Yue Li (李越)

Yue Li (李越)

East China Normal University

📝 Publications

🗡️🛡️ Jailbreak Attacks and Defenses

ACL 2025

sym

Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models \ Yue Li^*, Xin Yi^*, Dongsheng Shi, Gerard de Melo, Xiaoling Wang and Linlin Wang^†.

Arxiv | Project | ACL Anthology

The current pruning methods will lead to a significant degradation of the model’s safety at a higher sparsity.
The HSR (Hierarchical Safety Realignment) method we proposed can achieve safety realignment for the pruned model by restoring only a very small number of neurons. HSR is effective for both LLM and LVLM.

ESWA 2026 Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks, Xin Yi, Yue Li, Dongsheng Shi, Linlin Wang^†, Xiaoling Wang and Liang He.

📄🔍 Watermarks and Fingerprints

Preprint From Evaluation to Defense: Constructing Persistent Edit-Based Fingerprints for Large Language Models, Yue Li^*, Xin Yi^*, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Xiaoling Wang and Linlin Wang^†.
KBS 2025 Unified Attacks to Large Language Model Watermarks: Spoofing and Scrubbing in Unauthorized Knowledge Distillation, Xin Yi, Yue Li, Shunfan Zheng, Linlin Wang^†, Xiaoling Wang and Liang He.