π Publications
π‘οΈπ‘οΈ Jailbreak Attacks and Defenses
ACL 2025

Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models \ Yue Li*, Xin Yi*, Dongsheng Shi, Gerard de Melo, Xiaoling Wang and Linlin Wangβ .
Arxiv | Project | ACL Anthology
- The current pruning methods will lead to a significant degradation of the modelβs safety at a higher sparsity.
- The HSR (Hierarchical Safety Realignment) method we proposed can achieve safety realignment for the pruned model by restoring only a very small number of neurons. HSR is effective for both LLM and LVLM.
ESWA 2026
Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks, Xin Yi, Yue Li, Dongsheng Shi, Linlin Wangβ , Xiaoling Wang and Liang He.
ππ Watermarks and Fingerprints
-
Preprint
From Evaluation to Defense: Constructing Persistent Edit-Based Fingerprints for Large Language Models, Yue Li*, Xin Yi*, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Xiaoling Wang and Linlin Wangβ . -
KBS 2025
Unified Attacks to Large Language Model Watermarks: Spoofing and Scrubbing in Unauthorized Knowledge Distillation, Xin Yi, Yue Li, Shunfan Zheng, Linlin Wangβ , Xiaoling Wang and Liang He.