📝 Publications
🗡️🛡️ Jailbreak Attacks and Defenses

Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models \ Yue Li*, Xin Yi*, Dongsheng Shi, Gerard de Melo, Xiaoling Wang and Linlin Wang†.
Arxiv | Project | ACL Anthology
- The current pruning methods will lead to a significant degradation of the model’s safety at a higher sparsity.
- The HSR (Hierarchical Safety Realignment) method we proposed can achieve safety realignment for the pruned model by restoring only a very small number of neurons. HSR is effective for both LLM and LVLM.
-
ESWA 2026Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks, Xin Yi, Yue Li, Dongsheng Shi, Linlin Wang†, Xiaoling Wang and Liang He. -
PreprintUnified defense for large language models against jailbreak and fine-tuning attacks in education, Xin Yi, Yue Li, Dongsheng Shi, Linlin Wang†, Xiaoling Wang and Liang He.
📄🔍 Model Watermarks and Fingerprints
-
PreprintAGMark: Attention-Guided Dynamic Watermarking for Large Vision-Language Models, Yue Li*, Xin Yi*, Dongsheng Shi, Yongyi Cui, Gerard de Melo and Linlin Wang†. -
PreprintFrom Construction to Injection: Edit-Based Fingerprints for Large Language Models, Yue Li*, Xin Yi*, Dongsheng Shi, Yongyi Cui, Gerard de Melo and Linlin Wang†. -
KBS 2025Unified Attacks to Large Language Model Watermarks: Spoofing and Scrubbing in Unauthorized Knowledge Distillation, Xin Yi, Yue Li, Shunfan Zheng, Linlin Wang†, Xiaoling Wang and Liang He.
🗃️📊 Benchmarks
ESWA 2026Benchmarking Large Language Models for End-to-End Clinical Support in Traditional Chinese Medicine, Dongsheng Shi, Xin Yi, Yue Li, Linlin Wang†.
👻💭 Model Hallucinations
IJCNN 2026Process Alignment: Verifiable Knowledge Distillation for Mitigating Hallucinations in Large Language Models, Weicong Ni, Yue Li, Dongsheng Shi, Linlin Wang†.