📝 Publications

🗡️🛡️ Jailbreak Attacks and Defenses

ACL 2025
sym

Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models \ Yue Li*, Xin Yi*, Dongsheng Shi, Gerard de Melo, Xiaoling Wang and Linlin Wang.

Arxiv | Project | ACL Anthology

  • The current pruning methods will lead to a significant degradation of the model’s safety at a higher sparsity.
  • The HSR (Hierarchical Safety Realignment) method we proposed can achieve safety realignment for the pruned model by restoring only a very small number of neurons. HSR is effective for both LLM and LVLM.

📄🔍 Model Watermarks and Fingerprints

🗃️📊 Benchmarks

👻💭 Model Hallucinations

  • IJCNN 2026 Process Alignment: Verifiable Knowledge Distillation for Mitigating Hallucinations in Large Language Models, Weicong Ni, Yue Li, Dongsheng Shi, Linlin Wang.