📝 Publications

🗡️🛡️ Jailbreak Attacks and Defenses

ACL 2025
sym

Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models \ Yue Li*, Xin Yi*, Dongsheng Shi, Gerard de Melo, Xiaoling Wang and Linlin Wang.

Arxiv | Project | ACL Anthology

  • The current pruning methods will lead to a significant degradation of the model’s safety at a higher sparsity.
  • The HSR (Hierarchical Safety Realignment) method we proposed can achieve safety realignment for the pruned model by restoring only a very small number of neurons. HSR is effective for both LLM and LVLM.

📄🔍 Model Watermarks and Fingerprints

SIGKDD 2026
sym

AGMark: Attention-Guided Dynamic Watermarking for Large Vision-Language Models \ Yue Li*, Xin Yi*, Dongsheng Shi, Yongyi Cui, Gerard de Melo and Linlin Wang.

Arxiv | Project

  • We propose AGmark, a watermarking method for LVLMs that follows the red–green token partitioning paradigm.
  • At each step of autoregressive generation, AGmark dynamically identifies candidate token weights and adaptively determines the size of the protected token set, effectively mitigating the trade-off between text quality and watermark detectability.

🤖🎯 Agents

  • Preprint MAE: Continuous Learning–based Medical Multi-Agent System via Experience Mining and Reuse, Dongsheng Shi, Xin Yi, Yue Li, Linlin Wang.

  • Preprint SURGENT: A Surgical Multi-Agent Assistance System Across the Perioperative Workflow, Dongsheng Shi, Yue Li, Xin Yi, Huawei Feng, Linlin Wang.

🗃️📊 Benchmarks

👻💭 Model Hallucinations

  • IJCNN 2026 Process Alignment: Verifiable Knowledge Distillation for Mitigating Hallucinations in Large Language Models, Weicong Ni, Yue Li, Dongsheng Shi, Linlin Wang.