πŸ“ Publications

πŸ—‘οΈπŸ›‘οΈ Jailbreak Attacks and Defenses

ACL 2025
sym

Hierarchical Safety Realignment: Lightweight Restoration of Safety in Pruned Large Vision-Language Models \ Yue Li*, Xin Yi*, Dongsheng Shi, Gerard de Melo, Xiaoling Wang and Linlin Wang†.

Arxiv | Project | ACL Anthology

  • The current pruning methods will lead to a significant degradation of the model’s safety at a higher sparsity.
  • The HSR (Hierarchical Safety Realignment) method we proposed can achieve safety realignment for the pruned model by restoring only a very small number of neurons. HSR is effective for both LLM and LVLM.

πŸ“„πŸ” Watermarks and Fingerprints