Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models Paper • 2601.14004 • Published 16 days ago • 46
Beyond Over-Refusal: Scenario-Based Diagnostics and Post-Hoc Mitigation for Exaggerated Refusals in LLMs Paper • 2510.08158 • Published Oct 9, 2025 • 1
LLM in the Loop: Creating the PARADEHATE Dataset for Hate Speech Detoxification Paper • 2506.01484 • Published Jun 2, 2025 • 6
Graph-Guided Textual Explanation Generation Framework Paper • 2412.12318 • Published Dec 16, 2024 • 4
Hallucinations Can Improve Large Language Models in Drug Discovery Paper • 2501.13824 • Published Jan 23, 2025 • 10