Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense Paper โข 2510.07242 โข Published Oct 8, 2025 โข 30
Understanding Language Prior of LVLMs by Contrasting Chain-of-Embedding Paper โข 2509.23050 โข Published Sep 27, 2025 โข 14
Clean First, Align Later: Benchmarking Preference Data Cleaning for Reliable LLM Alignment Paper โข 2509.23564 โข Published Sep 28, 2025 โข 7
LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge Signals Paper โข 2509.21875 โข Published Sep 26, 2025 โข 9