Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback Paper • 2204.05862 • Published Apr 12, 2022 • 3
view article Article Introducing OptiMind, a research model designed for optimization 19 days ago • 33
Sparse Auto-Encoders (SAEs) for Mechanistic Interpretability Collection A compilation of sparse auto-encoders trained on large language models. • 37 items • Updated Dec 16, 2025 • 23
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints Paper • 2601.18137 • Published 9 days ago • 24
ECO: Quantized Training without Full-Precision Master Weights Paper • 2601.22101 • Published 5 days ago • 6
World events Collection Dataset containing real world events from 2023 till present • 3 items • Updated 9 days ago • 5
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits Paper • 2512.20578 • Published Dec 23, 2025 • 85
A^3-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation Paper • 2601.09274 • Published 21 days ago • 84
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos Paper • 2601.00393 • Published Jan 1 • 130
LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published 28 days ago • 146
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 26 days ago • 218
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence Paper • 2401.14196 • Published Jan 25, 2024 • 70
Perplexed by Quality: A Perplexity-based Method for Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data Paper • 2212.10440 • Published Dec 20, 2022 • 1