WUSH: Near-Optimal Adaptive Transforms for LLM Quantization Paper • 2512.00956 • Published Nov 30, 2025 • 23
CAGE: Curvature-Aware Gradient Estimation For Accurate Quantization-Aware Training Paper • 2510.18784 • Published Oct 21, 2025 • 1
Optimizers Qualitatively Alter Solutions And We Should Leverage This Paper • 2507.12224 • Published Jul 16, 2025 • 1
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization Paper • 2509.23202 • Published Sep 27, 2025 • 28
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models Paper • 2408.11743 • Published Aug 21, 2024 • 1
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations Paper • 2502.05003 • Published Feb 7, 2025 • 42
Flow Matching for Discrete Systems: Efficient Free Energy Sampling Across Lattice Sizes and Temperatures Paper • 2503.08063 • Published Mar 11, 2025 • 1
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20, 2025 • 78
Accurate generation of chemical reaction transition states by conditional flow matching Paper • 2507.10530 • Published Jul 14, 2025 • 1
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm Paper • 2507.18553 • Published Jul 24, 2025 • 41
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence Paper • 2405.15593 • Published May 24, 2024 • 1
SVD-Free Low-Rank Adaptive Gradient Optimization for Large Language Models Paper • 2505.17967 • Published May 23, 2025 • 17
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20, 2025 • 78
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20, 2025 • 78
Quartet: Native FP4 Training Can Be Optimal for Large Language Models Paper • 2505.14669 • Published May 20, 2025 • 78
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention Paper • 2504.06261 • Published Apr 8, 2025 • 110