Avoiding Premature Collapse: Adaptive Annealing for Entropy-Regularized Structural Inference
Abstract
Researchers identify and address premature mode collapse in optimal transport-based structural prediction models through an adaptive stability control algorithm that prevents gradient explosions during large-scale training.
Differentiable matching layers and residual connection paradigms, often implemented via entropy-regularized Optimal Transport (OT), serve as critical mechanisms in structural prediction and architectural scaling. However, recovering discrete permutations or maintaining identity mappings via annealing Ξ΅to 0 is notoriously unstable. In this work, we identify a fundamental mechanism for this failure: Premature Mode Collapse. By analyzing the non-normal dynamics of the Sinkhorn fixed-point map, we reveal a theoretical thermodynamic speed limit: standard exponential cooling outpaces the contraction rate of the inference operator, which degrades as O(1/Ξ΅). To address this, we propose Efficient Piecewise Hybrid Adaptive Stability Control (EPH-ASC), an adaptive scheduling algorithm that monitors the stability of the inference process. We demonstrate that EPH-ASC is essential for stabilizing Manifold-Constrained Hyper-Connections (mHC) during large-scale training on the FineWeb-Edu dataset, effectively preventing late-stage gradient explosions by enforcing a linear stability law.
Community
Excited to share my latest work: "Avoiding Premature Collapse: Adaptive Annealing for Entropy-Regularized Structural Inference" (arXiv:2601.23039) π
We identify a key failure mode in Sinkhorn annealing (Ξ΅β0): Premature Mode Collapse, caused by a "Thermodynamic Speed Limit" β standard exponential schedules violate stability due to O(1/Ξ΅) sensitivity and vanishing spectral gap.
Solution: EPH-ASC (Efficient Piecewise Hybrid Adaptive Stability Control) β lightweight adaptive scheduler that monitors primal drift and triggers "Thermodynamic Pause" when needed. Overhead <0.5%!
Results:
- SPair-71k keypoint matching: 1.6Γ speedup over Gumbel-Sinkhorn (47 vs 75 epochs to 90%)
- FineWeb-Edu + Nano Gemma w/ mHC: detects instability early, prevents late gradient explosion with 340-step safety margin
Also built an interactive demo to visualize annealing strategies: https://huggingface.co/spaces/leon0923/torch-sinkhorn-asc-demo
Would love feedback from OT, routing, or mHC folks! How does this resonate with your experiences in large-scale training? @leon0923 (me) happy to discuss / collaborate on integrations.
#OptimalTransport #Sinkhorn #MachineLearning #mHC
arXivLens breakdown of this paper π https://arxivlens.com/PaperView/Details/avoiding-premature-collapse-adaptive-annealing-for-entropy-regularized-structural-inference-4548-ff098f18
- Executive Summary
- Detailed Breakdown
- Practical Applications
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- The Homogeneity Trap: Spectral Collapse in Doubly-Stochastic Deep Networks (2026)
- Rate-Optimal Noise Annealing in Semi-Dual Neural Optimal Transport: Tangential Identifiability, Off-Manifold Ambiguity, and Guaranteed Recovery (2026)
- Smooth, Sparse, and Stable: Finite-Time Exact Skeleton Recovery via Smoothed Proximal Gradients (2026)
- The Geometric Mechanics of Contrastive Representation Learning: Alignment Potentials, Entropic Dispersion, and Cross-Modal Divergence (2026)
- A Unified Matrix-Spectral Framework for Stability and Interpretability in Deep Learning (2026)
- FISMO: Fisher-Structured Momentum-Orthogonalized Optimizer (2026)
- On the Spectral Flattening of Quantized Embeddings (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper