Learning Unmasking Policies for Diffusion Language Models Paper • 2512.09106 • Published 6 days ago • 6
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence Paper • 2511.07384 • Published Nov 10 • 16
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence Paper • 2511.07384 • Published Nov 10 • 16
smcleish/Recurrent-TinyLlama-3T-train-recurrence-4-two-phase Text Generation • 0.8B • Updated Nov 11 • 7
smcleish/Recurrent-TinyLlama-3T-train-recurrence-4-single-phase Text Generation • 0.8B • Updated Nov 11 • 8