MOSS-Audio Collection An open-source audio understanding model supporting speech recognition, environmental sound analysis, music understanding, time-aware QA, and complex • 7 items • Updated 3 days ago • 55
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Paper • 2408.15079 • Published Aug 27, 2024 • 56
Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models Paper • 2603.25750 • Published Mar 20 • 36
Running on CPU Upgrade Agents Featured 108 Cohere Multilingual ASR 🎙 108 Transcribe audio clips to text in many languages
Running Agents Featured 207 Voxtral TTS Demo ⚡ 207 Generate realistic speech from text with custom or preset voices
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper • 2603.23483 • Published Mar 24 • 62
Running on CPU Upgrade 231 The Synthetic Data Playbook: Generating Trillions of the Finest Tokens 📝 231 Explore synthetic data experiments on a virtual bookshelf