AI & ML interests

None defined yet.

Recent Activity

Jofthomas 
posted an update 4 days ago
view post
Post
3241
The new Mistral 3 models are here !

Today, we announce Mistral 3, the next generation of Mistral models. Mistral 3 includes three state-of-the-art small, dense models (14B, 8B, and 3B) and Mistral Large 3 – our most capable model to date – a sparse mixture-of-experts trained with 41B active and 675B total parameters.

All models are released under the Apache 2.0 license.

Ministrals :
https://huggingface.co/collections/mistralai/ministral-3

Mistral Large 3:
https://huggingface.co/collections/mistralai/mistral-large-3
  • 2 replies
·
DmitryRyumin 
posted an update 29 days ago
view post
Post
1216
🚀👁️🌟 New Research Alert - ICCV 2025 (Poster)! 🌟👁️🚀
📄 Title: Is Less More? Exploring Token Condensation as Training-Free Test-Time Adaptation 🔝

📝 Description: Token Condensation as Adaptation (TCA) improves the performance and efficiency of Vision Language Models in zero-shot inference by introducing domain anchor tokens.

👥 Authors: Zixin Wang, Dong Gong, Sen Wang, Zi Huang, Yadan Luo

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation (2410.14729)

📁 Repository: https://github.com/Jo-wang/TCA

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Session 1: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/session-1.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #TestTimeAdaptation #TokenCondensation #VisionLanguageModels #TrainingFreeAdaptation #ZeroShotLearning #EfficientAI #AI #ICCV2025 #ResearchHighlight
AdinaY 
posted an update 30 days ago
view post
Post
3198
Kimi K2 Thinking is now live on the hub 🔥

moonshotai/Kimi-K2-Thinking

✨ 1T MoE for deep reasoning & tool use
✨ Native INT4 quantization = 2× faster inference
✨ 256K context window
✨ Modified MIT license
AdinaY 
posted an update about 1 month ago
view post
Post
630
Chinese open source AI in October wasn’t about bigger models, it was about real world impact 🔥

https://huggingface.co/collections/zh-ai-community/october-2025-china-open-source-highlights

✨ Vision-Language & OCR wave 🌊
- DeepSeek-OCR : 3B
- PaddleOCR-VL : 0.9B
- Qwen3-VL : 2B / 4B / 8B / 32B /30B-A3B
- Open-Bee: Bee-8B-RL
- http://Z.ai Glyph :10B

OCR is industrializing, the real game now is understanding the (long context) document, not just reading it.

✨ Text generation: scale or innovation?
- MiniMax-M2: 229B
- Antgroup Ling-1T & Ring-1T
- Moonshot Kimi-Linear : linear-attention challenger
- Kwaipilot KAT-Dev

Efficiency is the key.

✨ Any-to-Any & World-Model : one step forward to the real world
- BAAI Emu 3.5
- Antgroup Ming-flash-omni
- HunyuanWorld-Mirror: 3D

Aligning with the “world model” globally

✨ Audio & Speech + Video & Visual: released from entertainment labs to delivery platforms
- SoulX-Podcast TTS
- LongCat-Audio-Codec & LongCat-Video by Meituan delivery paltform
- xiabs DreamOmni 2

Looking forward to what's next 🚀
DmitryRyumin 
posted an update about 1 month ago
view post
Post
2365
🚀👁️🌟 New Research Alert - ICCV 2025 (Oral)! 🌟👁️🚀
📄 Title: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching 🔝

📝 Description: The proposed method enhances stereo matching by efficiently combining unbiased monocular priors from vision foundation models. This method addresses misalignment and local optima issues using a binary local ordering map and pixel-wise linear regression.

👥 Authors: Chengtang Yao, Lidong Yu, Zhidan Liu, Jiaxi Zeng, Yuwei Wu, and Yunde Jia

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching (2505.14414)

📁 Repository: https://github.com/YaoChengTang/Diving-into-the-Fusion-of-Monocular-Priors-for-Generalized-Stereo-Matching

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the 3D Pose Understanding Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/3d-pose-understanding.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #StereoMatching #MonocularDepth #VisionFoundationModels #3DReconstruction #Generalization #AI #ICCV2025 #ResearchHighlight
DmitryRyumin 
posted an update about 1 month ago
view post
Post
2828
🚀👌🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🤌🚀
📄 Title: Understanding Co-speech Gestures in-the-wild 🔝

📝 Description: JEGAL is a tri-modal model that learns from gestures, speech and text simultaneously, enabling devices to interpret co-speech gestures in the wild.

👥 Authors: @sindhuhegde , K R Prajwal, Taein Kwon, and Andrew Zisserman

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Understanding Co-speech Gestures in-the-wild (2503.22668)

🌐 Web Page: https://www.robots.ox.ac.uk/~vgg/research/jegal
📁 Repository: https://github.com/Sindhu-Hegde/jegal
📺 Video: https://www.youtube.com/watch?v=TYFOLKfM-rM

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Human Modeling Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/human-modeling.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #CoSpeechGestures #GestureUnderstanding #TriModalRepresentation #MultimodalLearning #AI #ICCV2025 #ResearchHighlight
AdinaY 
posted an update about 1 month ago
pagezyhf 
posted an update about 1 month ago
view post
Post
2785
🚀 Big news for AI builders!

We’re thrilled to announce that the Qwen3-VL family of vision-language models is now available on Azure AI Foundry, thanks to our collaboration with Microsoft.

We bring open-source innovation to enterprise-grade AI infrastructure, making it easier than ever for enterprise to deploy and scale the latest and greatest from models from hugging Face securely within Azure.

🔍 Highlights:

- Deploy Qwen3-VL instantly via managed endpoints
- Built-in governance, telemetry, and lifecycle management
- True multimodal reasoning — vision, language, and code understanding
- State-of-the-art performance, outperforming closed-source models like Gemini 2.5 Pro and GPT-5
- Available in both *Instruct* and *Thinking* modes, across 24 model sizes

👉 Get started today: search for Qwen3-VL in the Hugging Face Collection on Azure AI Foundry.
  • 1 reply
·
DmitryRyumin 
posted an update about 1 month ago
view post
Post
3945
🚀💡🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🪄🚀
📄 Title: LoftUp: Learning a Coordinate-based Feature Upsampler for Vision Foundation Models 🔝

📝 Description: LoftUp is a coordinate-based transformer that upscales the low-resolution features of VFMs (e.g. DINOv2 and CLIP) using cross-attention and self-distilled pseudo-ground truth (pseudo-GT) from SAM.

👥 Authors: Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, and Dan Zhang

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models (2504.14032)

🌐 Github Page: https://andrehuang.github.io/loftup-site
📁 Repository: https://github.com/andrehuang/loftup

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Foundation Models and Representation Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/foundation-models-and-representation-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #LoftUp #VisionFoundationModels #FeatureUpsampling #Cross-AttentionTransformer #CoordinateBasedLearning #SelfDistillation #PseudoGroundTruth #RepresentationLearning #AI #ICCV2025 #ResearchHighlight
DmitryRyumin 
posted an update about 1 month ago
view post
Post
1943
🚀🏷️🌟 New Research Alert - ICCV 2025 (Oral)! 🌟🧩🚀
📄 Title: Heavy Labels Out! Dataset Distillation with Label Space Lightening 🔝

📝 Description: The HeLlO framework is a new corpus distillation method that removes the need for large soft labels. It uses a lightweight, online image-to-label projector based on CLIP. This projector has been adapted using LoRA-style, parameter-efficient tuning. It has also been initialized with text embeddings.

👥 Authors: @roseannelexie , @Huage001 , Zigeng Chen, Jingwen Ye, and Xinchao Wang

📅 Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA 🇺🇸

📄 Paper: Heavy Labels Out! Dataset Distillation with Label Space Lightening (2408.08201)

📺 Video: https://www.youtube.com/watch?v=kAyK_3wskgA

🚀 ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

🚀 Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🔍 Keywords: #DatasetDistillation #LabelCompression #CLIP #LoRA #EfficientAI #FoundationModels #AI #ICCV2025 #ResearchHighlight
  • 2 replies
·
AdinaY 
posted an update about 1 month ago
view post
Post
1747
Ming-flash-omni Preview 🚀 Multimodal foundation model from AntGroup

inclusionAI/Ming-flash-omni-Preview

✨ Built on Ling-Flash-2.0: 10B total/6B active
✨ Generative segmentation-as-editing
✨ SOTA contextual & dialect ASR
✨ High-fidelity image generation
AdinaY 
posted an update about 1 month ago
view post
Post
1855

Glyph 🔥 a framework that scales context length by compressing text into images and processing them with vision–language models, released by Z.ai.

Paper:https://huggingface.co/papers/2510.17800
Model:https://huggingface.co/zai-org/Glyph

✨ Compresses long sequences visually to bypass token limits
✨ Reduces computational and memory costs
✨ Preserves meaning through multimodal encoding
✨ Built on GLM-4.1V-9B-Base