Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

danielhanchenΒ 
posted an update 1 day ago
view post
Post
4187
We’re excited to announce that Unsloth has joined the PyTorch Ecosystem! πŸ”₯πŸ¦₯

Unsloth is an open-source project that makes training & running models more accurate and faster with less compute. Our mission is to make local AI accessible to everyone. Thanks to all of you for making this possible! πŸ’•

Blog: https://unsloth.ai/blog/pytorch
GitHub: https://github.com/unslothai/unsloth
  • 2 replies
Β·
spillaiΒ 
posted an update 2 days ago
view post
Post
8586
mm-ctx – fast, multimodal context for agents.

LLM-based agents handle text incredibly well, but images, videos, or PDFs with visual content are hard to interpret. mm-ctx gives your CLI agent multi-modal skills.

Try it interactively in Spaces: vlm-run/mm-ctx

Readme: https://vlm-run.github.io/mm/
PyPI: https://pypi.org/project/mm-ctx
SKILL.md: https://github.com/vlm-run/skills/blob/main/skills/mm-cli-skill/SKILL.md

mm-ctx is meant to feel familiar: the UNIX tools we already love (find/cat/grep/wc), rebuilt for file types LLMs can't read natively and designed to work with agents via the CLI.
- mm grep "invoice #1234" ~/Downloads searches across PDFs and returns line-numbered matches
- mm cat <document>.pdf returns a metadata description of the file
- mm cat <photo>.jpg returns a caption of the photo
- mm cat <video>.mp4 returns a caption of the video

A few things we obsessed over:
⚑ Speed: Rust core for the hot paths
🏠 Local-first, BYO model: Uses any OpenAI-compatible endpoint: Ollama, vLLM/SGLang, LMStudio with any multimodal LLM (Gemma4, Qwen3.5, GLM-4.6V).
πŸ”— Composable: stdin + structured outputs
πŸ€– Drops into any agent via mm-cli-skills: Claude Code, Codex, Gemini CLI, OpenClaw.

We’d love to hear your feedback! Especially on the CLI and what file types and workflows you would like to see next.
  • 2 replies
Β·
HannesVonEssenΒ 
posted an update 3 days ago
SeaWolf-AIΒ 
posted an update about 1 hour ago
view post
Post
66
🧬 Darwin Family: Zero Gradient Steps, GPQA Diamond 88.89%

How far can we push LLM reasoning *without* training?

Our team at VIDRAFT submitted this paper to Daily Papers yesterday, and it's
currently #3. Huge thanks to everyone who upvoted β€” sharing the core ideas below.

πŸ”— Paper: Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning (2605.14386)
πŸ”— arXiv: https://arxiv.org/abs/2605.14386
πŸ”— Model: FINAL-Bench/Darwin-28B-Opus

---

TL;DR

Darwin Family is a training-free evolutionary merging framework.
By recombining the weight spaces of existing LLM checkpoints β€” with zero
gradient-based training β€” it reaches frontier-level reasoning.

- πŸ† Darwin-28B-Opus: GPQA Diamond 88.89%
- πŸ’Έ Zero gradient steps β€” not a single B200 or H200 hour needed
- 🧬 Consistent gains across 4B β†’ 35B scale
- πŸ”€ Cross-architecture breeding between Transformer and Mamba families
- πŸ” Stable recursive multi-generation evolution

#Three Core Mechanisms

β‘  14-dim Adaptive Merge Genome β€” fine-grained recombination at both
component level (Attention / FFN / MLP / LayerNorm / Embedding) and block
level, expanding the prior evolutionary-merge search space.

β‘‘ MRI-Trust Fusion β€” we diagnose each layer's reasoning contribution
via an **MRI (Model Reasoning Importance)** signal and fuse it with
evolutionary search through a **learnable trust parameter**. Trust the
diagnostic too much and search collapses; ignore it and search becomes
inefficient β€” Darwin learns the balance from data.

β‘’ Architecture Mapper β€” weight-space breeding across heterogeneous
families. Attention Γ— SSM crossover actually works.

Why It Matters
> Diagnose latent capabilities already encoded in open checkpoints,
> and recombine them β€” no gradients required.

Replies and critiques welcome πŸ™Œ
blanchonΒ 
posted an update 1 day ago
view post
Post
1279
I'm releasing OpenCS2 a 11TB dataset of around 5000 hours of counter strike gameplay recording.
- HD resolution - 1280Γ—720 Β· 32 fps
- For each frame keyboard and mouse + world state (player position, velocity, weapon ...)
- HD Stereo audio
- All 10 players perspective

https://huggingface.co/collections/blanchon/opencs2
  • 1 reply
Β·
ImosuΒ 
posted an update 2 days ago
view post
Post
2772
# ZeroGPU Hardware Mismatch: Why Am I Getting RTX PRO 6000 Blackwell MIG Instead of the Documented H200?

I recently ran into a surprising issue while debugging a Hugging Face ZeroGPU Space.

According to the Hugging Face ZeroGPU documentation, ZeroGPU is described as using NVIDIA H200-based resources, with configurations such as β€œlarge” and β€œxlarge” offering H200-class memory. However, when I printed the actual GPU information inside my Space, I got something different:

`txt
GPU: NVIDIA RTX PRO 6000 Blackwell Server Edition MIG 2g.48gb
Capability: (12, 0)
Torch: 2.8.0+cu128
CUDA: 12.8

This is not an H200. It appears to be a MIG slice of an RTX PRO 6000 Blackwell Server Edition GPU, with 48GB VRAM.

This difference matters. It is not just a cosmetic hardware-name issue.

In my case, the Space was running Qwen3-TTS and failed with:

CUDA error:
no kernel image is available for execution on the device

The issue appears related to GPU architecture compatibility. The app was using kernels-community/flash-attn3, which is generally aligned with Hopper-class GPUs such as H100/H200, but the actual device exposed to the Space was Blackwell with compute capability 12.0. As a result, CUDA kernels that might work on the expected H200 environment failed on the actual assigned GPU.

To be clear, I am not saying the RTX PRO 6000 Blackwell is a bad GPU. It is a newer architecture and may be powerful in many workloads. But it is not the same as H200, and the software ecosystem compatibility is different. For ML workloads, especially those relying on custom CUDA kernels, the exact GPU architecture matters a lot.

This raises a few questions:

Is Hugging Face ZeroGPU now assigning RTX PRO 6000 Blackwell MIG instances instead of H200 instances?
If yes, why is this not clearly documented?
  • 2 replies
Β·
PhysiQuantyΒ 
posted an update about 5 hours ago
view post
Post
338
❗ Dating apps do not allow us to control the profiles suggested to us based on our mutual search criteria ❗
🧬 If you want to see if your soulmate has already existed, I have published a dataset of 59k anonymized public profiles

SpiceeChat/OkCupid-59k-Anonymized-Profiles

Are you looking for a female ML engineer who is looking for a male ML engineer and you can't find it on the apps ?
You need to look for her, but more importantly, she needs to look for you.
Personally, I'm looking for a physicist I'm encountering the same problem. I can't find it
My answer : Paradox of choice of dating apps solved by patent ⚑ WO2026082672 ⚑
https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2026082672

J'ai du brevetΓ© pour te trouver et on se trouvera bientΓ΄t !
cesear64Β 
posted an update about 18 hours ago
view post
Post
1159
Just published: how we built production Sango (Central African Republic) translation without fine-tuning, parallel corpus, or training compute.

The method β€” vocabulary-augmented prompting with a 581-entry native-speaker-verified lexicon β€” generalizes to any of the ~2,000 African languages at the same data-poverty level. Recipe, dataset, and code template all included.

πŸ“„ Blog: https://huggingface.co/blog/MEYNG/sangoai
πŸ“¦ Dataset: MEYNG/sango-vocabulary

Would especially value feedback from anyone working on other low-resource African languages β€” Ewondo, Lingala, Wolof next on our roadmap.
unmodeled-tylerΒ 
posted an update 1 day ago
view post
Post
1119
The UFO/UAP Dataset is complete!

unmodeled-tyler/DoW-UFO-UAP-1

The most recent release from the Department of War is there up in full and ready for analysis!

The dataset ships with an Hermes Agent Skill so you can quickly and easily start parsing through the data immediately.

Go chase some anomalies! πŸš€

rajkumarrawalΒ 
posted an update 2 days ago
view post
Post
2024
LLMs aren’t just answering questions anymore, they’re learning to evolve. Self evolving AI is the true endgame.

AI has shifted from short tasks to long missions. The breakthrough isn’t just automation, it’s machines learning human methods and applying them at machine speed. From cybersecurity to finance, from OPCs to NPCs, the wave is irreversible.

Read the full article: Self Evolving is the Endgame or final destiny

https://huggingface.co/blog/rajkumarrawal/self-evolving-is-the-endgame-or-final-destiny

What’s your definition of true AGI? Comment below.
  • 1 reply
Β·