Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! 🤯 Demo (+ source code): webml-community/DINOv3-video-tracking
This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! 😍
How does it work? 🤔 1️⃣ Generate and cache image features for each frame 2️⃣ Create a list of embeddings for selected patch(es) 3️⃣ Compute cosine similarity between each patch and the selected patch(es) 4️⃣ Highlight those whose score is above some threshold
... et voilà! 🥳
You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.
Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! 🤯 🗣️ Transcribe videos, meeting notes, songs and more 🔐 Runs on-device, meaning no data is sent to a server 🌎 Multilingual (8 languages) 🤗 Completely free (forever) & open source
That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! 🔥
NEW: Real-time conversational AI models can now run 100% locally in your browser! 🤯
🔐 Privacy by design (no data leaves your device) 💰 Completely free... forever 📦 Zero installation required, just visit a website ⚡️ Blazingly-fast WebGPU-accelerated inference
For those interested, here's how it works: - Silero VAD for voice activity detection - Whisper for speech recognition - SmolLM2-1.7B for text generation - Kokoro for text to speech
Powered by Transformers.js and ONNX Runtime Web! 🤗 I hope you like it!
Introducing the ONNX model explorer: Browse, search, and visualize neural networks directly in your browser. 🤯 A great tool for anyone studying Machine Learning! We're also releasing the entire dataset of graphs so you can use them in your own projects! 🤗
Reasoning models like o3 and o4-mini are advancing faster than ever, but imagine what will be possible when they can run locally in your browser! 🤯
Well, with 🤗 Transformers.js, you can do just that! Here's Zyphra's new ZR1 model running at over 100 tokens/second on WebGPU! ⚡️
Giving models access to browser APIs (like File System, Screen Capture, and more) could unlock an entirely new class of web experiences that are personalized, interactive, and run locally in a secure, sandboxed environment.
The most difficult part was getting the model running in the first place, but the next steps are simple: ✂️ Implement sentence splitting, allowing for streamed responses 🌍 Multilingual support (only phonemization left)