Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

HProg 
posted an update 2 days ago
view post
Post
2067
📣 I made a visualizer for Hugging Face models: https://hfviewer.com

✨ Simply paste a Hugging Face URL to get an interactive visualization of the architecture!

🔗 The recent Qwen3.6-27B model as an example: https://hfviewer.com/Qwen/Qwen3.6-27B

Feel free to try it out and give me feedback on how it can be improved! ❤️
  • 1 reply
·
eaddario 
posted an update 3 days ago
view post
Post
2984
Experimental global target bits‑per‑weight quantization of Qwen/Qwen3.6-27B and Qwen/Qwen3.6-35B-A3B.

Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.

Key Advantages:
- VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM).
- Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.

Full benchmarks (PPL, KLD, ARC, GPQA, MMLU, etc.) and methodology in the models' cards.

eaddario/Qwen3.6-27B-GGUF
eaddario/Qwen3.6-35B-A3B-GGUF
salma-remyx 
posted an update 3 days ago
view post
Post
4983
SciCrafter measured something AI practitioners have intuited: frontier agents are improving at executing inside well-framed problems, but lag at framing the problem in the first place.

GPT-5.2, Gemini-3-Pro, and Claude Opus 4.5 all plateaued near 26% on a new Minecraft benchmark for probing AI capabilities in the discovery-to-application loop.

So the authors ran targeted interventions:
* Hints about what to investigate doubled performance.
* A structured experimentation template added 7-14 more points.
* Structured consolidation beat free-form summaries by 6 points.
* Curriculum context beat independent task-solving.

These interventions helped the agent frame what’s worth investigating, and structure what gets learned so it compounds. The bottleneck for AI in scientific workflows is upstream of execution.

Their findings are congruent with the design patterns we've adopted at Remyx AI to help AI teams close the development loop scientifically.

Agents work well inside structured loops, but they perform poorly when tasked with creating the structure. Instrumenting your scientific workflows offers greater leverage than scaling compute with a less informed search.

In the work of building production AI systems, teams are flying through execution. The bigger challenge is identifying which experiments moved which production outcome, or what to try next.

One of the more interesting results I found this week by tracking work in AI for scientific workflows using Remyx: https://engine.remyx.ai/papers/d8f23b9b-b14b-4ada-b44e-ccfc221c06b4
Crownelius 
posted an update 3 days ago
view post
Post
5129
Day 3 - 05/02/2026
Scamp ships, hits the wall. New plan...

Scamp came back from training today... Didn't go so well, I'm still unsure...

Fast benchmark, temperature 0.7, top_p 0.9:
- "Capital of France is" produced "covered by the Crown" (grammatical, factually wrong)
- "23 + 19 = ?" produced "23. Answer: 23. Answer: 23..." (loops, math broken)
- "def fibonacci(n):" produced a list of letters

It speaks English. It can't reason. At 8K vocab and 50M params, it was never going to.

Next build: 412M MoE-3E. Three experts (math, language, code), top-1 routing, random init, let specialization emerge from gradient signal alone. Tried seeded Branch-Train-MiX first then dropped it. Adds compute for no clear win when the router will find its own attractors anyway.

Big lesson today came from limit testing on A100 80GB. Surprise, every planned phase ran out of memory even on 80GB. Root cause: at vocab 262144 (Gemma 3 standard), the output logits dominate during forward and backward. Fix: Liger Kernel's fused cross-entropy. It streams the loss computation instead of materialising the full B by T by vocab tensor. Without it the build would not run.

Scamp proved the pipeline runs end-to-end on real hardware. The 412M run starts tomorrow. If routing balances naturally and math finally crystallises, ships as Crowfeather-412M-3E with GGUF in F16, Q8, Q5, and Q4.

So... the training may have produced a poet if I had done it better. But I didn't, so instead... we get a malformed robot named Scamp... This is progress.

-Shane

P.S Join discord for discussion: https://discord.gg/8ZscHNmJYE and
I post my finished stuff here:
CompactAI-O
  • 2 replies
·
cihatyldz 
posted an update 3 days ago
view post
Post
3424
Şifahane, a dual-inference medical classification demo, is now live on Spaces. It features side-by-side Turkish BERT and Qwen2.5 architectures for real-time evaluation of the "Classifier vs. LLM" trade-offs, all within a single space. The system utilizes a fine-tuned Turkish BERT for high-speed, cost-effective inference and the Qwen2.5-7B model for flexible multi-task reasoning, with support for department classification, condition analysis, urgency assessment, and rationale generation across 12 medical departments.


🧠 BERT model: https://lnkd.in/dCUUASqq
📊 Dataset: https://lnkd.in/dGK9y24w
🤗 Demo: https://lnkd.in/dtWjCCPF
kanaria007 
posted an update about 2 hours ago
view post
Post
9
✅ Article highlight: *Verifier Packs and Conformance Harness* (art-60-227, v0.1)

TL;DR:
This article argues that “how we verify the spec” should itself be a governed artifact path.

A serious system should not stop at “we ran the tests and passed.” It should be able to say exactly **which verifier pack** was used, under **which harness manifest**, against **which vector bundle**, with **which reason-code linkage**, producing **which normalized run verdicts**, **which replay result**, and **which profile-level conformance report lineage**.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
• turns conformance from hidden CI behavior into portable, auditable artifacts
• makes verifier choice, harness policy, vector completeness, and replay status explicit
• prevents “green badge” claims that cannot later be reconstructed
• keeps degraded, partial, and historically superseded runs visible instead of laundering them away

What’s inside:
• a clean distinction between *specification*, *verifier pack*, *harness manifest*, *conformance run*, *replay verification*, and *profile conformance report*
• a practical ladder: VH1 / VH2 / VH3
• core portable artifacts like si/verifier-pack/v1, si/harness-manifest/v1, si/test-vector-bundle/v1, si/conformance-run-report/v1, and si/replay-verification-record/v1
• hard gates for explicit pack, explicit harness, vector completeness, replay-backed claims, and report support
• the rule that a profile conformance report must point to supporting runs rather than float free as a status badge

Key idea:
Do not say:

*“the tests passed.”*

Say:

*“this scope was checked by this verifier pack, under this harness manifest, against this declared vector bundle, with this linkage, producing these run verdicts and this replay-backed report lineage.”*
DedeProGames 
posted an update about 9 hours ago
view post
Post
52
GRaPE 2 Pro is now available.

SL-AI/GRaPE-2-Pro

This is the flagship model of the GRaPE 2 family and the largest model I have trained to date, sitting at 27B parameters. It is built on Qwen3.5-27B and trained on a closed-source proprietary dataset, with roughly half of post-training focused on code and the rest split between STEAM subjects and structured logical reasoning. It punches seriously above its weight class.

GRaPE 2 Pro supports multimodal input (image + text) and features 6 thinking modes via the tag. This gives you real control over how hard the model thinks, from skipping the reasoning phase entirely with minimal, all the way up to xtra-Hi for deep, extended thought on hard problems. For most agentic use, auto or low is the move to keep things snappy.

It also runs on consumer hardware. You can get it going with as low as 12GB of VRAM on a quantized build.

If you want to try it out and give feedback, that would be really appreciated. Email us at contact@skinnertopia.com
  • 1 reply
·
salma-remyx 
posted an update about 12 hours ago
view post
Post
53
VQASynth is the open source implementation of the SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities (2401.12168) paper, putting together the data synthesis pipeline behind remyxai/SpaceQwen2.5-VL-3B-Instruct, remyxai/SpaceThinker-Qwen2.5VL-3B, and several other spatial reasoning models we've shared on here on HF.

Here's how we use Remyx AI to build and improve VQASynth from the original concept forward.

Stage 1: When you connect a repo to Remyx, we extract development milestones from the commit history. For VQASynth, that surfaces the moments we changed how scenes get parsed, how captions get generated, how spatial relations get encoded. Those milestones power personalized recommendations for methods semantically relevant to improving your system.

Stage 2: When the model is serving in production, that same commit history delineates changes so you can learn from quasi-experiments through observational outcomes. This generates causal evidence about which changes drove which outcomes, sharpens recommendations, and supports inference on questions you haven't directly tested.

Stage 3: Once teams are running controlled experiments, the intervention outcomes tighten those estimates further.

Stage 4: When A/B testing becomes the operational bottleneck, we instrument decision points in the production system to explore via counterfactual perturbations. Initially in shadow mode, and after passing audits, with live traffic.

If you want recommendations tuned to your own project context, you can set up a feed here: https://docs.remyx.ai/platform/discover/feed
MikeDoes 
posted an update 1 day ago
view post
Post
110
AI4Privacy datasets are being used to decide what data should never leave the device.

A new paper on privacy-preserving cloud computing uses the AI4Privacy PII-Masking-65K dataset to train models that classify text as private or public before it’s ever sent to the cloud.

This is a subtle but important shift.

Instead of encrypting everything or trusting the cloud by default, the authors ask a simpler question:

Can we detect sensitive text early enough to keep it local?

Using DistilBERT, trained partly on AI4Privacy PII data, the system learns to:

route private text to local processing

send non-sensitive text to the cloud

train collaboratively using federated learning, without sharing raw data

The result:

99.9% accuracy in private vs public text detection

Near-centralized performance in downstream tasks like SMS spam detection

Privacy protection enforced by design, not policy

What stands out here is not just the model performance, but the architectural idea:
privacy as a routing decision, backed by large-scale PII annotations.

This work reinforces a pattern we keep seeing: scalable privacy systems don’t start with encryption, they start with good PII data.

📄 Full Paper here: https://dl.acm.org/doi/full/10.1145/3773276.3774872

#Ai4Privacy #DataPrivacy #PIIMasking #FederatedLearning #PrivacyEngineering #OpenSourceAI #ResponsibleAI #AcademicResearch #LLMSecurity
rajkumarrawal 
posted an update 1 day ago
view post
Post
126
I submitted a "Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization" Paper by Zi-Bo Qin, Feng-Feng Wei, Tai-You Chen, Wei-Neng Chen to Daily Papers on huggingface.

A trajectory-driven framework uses large language models to guide agent behavior and cooperation patterns in distributed black-box consensus optimization, improving solution quality and efficiency.

Learning to Act and Cooperate for Distributed Black-Box Consensus Optimization (2605.00691)