Request: Please provide a GGUF for qwen3-235b-a22b-thinking(2507) with UD 2.0 UD-IQ1 quantization

#1378

by MultEase - opened Sep 12

Sep 12

Hello mradermacher,
Sorry to bother you! Could you please help provide a GGUF quantization of qwen3-235b-a22b-thinking(2507) using Unsloth Dynamic 2.0 with the UD-IQ1 setting? Since there is currently no official UD-IQ1 file for this model, I’d like to use it for local inference and low-bit evaluations. Would it be feasible to produce a UD-IQ1 GGUF via the UD 2.0 pipeline? If so, could you let me know whether you need the original weights/links and license details, and whether there are any recommended llama.cpp build or runtime parameters?
Many thanks!

nicoboss

Sep 12

•

edited Sep 12

We already did this model under https://huggingface.co/mradermacher/Qwen3-235B-A22B-Thinking-2507-i1-GGUF
We don't do Unsloth Dynamic quant or any other quant mixtures such as bartowski's one. The default weighted/imatrix quant mixture is as good as it can get and any further improvements come at trade-offs such as requiring more GPU memory/RAM at the same stated size or being optimised for certain use case. There is no evidence that at the same size such quant mixtures are any better unless you cherry pick certain models or make your quant mixture architecture dependant (like Unsloth tries to do). Weighted/imatrix quants already dynamically decide what wights are important and what wights are not and with a high quality proprietary imatrix calibration dataset like we use we probably achieve similar performance to those Unsloth Dynamic quants. Thair marketing under https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs is quite misleading. They compare against static quants and imatrix quants that were made using the tiny and terrible wikitext dataset no professional quanter would use. I would have expected to at least compare themselves with quants made using bartowski's public imatrix calibration dataset.
The only real improvement without any trade-offs I see happening in the future is better rounding as proposed in https://github.com/ggml-org/llama.cpp/pull/12557
While researching this I found https://github.com/ggml-org/llama.cpp/pull/15550 which is quite interesting as well.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment