bknyaz/Qwen3-Coder-Next-REAM
Are you planning to do an AWQ quant of the REAM model aswell?
bknyaz/Qwen3-Coder-Next-REAM
it would be really nice, as an 4bit quant of it would fit in the typical 2x3090 setup without needing to use llama.cpp for offloading
would be highly appreciated
(im trying it myself currently but am not very proficient in quantization)
do you maybe have any tips or resources for the quantization pipeline you use and the calibration dataset and configuration of it?
it takes like 11Hours on my machine with sequential pipeline with llm-compressor but my first try wasnt runnable sadly . also i probably should use some code and tool calling calibration data right?
Also thank you for your work , i really like your AWQ quants and use them a lot :D !
I am happy that you enjoy my quants!
I will quantize that model soon, after the new MiniMax-2.5 model currently in process.
I would recommend llm-compressor awq examples as a starting point. I think it is normal for Qwen3-Next models to take 11hours, as I remember it took longer for me to quantize.
Thanks a lot :)
It might have been longer but it was in that range.
Thanks for the examples link I haven't found the qwen next example yet.
Do you have any tips regarding the datasets for calibration? Does it make sense to mix them somehow? How much does it influence the resulting quality of the quant?
@cpatonn
Do you have any update on this?
as i couldn't get the AWQ quant to run, based on examples.
(it could be a problem with my dual 3090 configuration) but it seems the REAM model has a different Group Size than the base model and i dont know if that is a problem
at least for me no Kernel could run it.
Just wanted to ask if you ran into similar problems while quantizing or if its a problem on my side
This is not intended as a message to rush you or anything, just wanted to ask as i had like 5 runs already and cannot figure it out :D