Hito 1.7B - GGUF

Quantized versions for llama.cpp, Ollama, LM Studio, and more

About

This repository contains GGUF quantized versions of hitonet/hito-1.7b.

For the original model (safetensors), training details, benchmarks, and full documentation, see the main repository.

A quote from Hito itself:

"Most AI gets the bat-and-ball problem wrong. I doubt myself first, then verify. Five cents, not ten. Math doesn't care about intuition."

Available Quantizations

File	Quant	Bits	Size	RAM Required	Use Case
hito-1.7b-Q2_K.gguf	Q2_K	2	742 MB	~1.2 GB	Smallest, significant quality loss
hito-1.7b-Q3_K_S.gguf	Q3_K_S	3	827 MB	~1.3 GB	Very small, noticeable quality loss
hito-1.7b-Q3_K_M.gguf	Q3_K_M	3	896 MB	~1.4 GB	Small, moderate quality loss
hito-1.7b-Q3_K_L.gguf	Q3_K_L	3	957 MB	~1.5 GB	Small, lower quality loss
hito-1.7b-Q4_0.gguf	Q4_0	4	1.0 GB	~1.5 GB	Legacy, prefer Q4_K_M
hito-1.7b-Q4_K_S.gguf	Q4_K_S	4	1.0 GB	~1.5 GB	Small, good quality
hito-1.7b-Q4_K_M.gguf	Q4_K_M	4	1.1 GB	~1.6 GB	Recommended - best balance
hito-1.7b-Q5_0.gguf	Q5_0	5	1.2 GB	~1.7 GB	Legacy, prefer Q5_K_M
hito-1.7b-Q5_K_S.gguf	Q5_K_S	5	1.2 GB	~1.7 GB	Large, low quality loss
hito-1.7b-Q5_K_M.gguf	Q5_K_M	5	1.2 GB	~1.7 GB	Large, very low quality loss
hito-1.7b-Q6_K.gguf	Q6_K	6	1.4 GB	~1.9 GB	Very large, minimal quality loss
hito-1.7b-Q8_0.gguf	Q8_0	8	1.8 GB	~2.3 GB	Highest quality quantization
hito-1.7b-F16.gguf	F16	16	3.3 GB	~3.8 GB	Full precision GGUF

Recommendation: Start with Q4_K_M for best size/quality balance. Use Q8_0 or F16 if you need maximum quality.

Compatibility

These GGUF files are compatible with:

llama.cpp (latest version recommended)
Ollama
LM Studio
Jan
GPT4All
llama-cpp-python
Any other llama.cpp-based application

Quick Start

Ollama

# Download the recommended quantization
wget https://huggingface.co/hitonet/hito-1.7b-GGUF/resolve/main/hito-1.7b-Q4_K_M.gguf

# Create Modelfile
cat > Modelfile << 'EOF'
FROM hito-1.7b-Q4_K_M.gguf
PARAMETER temperature 0.7
PARAMETER stop "<|im_end|>"
EOF

# Create and run
ollama create hito -f Modelfile
ollama run hito

llama.cpp

./llama-cli -m hito-1.7b-Q4_K_M.gguf -p "<|im_start|>user\nA bat and a ball cost \$1.10 together. The bat costs \$1.00 more than the ball. How much does the ball cost?<|im_end|>\n<|im_start|>assistant\n" -n 256

LM Studio

Download any GGUF file from this repository
Open LM Studio and load the model
Start chatting!

Quantization Methods

Click to see details

K-Quants (recommended): Use importance matrix for smarter quantization

Q2_K: 2-bit with 4-bit scales, ~2.5 bpw (bits per weight)
Q3_K: 3-bit with 6-bit scales, ~3.4 bpw
Q4_K: 4-bit with 6-bit scales, ~4.5 bpw
Q5_K: 5-bit with 6-bit scales, ~5.5 bpw
Q6_K: 6-bit with 8-bit scales, ~6.5 bpw

Legacy Quants: Simpler but less optimal

Q4_0/Q5_0: Basic 4/5-bit, prefer K-quants
Q8_0: 8-bit, nearly lossless
F16: Full 16-bit precision

What Makes Hito Special

Trained to think - Uses <think> tags with nested cognitive reasoning
Self-correcting - <doubt> and <verify> tags catch errors mid-reasoning
Humble by design - Admits uncertainty and limitations
Tiny but capable - Only 1.7B parameters, runs on CPU

See full details at hitonet/hito-1.7b.

⚖️ Licensing

Component	License	Commercial Use
Model Weights (GGUF files)	Apache 2.0	✅ Free to use
NCR Method/Architecture	CC BY-NC-ND	❌ Requires paid license

Commercial Licensing Required

The model weights (these GGUF files) are open source (Apache 2.0) - use them freely.

The Nested Cognitive Reasoning methodology (the cognitive tags, tree-structured thinking, humble tags system) is protected under CC BY-NC-ND.

Commercial use of the NCR method requires a license.

Contact: [email protected]

Model tree for hitonet/hito-1.7b-GGUF

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B