🤖 Fyodor-Q3-8B-Instruct

The "God Mode" Agentic Coder

📝 Model Overview

Fyodor-Q3-8B-Instruct is a high-fidelity instruction-tuned model designed for Agentic Reasoning and Robust Code Generation.

Unlike standard fine-tunes, Fyodor was trained using a "God Mode" LoRA configuration (Rank 128) on a curated curriculum of 60% Coding, 20% Instruction Following, and 20% Reasoning/Tool use. It excels at planning before coding, making it suitable for complex software engineering tasks.

Base Model: Qwen/Qwen3-8B
Developer: Kiy-K
Language(s): Python, JavaScript, TypeScript, C++, Rust, Go, and 80+ programming languages
Architecture: Dense Transformer (32k context window)
Training Precision: bfloat16
Training Framework: Unsloth + PyTorch

🧠 Training Strategy

Fyodor-Q3-8B utilizes a high-rank adaptation strategy to enable deep knowledge retention without the memory footprint of a full fine-tune.

The "Golden Ratio" Data Mix

The model was trained on a strategic blend of datasets to balance syntax proficiency with logic:

Category	Ratio	Datasets Used
Coding Mastery	60%	`flytech/python-codes-25k`, `CodeAlpaca-20k`, `iamtarun/code-alpaca`
Instruction Following	20%	`HuggingFaceH4/ultrachat_200k`, `OpenHermes-2.5`
Agentic Reasoning	10%	`Open-Orca/OpenOrca`, `Dolphin` (CoT)
Tool Use	10%	`Salesforce/xlam-function-calling-60k`

Hyperparameters ("God Mode" Config)

We utilized a significantly higher rank than standard LoRA implementations to allow the model to learn complex reasoning patterns.

{
    "lora_r": 128,          # High capacity adaptation
    "lora_alpha": 256,      # Strong update scaling (2x rank)
    "lora_dropout": 0.05,
    "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    "batch_size": 4,
    "gradient_accumulation_steps": 8,  # Effective batch: 32
    "learning_rate": 2e-4,
    "max_seq_length": 4096,
    "warmup_steps": 100,
    "num_train_epochs": 3,
    "scheduler": "linear",
    "optimizer": "adamw_8bit"
}

🚀 Usage

Installation

pip install transformers torch accelerate

Basic Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "Kiy-K/Fyodor-Q3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Prepare prompt
prompt = """Write a Python function that implements binary search with detailed comments."""

messages = [
    {"role": "system", "content": "You are Fyodor, an expert programming assistant. You write clean, efficient code with clear explanations."},
    {"role": "user", "content": prompt}
]

# Generate
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Advanced Usage with Streaming

from transformers import TextIteratorStreamer
from threading import Thread

streamer = TextIteratorStreamer(tokenizer, skip_special_tokens=True)

generation_kwargs = dict(
    inputs=inputs.input_ids,
    streamer=streamer,
    max_new_tokens=1024,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

thread = Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()

print("Assistant: ", end="")
for new_text in streamer:
    print(new_text, end="", flush=True)
print()

Function Calling Example

prompt = """I need to create a REST API endpoint that fetches user data. 
Available tools:
- fetch_user(user_id: int) -> dict
- validate_token(token: str) -> bool

Write the endpoint handler."""

messages = [
    {"role": "system", "content": "You are an expert at API development and tool orchestration."},
    {"role": "user", "content": prompt}
]

# Apply chat template and generate
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.7,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Quantization (4-bit)

from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=quantization_config,
    device_map="auto"
)

🎯 Key Capabilities

Multi-Language Proficiency: Python, JavaScript, TypeScript, C++, Rust, Go, Java, and more
Agentic Reasoning: Plans multi-step solutions before implementation
Function Calling: Native understanding of tool use and API integration
Code Review: Can analyze and critique existing code with constructive feedback
Documentation: Generates comprehensive docstrings and comments
Debugging: Identifies and fixes bugs with clear explanations

📊 Prompt Format

Fyodor uses the ChatML format from Qwen3:

<|im_start|>system
You are Fyodor, an expert programming assistant.<|im_end|>
<|im_start|>user
{your prompt here}<|im_end|>
<|im_start|>assistant

⚡ Performance Tips

Temperature: Use 0.1-0.3 for precise code generation, 0.7-0.9 for creative solutions
Context: Provide clear requirements and constraints upfront
System Prompt: Customize the system message for domain-specific tasks
Max Tokens: Allow 512-2048 tokens for complex implementations
Top-p: Keep at 0.9 for balanced diversity and coherence

🔧 Hardware Requirements

Precision	VRAM (Inference)	VRAM (Training)
FP16/BF16	~16 GB	~24 GB
INT8	~8 GB	N/A
INT4	~4.5 GB	N/A

🏆 Official Benchmark Results

Code Generation Performance

Fyodor v1 (8B) demonstrates state-of-the-art efficiency, significantly outperforming models with comparable parameter counts.

Model	Parameters	HumanEval (Pass@1)	MBPP (Pass@1)
Fyodor v1 (Ours)	8B	68.3% 🚀	57.2%
DeepSeek-Coder-Instruct	1.3B	64.0%	55.0%
StarCoder2	7B	35.4%	54.4%
Llama-2	7B	33.5%	45.0%

> Evaluated using BigCode Evaluation Harness (Greedy Decoding, T=0.2)

General Reasoning & Knowledge

Current evaluation shows the model retains "Senior Dev" knowledge but experiences reduced mathematical precision due to the lacking presence of mathematics datasets

Benchmark	Fyodor Q3 Score	vs. Base Model	Interpretation
MMLU (Knowledge)	72.95%	~ -1.5%	✅ Excellent. The model knows facts, history, and coding concepts perfectly.
HellaSwag (Common Sense)	74.94%	~ -5.0%	✅ Good. Understands context and situations well.
GSM8K (Math Logic)	18.88%	🔻 Significant Drop	⚠️ Known Weakness. The "Chain of Thought" precision breaks down due to Q3 noise.

> Evaluated using LM Evaluation Harness on NVIDIA A100 GPUs

Key Takeaways:

Knowledge & Reasoning: Minimal degradation from base model, excellent for software engineering tasks
Common Sense: Strong performance in contextual understanding
Mathematical Reasoning: Q3 quantization significantly impacts pure arithmetic tasks, but practical coding logic remains strong

Tool Use

🤝 Contributing

Found a bug or have a suggestion? Open an issue or submit a PR!

📄 License

This model is released under the Apache 2.0 license, inheriting from the Qwen3 base model.

🙏 Acknowledgments

Qwen Team for the exceptional Qwen3 base model
Unsloth for efficient training infrastructure
HuggingFace for dataset hosting and model hub
Team Mradermacher for GGUF conversions and local hosting accessibilities

📮 Citation

@misc{fyodor-q3-8b-instruct,
  author = {Kiy-K},
  title = {Fyodor-Q3-8B-Instruct: God Mode Agentic Coder},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Kiy-K/Fyodor-Q3-8B-Instruct}}
}

Made with ❤️ by Kiy-K | Powered by Qwen3