YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Qwen3-8B Turkish Reasoning Agent (QLoRA)
Repo: BossriZytn/Qwen3-8B_tr_reasoning
Model Overview
This repository contains a QLoRA fine-tuned adapter for Qwen3-8B, tailored for Turkish instruction-following and chain-of-thought reasoning on the MCP (turkish-general-reasoning-28k) dataset. The base model remains the original Qwen3-8B in 4-bit quantized format, with low-rank adapters applied to enhance performance on Turkish reasoning tasks.
- Base model: Qwen3-8B (8B parameters) via
Qwen/Qwen3-8Bin 4-bit (bitsandbytes) - Adapter: LoRA (rank=8, α=32) on
q_proj,v_proj,o_proj - Method: QLoRA (4-bit quant + LoRA)
Intended Uses & Limitations
Use cases:
- Turkish Q&A with chain-of-thought
- Instruction-following and step-by-step reasoning
- Agentic reasoning modules in applications
Limitations:
- Not for general text generation
- May hallucinate or err in reasoning
- Verbose outputs if budget not managed
Training Data
- Dataset:
ituperceptron/turkish-general-reasoning-28k - Size: ~28K examples
- Split: 90% train / 10% validation
Fine-tuning
- Hardware: NVIDIA A100 40GB
- Quant: 4-bit (
bnb_4bit_compute_dtype=torch.float16) - LoRA: rank=8, alpha=32, modules=
q_proj,v_proj,o_proj - Training: 3 epochs; bs=2 (grad_accum=8); lr=5e-5; wd=0.01; warmup=5%; early stop=1
Eval
- Val loss: ~1.20 CE (PPL≈3.5)
- Gap: Train vs Val <0.01
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(
'BossriZytn/Qwen3-8B_tr_reasoning', use_fast=True
)
# Base model (4-bit)
bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)
base = AutoModelForCausalLM.from_pretrained(
'Qwen/Qwen3-8B', quantization_config=bnb, device_map='auto'
)
# Load adapter
model = PeftModel.from_pretrained(
base, 'BossriZytn/Qwen3-8B_tr_reasoning',
device_map='auto', torch_dtype=torch.float16
)
model.eval()
# Prompt
prompt = (
'### Talimat:\n'
'Ters dönmüş bir kaplumbağanın yardım çığlığını duyduğunda ne yapardın?\n'
'### Cevap:\n'
)
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
# Generate
out = model.generate(
**inputs,
max_length=inputs.input_ids.size(1)+512,
do_sample=False,
eos_token_id=None,
pad_token_id=tokenizer.pad_token_id
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
TODO
Optimize this adapter as a “thinking” component for real-time agent scenarios
Implement more aggressive quantization and integrate ONNX/TensorRT to achieve low latency
Enhance model diversity through data augmentation with larger Turkish query sets
Develop an API layer for seamless integration into agent control loops
Ethical
- May be biased
- Human oversight needed
License
Apache-2.0
Cite
@misc{qwen3_turkish_qlora,
title={Qwen3-8B Turkish Reasoning Agent (QLoRA)},
author={BossriZytn},
year={2025},
howpublished={\url{https://huggingface.co/BossriZytn/Qwen3-8B_tr_reasoning}}
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support