---
license: mit
base_model: microsoft/Phi-3-mini-4k-instruct
datasets:
- b-mc2/sql-create-context
---

# QLoRA Adapter for Phi-3-mini: A Technical SQL Assistant

This repository contains a QLoRA (Parameter-Efficient Fine-Tuning) adapter for the `microsoft/Phi-3-mini-4k-instruct` model. The adapter was fine-tuned to act as a technical assistant that generates SQL queries from natural language questions, based on a provided database schema.

This project was developed as an educational exercise to demonstrate a complete, end-to-end fine-tuning pipeline, from data preparation to model evaluation.

## Model Description

- **Base Model:** `microsoft/Phi-3-mini-4k-instruct`
- **Fine-tuning Method:** QLoRA (4-bit NormalFloat quantization)
- **Task:** Text-to-SQL Generation

The model is designed to receive a database schema (as a `CREATE TABLE` statement) and a user's question in natural language. It then generates the appropriate SQL query to answer that question.

## How to Use

First, ensure you have the necessary libraries installed:

```bash
pip install -U transformers peft accelerate bitsandbytes torch
```
Next, use the following Python code.The `AutoPeftModelForCausalLM` class will automatically load the base model (Phi-3-mini) and
apply this fine-tuned adapter on top.

```python
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
import torch

# Your adapter's ID on the Hugging Face Hub
adapter_id = "manuelaschrittwieser/phi-3-mini-sql-assistant-adapter"

print("Loading model and adapter...")
# Load the fine-tuned model and adapter in one step
model = AutoPeftModelForCausalLM.from_pretrained(
    adapter_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
# The tokenizer is also loaded from the adapter's repository
tokenizer = AutoTokenizer.from_pretrained(adapter_id, trust_remote_code=True)

print("Model loaded successfully!")

# --- Define your schema and question ---
context = "CREATE TABLE employees (name VARCHAR, department VARCHAR, salary INTEGER)"
question = "What are the names of employees in the 'Engineering' department with a salary over 80000?"

# --- Format the prompt using the Phi-3 chat template ---
prompt = f"""<|user|>
Given the database schema:
{context}

Generate the SQL query for the following request:
{question}<|end|>
<|assistant|>
"""

# --- Generate the response ---
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
outputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=False)
generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

# --- Safely extract the generated SQL ---
# This handles cases where the model might forget the <|assistant|> token
assistant_token = "<|assistant|>"
if assistant_token in generated_text:
    generated_sql = generated_text.split(assistant_token)[1].strip()
else:
    # Fallback for when the model doesn't follow the template perfectly
    # It removes the original prompt from the generated text
    prompt_for_splitting = prompt.replace(assistant_token, "")
    generated_sql = generated_text.split(prompt_for_splitting)[-1].strip()


print(f"\nGenerated SQL: {generated_sql}")
# Expected output: SELECT name FROM employees WHERE department = 'Engineering' AND salary > 80000

```

## Training Procedure

#### Dataset

The model was fine-tuned on a 10,000-sample subset of the b-mc2/sql-create-context dataset. This dataset provides pairs of database schemas (context), natural language questions (question), and their corresponding SQL queries (answer). The subset was split into 9,000 examples for training and 1,000 for testing.

**Fine-tuning Configuration (QLoRA)**

* Quantization: 4-bit NormalFloat (NF4) with bfloat16 compute dtype.
* LoRA Rank (r): 8
* LoRA Alpha (lora_alpha): 16
* Target Modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

**Training Hyperparameters**

* Learning Rate: 2e-4
* Epochs: 1
* Batch Size: 2 (with 4 gradient accumulation steps for an effective batch size of 8)
* Optimizer: Paged AdamW (32-bit)
* LR Scheduler: Cosine

## Evaluation

The model's primary goal was to generate syntactically correct and logically sound SQL queries. A qualitative evaluation on the held-out test set showed that the model successfully learned this task, often producing the exact ground truth SQL.

However, after only one epoch of training, the model sometimes fails to perfectly adhere to the chat template format (e.g., omitting the <|assistant|> token), even when the generated SQL itself is correct. Further training would likely improve this formatting consistency.

## Limitations and Bias

* This model is a proof-of-concept and was trained on a limited dataset for a short duration. It may not generalize well to complex, unseen database schemas or highly intricate SQL queries.
* The model is not designed to be a chatbot and has no knowledge outside of the Text-to-SQL domain it was trained on.
* The training data may contain biases, and the model's outputs should be reviewed before execution in a production environment, especially to prevent potential SQL injection vulnerabilities.