Model Card for qwen2_vl_2b_sft_working_T365
This model is a fine-tuned version of Qwen/Qwen2-VL-2B-Instruct. It has been trained using TRL.
Quick start
import torch
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from PIL import Image
# ๐ฏ CHANGED: Use your HF model instead of local path
model_path = "Tushar365/qwen2-vl-2b-sft-tushar365" # Your HF repo
print(f"๐ Loading model from Hugging Face: {model_path}")
# Load your fine-tuned model from HF
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True # Required for custom models
)
# Load processor and tokenizer (these stay the same)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", trust_remote_code=True)
print("โ
Model loaded successfully from Hugging Face!")
def test_single_image(image_path, question):
"""Test with a single image"""
# Load image
image = Image.open(image_path)
# Prepare the conversation
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": question}
]
}
]
# Apply chat template
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Process inputs
inputs = processor(text=[text], images=[image], return_tensors="pt")
inputs = inputs.to(model.device)
# Generate response
with torch.no_grad():
generated_ids = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.1,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
# Decode response
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
# Extract just the assistant's response
assistant_response = generated_text.split("assistant\n")[-1]
return assistant_response
def test_disaster_assessment(pre_image_path, post_image_path):
"""Test disaster assessment with before/after images"""
# Test pre-image first
pre_response = test_single_image(
pre_image_path,
"This is a pre-disaster satellite image. I will show you the post-disaster image next for comparison."
)
print("Pre-disaster response:", pre_response)
print("\n" + "="*50 + "\n")
# Test post-image with comparison task
post_response = test_single_image(
post_image_path,
"This is the post-disaster satellite image. Compare with the previous pre-disaster image and provide a comprehensive building damage assessment report."
)
print("Disaster assessment:", post_response)
return post_response
# Test your model
print("Testing fine-tuned Qwen2-VL model from Hugging Face...")
# Test with your training images
result = test_disaster_assessment(
"/pre_flood.png",
"/post_flood.png"
)
Training procedure
This model was trained with SFT.
Framework versions
- TRL: 0.21.0
- Transformers: 4.55.4
- Pytorch: 2.8.0+cu126
- Datasets: 3.2.0
- Tokenizers: 0.21.4
Citations
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
- Downloads last month
- 7