Model Card for qwen2_vl_2b_sft_working_T365

This model is a fine-tuned version of Qwen/Qwen2-VL-2B-Instruct. It has been trained using TRL.

Quick start

import torch
from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from PIL import Image

# 🎯 CHANGED: Use your HF model instead of local path
model_path = "Tushar365/qwen2-vl-2b-sft-tushar365"  # Your HF repo

print(f"🚀 Loading model from Hugging Face: {model_path}")

# Load your fine-tuned model from HF
model = Qwen2VLForConditionalGeneration.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True  # Required for custom models
)

# Load processor and tokenizer (these stay the same)
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-VL-2B-Instruct", trust_remote_code=True)

print("✅ Model loaded successfully from Hugging Face!")

def test_single_image(image_path, question):
    """Test with a single image"""
    # Load image
    image = Image.open(image_path)

    # Prepare the conversation
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "image", "image": image},
                {"type": "text", "text": question}
            ]
        }
    ]

    # Apply chat template
    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

    # Process inputs
    inputs = processor(text=[text], images=[image], return_tensors="pt")
    inputs = inputs.to(model.device)

    # Generate response
    with torch.no_grad():
        generated_ids = model.generate(
            **inputs,
            max_new_tokens=512,
            temperature=0.1,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    # Decode response
    generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

    # Extract just the assistant's response
    assistant_response = generated_text.split("assistant\n")[-1]
    return assistant_response

def test_disaster_assessment(pre_image_path, post_image_path):
    """Test disaster assessment with before/after images"""
    # Test pre-image first
    pre_response = test_single_image(
        pre_image_path,
        "This is a pre-disaster satellite image. I will show you the post-disaster image next for comparison."
    )
    print("Pre-disaster response:", pre_response)
    print("\n" + "="*50 + "\n")

    # Test post-image with comparison task
    post_response = test_single_image(
        post_image_path,
        "This is the post-disaster satellite image. Compare with the previous pre-disaster image and provide a comprehensive building damage assessment report."
    )
    print("Disaster assessment:", post_response)
    return post_response

# Test your model
print("Testing fine-tuned Qwen2-VL model from Hugging Face...")

# Test with your training images
result = test_disaster_assessment(
    "/pre_flood.png",
    "/post_flood.png"
)

Training procedure

This model was trained with SFT.

Framework versions

TRL: 0.21.0
Transformers: 4.55.4
Pytorch: 2.8.0+cu126
Datasets: 3.2.0
Tokenizers: 0.21.4

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Downloads last month: 7

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for Tushar365/qwen2-vl-2b-sft-tushar365

Base model

Qwen/Qwen2-VL-2B

Finetuned

Qwen/Qwen2-VL-2B-Instruct

Finetuned

(316)

this model