Spaces:

abdulqadir02
/

AI_Image_caption_Recommendation_using_CLIP

Sleeping

App Files Files Community

abdulqadir02 commited on Feb 18

Commit

e02c264

1 Parent(s): 4b61277

final push

Browse files

Files changed (8) hide show

.gitignore +1 -0
README.md +95 -0
app.py +22 -0
caption.txt +92 -0
modules/caption_utils.py +16 -0
modules/image_utils.py +8 -0
modules/model_utils.py +8 -0
requirements.txt +5 -0

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ caption

README.md CHANGED Viewed

@@ -12,3 +12,98 @@ short_description: This AI system recommends captions for the images
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
+# AI Image Caption Recommender System with CLIP
+This project is an AI-powered image caption recommender system built using OpenAI's CLIP model. It recommends captions for social media posts (Instagram, Facebook, etc.) by finding the most relevant captions from a curated list based on the uploaded image.
+---
+## Problem Statement
+Social media users often struggle to come up with engaging captions for their posts. This project solves that problem by automatically recommending captions that best match an uploaded image, helping users enhance their social media presence effortlessly.
+---
+## Features
+- Upload images via a web interface.
+- Generate image embeddings using CLIP.
+- Match images to relevant captions from a predefined list.
+- Display top 5 most relevant captions with similarity scores.
+- Modular code structure for easy maintenance and scalability.
+---
+## Project Structure
+```
+clip_caption_recommender/
+│
+├── app.py              # Main Streamlit app
+├── image_utils.py      # Image processing functions
+├── model_utils.py      # CLIP model loading and embeddings
+├── caption_utils.py    # Caption matching logic
+├── captions.txt        # List of predefined captions
+└── requirements.txt    # Project dependencies
+```
+---
+## Installation
+```bash
+# Clone the repository
+git clone https://github.com/yourusername/clip-caption-recommender.git
+cd clip-caption-recommender
+# Install dependencies
+pip install -r requirements.txt
+```
+---
+## Usage
+```bash
+# Run the Streamlit app
+streamlit run app.py
+```
+- Upload an image.
+- View top 5 recommended captions with similarity scores.
+---
+## Adding More Captions
+- Add more captions in `captions.txt` with one caption per line.
+---
+## Technologies Used
+- Python
+- OpenAI CLIP
+- Streamlit
+- Transformers
+- scikit-learn
+- Pillow
+---
+## Future Improvements
+- Integrate with social media APIs for direct caption posting.
+- Allow users to add personalized captions.
+- Implement fine-tuning on domain-specific datasets.
+- Deploy using Docker and AWS for scalability.
+---
+## Author
+- **Your Name**
+  [LinkedIn](https://linkedin.com/in/abdulqadir02) | [GitHub](https://github.com/AbdulQadir0211)
+---
+## License
+This project is licensed under the MIT License.
+---
+Happy Posting! 🚀

app.py ADDED Viewed

	@@ -0,0 +1,22 @@

+import streamlit as st
+from modules.image_utils import load_and_preprocess_image
+from modules.model_utils import generate_image_embeddings
+from modules.caption_utils import match_captions
+def image_captioning(image_path, candidate_captions):
+    inputs, processor = load_and_preprocess_image(image_path)
+    image_features, clip_model = generate_image_embeddings(inputs)
+    best_captions, similarities = match_captions(image_features, candidate_captions, clip_model, processor)
+    return best_captions, similarities
+st.title("AI Image Caption Recommender with CLIP")
+uploaded_image = st.file_uploader("Upload an image", type=["jpg", "jpeg", "png"])
+candidate_captions = ["Trees, Travel and Tea!", "A refreshing beverage.", ...]  # Add all captions
+if uploaded_image is not None:
+    st.image(uploaded_image, caption="Uploaded Image", use_column_width=True)
+    best_captions, similarities = image_captioning(uploaded_image, candidate_captions)
+    st.write("### Top Captions")
+    for idx, (caption, similarity) in enumerate(zip(best_captions[:5], similarities[:5])):
+        st.write(f"{idx+1}. {caption} (Similarity: {similarity:.4f})")

caption.txt ADDED Viewed

	@@ -0,0 +1,92 @@

+Trees, Travel and Tea!.
+A refreshing beverage.
+A moment of indulgence.
+The perfect thirst quencher.
+Your daily dose of delight.
+Taste the tradition.
+Savor the flavor.
+Refresh and rejuvenate.
+Unwind and enjoy.
+The taste of home.
+A treat for your senses.
+A taste of adventure.
+A moment of bliss.
+Your travel companion.
+Fuel for your journey.
+The essence of nature.
+The warmth of comfort.
+A sip of happiness.
+Pure indulgence.
+Quench your thirst, ignite your spirit.
+Awaken your senses, embrace the moment.
+The taste of faraway lands.
+A taste of home, wherever you are.
+Your daily dose of delight.
+Your moment of serenity.
+The perfect pick-me-up.
+The perfect way to unwind.
+Taste the difference.
+Experience the difference.
+A refreshing escape.
+A delightful escape.
+The taste of tradition, the spirit of adventure.
+The warmth of home, the joy of discovery.
+Your passport to flavor.
+Your ticket to tranquility.
+Sip, savor, and explore.
+Indulge, relax, and rejuvenate.
+The taste of wanderlust.
+The comfort of home.
+A journey for your taste buds.
+A haven for your senses.
+Your refreshing companion.
+Your delightful escape.
+Taste the world, one sip at a time.
+Embrace the moment, one cup at a time.
+The essence of exploration.
+The comfort of connection.
+Quench your thirst for adventure.
+Savor the moment of peace.
+The taste of discovery.
+The warmth of belonging.
+Your travel companion, your daily delight.
+Your moment of peace, your daily indulgence.
+The spirit of exploration, the comfort of home.
+The joy of discovery, the warmth of connection.
+Sip, savor, and set off on an adventure.
+Indulge, relax, and find your peace.
+A delightful beverage.",
+A moment of relaxation.",
+The perfect way to start your day.
+The perfect way to end your day.
+A treat for yourself.
+Something to savor.
+A moment of calm.
+A taste of something special.
+A refreshing pick-me-up.
+A comforting drink.
+A taste of adventure.
+A moment of peace.
+A small indulgence.
+A daily ritual.
+A way to connect with others.
+A way to connect with yourself.
+A taste of home.
+A taste of something new.
+A moment to enjoy.
+A moment to remember.
+Capturing moments of joy and laughter.
+A peaceful sunset by the ocean.
+A cup of coffee on a rainy morning.
+Adventurous hikes through mountain trails.
+A serene lake reflecting the clear blue sky.
+Busy city streets bustling with energy.
+A cozy corner with a good book.
+Soft petals of blooming flowers.
+A majestic lion in the savannah.
+Snow-capped mountains under a starlit sky.
+Children playing in the park.
+Freshly baked bread cooling on a rack.
+A traveler gazing at ancient ruins.
+Bright lights of a carnival at night.
+A quiet beach with gentle waves.

modules/caption_utils.py ADDED Viewed

	@@ -0,0 +1,16 @@

+import torch
+from sklearn.metrics.pairwise import cosine_similarity
+def match_captions(image_features, captions, clip_model, processor):
+    text_inputs = processor(text=captions, return_tensors="pt", padding=True)
+    with torch.no_grad():
+        text_features = clip_model.get_text_features(**text_inputs)
+    image_features = image_features.detach().cpu().numpy()
+    text_features = text_features.detach().cpu().numpy()
+    similarities = cosine_similarity(image_features, text_features)
+    best_indices = similarities.argsort(axis=1)[0][::-1]
+    best_captions = [captions[i] for i in best_indices]
+    return best_captions, similarities[0][best_indices].tolist()

modules/image_utils.py ADDED Viewed

	@@ -0,0 +1,8 @@

+from PIL import Image
+from transformers import CLIPProcessor
+def load_and_preprocess_image(image_path):
+    image = Image.open(image_path).convert("RGB")
+    processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
+    inputs = processor(images=image, return_tensors="pt")
+    return inputs, processor

modules/model_utils.py ADDED Viewed

	@@ -0,0 +1,8 @@

+import torch
+from transformers import CLIPModel
+def generate_image_embeddings(inputs):
+    model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
+    with torch.no_grad():
+        image_features = model.get_image_features(**inputs)
+    return image_features, model

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+torch
+transformers
+pillow
+scikit-learn
+streamlit