abdulqadir02 commited on
Commit
e02c264
·
1 Parent(s): 4b61277

final push

Browse files
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ caption
README.md CHANGED
@@ -12,3 +12,98 @@ short_description: This AI system recommends captions for the images
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
15
+
16
+
17
+ # AI Image Caption Recommender System with CLIP
18
+
19
+ This project is an AI-powered image caption recommender system built using OpenAI's CLIP model. It recommends captions for social media posts (Instagram, Facebook, etc.) by finding the most relevant captions from a curated list based on the uploaded image.
20
+
21
+ ---
22
+
23
+ ## Problem Statement
24
+ Social media users often struggle to come up with engaging captions for their posts. This project solves that problem by automatically recommending captions that best match an uploaded image, helping users enhance their social media presence effortlessly.
25
+
26
+ ---
27
+
28
+ ## Features
29
+ - Upload images via a web interface.
30
+ - Generate image embeddings using CLIP.
31
+ - Match images to relevant captions from a predefined list.
32
+ - Display top 5 most relevant captions with similarity scores.
33
+ - Modular code structure for easy maintenance and scalability.
34
+
35
+ ---
36
+
37
+ ## Project Structure
38
+ ```
39
+ clip_caption_recommender/
40
+
41
+ ├── app.py # Main Streamlit app
42
+ ├── image_utils.py # Image processing functions
43
+ ├── model_utils.py # CLIP model loading and embeddings
44
+ ├── caption_utils.py # Caption matching logic
45
+ ├── captions.txt # List of predefined captions
46
+ └── requirements.txt # Project dependencies
47
+ ```
48
+
49
+ ---
50
+
51
+ ## Installation
52
+ ```bash
53
+ # Clone the repository
54
+ git clone https://github.com/yourusername/clip-caption-recommender.git
55
+ cd clip-caption-recommender
56
+
57
+ # Install dependencies
58
+ pip install -r requirements.txt
59
+ ```
60
+
61
+ ---
62
+
63
+ ## Usage
64
+ ```bash
65
+ # Run the Streamlit app
66
+ streamlit run app.py
67
+ ```
68
+ - Upload an image.
69
+ - View top 5 recommended captions with similarity scores.
70
+
71
+ ---
72
+
73
+ ## Adding More Captions
74
+ - Add more captions in `captions.txt` with one caption per line.
75
+
76
+ ---
77
+
78
+ ## Technologies Used
79
+ - Python
80
+ - OpenAI CLIP
81
+ - Streamlit
82
+ - Transformers
83
+ - scikit-learn
84
+ - Pillow
85
+
86
+ ---
87
+
88
+ ## Future Improvements
89
+ - Integrate with social media APIs for direct caption posting.
90
+ - Allow users to add personalized captions.
91
+ - Implement fine-tuning on domain-specific datasets.
92
+ - Deploy using Docker and AWS for scalability.
93
+
94
+ ---
95
+
96
+ ## Author
97
+ - **Your Name**
98
+ [LinkedIn](https://linkedin.com/in/abdulqadir02) | [GitHub](https://github.com/AbdulQadir0211)
99
+
100
+ ---
101
+
102
+ ## License
103
+ This project is licensed under the MIT License.
104
+
105
+ ---
106
+
107
+ Happy Posting! 🚀
108
+
109
+
app.py ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ from modules.image_utils import load_and_preprocess_image
3
+ from modules.model_utils import generate_image_embeddings
4
+ from modules.caption_utils import match_captions
5
+
6
+ def image_captioning(image_path, candidate_captions):
7
+ inputs, processor = load_and_preprocess_image(image_path)
8
+ image_features, clip_model = generate_image_embeddings(inputs)
9
+ best_captions, similarities = match_captions(image_features, candidate_captions, clip_model, processor)
10
+ return best_captions, similarities
11
+
12
+ st.title("AI Image Caption Recommender with CLIP")
13
+ uploaded_image = st.file_uploader("Upload an image", type=["jpg", "jpeg", "png"])
14
+
15
+ candidate_captions = ["Trees, Travel and Tea!", "A refreshing beverage.", ...] # Add all captions
16
+
17
+ if uploaded_image is not None:
18
+ st.image(uploaded_image, caption="Uploaded Image", use_column_width=True)
19
+ best_captions, similarities = image_captioning(uploaded_image, candidate_captions)
20
+ st.write("### Top Captions")
21
+ for idx, (caption, similarity) in enumerate(zip(best_captions[:5], similarities[:5])):
22
+ st.write(f"{idx+1}. {caption} (Similarity: {similarity:.4f})")
caption.txt ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Trees, Travel and Tea!.
2
+ A refreshing beverage.
3
+ A moment of indulgence.
4
+ The perfect thirst quencher.
5
+ Your daily dose of delight.
6
+ Taste the tradition.
7
+ Savor the flavor.
8
+ Refresh and rejuvenate.
9
+ Unwind and enjoy.
10
+ The taste of home.
11
+ A treat for your senses.
12
+ A taste of adventure.
13
+ A moment of bliss.
14
+ Your travel companion.
15
+ Fuel for your journey.
16
+ The essence of nature.
17
+ The warmth of comfort.
18
+ A sip of happiness.
19
+ Pure indulgence.
20
+ Quench your thirst, ignite your spirit.
21
+ Awaken your senses, embrace the moment.
22
+ The taste of faraway lands.
23
+ A taste of home, wherever you are.
24
+ Your daily dose of delight.
25
+ Your moment of serenity.
26
+ The perfect pick-me-up.
27
+ The perfect way to unwind.
28
+ Taste the difference.
29
+ Experience the difference.
30
+ A refreshing escape.
31
+ A delightful escape.
32
+ The taste of tradition, the spirit of adventure.
33
+ The warmth of home, the joy of discovery.
34
+ Your passport to flavor.
35
+ Your ticket to tranquility.
36
+ Sip, savor, and explore.
37
+ Indulge, relax, and rejuvenate.
38
+ The taste of wanderlust.
39
+ The comfort of home.
40
+ A journey for your taste buds.
41
+ A haven for your senses.
42
+ Your refreshing companion.
43
+ Your delightful escape.
44
+ Taste the world, one sip at a time.
45
+ Embrace the moment, one cup at a time.
46
+ The essence of exploration.
47
+ The comfort of connection.
48
+ Quench your thirst for adventure.
49
+ Savor the moment of peace.
50
+ The taste of discovery.
51
+ The warmth of belonging.
52
+ Your travel companion, your daily delight.
53
+ Your moment of peace, your daily indulgence.
54
+ The spirit of exploration, the comfort of home.
55
+ The joy of discovery, the warmth of connection.
56
+ Sip, savor, and set off on an adventure.
57
+ Indulge, relax, and find your peace.
58
+ A delightful beverage.",
59
+ A moment of relaxation.",
60
+ The perfect way to start your day.
61
+ The perfect way to end your day.
62
+ A treat for yourself.
63
+ Something to savor.
64
+ A moment of calm.
65
+ A taste of something special.
66
+ A refreshing pick-me-up.
67
+ A comforting drink.
68
+ A taste of adventure.
69
+ A moment of peace.
70
+ A small indulgence.
71
+ A daily ritual.
72
+ A way to connect with others.
73
+ A way to connect with yourself.
74
+ A taste of home.
75
+ A taste of something new.
76
+ A moment to enjoy.
77
+ A moment to remember.
78
+ Capturing moments of joy and laughter.
79
+ A peaceful sunset by the ocean.
80
+ A cup of coffee on a rainy morning.
81
+ Adventurous hikes through mountain trails.
82
+ A serene lake reflecting the clear blue sky.
83
+ Busy city streets bustling with energy.
84
+ A cozy corner with a good book.
85
+ Soft petals of blooming flowers.
86
+ A majestic lion in the savannah.
87
+ Snow-capped mountains under a starlit sky.
88
+ Children playing in the park.
89
+ Freshly baked bread cooling on a rack.
90
+ A traveler gazing at ancient ruins.
91
+ Bright lights of a carnival at night.
92
+ A quiet beach with gentle waves.
modules/caption_utils.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from sklearn.metrics.pairwise import cosine_similarity
3
+
4
+ def match_captions(image_features, captions, clip_model, processor):
5
+ text_inputs = processor(text=captions, return_tensors="pt", padding=True)
6
+ with torch.no_grad():
7
+ text_features = clip_model.get_text_features(**text_inputs)
8
+
9
+ image_features = image_features.detach().cpu().numpy()
10
+ text_features = text_features.detach().cpu().numpy()
11
+
12
+ similarities = cosine_similarity(image_features, text_features)
13
+ best_indices = similarities.argsort(axis=1)[0][::-1]
14
+ best_captions = [captions[i] for i in best_indices]
15
+
16
+ return best_captions, similarities[0][best_indices].tolist()
modules/image_utils.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ from PIL import Image
2
+ from transformers import CLIPProcessor
3
+
4
+ def load_and_preprocess_image(image_path):
5
+ image = Image.open(image_path).convert("RGB")
6
+ processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
7
+ inputs = processor(images=image, return_tensors="pt")
8
+ return inputs, processor
modules/model_utils.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from transformers import CLIPModel
3
+
4
+ def generate_image_embeddings(inputs):
5
+ model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
6
+ with torch.no_grad():
7
+ image_features = model.get_image_features(**inputs)
8
+ return image_features, model
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ torch
2
+ transformers
3
+ pillow
4
+ scikit-learn
5
+ streamlit