File size: 7,993 Bytes
1277a08
 
 
 
44b4842
1277a08
4eaa4f0
1277a08
 
 
 
 
 
 
 
 
e372500
 
1277a08
c74e7c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34e3a02
c74e7c5
 
 
 
34e3a02
c74e7c5
 
 
 
 
34e3a02
c74e7c5
 
 
 
 
0b1f014
34e3a02
 
 
 
 
 
 
fc723e3
 
 
 
 
 
 
 
4eaa4f0
fc723e3
 
 
4eaa4f0
fc723e3
 
 
4eaa4f0
fc723e3
 
 
34e3a02
 
fc723e3
 
 
 
 
 
4eaa4f0
fc723e3
 
 
4eaa4f0
fc723e3
 
 
4eaa4f0
fc723e3
 
 
34e3a02
 
fc723e3
 
 
 
 
 
4eaa4f0
fc723e3
 
 
4eaa4f0
fc723e3
 
 
4eaa4f0
fc723e3
 
34e3a02
fc723e3
34e3a02
 
 
 
21cb97c
 
73e1285
 
 
 
 
 
 
 
21cb97c
e83d504
 
30b7a30
e83d504
 
30b7a30
21cb97c
73e1285
21cb97c
73e1285
8298b54
73e1285
34e3a02
 
 
c74e7c5
 
34e3a02
 
 
 
c74e7c5
 
 
 
 
 
 
 
 
 
 
 
34e3a02
c74e7c5
 
 
 
 
 
 
0d1cefd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
title: Autonomous Rendering And Imaging Agent (A.R.I.A.)
emoji: πŸš€
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 6.0.1
app_file: app.py
pinned: true
short_description: MCP's 1st Birthday - Hosted by Anthropic and Gradio Project
tags:
- mcp-in-action-track-creative
- agent-course
- stable-diffusion
- agentic-workflow
license: apache-2.0
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/68389182a9bdf98f9279e018/jr7YqXf2Q1rteSNJ7yIFB.png
---

# 🧬 A.R.I.A. (Autonomous Rendering & Imaging Agent)

> **"Autonomous reasoning for artistic execution."**

A.R.I.A. is an AI Agent designed for the **Track 2: MCP in Action (Creative Applications)** category. It demonstrates a complete agentic loopβ€”**Reasoning, Planning, and Execution**β€”to transform raw visual signals into complex artistic outputs without opaque "black box" generation.

---

## πŸ† Hackathon Submission Details

* **Track:** πŸ€– Track 2: MCP in Action
* **Category:** Creative Applications
* **Tag:** `mcp-in-action-track-creative`
* **Agent Vision:** To create a system that acts as a "Digital Art Director," breaking down a user's vague intent into specific technical steps (Foundation vs. Atmosphere) and executing them sequentially.

---

## 🧠 Agentic Workflow (The "Reasoning" Core)

This application is not a simple filter. It utilizes a structured **Chain-of-Thought (CoT)** engine to simulate an artist's decision-making process.

### 1. Reasoning (Analysis)
The agent first analyzes the **Input Signal** (User Image) and the **Semantic Intent** (Prompt). It determines the necessary stylistic divergence based on the "Concept Divergence" parameter.

> *Log Example:* `πŸ’­ THOUGHT: Analyzing input... Intent: 'Cyberpunk'. Strategy: Structure (Step 1) + Atmosphere (Step 2).`

### 2. Planning (Tool Selection)
Based on the analysis, A.R.I.A. selects the appropriate tools from its registry:

* **Foundation Tool:** Selected to rewrite the texture matrix (e.g., *Oil Painting* to break digital edges).
* **Atmosphere Tool:** Selected to inject lighting and mood (e.g., *Neon City* for color grading).

### 3. Execution (Action)
The agent executes the plan in a multi-pass pipeline:

* **Pass 1 (High Denoising):** Destructive generation to establish the new style.
* **Pass 2 (Low Denoising):** Constructive generation to "glaze" the final output.

---

## πŸ–ΌοΈ Three-Step Image Gallery

This gallery represents the **three-step pipeline workflow**:

* **Step 1:** Foundation / Structural Style
* **Step 2:** Atmosphere / Lighting & Mood
* **Step 3:** Final Output

<div style="display: grid; grid-template-rows: repeat(3, auto); gap: 40px;">

  <!-- Example 1 -->
  <div style="background:#fdfdfd; padding:15px; border-radius:8px;">
    <h3 style="text-align:center; margin-bottom:10px;">Example 1</h3>
    <div style="display: grid; grid-template-columns: repeat(3, 1fr); gap: 15px; text-align:center;">
      <div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);">
        <strong>Step 1: Foundation</strong><br>
        <img src="assets/rabbit2.png" alt="Step 1" style="max-width:100%; border-radius:6px;" />
      </div>
      <div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);">
        <strong>Step 2: Atmosphere</strong><br>
        <img src="assets/ex1-step2.webp" alt="Step 2" style="max-width:100%; border-radius:6px;" />
      </div>
      <div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);">
        <strong>Step 3: Final Output</strong><br>
        <img src="assets/ex1-step3.webp" alt="Step 3" style="max-width:100%; border-radius:6px;" />
      </div>
    </div>
    <hr style="margin:20px 0; border:0; border-top:2px solid #ccc;">
  </div>

  <!-- Example 2 -->
  <div style="background:#f7f7f7; padding:15px; border-radius:8px;">
    <h3 style="text-align:center; margin-bottom:10px;">Example 2</h3>
    <div style="display: grid; grid-template-columns: repeat(3, 1fr); gap: 15px; text-align:center;">
      <div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);">
        <strong>Step 1: Foundation</strong><br>
        <img src="assets/rabbit2.png" alt="Step 1" style="max-width:100%; border-radius:6px;" />
      </div>
      <div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);">
        <strong>Step 2: Atmosphere</strong><br>
        <img src="assets/ex2-step2.webp" alt="Step 2" style="max-width:100%; border-radius:6px;" />
      </div>
      <div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);">
        <strong>Step 3: Final Output</strong><br>
        <img src="assets/ex2-step3.webp" alt="Step 3" style="max-width:100%; border-radius:6px;" />
      </div>
    </div>
    <hr style="margin:20px 0; border:0; border-top:2px solid #ccc;">
  </div>

  <!-- Example 3 -->
  <div style="background:#fdfdfd; padding:15px; border-radius:8px;">
    <h3 style="text-align:center; margin-bottom:10px;">Example 3</h3>
    <div style="display: grid; grid-template-columns: repeat(3, 1fr); gap: 15px; text-align:center;">
      <div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);">
        <strong>Step 1: Foundation</strong><br>
        <img src="assets/rabbit2.png" alt="Step 1" style="max-width:100%; border-radius:6px;" />
      </div>
      <div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);">
        <strong>Step 2: Atmosphere</strong><br>
        <img src="assets/ex3-step2.webp" alt="Step 2" style="max-width:100%; border-radius:6px;" />
      </div>
      <div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);">
        <strong>Step 3: Final Output</strong><br>
        <img src="assets/ex3-step3.webp" alt="Step 3" style="max-width:100%; border-radius:6px;" />
      </div>
    </div>
  </div>

</div>

---

## πŸŽ₯ Video Demonstrations

<!-- YouTube Video Thumbnail -->
<div>
  <h3>A.R.I.A. Overview (YouTube)</h3>
  <a href="https://www.youtube.com/watch?v=fZMkg6tRhEc" target="_blank">
    <img src="https://img.youtube.com/vi/fZMkg6tRhEc/0.jpg" alt="A.R.I.A. Overview" style="width:560px; max-width:100%;">
  </a>
  <p>Watch the full video on YouTube: <a href="https://www.youtube.com/watch?v=fZMkg6tRhEc" target="_blank">A.R.I.A. Overview</a></p>
</div>

### A.R.I.A. Video Demo

πŸŽ₯ [Watch ARIA.mp4 Demo](assets/ARIA.mp4)  
*(Note: video can only be viewed after downloading)*  

![ARIA Demo Preview](assets/ARIA.gif)

## πŸ’Ό LinkedIn Showcase

<ul>
  <li><a href="https://www.linkedin.com/posts/activity-7398136529942081536-9PY_?utm_source=share&utm_medium=member_desktop&rcm=ACoAADV81lIBHfqnWPcrqTwi8q3nrm4-wpvkldE" target="_blank">πŸ’Ό LinkedIn Post</a></li>
</ul>

---

## πŸ› οΈ Technical Stack

* **Frontend:** Gradio 5 (Streaming interface for real-time feedback)
* **Backend:** Python / PyTorch / Diffusers
* **Model:** Stable Diffusion v1.5 (Optimized for CPU inference via Attention Slicing)
* **UX:** Custom "Mission Control" CSS theme with Typewriter-style logging to visualize the Agent's thinking speed

---

## πŸš€ How to Use

1.  **Upload** a source image (sketch, photo, or 3D render).
2.  **Define Intent:** Type a prompt (e.g., "A lonely rover on Mars").
3.  **Configure Agent:**
    * *Foundation:* Choose the base texture style.
    * *Atmosphere:* Choose the lighting/mood context.
4.  **Initialize:** Click **Initialize Agent Sequence**.
5.  **Observe:** Watch the **Chain-of-Thought Log** as the agent thinks, plans, and executes the visual transformation in real-time.
6.  **Compare:** Review multiple outputs in the **three-step gallery grid**.

---

## πŸ“¦ Local Installation

```bash
pip install gradio diffusers torch transformers scipy
python app.py