|
|
--- |
|
|
title: Autonomous Rendering And Imaging Agent (A.R.I.A.) |
|
|
emoji: π |
|
|
colorFrom: blue |
|
|
colorTo: green |
|
|
sdk: gradio |
|
|
sdk_version: 6.0.1 |
|
|
app_file: app.py |
|
|
pinned: true |
|
|
short_description: MCP's 1st Birthday - Hosted by Anthropic and Gradio Project |
|
|
tags: |
|
|
- mcp-in-action-track-creative |
|
|
- agent-course |
|
|
- stable-diffusion |
|
|
- agentic-workflow |
|
|
license: apache-2.0 |
|
|
thumbnail: >- |
|
|
https://cdn-uploads.huggingface.co/production/uploads/68389182a9bdf98f9279e018/jr7YqXf2Q1rteSNJ7yIFB.png |
|
|
--- |
|
|
|
|
|
# 𧬠A.R.I.A. (Autonomous Rendering & Imaging Agent) |
|
|
|
|
|
> **"Autonomous reasoning for artistic execution."** |
|
|
|
|
|
A.R.I.A. is an AI Agent designed for the **Track 2: MCP in Action (Creative Applications)** category. It demonstrates a complete agentic loopβ**Reasoning, Planning, and Execution**βto transform raw visual signals into complex artistic outputs without opaque "black box" generation. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Hackathon Submission Details |
|
|
|
|
|
* **Track:** π€ Track 2: MCP in Action |
|
|
* **Category:** Creative Applications |
|
|
* **Tag:** `mcp-in-action-track-creative` |
|
|
* **Agent Vision:** To create a system that acts as a "Digital Art Director," breaking down a user's vague intent into specific technical steps (Foundation vs. Atmosphere) and executing them sequentially. |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Agentic Workflow (The "Reasoning" Core) |
|
|
|
|
|
This application is not a simple filter. It utilizes a structured **Chain-of-Thought (CoT)** engine to simulate an artist's decision-making process. |
|
|
|
|
|
### 1. Reasoning (Analysis) |
|
|
The agent first analyzes the **Input Signal** (User Image) and the **Semantic Intent** (Prompt). It determines the necessary stylistic divergence based on the "Concept Divergence" parameter. |
|
|
|
|
|
> *Log Example:* `π THOUGHT: Analyzing input... Intent: 'Cyberpunk'. Strategy: Structure (Step 1) + Atmosphere (Step 2).` |
|
|
|
|
|
### 2. Planning (Tool Selection) |
|
|
Based on the analysis, A.R.I.A. selects the appropriate tools from its registry: |
|
|
|
|
|
* **Foundation Tool:** Selected to rewrite the texture matrix (e.g., *Oil Painting* to break digital edges). |
|
|
* **Atmosphere Tool:** Selected to inject lighting and mood (e.g., *Neon City* for color grading). |
|
|
|
|
|
### 3. Execution (Action) |
|
|
The agent executes the plan in a multi-pass pipeline: |
|
|
|
|
|
* **Pass 1 (High Denoising):** Destructive generation to establish the new style. |
|
|
* **Pass 2 (Low Denoising):** Constructive generation to "glaze" the final output. |
|
|
|
|
|
--- |
|
|
|
|
|
## πΌοΈ Three-Step Image Gallery |
|
|
|
|
|
This gallery represents the **three-step pipeline workflow**: |
|
|
|
|
|
* **Step 1:** Foundation / Structural Style |
|
|
* **Step 2:** Atmosphere / Lighting & Mood |
|
|
* **Step 3:** Final Output |
|
|
|
|
|
<div style="display: grid; grid-template-rows: repeat(3, auto); gap: 40px;"> |
|
|
|
|
|
<!-- Example 1 --> |
|
|
<div style="background:#fdfdfd; padding:15px; border-radius:8px;"> |
|
|
<h3 style="text-align:center; margin-bottom:10px;">Example 1</h3> |
|
|
<div style="display: grid; grid-template-columns: repeat(3, 1fr); gap: 15px; text-align:center;"> |
|
|
<div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);"> |
|
|
<strong>Step 1: Foundation</strong><br> |
|
|
<img src="assets/rabbit2.png" alt="Step 1" style="max-width:100%; border-radius:6px;" /> |
|
|
</div> |
|
|
<div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);"> |
|
|
<strong>Step 2: Atmosphere</strong><br> |
|
|
<img src="assets/ex1-step2.webp" alt="Step 2" style="max-width:100%; border-radius:6px;" /> |
|
|
</div> |
|
|
<div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);"> |
|
|
<strong>Step 3: Final Output</strong><br> |
|
|
<img src="assets/ex1-step3.webp" alt="Step 3" style="max-width:100%; border-radius:6px;" /> |
|
|
</div> |
|
|
</div> |
|
|
<hr style="margin:20px 0; border:0; border-top:2px solid #ccc;"> |
|
|
</div> |
|
|
|
|
|
<!-- Example 2 --> |
|
|
<div style="background:#f7f7f7; padding:15px; border-radius:8px;"> |
|
|
<h3 style="text-align:center; margin-bottom:10px;">Example 2</h3> |
|
|
<div style="display: grid; grid-template-columns: repeat(3, 1fr); gap: 15px; text-align:center;"> |
|
|
<div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);"> |
|
|
<strong>Step 1: Foundation</strong><br> |
|
|
<img src="assets/rabbit2.png" alt="Step 1" style="max-width:100%; border-radius:6px;" /> |
|
|
</div> |
|
|
<div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);"> |
|
|
<strong>Step 2: Atmosphere</strong><br> |
|
|
<img src="assets/ex2-step2.webp" alt="Step 2" style="max-width:100%; border-radius:6px;" /> |
|
|
</div> |
|
|
<div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);"> |
|
|
<strong>Step 3: Final Output</strong><br> |
|
|
<img src="assets/ex2-step3.webp" alt="Step 3" style="max-width:100%; border-radius:6px;" /> |
|
|
</div> |
|
|
</div> |
|
|
<hr style="margin:20px 0; border:0; border-top:2px solid #ccc;"> |
|
|
</div> |
|
|
|
|
|
<!-- Example 3 --> |
|
|
<div style="background:#fdfdfd; padding:15px; border-radius:8px;"> |
|
|
<h3 style="text-align:center; margin-bottom:10px;">Example 3</h3> |
|
|
<div style="display: grid; grid-template-columns: repeat(3, 1fr); gap: 15px; text-align:center;"> |
|
|
<div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);"> |
|
|
<strong>Step 1: Foundation</strong><br> |
|
|
<img src="assets/rabbit2.png" alt="Step 1" style="max-width:100%; border-radius:6px;" /> |
|
|
</div> |
|
|
<div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);"> |
|
|
<strong>Step 2: Atmosphere</strong><br> |
|
|
<img src="assets/ex3-step2.webp" alt="Step 2" style="max-width:100%; border-radius:6px;" /> |
|
|
</div> |
|
|
<div style="background:#ffffff; padding:10px; border-radius:8px; box-shadow:0 1px 4px rgba(0,0,0,0.1);"> |
|
|
<strong>Step 3: Final Output</strong><br> |
|
|
<img src="assets/ex3-step3.webp" alt="Step 3" style="max-width:100%; border-radius:6px;" /> |
|
|
</div> |
|
|
</div> |
|
|
</div> |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## π₯ Video Demonstrations |
|
|
|
|
|
<!-- YouTube Video Thumbnail --> |
|
|
<div> |
|
|
<h3>A.R.I.A. Overview (YouTube)</h3> |
|
|
<a href="https://www.youtube.com/watch?v=fZMkg6tRhEc" target="_blank"> |
|
|
<img src="https://img.youtube.com/vi/fZMkg6tRhEc/0.jpg" alt="A.R.I.A. Overview" style="width:560px; max-width:100%;"> |
|
|
</a> |
|
|
<p>Watch the full video on YouTube: <a href="https://www.youtube.com/watch?v=fZMkg6tRhEc" target="_blank">A.R.I.A. Overview</a></p> |
|
|
</div> |
|
|
|
|
|
### A.R.I.A. Video Demo |
|
|
|
|
|
π₯ [Watch ARIA.mp4 Demo](assets/ARIA.mp4) |
|
|
*(Note: video can only be viewed after downloading)* |
|
|
|
|
|
 |
|
|
|
|
|
## πΌ LinkedIn Showcase |
|
|
|
|
|
<ul> |
|
|
<li><a href="https://www.linkedin.com/posts/activity-7398136529942081536-9PY_?utm_source=share&utm_medium=member_desktop&rcm=ACoAADV81lIBHfqnWPcrqTwi8q3nrm4-wpvkldE" target="_blank">πΌ LinkedIn Post</a></li> |
|
|
</ul> |
|
|
|
|
|
--- |
|
|
|
|
|
## π οΈ Technical Stack |
|
|
|
|
|
* **Frontend:** Gradio 5 (Streaming interface for real-time feedback) |
|
|
* **Backend:** Python / PyTorch / Diffusers |
|
|
* **Model:** Stable Diffusion v1.5 (Optimized for CPU inference via Attention Slicing) |
|
|
* **UX:** Custom "Mission Control" CSS theme with Typewriter-style logging to visualize the Agent's thinking speed |
|
|
|
|
|
--- |
|
|
|
|
|
## π How to Use |
|
|
|
|
|
1. **Upload** a source image (sketch, photo, or 3D render). |
|
|
2. **Define Intent:** Type a prompt (e.g., "A lonely rover on Mars"). |
|
|
3. **Configure Agent:** |
|
|
* *Foundation:* Choose the base texture style. |
|
|
* *Atmosphere:* Choose the lighting/mood context. |
|
|
4. **Initialize:** Click **Initialize Agent Sequence**. |
|
|
5. **Observe:** Watch the **Chain-of-Thought Log** as the agent thinks, plans, and executes the visual transformation in real-time. |
|
|
6. **Compare:** Review multiple outputs in the **three-step gallery grid**. |
|
|
|
|
|
--- |
|
|
|
|
|
## π¦ Local Installation |
|
|
|
|
|
```bash |
|
|
pip install gradio diffusers torch transformers scipy |
|
|
python app.py |