π¨ CS x Design Convergence Project: Generative AI Pipeline & Workflow Archive
"Bridging Technical Logic with Aesthetic Sensibility"
This repository serves as a Portfolio Archive documenting the construction of Generative AI image generation pipelines and workflow optimization. As a result of an interdisciplinary curriculum merging Computer Science and Design, this project demonstrates the end-to-end process from data collection and model fine-tuning to the design of advanced inference workflows.
π 1. Project Overview
The core objective of this project is to demonstrate the ability to accurately train specific artistic styles and implement them into highly controllable workflows, going beyond simple prompt engineering. It aims to prove both technical proficiency (Model Architecture, Latent Space understanding) and artistic expression (Style Transfer).
- Key Activities: Custom LoRA Training, Advanced ComfyUI Workflow Design, Automated Pipeline Scripting.
- Tools Used: ComfyUI, OneTrainer, Stable Diffusion, Python, Hugging Face.
π§ 2. Model Training Methodology: Kirochy Style LoRA
To replicate the unique style of the illustrator Kirochy, I conducted LoRA (Low-Rank Adaptation) training with a rigorous data processing approach.
2.1 Data Acquisition & Preprocessing
- Data Source: Aggregated reference illustrations from the artist's official portfolios (Instagram @kirochy_00, X).
- Preprocessing: Implemented OneTrainer to handle various resolutions and aspect ratios via bucketing. Conducted detailed tagging to capture specific stylistic features (line art weight, color palettes, shading techniques).
2.2 Training Framework & Optimization
- Engine: Trained using OneTrainer for precise parameter control.
- Optimization: Adjusted Epochs and Learning Rates iteratively to balance between style fidelity and generalization, ensuring the model avoids overfitting while retaining the artist's signature touch.
βοΈ 3. Workflow Architecture: P2A (Photo to Anime) Pipeline
The p2a.ai.json file in this repository is a highly sophisticated Img2Img Workflow designed to convert real-world photos into Kirochy-style illustrations. To solve common structural distortion issues in style transfer, I engineered a multi-stage processing pipeline.
3.1 Technical Logic & Customization
This workflow is not a mere copy-paste; it is a custom-built architecture integrating various advanced techniques researched from diverse community workflows and technical documentation.
ControlNet Integration (Structural Integrity):
- Utilized ControlNet algorithms to strictly preserve the pose and depth information of the source image, preventing the "hallucinations" often seen in generative models.
SAM (Segment Anything Model) & SAG (Self-Attention Guidance):
- Integrated SAM for precise object segmentation and SAG to refine attention mechanisms. This ensures a clear separation between the subject and the background, enhancing the clarity of the illustration style.
Automatic Detailer (Face & Hand Refinement):
- Implemented a post-processing pipeline using Face and Hand Detailers. The workflow automatically detects and masks these complex regions, resampling them at higher resolutions to fix artifacts and ensure anatomical correctness.
πΌοΈ 4. Results & Portfolio Showcase
The final outputs generated using this model and workflow are archived on Instagram. You can compare the reference inputs with the generated results to verify the technical quality.
- Instagram Portfolio: @eom0am
β οΈ 5. Ethical Considerations & License
This project was conducted strictly for Academic Study and Research purposes.
β Copyright & Usage Warning
- Intellectual Property: The copyright and stylistic rights of the LoRA model belong entirely to the original artist, Kirochy (@kirochy_00).
- Non-Commercial Use Only: Utilizing this model file or the workflows for any commercial purpose (sales, paid commissions, advertising, etc.) is strictly prohibited.
- Legal Notice: Any commercial exploitation may result in legal consequences under copyright laws.
π Scope of Permitted Use
- β Allowed: Personal study, portfolio research, non-commercial fan art.
- β Prohibited: Commercial use, impersonation of the original artist, unauthorized redistribution for profit.
Author: Um Yunsang
Role: CS & Design Convergence Researcher / AI Engineer Candidate
Model tree for umyunsang/comfyui-models
Base model
stabilityai/stable-diffusion-xl-base-1.0