numz commited on
Commit
09ced71
·
verified ·
1 Parent(s): b77d5dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +785 -125
README.md CHANGED
@@ -13,46 +13,153 @@ base_model:
13
 
14
  [![View Code](https://img.shields.io/badge/📂_View_Code-GitHub-181717?style=for-the-badge&logo=github)](https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler)
15
 
16
- Official release of [SeedVR2](https://github.com/ByteDance-Seed/SeedVR) for ComfyUI that enables Upscale Video/Images generation.
17
 
18
- <img src="https://raw.githubusercontent.com/numz/ComfyUI-SeedVR2_VideoUpscaler/refs/heads/main/docs/demo_01.jpg">
19
- <img src="https://raw.githubusercontent.com/numz/ComfyUI-SeedVR2_VideoUpscaler/refs/heads/main/docs/demo_02.jpg">
20
 
21
- <img src="https://raw.githubusercontent.com/numz/ComfyUI-SeedVR2_VideoUpscaler/refs/heads/main/docs/usage.png">
 
 
 
 
22
 
23
  ## 📋 Quick Access
24
 
25
- - [🆙 Note and futur releases](#-note-and-futur-releases)
26
  - [🚀 Updates](#-updates)
27
  - [🎯 Features](#-features)
28
  - [🔧 Requirements](#-requirements)
29
  - [📦 Installation](#-installation)
30
  - [📖 Usage](#-usage)
31
- - [📊 Benchmarks](#-benchmarks)
32
- - [⚠️ Limitations](#-Limitations)
33
  - [🤝 Contributing](#-contributing)
34
  - [🙏 Credits](#-credits)
35
- - [📄 License](#-license)
 
 
36
 
37
- ## 🆙 Note and futur releases
38
 
39
- - Improve FP8 integration, we are loosing some FP8 advantages during the process.
40
- - Tile-VAE integration if it works for video, I have test to do or if some dev want help, you are welcome.
41
- - 7B FP8 model seems to have quality issues, use 7BFP16 instead (If FP8 don't give OOM then FP16 will works) I have to review this.
42
 
43
  ## 🚀 Updates
44
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
  **2025.06.30**
46
 
47
- - 🚀 Speed Up the process and less VRAM used (see new benchmark).
48
- - 🛠️ Fixed leak memory on 3B models.
49
- - ❌ Can now interrupt process if needed.
50
- - ✅ refactored the code for better sharing with the community, feel free to propose pull requests.
51
- - 🛠️ Removed flash attention dependency
52
 
53
  **2025.06.24**
54
 
55
- - 🚀 Speed up the process until x4 (see new benchmark)
56
 
57
  **2025.06.22**
58
 
@@ -67,168 +174,721 @@ Official release of [SeedVR2](https://github.com/ByteDance-Seed/SeedVR) for Comf
67
 
68
  ## 🎯 Features
69
 
70
- - High-quality Upscaling
71
- - Suitable for any video length once the right settings are found
72
- - Model Will Be Download Automatically from [Models](https://huggingface.co/numz/SeedVR2_comfyUI/tree/main)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
  ## 🔧 Requirements
75
 
76
- - A Huge VRAM capabilities is better, from my test, even the 3B version need a lot of VRAM at least 18GB.
77
- - Last ComfyUI version with python 3.12.9 (may be works with older versions but I haven't test it)
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
  ## 📦 Installation
80
 
81
- 1. Clone this repository into your ComfyUI custom nodes directory:
82
 
83
- ```bash
84
- cd ComfyUI/custom_nodes
85
- git clone https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler.git
86
- ```
87
 
88
- 2. Install the required dependencies:
89
 
90
- load venv and :
91
 
 
92
  ```bash
93
- pip install -r ComfyUI-SeedVR2_VideoUpscaler/requirements.txt
 
94
  ```
95
 
96
- install flash_attn/triton, 6% faster on process, not a mandatory.
97
-
98
  ```bash
99
- pip install flash_attn
100
- pip install triton
 
 
 
101
  ```
102
 
103
- or
104
 
105
- ```bash
106
- python_embeded\python.exe -m pip install -r flash_attn
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  ```
108
 
109
- check here from https://github.com/loscrossos/lib_flashattention/releases and https://github.com/woct0rdho/triton-windows
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
 
111
- 3. Models
112
 
113
- Will be automtically download into :
114
- `models/SEEDVR2`
115
 
116
- or can be found here ([MODELS](https://huggingface.co/numz/SeedVR2_comfyUI/tree/main))
117
 
118
- ## 📖 Usage
119
 
120
- 1. In ComfyUI, locate the **SeedVR2 Video Upscaler** node in the node menu.
121
 
122
- <img src="https://raw.githubusercontent.com/numz/ComfyUI-SeedVR2_VideoUpscaler/refs/heads/main/docs/node.png" width="100%">
123
 
124
- 2. ⚠️ **THINGS TO KNOW !!**
 
 
 
 
 
 
 
 
 
125
 
126
- **temporal consistency** : at least a **batch_size** of 5 is required to activate temporal consistency. SEEDVR2 need at least 5 frames to calculate it. A higher batch_size give better performances/results but need more than 24GB VRAM.
127
 
128
- **VRAM usage** : The input video resolution impacts VRAM consumption during the process. The larger the input video, the more VRAM will consume during the process. So, if you experience OOMs with a batch_size of at least 5, try reducing the input video resolution until it resolves.
129
 
130
- Of course, the output resolution also has an impact, so if your hardware doesn't allow it, reduce the output resolution.
131
 
132
- 3. Configure the node parameters:
 
 
 
 
 
 
 
133
 
134
- - `model`: Select your 3B or 7B model
135
- - `seed`: a seed but it generate another seed from this one
136
- - `new_resolution`: New desired short edge in px, will keep ratio on other edge
137
- - `batch_size`: VERY IMPORTANT!, this model consume a lot of VRAM, All your VRAM, even for the 3B model, so for GPU under 24GB VRAM keep this value Low, good value is "1" without temporal consistency, "5" for temporal consistency, but higher is this value better is the result.
138
- - `preserve_vram`: for VRAM < 24GB, If true, It will unload unused models during process, longer but works, otherwise probably OOM with
139
 
140
- ## 📊 Benchmarks
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
141
 
142
- **7B models on NVIDIA H100 93GB VRAM** (values in parentheses are from the previous benchmark):
143
 
144
- | nb frames | Resolution | Batch Size | execution time fp8 (s) | FPS fp8 | execution time fp16 (s) | FPS fp16 | perf progress since start |
145
- | --------- | ------------------- | ---------- | ---------------------- | ----------- | ----------------------- | ------------------ | ------------------------- |
146
- | 15 | 512×768 → 1080×1620 | 5 | 23.75 (26.71) | 0.63 (0.56) | 24.23 (27.75) | 0.61 (0.54) (0.10) | x6.1 |
147
- | 27 | 512×768 → 1080×1620 | 9 | 27.75 (33.97) | 0.97 (0.79) | 28.48 (35.08) | 0.94 (0.77) (0.15) | x6.2 |
148
- | 39 | 512×768 → 1080×1620 | 13 | 32.02 (41.01) | 1.21 (0.95) | 32.62 (42.08) | 1.19 (0.93) (0.19) | x6.2 |
149
- | 51 | 512×768 → 1080×1620 | 17 | 36.39 (48.12) | 1.40 (1.06) | 37.30 (49.44) | 1.36 (1.03) (0.21) | x6.4 |
150
- | 63 | 512×768 → 1080×1620 | 21 | 40.80 (55.40) | 1.54 (1.14) | 41.32 (56.70) | 1.52 (1.11) (0.23) | x6.6 |
151
- | 75 | 512×768 → 1080×1620 | 25 | 45.37 (62.60) | 1.65 (1.20) | 45.79 (63.80) | 1.63 (1.18) (0.24) | x6.8 |
152
- | 123 | 512×768 → 1080×1620 | 41 | 62.44 (91.38) | 1.96 (1.35) | 62.28 (92.90) | 1.97 (1.32) (0.28) | x7.0 |
153
- | 243 | 512×768 → 1080×1620 | 81 | 106.13 (164.25) | 2.28 (1.48) | 104.68 (166.09) | 2.32 (1.46) (0.31) | x7.4 |
154
- | 363 | 512×768 → 1080×1620 | 121 | 151.01 (238.18) | 2.40 (1.52) | 148.67 (239.80) | 2.44 (1.51) (0.33) | x7.4 |
155
- | 453 | 512×768 → 1080×1620 | 151 | 186.98 (296.52) | 2.42 (1.53) | 184.11 (298.65) | 2.46 (1.52) (0.33) | x7.4 |
156
- | 633 | 512×768 → 1080×1620 | 211 | 253.77 (406.65) | 2.49 (1.56) | 249.43 (409.44) | 2.53 (1.55) (0.34) | x7.4 |
157
- | 903 | 512×768 → 1080×1620 | 301 | OOM (OOM) | (OOM) | OOM (OOM) | (OOM) (OOM) | |
158
- | 149 | 854x480 → 1920x1080 | 149 | | | 450.22 | 0.41 | |
159
 
160
- **3B FP8 models on NVIDIA H100 93GB VRAM** (values in parentheses are from the previous benchmark):
161
 
162
- | nb frames | Resolution | Batch Size | execution time fp8 (s) | FPS fp8 | execution time fp16 (s) | FPS fp16 |
163
- | --------- | ------------------- | ---------- | ---------------------- | ------- | ----------------------- | -------- |
164
- | 149 | 854x480 → 1920x1080 | 149 | 361.22 | 0.41 | | |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165
 
166
- **NVIDIA RTX4090 24GB VRAM**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167
 
168
- | Model | nb frames | Resolution | Batch Size | execution time (seconds) | FPS | Note |
169
- | ------- | --------- | ------------------- | ---------- | ------------------------ | ----------- | ---------------------------------------- |
170
- | 3B fp8 | 5 | 512x768 1080x1620 | 1 | 14.66 (22.52) | 0.34 (0.22) | |
171
- | 3B fp16 | 5 | 512x768 1080x1620 | 1 | 17.02 (27.84) | 0.29 (0.18) | |
172
- | 7B fp8 | 5 | 512x768 1080x1620 | 1 | 46.23 (75.51) | 0.11 (0.07) | preserve_memory=on |
173
- | 7B fp16 | 5 | 512x768 → 1080x1620 | 1 | 43.58 (78.93) | 0.11 (0.06) | preserve_memory=on |
174
- | 3B fp8 | 10 | 512x768 → 1080x1620 | 5 | 39.75 | 0.25 | preserve_memory=on |
175
- | 3B fp8 | 100 | 512x768 → 1080x1620 | 5 | 322.77 | 0.31 | preserve_memory=on |
176
- | 3B fp8 | 1000 | 512x768 → 1080x1620 | 5 | 3624.08 | 0.28 | preserve_memory=on |
177
- | 3B fp8 | 20 | 512x768 → 1080x1620 | 1 | 40.71 (65.40) | 0.49 (0.31) | |
178
- | 3B fp16 | 20 | 512x768 → 1080x1620 | 1 | 44.76 (91.12) | 0.45 (0.22) | |
179
- | 3B fp8 | 20 | 512x768 → 1280x1920 | 1 | 61.14 (89.10) | 0.33 (0.22) | |
180
- | 3B fp8 | 20 | 512x768 → 1480x2220 | 1 | 79.66 (136.08) | 0.25 (0.15) | |
181
- | 3B fp8 | 20 | 512x768 → 1620x2430 | 1 | 125.79 (191.28) | 0.16 (0.10) | preserve_memory=off (preserve_memory=on) |
182
- | 3B fp8 | 149 | 854x480 → 1920x1080 | 5 | 782.76 | 0.19 | preserve_memory=on |
183
 
184
  ## ⚠️ Limitations
185
 
186
- - Use a lot of VRAM, it will take all!!
187
- - Processing speed depends on GPU capabilities
188
 
189
- ## 🤝 Contributing
190
 
191
- Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
192
 
193
- Please make sure to update tests as appropriate.
194
 
195
- ### How to contribute:
 
 
 
 
196
 
197
- 1. Fork the repository
198
- 2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
199
- 3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
200
- 4. Push to the branch (`git push origin feature/AmazingFeature`)
201
- 5. Open a Pull Request
 
202
 
203
- ### Development Setup:
204
 
205
- 1. Clone the repository
206
- 2. Install dependencies
207
- 3. Make your changes
208
- 4. Test your changes
209
- 5. Submit a pull request
 
 
 
 
 
210
 
211
- ### Code Style:
212
 
213
- - Follow the existing code style
214
- - Add comments for complex logic
215
- - Update documentation if needed
216
- - Ensure all tests pass
217
 
218
- ### Reporting Issues:
219
 
220
- When reporting issues, please include:
 
 
 
 
221
 
222
- - Your system specifications
223
- - ComfyUI version
224
- - Python version
225
- - Error messages
226
- - Steps to reproduce the issue
227
 
228
  ## 🙏 Credits
229
 
230
- - Original [SeedVR2](https://github.com/ByteDance-Seed/SeedVR) implementation
 
 
231
 
232
- # 📜 License
233
 
234
- - The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).
 
13
 
14
  [![View Code](https://img.shields.io/badge/📂_View_Code-GitHub-181717?style=for-the-badge&logo=github)](https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler)
15
 
16
+ Official release of [SeedVR2](https://github.com/ByteDance-Seed/SeedVR) for ComfyUI that enables high-quality video and image upscaling.
17
 
18
+ Can run as **Multi-GPU standalone CLI** too, see [🖥️ Run as Standalone](#️-run-as-standalone-cli) section.
 
19
 
20
+ [![SeedVR2 v2.5 Deep Dive Tutorial](https://img.youtube.com/vi/MBtWYXq_r60/maxresdefault.jpg)](https://youtu.be/MBtWYXq_r60)
21
+
22
+ ![Usage Example](docs/usage_01.png)
23
+
24
+ ![Usage Example](docs/usage_02.png)
25
 
26
  ## 📋 Quick Access
27
 
28
+ - [🆙 Future Releases](#-future-releases)
29
  - [🚀 Updates](#-updates)
30
  - [🎯 Features](#-features)
31
  - [🔧 Requirements](#-requirements)
32
  - [📦 Installation](#-installation)
33
  - [📖 Usage](#-usage)
34
+ - [🖥️ Run as Standalone](#️-run-as-standalone-cli)
35
+ - [⚠️ Limitations](#️-limitations)
36
  - [🤝 Contributing](#-contributing)
37
  - [🙏 Credits](#-credits)
38
+ - [📜 License](#-license)
39
+
40
+ ## 🆙 Future Releases
41
 
42
+ We're actively working on improvements and new features. To stay informed:
43
 
44
+ - **📌 Track Active Development**: Visit [Issues](https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler/issues) to see active development, report bugs, and request new features
45
+ - **💬 Join the Community**: Learn from others, share your workflows, and get help in the [Discussions](https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler/discussions)
46
+ - **🔮 Next Model Survey**: We're looking for community input on the next open-source super-powerful generic restoration model. Share your suggestions in [Issue #164](https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler/issues/164)
47
 
48
  ## 🚀 Updates
49
 
50
+ **2025.11.09 - Version 2.5.5**
51
+
52
+ - 💾 **Memory: Fixed RAM leak for long videos** - On-demand reconstruction with lightweight batch indices instead of storing full transformed videos, fixed release_tensor_memory to handle CPU/CUDA/MPS consistently, and refactored batch processing helpers
53
+
54
+ **2025.11.08 - Version 2.5.4**
55
+
56
+ - 🎨 **Fix: AdaIN color correction** - Replace `.view()` with `.reshape()` to handle non-contiguous tensors after spatial padding, resolving "view size is not compatible with input tensor's size and stride" error
57
+ - 🔴 **Fix: AMD ROCm compatibility** - Add cuDNN availability check in Conv3d workaround to prevent "ATen not compiled with cuDNN support" error on ROCm systems (AMD GPUs on Windows/Linux)
58
+
59
+ **2025.11.08 - Version 2.5.3**
60
+
61
+ - 🍎 **Fix: Apple Silicon MPS device handling** - Corrected MPS device enumeration to use `"mps"` instead of `"mps:0"`, resolving invalid device errors on M-series Macs
62
+ - 🪟 **Fix: torch.mps AttributeError on Windows** - Add defensive checks for `torch.mps.is_available()` to handle PyTorch versions where the method doesn't exist on non-Mac platforms
63
+
64
+ **2025.11.07 - Version 2.5.0** 🎉
65
+
66
+ ⚠️ **BREAKING CHANGE**: This is a major update requiring workflow recreation. All nodes and CLI parameters have been redesigned for better usability and consistency. Watch the latest video from [AInVFX](https://www.youtube.com/@AInVFX) for a deep dive and check out the [usage](#-usage) section.
67
+
68
+ **📦 Official Release**: Now available on main branch with ComfyUI Manager support for easy installation and automatic version tracking. Updated dependencies and local imports prevent conflicts with other ComfyUI custom nodes.
69
+
70
+ ### 🎨 ComfyUI Improvements
71
+
72
+ - **Four-Node Modular Architecture**: Split into dedicated nodes for DiT model, VAE model, torch.compile settings, and main upscaler for granular control
73
+ - **Global Model Cache**: Models now shared across multiple upscaler instances with automatic config updates - no more redundant loading
74
+ - **ComfyUI V3 Migration**: Full compatibility with ComfyUI V3 stateless node design
75
+ - **RGBA Support**: Native alpha channel processing with edge-guided upscaling for clean transparency
76
+ - **Improved Memory Management**: Streaming architecture prevents VRAM spikes regardless of video length
77
+ - **Flexible Resolution Support**: Upscale to any resolution divisible by 2 with lossless padding approach (replaced restrictive cropping)
78
+ - **Enhanced Parameters**: Added `uniform_batch_size`, `temporal_overlap`, `prepend_frames`, and `max_resolution` for better control
79
+
80
+ ### 🖥️ CLI Enhancements
81
+
82
+ - **Batch Directory Processing**: Process entire folders of videos/images with model caching for efficiency
83
+ - **Single Image Support**: Direct image upscaling without video conversion
84
+ - **Smart Output Detection**: Auto-detects output format (MP4/PNG) based on input type
85
+ - **Enhanced Multi-GPU**: Improved workload distribution with temporal overlap blending
86
+ - **Unified Parameters**: CLI and ComfyUI now use identical parameter names for consistency
87
+ - **Better UX**: Auto-display help, validation improvements, progress tracking, and cleaner output
88
+
89
+ ### ⚡ Performance & Optimization
90
+
91
+ - **torch.compile Support**: 20-40% DiT speedup and 15-25% VAE speedup with full graph compilation
92
+ - **Optimized BlockSwap**: Adaptive memory clearing (5% threshold), separate I/O component handling, reduced overhead
93
+ - **Enhanced VAE Tiling**: Tensor offload support for accumulation buffers, separate encode/decode configuration
94
+ - **Native Dtype Pipeline**: Eliminated unnecessary conversions, maintains bfloat16 precision throughout for speed and quality
95
+ - **Optimized Tensor Operations**: Replaced einops rearrange with native PyTorch ops for 2-5x faster transforms
96
+
97
+ ### 🎯 Quality Improvements
98
+
99
+ - **LAB Color Correction**: New perceptual color transfer method with superior color accuracy (now default)
100
+ - **Additional Color Methods**: HSV saturation matching, wavelet adaptive, and hybrid approaches
101
+ - **Deterministic Generation**: Seed-based reproducibility with phase-specific seeding strategy
102
+ - **Better Temporal Consistency**: Hann window blending for smooth transitions between batches
103
+
104
+ ### 💾 Memory Management
105
+
106
+ - **Smarter Offloading**: Independent device configuration for DiT, VAE, and tensors (CPU/GPU/none)
107
+ - **Four-Phase Pipeline**: Completes each phase (encode→upscale→decode→postprocess) for all batches before moving to next, minimizing model swaps
108
+ - **Better Cleanup**: Phase-specific resource management with proper tensor memory release
109
+ - **Peak VRAM Tracking**: Per-phase memory monitoring with summary display
110
+
111
+ ### 🔧 Technical Improvements
112
+
113
+ - **GGUF Quantization Support**: Added full GGUF support for 4-bit/8-bit inference on low-VRAM systems
114
+ - **Improved GGUF Handling**: Fixed VRAM leaks, torch.compile compatibility, non-persistent buffers
115
+ - **Apple Silicon Support**: Full MPS (Metal Performance Shaders) support for Apple Silicon Macs
116
+ - **AMD ROCm Compatibility**: Conditional FSDP imports for PyTorch ROCm 7+ support
117
+ - **Conv3d Memory Workaround**: Fixes PyTorch 2.9+ cuDNN memory bug (3x usage reduction)
118
+ - **Flash Attention Optional**: Graceful fallback to SDPA when flash-attn unavailable
119
+
120
+ ### 📚 Code Quality
121
+
122
+ - **Modular Architecture**: Split monolithic files into focused modules (generation_phases, model_configuration, etc.)
123
+ - **Comprehensive Documentation**: Extensive docstrings with type hints across all modules
124
+ - **Better Error Handling**: Early validation, clear error messages, installation instructions
125
+ - **Consistent Logging**: Unified indentation, better categorization, concise messages
126
+
127
+ **2025.08.07**
128
+
129
+ - 🎯 **Unified Debug System**: New structured logging with categories, timers, and memory tracking. `enable_debug` now available on main node
130
+ - ⚡ **Smart FP8 Optimization**: FP8 models now keep native FP8 storage, converting to BFloat16 only for arithmetic - faster and more memory efficient than FP16
131
+ - 📦 **Model Registry**: Multi-repo support (numz/ & AInVFX/), auto-discovery of user models, added mixed FP8 variants to fix 7B artifacts
132
+ - 💾 **Model Caching**: `cache_model` moved to main node, fixed memory leaks with proper RoPE/wrapper cleanup
133
+ - 🧹 **Code Cleanup**: New modular structure (`constants.py`, `model_registry.py`, `debug.py`), removed legacy code
134
+ - 🚀 **Performance**: Better memory management with `torch.cuda.ipc_collect()`, improved RoPE handling
135
+
136
+ **2025.07.17**
137
+
138
+ - 🛠️ Add 7B sharp Models: add 2 new 7B models with sharpen output
139
+
140
+ **2025.07.11**
141
+
142
+ - 🎬 Complete tutorial released: Adrien from [AInVFX](https://www.youtube.com/@AInVFX) created an in-depth ComfyUI SeedVR2 guide covering everything from basic setup to advanced BlockSwap techniques for running on consumer GPUs. Perfect for understanding memory optimization and upscaling of image sequences with alpha channel! [Watch the tutorial](#-usage)
143
+
144
+ **2025.09.07**
145
+
146
+ - 🛠️ Blockswap Integration: Big thanks to [Adrien Toupet](https://github.com/adrientoupet) from [AInVFX](https://www.youtube.com/@AInVFX) for this :), useful for low VRAM users (see [usage](#-usage) section)
147
+
148
+ **2025.07.03**
149
+
150
+ - 🛠️ Can run as **standalone mode** with **Multi GPU** see [🖥️ Run as Standalone](#️-run-as-standalone-cli)
151
+
152
  **2025.06.30**
153
 
154
+ - 🚀 Speed Up the process and less VRAM used
155
+ - 🛠️ Fixed memory leak on 3B models
156
+ - ❌ Can now interrupt process if needed
157
+ - ✅ Refactored the code for better sharing with the community, feel free to propose pull requests
158
+ - 🛠️ Removed flash attention dependency (thanks to [luke2642](https://github.com/Luke2642) !!)
159
 
160
  **2025.06.24**
161
 
162
+ - 🚀 Speed up the process until x4
163
 
164
  **2025.06.22**
165
 
 
174
 
175
  ## 🎯 Features
176
 
177
+ ### Core Capabilities
178
+ - **High-Quality Diffusion-Based Upscaling**: One-step diffusion model for video and image enhancement
179
+ - **Temporal Consistency**: Maintains coherence across video frames with configurable batch processing
180
+ - **Multi-Format Support**: Handles RGB and RGBA (alpha channel) for both videos and images
181
+ - **Any Video Length**: Suitable for any video length
182
+
183
+ ### Model Support
184
+ - **Multiple Model Variants**: 3B and 7B parameter models with different precision options
185
+ - **FP16, FP8, and GGUF Quantization**: Choose between full precision (FP16), mixed precision (FP8), or heavily quantized GGUF models for different VRAM requirements
186
+ - **Automatic Model Downloads**: Models are automatically downloaded from HuggingFace on first use
187
+
188
+ ### Memory Optimization
189
+ - **BlockSwap Technology**: Dynamically swap transformer blocks between GPU and CPU memory to run large models on limited VRAM
190
+ - **VAE Tiling**: Process large resolutions with tiled encoding/decoding to reduce VRAM usage
191
+ - **Intelligent Offloading**: Offload models and intermediate tensors to CPU or secondary GPUs between processing phases
192
+ - **GGUF Quantization Support**: Run models with 4-bit or 8-bit quantization for extreme VRAM savings
193
+
194
+ ### Performance Features
195
+ - **torch.compile Integration**: Optional 20-40% DiT speedup and 15-25% VAE speedup with PyTorch 2.0+ compilation
196
+ - **Multi-GPU CLI**: Distribute workload across multiple GPUs with automatic temporal overlap blending
197
+ - **Model Caching**: Keep models loaded in memory for faster batch processing
198
+ - **Flexible Attention Backends**: Choose between PyTorch SDPA (stable, always available) or Flash Attention 2 (faster on supported hardware)
199
+
200
+ ### Quality Control
201
+ - **Advanced Color Correction**: Five methods including LAB (recommended for highest fidelity), wavelet, wavelet adaptive, HSV, and AdaIN
202
+ - **Noise Injection Controls**: Fine-tune input and latent noise scales for artifact reduction at high resolutions
203
+ - **Configurable Resolution Limits**: Set target and maximum resolutions with automatic aspect ratio preservation
204
+
205
+ ### Workflow Features
206
+ - **ComfyUI Integration**: Four dedicated nodes for complete control over the upscaling pipeline
207
+ - **Standalone CLI**: Command-line interface for batch processing and automation
208
+ - **Debug Logging**: Comprehensive debug mode with memory tracking, timing information, and processing details
209
+ - **Progress Reporting**: Real-time progress updates during processing
210
 
211
  ## 🔧 Requirements
212
 
213
+ ### Hardware
214
+
215
+ With the current optimizations (tiling, BlockSwap, GGUF quantization), SeedVR2 can run on a wide range of hardware:
216
+
217
+ - **Minimal VRAM** (8GB or less): Use GGUF Q4_K_M models with BlockSwap and VAE tiling enabled
218
+ - **Moderate VRAM** (12-16GB): Use FP8 models with BlockSwap or VAE tiling as needed
219
+ - **High VRAM** (24GB+): Use FP16 models for best quality and speed without memory optimizations
220
+
221
+ ### Software
222
+
223
+ - **ComfyUI**: Latest version recommended
224
+ - **Python**: 3.12+ (Python 3.12 and 3.13 tested and recommended)
225
+ - **PyTorch**: 2.0+ for torch.compile support (optional but recommended)
226
+ - **Triton**: Required for torch.compile with inductor backend (optional)
227
+ - **Flash Attention 2**: Provides faster attention computation on supported hardware (optional, falls back to PyTorch SDPA)
228
 
229
  ## 📦 Installation
230
 
231
+ ### Option 1: ComfyUI Manager (Recommended)
232
 
233
+ 1. Open ComfyUI Manager in your ComfyUI interface
234
+ 2. Click "Custom Nodes Manager"
235
+ 3. Search for "ComfyUI-SeedVR2_VideoUpscaler"
236
+ 4. Click "Install" and restart ComfyUI
237
 
238
+ **Registry Link**: [ComfyUI Registry - SeedVR2 Video Upscaler](https://registry.comfy.org/nodes/seedvr2_videoupscaler)
239
 
240
+ ### Option 2: Manual Installation
241
 
242
+ 1. **Clone the repository** into your ComfyUI custom nodes directory:
243
  ```bash
244
+ cd ComfyUI
245
+ git clone https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler.git custom_nodes/seedvr2_videoupscaler
246
  ```
247
 
248
+ 2. **Install dependencies using standalone Python**:
 
249
  ```bash
250
+ # Install requirements (from same ComfyUI directory)
251
+ # Windows:
252
+ .venv\Scripts\python.exe -m pip install -r custom_nodes\seedvr2_videoupscaler\requirements.txt
253
+ # Linux/macOS:
254
+ .venv/bin/python -m pip install -r custom_nodes/seedvr2_videoupscaler/requirements.txt
255
  ```
256
 
257
+ 3. **Restart ComfyUI**
258
 
259
+ ### Model Installation
260
+
261
+ Models will be **automatically downloaded** on first use and saved to `ComfyUI/models/SEEDVR2`.
262
+
263
+ You can also manually download models from:
264
+ - Main models available at [numz/SeedVR2_comfyUI](https://huggingface.co/numz/SeedVR2_comfyUI/tree/main) and [AInVFX/SeedVR2_comfyUI](https://huggingface.co/AInVFX/SeedVR2_comfyUI/tree/main)
265
+ - Additional GGUF models available at [cmeka/SeedVR2-GGUF](https://huggingface.co/cmeka/SeedVR2-GGUF/tree/main)
266
+
267
+ ## 📖 Usage
268
+
269
+ ### 🎬 Video Tutorials
270
+
271
+ #### Latest Version Deep Dive (Recommended)
272
+
273
+ Complete walkthrough of version 2.5 by Adrien from [AInVFX](https://www.youtube.com/@AInVFX), covering the new 4-node architecture, GGUF support, memory optimizations, and production workflows:
274
+
275
+ [![SeedVR2 v2.5 Deep Dive Tutorial](https://img.youtube.com/vi/MBtWYXq_r60/maxresdefault.jpg)](https://youtu.be/MBtWYXq_r60)
276
+
277
+ This comprehensive tutorial covers:
278
+ - Installing v2.5 through ComfyUI Manager and troubleshooting conflicts
279
+ - Understanding the new 4-node modular architecture and why we rebuilt it
280
+ - Running 7B models on 8GB VRAM with GGUF quantization
281
+ - Configuring BlockSwap, VAE tiling, and torch.compile for your hardware
282
+ - Image and video upscaling workflows with alpha channel support
283
+ - CLI for batch processing and multi-GPU rendering
284
+ - Memory optimization strategies for different VRAM levels
285
+ - Real production tips and the critical batch_size formula (4n+1)
286
+
287
+ #### Previous Version Tutorial
288
+
289
+ For reference, here's the original tutorial covering the initial release:
290
+
291
+ [![SeedVR2 Deep Dive Tutorial](https://img.youtube.com/vi/I0sl45GMqNg/maxresdefault.jpg)](https://youtu.be/I0sl45GMqNg)
292
+
293
+ *Note: This tutorial covers the previous single-node architecture. While the UI has changed significantly in v2.5, the core concepts about BlockSwap and memory management remain valuable.*
294
+
295
+ ### Node Setup
296
+
297
+ SeedVR2 uses a modular node architecture with four specialized nodes:
298
+
299
+ #### 1. SeedVR2 (Down)Load DiT Model
300
+
301
+ ![SeedVR2 (Down)Load DiT Model](docs/dit_model_loader.png)
302
+
303
+ Configure the DiT (Diffusion Transformer) model for video upscaling.
304
+
305
+ **Parameters:**
306
+
307
+ - **model**: Choose your DiT model
308
+ - **3B Models**: Faster, lower VRAM requirements
309
+ - `seedvr2_ema_3b_fp16.safetensors`: FP16 (best quality)
310
+ - `seedvr2_ema_3b_fp8_e4m3fn.safetensors`: FP8 8-bit (good quality)
311
+ - `seedvr2_ema_3b-Q4_K_M.gguf`: GGUF 4-bit quantized (acceptable quality)
312
+ - `seedvr2_ema_3b-Q8_0.gguf`: GGUF 8-bit quantized (good quality)
313
+ - **7B Models**: Higher quality, higher VRAM requirements
314
+ - `seedvr2_ema_7b_fp16.safetensors`: FP16 (best quality)
315
+ - `seedvr2_ema_7b_fp8_e4m3fn_mixed_block35_fp16.safetensors`: FP8 with last block in FP16 to reduce artifacts (good quality)
316
+ - `seedvr2_ema_7b-Q4_K_M.gguf`: GGUF 4-bit quantized (acceptable quality)
317
+ - `seedvr2_ema_7b_sharp_*`: Sharp variants for enhanced detail
318
+
319
+ - **device**: GPU device for DiT inference (e.g., `cuda:0`)
320
+
321
+ - **offload_device**: Device to offload DiT model when not actively processing
322
+ - `none`: Keep model on inference device (fastest, highest VRAM)
323
+ - `cpu`: Offload to system RAM (reduces VRAM)
324
+ - `cuda:X`: Offload to another GPU (good balance if available)
325
+
326
+ - **cache_model**: Keep DiT model loaded on offload_device between workflow runs
327
+ - Useful for batch processing to avoid repeated loading
328
+ - Requires offload_device to be set
329
+
330
+ - **blocks_to_swap**: BlockSwap memory optimization
331
+ - `0`: Disabled (default)
332
+ - `1-32`: Number of transformer blocks to swap for 3B model
333
+ - `1-36`: Number of transformer blocks to swap for 7B model
334
+ - Higher values = more VRAM savings but slower processing
335
+ - Requires offload_device to be set and different from device
336
+
337
+ - **swap_io_components**: Offload input/output embeddings and normalization layers
338
+ - Additional VRAM savings when combined with blocks_to_swap
339
+ - Requires offload_device to be set and different from device
340
+
341
+ - **attention_mode**: Attention computation backend
342
+ - `sdpa`: PyTorch scaled_dot_product_attention (default, stable, always available)
343
+ - `flash_attn`: Flash Attention 2 (faster on supported hardware, requires flash-attn package)
344
+
345
+ - **torch_compile_args**: Connect to SeedVR2 Torch Compile Settings node for 20-40% speedup
346
+
347
+ **BlockSwap Explained:**
348
+
349
+ BlockSwap enables running large models on GPUs with limited VRAM by dynamically swapping transformer blocks between GPU and CPU memory during inference. Here's how it works:
350
+
351
+ - **What it does**: Keeps only the currently-needed transformer blocks on the GPU, while storing the rest on CPU or another device
352
+ - **When to use it**: When you get OOM (Out of Memory) errors during the upscaling phase
353
+ - **How to configure**:
354
+ 1. Set `offload_device` to `cpu` or another GPU
355
+ 2. Start with `blocks_to_swap=16` (half the blocks)
356
+ 3. If still getting OOM, increase to 24 or 32 (3B) / 36 (7B)
357
+ 4. Enable `swap_io_components` for maximum VRAM savings
358
+ 5. If you have plenty of VRAM, decrease or set to 0 for faster processing
359
+
360
+ **Example Configuration for Low VRAM (8GB)**:
361
+ - model: `seedvr2_ema_3b-Q8_0.gguf`
362
+ - device: `cuda:0`
363
+ - offload_device: `cpu`
364
+ - blocks_to_swap: `32`
365
+ - swap_io_components: `True`
366
+
367
+ #### 2. SeedVR2 (Down)Load VAE Model
368
+
369
+ ![SeedVR2 (Down)Load VAE Model](docs/vae_model_loader.png)
370
+
371
+ Configure the VAE (Variational Autoencoder) model for encoding/decoding video frames.
372
+
373
+ **Parameters:**
374
+
375
+ - **model**: VAE model selection
376
+ - `ema_vae_fp16.safetensors`: Default and recommended
377
+
378
+ - **device**: GPU device for VAE inference (e.g., `cuda:0`)
379
+
380
+ - **offload_device**: Device to offload VAE model when not actively processing
381
+ - `none`: Keep model on inference device (default, fastest)
382
+ - `cpu`: Offload to system RAM (reduces VRAM)
383
+ - `cuda:X`: Offload to another GPU (good balance if available)
384
+
385
+ - **cache_model**: Keep VAE model loaded on offload_device between workflow runs
386
+ - Requires offload_device to be set
387
+
388
+ - **encode_tiled**: Enable tiled encoding to reduce VRAM usage during encoding phase
389
+ - Enable if you see OOM errors during the "Encoding" phase in debug logs
390
+
391
+ - **encode_tile_size**: Encoding tile size in pixels (default: 1024)
392
+ - Applied to both height and width
393
+ - Lower values reduce VRAM but may increase processing time
394
+
395
+ - **encode_tile_overlap**: Encoding tile overlap in pixels (default: 128)
396
+ - Reduces visible seams between tiles
397
+
398
+ - **decode_tiled**: Enable tiled decoding to reduce VRAM usage during decoding phase
399
+ - Enable if you see OOM errors during the "Decoding" phase in debug logs
400
+
401
+ - **decode_tile_size**: Decoding tile size in pixels (default: 1024)
402
+
403
+ - **decode_tile_overlap**: Decoding tile overlap in pixels (default: 128)
404
+
405
+ - **torch_compile_args**: Connect to SeedVR2 Torch Compile Settings node for 15-25% speedup
406
+
407
+ **VAE Tiling Explained:**
408
+
409
+ VAE tiling processes large resolutions in smaller tiles to reduce VRAM requirements. Here's how to use it:
410
+
411
+ 1. **Run without tiling first** and monitor the debug logs (enable `enable_debug` on main node)
412
+ 2. **If OOM during "Encoding" phase**:
413
+ - Enable `encode_tiled`
414
+ - If still OOM, reduce `encode_tile_size` (try 768, 512, etc.)
415
+ 3. **If OOM during "Decoding" phase**:
416
+ - Enable `decode_tiled`
417
+ - If still OOM, reduce `decode_tile_size`
418
+ 4. **Adjust overlap** (default 128) if you see visible seams in output (increase it) or processing times are too slow (decrease it).
419
+
420
+ **Example Configuration for High Resolution (4K)**:
421
+ - encode_tiled: `True`
422
+ - encode_tile_size: `1024`
423
+ - encode_tile_overlap: `128`
424
+ - decode_tiled: `True`
425
+ - decode_tile_size: `1024`
426
+ - decode_tile_overlap: `128`
427
+
428
+ #### 3. SeedVR2 Torch Compile Settings (Optional)
429
+
430
+ ![SeedVR2 Torch Compile Settings](docs/torch_compile_settings.png)
431
+
432
+ Configure torch.compile optimization for 20-40% DiT speedup and 15-25% VAE speedup.
433
+
434
+ **Requirements:**
435
+ - PyTorch 2.0+
436
+ - Triton (for inductor backend)
437
+
438
+ **Parameters:**
439
+
440
+ - **backend**: Compilation backend
441
+ - `inductor`: Full optimization with Triton kernel generation and fusion (recommended)
442
+ - `cudagraphs`: Lightweight wrapper using CUDA graphs, no kernel optimization
443
+
444
+ - **mode**: Optimization level (compilation time vs runtime performance)
445
+ - `default`: Fast compilation with good speedup (recommended for development)
446
+ - `reduce-overhead`: Lower overhead, optimized for smaller models
447
+ - `max-autotune`: Slowest compilation, best runtime performance (recommended for production)
448
+ - `max-autotune-no-cudagraphs`: Like max-autotune but without CUDA graphs
449
+
450
+ - **fullgraph**: Compile entire model as single graph without breaks
451
+ - `False`: Allow graph breaks for better compatibility (default, recommended)
452
+ - `True`: Enforce no breaks for maximum optimization (may fail with dynamic shapes)
453
+
454
+ - **dynamic**: Handle varying input shapes without recompilation
455
+ - `False`: Specialize for exact input shapes (default)
456
+ - `True`: Create dynamic kernels that adapt to shape variations (enable when processing different resolutions or batch sizes)
457
+
458
+ - **dynamo_cache_size_limit**: Max cached compiled versions per function (default: 64)
459
+ - Higher = more memory, lower = more recompilation
460
+
461
+ - **dynamo_recompile_limit**: Max recompilation attempts before falling back to eager mode (default: 128)
462
+ - Safety limit to prevent compilation loops
463
+
464
+ **Usage:**
465
+ 1. Add this node to your workflow
466
+ 2. Connect its output to the `torch_compile_args` input of DiT and/or VAE loader nodes
467
+ 3. First run will be slow (compilation), subsequent runs will be much faster
468
+
469
+ **When to use:**
470
+ - torch.compile only makes sense when processing **multiple batches, long videos, or many tiles**
471
+ - For single images or short clips, the compilation time outweighs the speed improvement
472
+ - Best suited for batch processing workflows or long videos
473
+
474
+ **Recommended Settings:**
475
+ - For development/testing: `mode=default`, `backend=inductor`, `fullgraph=False`
476
+ - For production: `mode=max-autotune`, `backend=inductor`, `fullgraph=False`
477
+
478
+ #### 4. SeedVR2 Video Upscaler (Main Node)
479
+
480
+ ![SeedVR2 Video Upscaler](docs/video_upscaler.png)
481
+
482
+ Main upscaling node that processes video frames using DiT and VAE models.
483
+
484
+ **Required Inputs:**
485
+
486
+ - **image**: Input video frames as image batch (RGB or RGBA format)
487
+ - **dit**: DiT model configuration from SeedVR2 (Down)Load DiT Model node
488
+ - **vae**: VAE model configuration from SeedVR2 (Down)Load VAE Model node
489
+
490
+ **Parameters:**
491
+
492
+ - **seed**: Random seed for reproducible generation (default: 42)
493
+ - Same seed with same inputs produces identical output
494
+
495
+ - **resolution**: Target resolution for shortest edge in pixels (default: 1080)
496
+ - Maintains aspect ratio automatically
497
+
498
+ - **max_resolution**: Maximum resolution for any edge (default: 0 = no limit)
499
+ - Automatically scales down if exceeded to prevent OOM
500
+
501
+ - **batch_size**: Frames per batch (default: 5)
502
+ - **CRITICAL REQUIREMENT**: Must follow the **4n+1 formula** (1, 5, 9, 13, 17, 21, 25, ...)
503
+ - **Why this matters**: The model uses these frames for temporal consistency calculations
504
+ - **Minimum 5 for temporal consistency**: Use 1 only for single images or when temporal consistency isn't needed
505
+ - **Match shot length ideally**: For best results, set batch_size to match your shot length (e.g., batch_size=21 for a 20-frame shot)
506
+ - **VRAM impact**: Higher batch_size = better quality and speed but requires more VRAM
507
+ - **If you get OOM with batch_size=5**: Try optimization techniques first (model offloading, BlockSwap, GGUF models...) before reducing batch_size or input resolution, as these directly impact quality
508
+
509
+ **uniform_batch_size** (default: False)
510
+ - Pads the final batch to match `batch_size` for uniform processing
511
+ - Prevents temporal artifacts when the last batch is significantly smaller than others
512
+ - Example: 45 frames with `batch_size=33` creates [33, 33] instead of [33, 12]
513
+ - Recommended when using large batch sizes and video length is not a multiple of `batch_size`
514
+ - Increases VRAM usage slightly but ensures consistent temporal coherence across all batches
515
+
516
+ - **temporal_overlap**: Overlapping frames between batches (default: 0)
517
+ - Used for blending between batches to reduce temporal artifacts
518
+ - Range: 0-16 frames
519
+
520
+ - **prepend_frames**: Frames to prepend (default: 0)
521
+ - Prepends reversed frames to reduce artifacts at video start
522
+ - Automatically removed after processing
523
+ - Range: 0-32 frames
524
+
525
+ - **color_correction**: Color correction method (default: "wavelet")
526
+ - **`lab`**: Full perceptual color matching with detail preservation (recommended for highest fidelity to original)
527
+ - **`wavelet`**: Frequency-based natural colors, preserves details well
528
+ - **`wavelet_adaptive`**: Wavelet base + targeted saturation correction
529
+ - **`hsv`**: Hue-conditional saturation matching
530
+ - **`adain`**: Statistical style transfer
531
+ - **`none`**: No color correction
532
+
533
+ - **input_noise_scale**: Input noise injection scale 0.0-1.0 (default: 0.0)
534
+ - Adds noise to input frames to reduce artifacts at very high resolutions
535
+ - Try 0.1-0.3 if you see artifacts with high output resolutions
536
+
537
+ - **latent_noise_scale**: Latent space noise scale 0.0-1.0 (default: 0.0)
538
+ - Adds noise during diffusion process, can soften excessive detail
539
+ - Use if input_noise doesn't help, try 0.05-0.15
540
+
541
+ - **offload_device**: Device for storing intermediate tensors between processing phases (default: "cpu")
542
+ - `none`: Keep all tensors on inference device (fastest but highest VRAM)
543
+ - `cpu`: Offload to system RAM (recommended for long videos, slower transfers)
544
+ - `cuda:X`: Offload to another GPU (good balance if available, faster than CPU)
545
+
546
+ - **enable_debug**: Enable detailed debug logging (default: False)
547
+ - Shows memory usage, timing information, and processing details
548
+ - **Highly recommended** for troubleshooting OOM issues
549
+
550
+ **Output:**
551
+ - Upscaled video frames with color correction applied
552
+ - Format (RGB/RGBA) matches input
553
+ - Range [0, 1] normalized for ComfyUI compatibility
554
+
555
+ ### Typical Workflow Setup
556
+
557
+ **Basic Workflow (High VRAM - 24GB+)**:
558
+ ```
559
+ Load Video Frames
560
+
561
+ SeedVR2 Load DiT Model
562
+ ├─ model: seedvr2_ema_3b_fp16.safetensors
563
+ └─ device: cuda:0
564
+
565
+ SeedVR2 Load VAE Model
566
+ ├─ model: ema_vae_fp16.safetensors
567
+ └─ device: cuda:0
568
+
569
+ SeedVR2 Video Upscaler
570
+ ├─ batch_size: 21
571
+ └─ resolution: 1080
572
+
573
+ Save Video/Frames
574
+ ```
575
+
576
+ **Low VRAM Workflow (8-12GB)**:
577
+ ```
578
+ Load Video Frames
579
+
580
+ SeedVR2 Load DiT Model
581
+ ├─ model: seedvr2_ema_3b-Q8_0.gguf
582
+ ├─ device: cuda:0
583
+ ├─ offload_device: cpu
584
+ ├─ blocks_to_swap: 32
585
+ └─ swap_io_components: True
586
+
587
+ SeedVR2 Load VAE Model
588
+ ├─ model: ema_vae_fp16.safetensors
589
+ ├─ device: cuda:0
590
+ ├─ encode_tiled: True
591
+ └─ decode_tiled: True
592
+
593
+ SeedVR2 Video Upscaler
594
+ ├─ batch_size: 5
595
+ └─ resolution: 720
596
+
597
+ Save Video/Frames
598
  ```
599
 
600
+ **High Performance Workflow (24GB+ with torch.compile)**:
601
+ ```
602
+ Load Video Frames
603
+
604
+ SeedVR2 Torch Compile Settings
605
+ ├─ mode: max-autotune
606
+ └─ backend: inductor
607
+
608
+ SeedVR2 Load DiT Model
609
+ ├─ model: seedvr2_ema_7b_sharp_fp16.safetensors
610
+ ├─ device: cuda:0
611
+ └─ torch_compile_args: connected
612
+
613
+ SeedVR2 Load VAE Model
614
+ ├─ model: ema_vae_fp16.safetensors
615
+ ├─ device: cuda:0
616
+ └─ torch_compile_args: connected
617
+
618
+ SeedVR2 Video Upscaler
619
+ ├─ batch_size: 81
620
+ └─ resolution: 1080
621
+
622
+ Save Video/Frames
623
+ ```
624
 
625
+ ## 🖥️ Run as Standalone (CLI)
626
 
627
+ The standalone CLI provides powerful batch processing capabilities with multi-GPU support and sophisticated optimization options.
 
628
 
629
+ ### Prerequisites
630
 
631
+ Choose the appropriate setup based on your installation:
632
 
633
+ #### Option 1: Already Have ComfyUI with SeedVR2 Installed
634
 
635
+ If you've already installed SeedVR2 as part of ComfyUI (via [ComfyUI installation](#-installation)), you can use the CLI directly:
636
 
637
+ ```bash
638
+ # Navigate to your ComfyUI directory
639
+ cd ComfyUI
640
+
641
+ # Run the CLI using standalone Python (display help message)
642
+ # Windows:
643
+ .venv\Scripts\python.exe custom_nodes\seedvr2_videoupscaler\inference_cli.py --help
644
+ # Linux/macOS:
645
+ .venv/bin/python custom_nodes/seedvr2_videoupscaler/inference_cli.py --help
646
+ ```
647
 
648
+ **Skip to [Command Line Usage](#command-line-usage) below.**
649
 
650
+ #### Option 2: Standalone Installation (Without ComfyUI)
651
 
652
+ If you want to use the CLI without ComfyUI installation, follow these steps:
653
 
654
+ 1. **Install [uv](https://docs.astral.sh/uv/getting-started/installation/)** (modern Python package manager):
655
+ ```bash
656
+ # Windows
657
+ powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
658
+
659
+ # macOS and Linux
660
+ curl -LsSf https://astral.sh/uv/install.sh | sh
661
+ ```
662
 
663
+ 2. **Clone the repository**:
664
+ ```bash
665
+ git clone https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler.git seedvr2_videoupscaler
666
+ cd seedvr2_videoupscaler
667
+ ```
668
 
669
+ 3. **Create virtual environment and install dependencies**:
670
+ ```bash
671
+ # Create virtual environment with Python 3.13
672
+ uv venv --python 3.13
673
+
674
+ # Activate virtual environment
675
+ # Windows:
676
+ .venv\Scripts\activate
677
+ # Linux/macOS:
678
+ source .venv/bin/activate
679
+
680
+ # Install PyTorch with CUDA support
681
+ # Check command line based on your environment: https://pytorch.org/get-started/locally/
682
+ uv pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu130
683
+
684
+ # Install SeedVR2 requirements
685
+ uv pip install -r requirements.txt
686
+
687
+ # Run the CLI (display help message)
688
+ # Windows:
689
+ .venv\Scripts\python.exe inference_cli.py --help
690
+ # Linux/macOS:
691
+ .venv/bin/python inference_cli.py --help
692
+ ```
693
 
694
+ ### Command Line Usage
695
 
696
+ The CLI provides comprehensive options for single-GPU, multi-GPU, and batch processing workflows.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
697
 
698
+ **Basic Usage Examples:**
699
 
700
+ ```bash
701
+ # Basic image upscaling
702
+ python inference_cli.py image.jpg
703
+
704
+ # Basic video video upscaling with temporal consistency
705
+ python inference_cli.py video.mp4 --resolution 720 --batch_size 33
706
+
707
+ # Multi-GPU processing with temporal overlap
708
+ python inference_cli.py video.mp4 \
709
+ --cuda_device 0,1 \
710
+ --resolution 1080 \
711
+ --batch_size 81 \
712
+ --uniform_batch_size \
713
+ --temporal_overlap 3 \
714
+ --prepend_frames 4
715
+
716
+ # Memory-optimized for low VRAM (8GB)
717
+ python inference_cli.py image.png \
718
+ --dit_model seedvr2_ema_3b-Q8_0.gguf \
719
+ --resolution 1080 \
720
+ --blocks_to_swap 32 \
721
+ --swap_io_components \
722
+ --dit_offload_device cpu \
723
+ --vae_offload_device cpu
724
+
725
+ # High resolution with VAE tiling
726
+ python inference_cli.py video.mp4 \
727
+ --resolution 1440 \
728
+ --batch_size 31 \
729
+ --uniform_batch_size \
730
+ --temporal_overlap 3 \
731
+ --vae_encode_tiled \
732
+ --vae_decode_tiled
733
+
734
+ # Batch directory processing with model caching
735
+ python inference_cli.py media_folder/ \
736
+ --output processed/ \
737
+ --cuda_device 0 \
738
+ --cache_dit \
739
+ --cache_vae \
740
+ --dit_offload_device cpu \
741
+ --vae_offload_device cpu \
742
+ --resolution 1080 \
743
+ --max_resolution 1920
744
+ ```
745
 
746
+ ### Command Line Arguments
747
+
748
+ **Input/Output:**
749
+ - `<input>`: Input file (.mp4, .avi, .png, .jpg, etc.) or directory
750
+ - `--output`: Output path (default: auto-generated in 'output/' directory)
751
+ - `--output_format`: Output format: 'mp4' (video) or 'png' (image sequence). Default: auto-detect from input type
752
+ - `--model_dir`: Model directory (default: ./models/SEEDVR2)
753
+
754
+ **Model Selection:**
755
+ - `--dit_model`: DiT model to use. Options: 3B/7B with fp16/fp8/GGUF variants (default: 3B FP8)
756
+
757
+ **Processing Parameters:**
758
+ - `--resolution`: Target short-side resolution in pixels (default: 1080)
759
+ - `--max_resolution`: Maximum resolution for any edge. Scales down if exceeded. 0 = no limit (default: 0)
760
+ - `--batch_size`: Frames per batch (must follow 4n+1: 1, 5, 9, 13, 17, 21...). Ideally matches shot length for best temporal consistency (default: 5)
761
+ - `--seed`: Random seed for reproducibility (default: 42)
762
+ - `--skip_first_frames`: Skip N initial frames (default: 0)
763
+ - `--load_cap`: Load maximum N frames from video. 0 = load all (default: 0)
764
+ - `--prepend_frames`: Prepend N reversed frames to reduce start artifacts (auto-removed) (default: 0)
765
+ - `--temporal_overlap`: Frames to overlap between batches/GPUs for smooth blending (default: 0)
766
+
767
+ **Quality Control:**
768
+ - `--color_correction`: Color correction method: 'lab' (perceptual, recommended), 'wavelet', 'wavelet_adaptive', 'hsv', 'adain', or 'none' (default: lab)
769
+ - `--input_noise_scale`: Input noise injection scale (0.0-1.0). Reduces artifacts at high resolutions (default: 0.0)
770
+ - `--latent_noise_scale`: Latent space noise scale (0.0-1.0). Softens details if needed (default: 0.0)
771
+
772
+ **Memory Management:**
773
+ - `--dit_offload_device`: Device to offload DiT model: 'none' (keep on GPU), 'cpu', or 'cuda:X' (default: none)
774
+ - `--vae_offload_device`: Device to offload VAE model: 'none', 'cpu', or 'cuda:X' (default: none)
775
+ - `--blocks_to_swap`: Number of transformer blocks to swap (0=disabled, 3B: 0-32, 7B: 0-36). Requires dit_offload_device (default: 0)
776
+ - `--swap_io_components`: Offload I/O components for additional VRAM savings. Requires dit_offload_device
777
+ - `--use_non_blocking`: Use non-blocking memory transfers for BlockSwap (recommended)
778
+
779
+ **VAE Tiling:**
780
+ - `--vae_encode_tiled`: Enable VAE encode tiling to reduce VRAM during encoding
781
+ - `--vae_encode_tile_size`: VAE encode tile size in pixels (default: 1024)
782
+ - `--vae_encode_tile_overlap`: VAE encode tile overlap in pixels (default: 128)
783
+ - `--vae_decode_tiled`: Enable VAE decode tiling to reduce VRAM during decoding
784
+ - `--vae_decode_tile_size`: VAE decode tile size in pixels (default: 1024)
785
+ - `--vae_decode_tile_overlap`: VAE decode tile overlap in pixels (default: 128)
786
+ - `--tile_debug`: Visualize tiles: 'false' (default), 'encode', or 'decode'
787
+
788
+ **Performance Optimization:**
789
+ - `--attention_mode`: Attention backend: 'sdpa' (default, stable) or 'flash_attn' (faster, requires package)
790
+ - `--compile_dit`: Enable torch.compile for DiT model (20-40% speedup, requires PyTorch 2.0+ and Triton)
791
+ - `--compile_vae`: Enable torch.compile for VAE model (15-25% speedup, requires PyTorch 2.0+ and Triton)
792
+ - `--compile_backend`: Compilation backend: 'inductor' (full optimization) or 'cudagraphs' (lightweight) (default: inductor)
793
+ - `--compile_mode`: Optimization level: 'default', 'reduce-overhead', 'max-autotune', 'max-autotune-no-cudagraphs' (default: default)
794
+ - `--compile_fullgraph`: Compile entire model as single graph (faster but less flexible) (default: False)
795
+ - `--compile_dynamic`: Handle varying input shapes without recompilation (default: False)
796
+ - `--compile_dynamo_cache_size_limit`: Max cached compiled versions per function (default: 64)
797
+ - `--compile_dynamo_recompile_limit`: Max recompilation attempts before fallback (default: 128)
798
+
799
+ **Model Caching (batch processing):**
800
+ - `--cache_dit`: Cache DiT model between files (single GPU only, speeds up directory processing)
801
+ - `--cache_vae`: Cache VAE model between files (single GPU only, speeds up directory processing)
802
+
803
+ **Multi-GPU:**
804
+ - `--cuda_device`: CUDA device id(s). Single id (e.g., '0') or comma-separated list '0,1' for multi-GPU
805
+
806
+ **Debugging:**
807
+ - `--debug`: Enable verbose debug logging
808
+
809
+ ### Multi-GPU Processing Explained
810
+
811
+ The CLI's multi-GPU mode automatically distributes the workload across multiple GPUs with intelligent temporal overlap handling:
812
+
813
+ **How it works:**
814
+ 1. Video is split into chunks, one per GPU
815
+ 2. Each GPU processes its chunk independently
816
+ 3. Chunks overlap by `--temporal_overlap` frames
817
+ 4. Results are blended together seamlessly using the overlap region
818
+
819
+ **Example for 2 GPUs with temporal_overlap=4:**
820
+ ```
821
+ GPU 0: Frames 0-50 (includes 4 overlap frames at end)
822
+ GPU 1: Frames 46-100 (includes 4 overlap frames at beginning)
823
+ Result: Frames 0-100 with smooth transition at frame 48
824
+ ```
825
 
826
+ **Best practices:**
827
+ - Set `--temporal_overlap` to 2-8 frames for smooth blending
828
+ - Higher overlap = smoother transitions but more redundant processing
829
+ - Use `--prepend_frames` to reduce artifacts at video start
830
+ - batch_size should divide evenly into chunk sizes for best results
 
 
 
 
 
 
 
 
 
 
831
 
832
  ## ⚠️ Limitations
833
 
834
+ ### Model Limitations
 
835
 
836
+ **Batch Size Constraint**: The model requires batch_size to follow the **4n+1 formula** (1, 5, 9, 13, 17, 21, 25, ...) due to temporal consistency architecture. All frames in a batch are processed together for temporal coherence, then batches can be blended using temporal_overlap. Ideally, set batch_size to match your shot length for optimal quality.
837
 
838
+ ### Performance Considerations
839
 
840
+ **VAE Bottleneck**: Even with optimized DiT upscaling (BlockSwap, GGUF, torch.compile), the VAE encoding/decoding stages can be the bottleneck, especially for high resolutions. The VAE is slow. Use large batch_size to mitigate this.
841
 
842
+ **VRAM Usage**: While the integration now supports low VRAM systems (8GB or less with proper optimization), VRAM usage varies based on:
843
+ - Input/output resolution (larger = more VRAM)
844
+ - Batch size (higher = more VRAM but better temporal consistency and speed)
845
+ - Model choice (FP16 > FP8 > GGUF in VRAM usage)
846
+ - Optimization settings (BlockSwap, VAE tiling significantly reduce VRAM)
847
 
848
+ **Speed**: Processing speed depends on:
849
+ - GPU capabilities (compute performance, VRAM bandwidth, and architecture generation)
850
+ - Model size (3B faster than 7B)
851
+ - Batch size (larger batch sizes are faster per frame due to better GPU utilization)
852
+ - Optimization settings (torch.compile provides significant speedup)
853
+ - Resolution (higher resolutions are slower)
854
 
855
+ ### Best Practices
856
 
857
+ 1. **Start with debug enabled** to understand where VRAM is being used
858
+ 2. **For OOM errors during encoding**: Enable VAE encode tiling and reduce tile size
859
+ 3. **For OOM errors during upscaling**: Enable BlockSwap and increase blocks_to_swap
860
+ 4. **For OOM errors during decoding**: Enable VAE decode tiling and reduce tile size
861
+ - **If still getting OOM after trying all above**: Reduce batch_size or resolution
862
+ 5. **For best quality**: Use higher batch_size matching your shot length, FP16 models, and LAB color correction
863
+ 6. **For speed**: Use FP8/GGUF models, enable torch.compile, and use Flash Attention if available
864
+ 7. **Test settings with a short clip first** before processing long videos
865
+
866
+ ## 🤝 Contributing
867
 
868
+ Contributions are welcome! We value community input and improvements.
869
 
870
+ For detailed contribution guidelines, see [CONTRIBUTING.md](CONTRIBUTING.md).
 
 
 
871
 
872
+ **Quick Start:**
873
 
874
+ 1. Fork the repository
875
+ 2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
876
+ 3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
877
+ 4. Push to the branch (`git push origin feature/AmazingFeature`)
878
+ 5. Open a Pull Request to **main** branch for stable features or **nightly** branch for experimental features
879
 
880
+ **Get Help:**
881
+ - YouTube: [AInVFX Channel](https://www.youtube.com/@AInVFX)
882
+ - GitHub [Issues](https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler/issues): For bug reports and feature requests
883
+ - GitHub [Discussions](https://github.com/numz/ComfyUI-SeedVR2_VideoUpscaler/discussions): For questions and community support
884
+ - Discord: adrientoupet & NumZ#7184
885
 
886
  ## 🙏 Credits
887
 
888
+ This ComfyUI implementation is a collaborative project by **[NumZ](https://github.com/numz)** and **[AInVFX](https://www.youtube.com/@AInVFX)** (Adrien Toupet), based on the original [SeedVR2](https://github.com/ByteDance-Seed/SeedVR) by ByteDance Seed Team.
889
+
890
+ Special thanks to our community contributors including [benjaminherb](https://github.com/benjaminherb), [cmeka](https://github.com/cmeka), [FurkanGozukara](https://github.com/FurkanGozukara), [JohnAlcatraz](https://github.com/JohnAlcatraz), [lihaoyun6](https://github.com/lihaoyun6), [Luchuanzhao](https://github.com/Luchuanzhao), [Luke2642](https://github.com/Luke2642), [naxci1](https://github.com/naxci1), [q5sys](https://github.com/q5sys), and many others for their improvements, bug fixes, and testing.
891
 
892
+ ## 📜 License
893
 
894
+ The code in this repository is released under the MIT license as found in the [LICENSE](LICENSE) file.