Autoencoders / README.md

Update README.md

7079935 verified 23 days ago

13 kB

	---
	license: apache-2.0
	tags:
	- diffusion-single-file
	- comfyui
	- distillation
	- LoRA
	- video
	- video genration
	base_model:
	- Wan-AI/Wan2.2-I2V-A14B
	- Wan-AI/Wan2.2-TI2V-5B
	- Wan-AI/Wan2.1-I2V-14B-720P
	pipeline_tags:
	- image-to-video
	- text-to-video
	library_name: diffusers
	---
	# 🎨 LightVAE

	## ⚡ Efficient Video Autoencoder (VAE) Model Collection

	From Official Models to Lightx2v Distilled Optimized Versions - Balancing Quality, Speed and Memory
	![img_lightx2v](https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/tTnp8-ARpj3wGxfo5P55c.png)

	---

	[![🤗 HuggingFace](https://img.shields.io/badge/🤗-HuggingFace-yellow)](https://huggingface.co/lightx2v)
	[![GitHub](https://img.shields.io/badge/GitHub-LightX2V-blue?logo=github)](https://github.com/ModelTC/LightX2V)
	[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](LICENSE)

	---

	For VAE, the LightX2V team has conducted a series of deep optimizations, deriving two major series: LightVAE and LightTAE, which significantly reduce memory consumption and improve inference speed while maintaining high quality.

	## 💡 Core Advantages

	<table>
	<tr>
	<td width="50%">

	### 📊 Official VAE
	Features: Highest Quality ⭐⭐⭐⭐⭐

	✅ Best reconstruction accuracy
	✅ Complete detail preservation
	❌ Large memory usage (~8-12 GB)
	❌ Slow inference speed

	</td>
	<td width="50%">

	### 🚀 Open Source TAE Series
	Features: Fastest Speed ⚡⚡⚡⚡⚡

	✅ Minimal memory usage (~0.4 GB)
	✅ Extremely fast inference
	❌ Average quality ⭐⭐⭐
	❌ Potential detail loss

	</td>
	</tr>
	<tr>
	<td width="50%">

	### 🎯 LightVAE Series (Our Optimization)
	Features: Best Balanced Solution ⚖️

	✅ Uses Causal 3D Conv (same as official)
	✅ Quality close to official ⭐⭐⭐⭐
	✅ Memory reduced by ~50% (~4-5 GB)
	✅ Speed increased by 2-3x
	✅ Balances quality, speed, and memory 🏆

	</td>
	<td width="50%">

	### ⚡ LightTAE Series (Our Optimization)
	Features: Fast Speed + Good Quality 🏆

	✅ Minimal memory usage (~0.4 GB)
	✅ Extremely fast inference
	✅ Quality close to official ⭐⭐⭐⭐
	✅ Significantly surpasses open source TAE

	</td>
	</tr>
	</table>

	---

	## 📦 Available Models

	### 🎯 Wan2.1 Series VAE

	\| Model Name \| Type \| Architecture \| Description \|
	\|:--------\|:-----\|:-----\|:-----\|
	\| `Wan2.1_VAE` \| Official VAE \| Causal Conv3D \| Wan2.1 official video VAE model<br>Highest quality, large memory, slow speed \|
	\| `taew2_1` \| Open Source Small AE \| Conv2D \| Open source model based on [taeHV](https://github.com/madebyollin/taeHV)<br>Small memory, fast speed, average quality \|
	\| `lighttaew2_1` \| LightTAE Series \| Conv2D \| Our distilled optimized version based on `taew2_1`<br>Small memory, fast speed, quality close to official ✨ \|
	\| `lightvaew2_1` \| LightVAE Series \| Causal Conv3D \| Our pruned 75% on WanVAE2.1 architecture then trained+distilled<br>Best balance: high quality + low memory + fast speed 🏆 \|

	### 🎯 Wan2.2 Series VAE

	\| Model Name \| Type \| Architecture \| Description \|
	\|:--------\|:-----\|:-----\|:-----\|
	\| `Wan2.2_VAE` \| Official VAE \| Causal Conv3D \| Wan2.2 official video VAE model<br>Highest quality, large memory, slow speed \|
	\| `taew2_2` \| Open Source Small AE \| Conv2D \| Open source model based on [taeHV](https://github.com/madebyollin/taeHV)<br>Small memory, fast speed, average quality \|
	\| `lighttaew2_2` \| LightTAE Series \| Conv2D \| Our distilled optimized version based on `taew2_2`<br>Small memory, fast speed, quality close to official ✨ \|

	---


	## 📊 Wan2.1 Series Performance Comparison
	- Precision: BF16
	- Test Hardware: NVIDIA H100

	### Video Reconstruction (5s 81-frame video)

	\|Speed \| Wan2.1_VAE \| taew2_1 \| lighttaew2_1 \| lightvaew2_1 \|
	\|:-----\|:--------------\|:------------\|:---------------------\|:-------------\|
	\| Encode Speed \| 4.1721 s \| 0.3956 s \| 0.3956 s \|1.5014s \|
	\| Decode Speed \| 5.4649 s \| 0.2463 s \| 0.2463 s \| 2.0697s \|

	\|GPU Memory \| Wan2.1_VAE \| taew2_1 \| lighttaew2_1 \| lightvaew2_1 \|
	\|:-----\|:--------------\|:------------\|:---------------------\|:-------------\|
	\| Encode Memory \| 8.4954 GB \| 0.00858 GB \| 0.00858 GB \| 4.7631 GB \|
	\| Decode Memory \| 10.1287 GB \| 0.41199 GB \| 0.41199 GB \| 5.5673 GB \|

	### Video Generation

	Task: s2v(speech to video)
	Model: seko-talk

	<table>
	<tr>
	<td width="25%" align="center">
	<strong>Wan2.1_VAE</strong><br>
	<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/6l-P-3Hr9JKL3xgUyJXWJ.mp4"></video>
	</td>
	<td width="25%" align="center">
	<strong>taew2_1</strong><br>
	<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/rcVHrCKB4nRAs2VSjJd2d.mp4"></video>
	</td>
	<td width="25%" align="center">
	<strong>lighttaew2_1</strong><br>
	<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/Wq9p9Z7NDYwaKw4SqVbYT.mp4"></video>
	</td>
	<td width="25%" align="center">
	<strong>lightvaew2_1</strong><br>
	<video controls autoplay muted width="100%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/NpKOzFcvsHzSFfFACzUKP.mp4"></video>
	</td>
	</tr>
	</table>

	## 📊 Wan2.2 Series Performance Comparison
	- Precision: BF16
	- Test Hardware: NVIDIA H100

	### Video Reconstruction
	\| Speed \| Wan2.2_VAE \| taew2_2 \| lighttaew2_2 \|
	\|:-----\|:--------------\|:------------\|:---------------------\|
	\| Encode Speed \| 1.1369s \| 0.3499 s \| 0.3499 s \|
	\| Decode Speed \| 3.1268 s \| 0.0891 s \| 0.0891 s\|

	\| GPU Memory \| Wan2.2_VAE \| taew2_2 \| lighttaew2_2 \|
	\|:-----\|:--------------\|:------------\|:---------------------\|
	\| Encode Memory \| 6.1991 GB \| 0.0064 GB \| 0.0064 GB \|
	\| Decode Memory \| 12.3487 GB \| 0.4120 GB \| 0.4120 GB \|


	### Video Generation

	Task: t2v(text to video)
	Model: [Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)

	<table>
	<tr>
	<td width="33%" align="center">
	<strong>Wan2.2_VAE</strong><br>
	<video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/KUY7Ifz9gFJqDjWga6A53.mp4"></video>
	</td>
	<td width="33%" align="center">
	<strong>taew2_2</strong><br>
	<video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/OYA8VfNlCv_hBkj_n_OMl.mp4"></video>
	</td>
	<td width="33%" align="center">
	<strong>lighttaew2_2</strong><br>
	<video controls autoplay width="95%" src="https://cdn-uploads.huggingface.co/production/uploads/680de13385293771bc57400b/gaHRr6uuAF0NlH4YlMbHO.mp4"></video>
	</td>
	</tr>
	</table>



	## 🎯 Model Selection Recommendations

	### Selection by Use Case

	<table>
	<tr>
	<td width="33%">

	#### 🏆 Pursuing Best Quality
	Recommended: `Wan2.1_VAE` / `Wan2.2_VAE`

	- ✅ Official model, quality ceiling
	- ✅ Highest reconstruction accuracy
	- ✅ Suitable for final product output
	- ⚠️ Large memory usage (~8-12 GB)
	- ⚠️ Slow inference speed

	</td>
	<td width="33%">

	#### ⚖️ Best Balance 🏆
	Recommended: `lightvaew2_1`

	- ✅ Uses Causal 3D Conv (same as official)
	- ✅ Excellent quality, close to official
	- ✅ Memory reduced by ~50% (~4-5 GB)
	- ✅ Speed increased by 2-3x
	- ✅ Close to official quality ⭐⭐⭐⭐

	Use Cases: Daily production, strongly recommended ⭐

	</td>
	<td width="33%">

	#### ⚡ Speed + Quality Balance ✨
	Recommended: `lighttaew2_1` / `lighttaew2_2`

	- ✅ Extremely low memory usage (~0.4 GB)
	- ✅ Extremely fast inference
	- ✅ Quality significantly surpasses open source TAE
	- ✅ Close to official quality ⭐⭐⭐⭐

	Use Cases: Development testing, rapid iteration

	</td>
	</tr>
	</table>


	### 🔥 Our Optimization Results Comparison

	\| Comparison \| Open Source TAE \| LightTAE (Ours) \| Official VAE \| LightVAE (Ours) \|
	\|:------\|:--------\|:---------------------\|:---------\|:---------------------\|
	\| Architecture \| Conv2D \| Conv2D \| Causal Conv3D \| Causal Conv3D \|
	\| Memory Usage \| Minimal (~0.4 GB) \| Minimal (~0.4 GB) \| Large (~8-12 GB) \| Medium (~4-5 GB) \|
	\| Inference Speed \| Extremely Fast ⚡⚡⚡⚡⚡ \| Extremely Fast ⚡⚡⚡⚡⚡ \| Slow ⚡⚡ \| Fast ⚡⚡⚡⚡ \|
	\| Generation Quality \| Average ⭐⭐⭐ \| Close to Official ⭐⭐⭐⭐ \| Highest ⭐⭐⭐⭐⭐ \| Close to Official ⭐⭐⭐⭐ \|

	## 📑 Todo List
	- [x] LightX2V integration
	- [x] ComfyUI integration
	- [ ] Training & Distillation Code

	## 🚀 Usage

	### Download VAE Models

	```bash
	# Download Wan2.1 official VAE
	huggingface-cli download lightx2v/Autoencoders \
	--local-dir ./models/vae/
	```

	### 🧪 Video Reconstruction Test

	We provide a standalone script `vid_recon.py` to test VAE models independently. This script reads a video, encodes it through VAE, then decodes it back to verify the reconstruction quality.

	Script Location: `LightX2V/lightx2v/models/video_encoders/hf/vid_recon.py`

	```bash
	git clone https://github.com/ModelTC/LightX2V.git
	cd LightX2V
	```

	1. Test Official VAE (Wan2.1)
	```bash
	python -m lightx2v.models.video_encoders.hf.vid_recon \
	input_video.mp4 \
	--checkpoint ./models/vae/Wan2.1_VAE.pth \
	--model_type vaew2_1 \
	--device cuda \
	--dtype bfloat16
	```

	2. Test Official VAE (Wan2.2)
	```bash
	python -m lightx2v.models.video_encoders.hf.vid_recon \
	input_video.mp4 \
	--checkpoint ./models/vae/Wan2.2_VAE.pth \
	--model_type vaew2_2 \
	--device cuda \
	--dtype bfloat16
	```

	3. Test LightTAE (Wan2.1)
	```bash
	python -m lightx2v.models.video_encoders.hf.vid_recon \
	input_video.mp4 \
	--checkpoint ./models/vae/lighttaew2_1.pth \
	--model_type taew2_1 \
	--device cuda \
	--dtype bfloat16
	```

	4. Test LightTAE (Wan2.2)
	```bash
	python -m lightx2v.models.video_encoders.hf.vid_recon \
	input_video.mp4 \
	--checkpoint ./models/vae/lighttaew2_2.pth \
	--model_type taew2_2 \
	--device cuda \
	--dtype bfloat16
	```

	5. Test LightVAE (Wan2.1)
	```bash
	python -m lightx2v.models.video_encoders.hf.vid_recon \
	input_video.mp4 \
	--checkpoint ./models/vae/lightvaew2_1.pth \
	--model_type vaew2_1 \
	--device cuda \
	--dtype bfloat16 \
	--use_lightvae
	```


	6. Test TAE (Wan2.1)
	```bash
	python -m lightx2v.models.video_encoders.hf.vid_recon \
	input_video.mp4 \
	--checkpoint ./models/vae/taew2_1.pth \
	--model_type taew2_1 \
	--device cuda \
	--dtype bfloat16
	```

	7. Test TAE (Wan2.2)
	```bash
	python -m lightx2v.models.video_encoders.hf.vid_recon \
	input_video.mp4 \
	--checkpoint ./models/vae/taew2_2.pth \
	--model_type taew2_1 \
	--device cuda \
	--dtype bfloat16
	```

	### Use in LightX2V

	Specify the VAE path in the configuration file:


	Using Official VAE Series:
	```json
	{

	"vae_path": "./models/vae/Wan2.1_VAE.pth"
	}
	```

	Using LightVAE Series:
	```json
	{
	"use_lightvae": true,
	"vae_path": "./models/vae/lightvaew2_1.pth"
	}
	```


	Using LightTAE Series:
	```json
	{
	"use_tae": true,
	"need_scaled": true,
	"tae_path": "./models/vae/lighttaew2_1.pth"
	}
	```


	Using TAE Series:
	```json
	{
	"use_tae": true,
	"tae_path": "./models/vae/taew2_1.pth"
	}
	```

	Then run the inference script:

	```bash
	cd LightX2V/scripts
	bash wan/run_wan_i2v.sh # or other inference scripts
	```

	### Use in ComfyUI

	please refer to https://github.com/ModelTC/ComfyUI-LightVAE

	## ⚠️ Important Notes

	### 1. Compatibility
	- Wan2.1 series VAE only works with Wan2.1 backbone models
	- Wan2.2 series VAE only works with Wan2.2 backbone models
	- Do not mix different versions of VAE and backbone models

	## 📚 Related Resources

	### Documentation Links
	- LightX2V Quick Start: [Quick Start Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/quickstart.html)
	- Model Structure Description: [Model Structure Documentation](https://lightx2v-zhcn.readthedocs.io/zh-cn/latest/getting_started/model_structure.html)
	- taeHV Project: [GitHub - madebyollin/taeHV](https://github.com/madebyollin/taeHV)

	### Related Models
	- Wan2.1 Backbone Models: [Wan-AI Model Collection](https://huggingface.co/Wan-AI)
	- Wan2.2 Backbone Models: [Wan-AI/Wan2.2-TI2V-5B](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B)
	- LightX2V Optimized Models: [lightx2v Model Collection](https://huggingface.co/lightx2v)

	---

	## 🤝 Community & Support

	- GitHub Issues: https://github.com/ModelTC/LightX2V/issues
	- HuggingFace: https://huggingface.co/lightx2v
	- LightX2V Homepage: https://github.com/ModelTC/LightX2V

	If you find this project helpful, please give us a ⭐ on [GitHub](https://github.com/ModelTC/LightX2V)