Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

.gitattributes +3 -0
README.md +82 -0
README_from_modelscope.md +96 -0
configuration.json +1 -0
model.safetensors +3 -0
video_merged_1.mp4 +3 -0
video_merged_2.mp4 +3 -0
video_merged_3.mp4 +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+video_merged_1.mp4 filter=lfs diff=lfs merge=lfs -text
+video_merged_2.mp4 filter=lfs diff=lfs merge=lfs -text
+video_merged_3.mp4 filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,82 @@

+---
+license: apache-2.0
+---
+# Wanxiao 2.1-1.3B-LoRA-Speed-Control-v1
+## Model Introduction
+This model is trained based on the [Wanxiao 2.1-1.3B](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-1.3B) model and the [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) framework. Using a structure similar to T2I-Adapter, an additional motion speed encoder is introduced to incorporate the *motion bucket id* for controlling the magnitude of motion. The encoder adopts a RoPE encoding + MLP architecture. The *motion bucket id* is calculated based on the quantiles of the standard deviation of latents along the temporal axis, mapped to a control range of 0–100.
+* **motion bucket id = 1**: Slower motion, enhanced visual quality
+* **motion bucket id = 100**: Faster motion, reduced visual quality
+## Model Performance
+**Prompt**: Documentary photography style, an energetic puppy rapidly running on lush green grass. The puppy has brownish-yellow fur, upright ears, and an expression that is focused and joyful. Sunlight shines on its body, making its fur appear exceptionally soft and shiny. The background features an open grassland with occasional wildflowers, and in the distance, a faint view of blue sky and scattered clouds. Strong perspective emphasizes the dynamic movement of the running puppy and the vitality of the surrounding grass. Medium shot with a side-moving viewpoint.
+**Negative Prompt**: Vivid colors, overexposure, static, blurry details, subtitles, style, artwork, painting, stillness, overall grayish tone, worst quality, low quality, JPEG compression artifacts, ugly, defective, extra fingers, poorly drawn hands, poorly drawn face, deformed limbs, fused fingers, motionless frames, cluttered background, three legs, crowded background people, walking backwards.
+**Example 1** (seed=1, left: motion bucket id=1, right: motion bucket id=100)
+<div align="center"><video width="80%" controls><source src="video_merged_1.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>
+**Example 2** (seed=2, left: motion bucket id=1, right: motion bucket id=100)
+<div align="center"><video width="80%" controls><source src="video_merged_2.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>
+**Example 3** (seed=3, left: motion bucket id=1, right: motion bucket id=100)
+<div align="center"><video width="80%" controls><source src="video_merged_2.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>
+## Usage Instructions
+This model is trained using the [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) framework. Please install it first:
+```
+pip install diffsynth
+```
+```python
+import torch
+from diffsynth import ModelManager, WanVideoPipeline, save_video, VideoData
+from modelscope import snapshot_download
+# Download models
+snapshot_download("Wan-AI/Wan2.1-T2V-1.3B", local_dir="models/Wan-AI/Wan2.1-T2V-1.3B")
+snapshot_download("DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1", local_dir="models/DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1")
+```
+# Load models
+model_manager = ModelManager(device="cpu")
+model_manager.load_models(
+    [
+        "models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors",
+        "models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth",
+        "models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth",
+        "models/DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1/model.safetensors",
+    ],
+    torch_dtype=torch.bfloat16, # You can set `torch_dtype=torch.float8_e4m3fn` to enable FP8 quantization.
+)
+pipe = WanVideoPipeline.from_model_manager(model_manager, torch_dtype=torch.bfloat16, device="cuda")
+pipe.enable_vram_management(num_persistent_param_in_dit=None)
+# Text-to-video
+video = pipe(
+    prompt="Documentary photography style scene: a lively little dog rapidly running on a green grassy field. The dog has a brownish-yellow coat, upright ears, and an expression of focus and joy. Sunlight shines on its body, making its fur appear exceptionally soft and shiny. The background is an open grassland, occasionally dotted with a few wildflowers, with a faint view of blue sky and some white clouds in the distance. Strong sense of perspective captures the dynamic motion of the running dog and the vitality of the surrounding grass. Medium shot with a side-moving viewpoint.",
+    negative_prompt="vivid colors, overexposed, static, blurry details, subtitles, style, artwork, painting, frame, still, overall grayish, worst quality, low quality, JPEG compression artifacts, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, motionless frame, cluttered background, three legs, crowded background, walking backwards",
+    num_inference_steps=50,
+    seed=1, tiled=True,
+    motion_bucket_id=0
+)
+save_video(video, "video_slow.mp4", fps=15, quality=5)
+video = pipe(
+    prompt="Documentary photography style scene: a lively little dog rapidly running on a green grassy field. The dog has a brownish-yellow coat, upright ears, and an expression of focus and joy. Sunlight shines on its body, making its fur appear exceptionally soft and shiny. The background is an open grassland, occasionally dotted with a few wildflowers, with a faint view of blue sky and some white clouds in the distance. Strong sense of perspective captures the dynamic motion of the running dog and the vitality of the surrounding grass. Medium shot with a side-moving viewpoint.",
+    negative_prompt="vivid colors, overexposed, static, blurry details, subtitles, style, artwork, painting, frame, still, overall grayish, worst quality, low quality, JPEG compression artifacts, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, motionless frame, cluttered background, three legs, crowded background, walking backwards",
+    num_inference_steps=50,
+    seed=1, tiled=True,
+    motion_bucket_id=100
+)
+save_video(video, "video_fast.mp4", fps=15, quality=5)
+```

README_from_modelscope.md ADDED Viewed

	@@ -0,0 +1,96 @@

+---
+base_model: MusePublic/wan2.1-1.3b@v1
+frameworks:
+- Pytorch
+license: Apache License 2.0
+tags:
+- LoRA
+- text2video generation
+tasks:
+- text-to-video-synthesis
+trigger_words:
+- "low speed"
+vision_foundation: WAN_VIDEO_2_1_T2V_1_3_B
+---
+# 通义万相2.1-1.3B-LoRA-速度控制-v1
+## 模型介绍
+本模是基于模型[通义万相2.1-1.3B](https://www.modelscope.cn/models/Wan-AI/Wan2.1-T2V-1.3B)和框架 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) 训练，以类似于T2I-Adapter的结构，额外接入了一个动作速度编码器来引入motion bucket id，用于控制运动幅度。本编码器的结构是RoPE编码 + MLP，motion bucket id 按照 latents 在时间轴上的标准差的分位数计算，映射到 0～100 的控制范围。
+* **motion bucket id = 1**: 运动速度变慢，画质增强
+* **motion bucket id = 100**: 运动速度变快，画质降低
+## 模型效果
+提示词：纪实摄影风格画面，一只活泼的小狗在绿茵茵的草地上迅速奔跑。小狗毛色棕黄，两只耳朵立起，神情专注而欢快。阳光洒在它身上，使得毛发看上去格外柔软而闪亮。背景是一片开阔的草地，偶尔点缀着几朵野花，远处隐约可见蓝天和几片白云。透视感鲜明，捕捉小狗奔跑时的动感和四周草地的生机。中景侧面移动视角。
+负面提示词：色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走
+案例1（seed=1, 左边motion bucket id=1, 右边motion bucket id=100）
+<div align="center"><video width="80%" controls><source src="video_merged_1.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>
+案例2（seed=2, 左边motion bucket id=1, 右边motion bucket id=100）
+<div align="center"><video width="80%" controls><source src="video_merged_2.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>
+案例3（seed=3, 左边motion bucket id=1, 右边motion bucket id=100）
+<div align="center"><video width="80%" controls><source src="video_merged_2.mp4" type="video/mp4">Your browser does not support the video tag.</video></div>
+## 使用说明
+本模型基于框架 [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) 训练，请先安装
+```
+pip install diffsynth
+```
+```python
+import torch
+from diffsynth import ModelManager, WanVideoPipeline, save_video, VideoData
+from modelscope import snapshot_download
+# Download models
+snapshot_download("Wan-AI/Wan2.1-T2V-1.3B", local_dir="models/Wan-AI/Wan2.1-T2V-1.3B")
+snapshot_download("DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1", local_dir="models/DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1")
+# Load models
+model_manager = ModelManager(device="cpu")
+model_manager.load_models(
+    [
+        "models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors",
+        "models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth",
+        "models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth",
+        "models/DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1/model.safetensors",
+    ],
+    torch_dtype=torch.bfloat16, # You can set `torch_dtype=torch.float8_e4m3fn` to enable FP8 quantization.
+)
+pipe = WanVideoPipeline.from_model_manager(model_manager, torch_dtype=torch.bfloat16, device="cuda")
+pipe.enable_vram_management(num_persistent_param_in_dit=None)
+# Text-to-video
+video = pipe(
+    prompt="纪实摄影风格画面，一只活泼的小狗在绿茵茵的草地上迅速奔跑。小狗毛色棕黄，两只耳朵立起，神情专注而欢快。阳光洒在它身上，使得毛发看上去格外柔软而闪亮。背景是一片开阔的草地，偶尔点缀着几朵野花，远处隐约可见蓝天和几片白云。透视感鲜明，捕捉小狗奔跑时的动感和四周草地的生机。中景侧面移动视角。",
+    negative_prompt="色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走",
+    num_inference_steps=50,
+    seed=1, tiled=True,
+    motion_bucket_id=0
+)
+save_video(video, "video_slow.mp4", fps=15, quality=5)
+video = pipe(
+    prompt="纪实摄影风格画面，一只活泼的小狗在绿茵茵的草地上迅速奔跑。小狗毛色棕黄，两只耳朵立起，神情专注而欢快。阳光洒在它身上，使得毛发看上去格外柔软而闪亮。背景是一片开阔的草地，偶尔点���着几朵野花，远处隐约可见蓝天和几片白云。透视感鲜明，捕捉小狗奔跑时的动感和四周草地的生机。中景侧面移动视角。",
+    negative_prompt="色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走",
+    num_inference_steps=50,
+    seed=1, tiled=True,
+    motion_bucket_id=100
+)
+save_video(video, "video_fast.mp4", fps=15, quality=5)
+```

configuration.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"framework":"Pytorch","task":"text-to-video-synthesis"}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:44f06a18b0ea2fea0be49641d7a7db23491028208677f6a349ebff8ed93eadd7
+size 33841656

video_merged_1.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b7417ec6b1192a226a8a03f8a8696773057d11f4a17e7a9fd9b63961ceba8da4
+size 1481100

video_merged_2.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ab9926a8e4a012bf6edfa2a2c549eeb43fb80ab0ee043d45f8aeb385591b1ee0
+size 1239117

video_merged_3.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b005d6e99d07e1733b9c5b4e6799f8b4b98b22cb4e6f9931126c357a4b70833b
+size 1286867