Wanxiao 2.1-1.3B-LoRA-Speed-Control-v1

Model Introduction

This model is trained based on the Wanxiao 2.1-1.3B model and the DiffSynth-Studio framework. Using a structure similar to T2I-Adapter, an additional motion speed encoder is introduced to incorporate the motion bucket id for controlling the magnitude of motion. The encoder adopts a RoPE encoding + MLP architecture. The motion bucket id is calculated based on the quantiles of the standard deviation of latents along the temporal axis, mapped to a control range of 0–100.

motion bucket id = 1: Slower motion, enhanced visual quality
motion bucket id = 100: Faster motion, reduced visual quality

Model Performance

Prompt: Documentary photography style, an energetic puppy rapidly running on lush green grass. The puppy has brownish-yellow fur, upright ears, and an expression that is focused and joyful. Sunlight shines on its body, making its fur appear exceptionally soft and shiny. The background features an open grassland with occasional wildflowers, and in the distance, a faint view of blue sky and scattered clouds. Strong perspective emphasizes the dynamic movement of the running puppy and the vitality of the surrounding grass. Medium shot with a side-moving viewpoint.

Negative Prompt: Vivid colors, overexposure, static, blurry details, subtitles, style, artwork, painting, stillness, overall grayish tone, worst quality, low quality, JPEG compression artifacts, ugly, defective, extra fingers, poorly drawn hands, poorly drawn face, deformed limbs, fused fingers, motionless frames, cluttered background, three legs, crowded background people, walking backwards.

Example 1 (seed=1, left: motion bucket id=1, right: motion bucket id=100)

Example 2 (seed=2, left: motion bucket id=1, right: motion bucket id=100)

Example 3 (seed=3, left: motion bucket id=1, right: motion bucket id=100)

Usage Instructions

This model is trained using the DiffSynth-Studio framework. Please install it first:

pip install diffsynth

import torch
from diffsynth import ModelManager, WanVideoPipeline, save_video, VideoData
from modelscope import snapshot_download


# Download models
snapshot_download("Wan-AI/Wan2.1-T2V-1.3B", local_dir="models/Wan-AI/Wan2.1-T2V-1.3B")
snapshot_download("DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1", local_dir="models/DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1")

# Load models
model_manager = ModelManager(device="cpu")
model_manager.load_models(
    [
        "models/Wan-AI/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors",
        "models/Wan-AI/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth",
        "models/Wan-AI/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth",
        "models/DiffSynth-Studio/Wan2.1-1.3b-speedcontrol-v1/model.safetensors",
    ],
    torch_dtype=torch.bfloat16, # You can set `torch_dtype=torch.float8_e4m3fn` to enable FP8 quantization.
)
pipe = WanVideoPipeline.from_model_manager(model_manager, torch_dtype=torch.bfloat16, device="cuda")
pipe.enable_vram_management(num_persistent_param_in_dit=None)

# Text-to-video
video = pipe(
    prompt="Documentary photography style scene: a lively little dog rapidly running on a green grassy field. The dog has a brownish-yellow coat, upright ears, and an expression of focus and joy. Sunlight shines on its body, making its fur appear exceptionally soft and shiny. The background is an open grassland, occasionally dotted with a few wildflowers, with a faint view of blue sky and some white clouds in the distance. Strong sense of perspective captures the dynamic motion of the running dog and the vitality of the surrounding grass. Medium shot with a side-moving viewpoint.",
    negative_prompt="vivid colors, overexposed, static, blurry details, subtitles, style, artwork, painting, frame, still, overall grayish, worst quality, low quality, JPEG compression artifacts, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, motionless frame, cluttered background, three legs, crowded background, walking backwards",
    num_inference_steps=50,
    seed=1, tiled=True,
    motion_bucket_id=0
)
save_video(video, "video_slow.mp4", fps=15, quality=5)

video = pipe(
    prompt="Documentary photography style scene: a lively little dog rapidly running on a green grassy field. The dog has a brownish-yellow coat, upright ears, and an expression of focus and joy. Sunlight shines on its body, making its fur appear exceptionally soft and shiny. The background is an open grassland, occasionally dotted with a few wildflowers, with a faint view of blue sky and some white clouds in the distance. Strong sense of perspective captures the dynamic motion of the running dog and the vitality of the surrounding grass. Medium shot with a side-moving viewpoint.",
    negative_prompt="vivid colors, overexposed, static, blurry details, subtitles, style, artwork, painting, frame, still, overall grayish, worst quality, low quality, JPEG compression artifacts, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, motionless frame, cluttered background, three legs, crowded background, walking backwards",
    num_inference_steps=50,
    seed=1, tiled=True,
    motion_bucket_id=100
)
save_video(video, "video_fast.mp4", fps=15, quality=5)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support