LTX2.3-ICEdit-Insight

LTX2.3-ICEdit-Insight is a task-aware video restoration and editing model family developed by JoyFox Lab, built on top of the LTX-2.3 DiT-based audio-video foundation model.

This release focuses on four practical video editing directions:

Video Restoration: degradation recovery, compression cleanup, blur and noise reduction, and damaged detail restoration.
Video HD Enhancement: super-resolution, detail reconstruction, texture sharpening, and perceptual quality improvement.
Watermark Removal: logo cleanup, semi-transparent overlay removal, and occlusion-aware background reconstruction.
Subtitle Removal: hard subtitle removal, caption cleanup, text overlay removal, and temporally stable inpainting.

Unlike conventional frame-level enhancement pipelines, this model family operates as a generative video restoration system in latent video space. It is designed to preserve global structure, camera motion, object identity, and temporal consistency while reconstructing missing or degraded visual content.

Project links: GitHub project | JoyFox on Hugging Face

📦 Model Files

File	Purpose
`ltx-2.3-edit-insight-dev-fp8.safetensors`	Unified Insight base checkpoint for LTX-2.3 editing
`ltx2.3-video-restoration-general.safetensors`	Video restoration, artifact cleanup, blur and noise recovery
`ltx2.3-ic-video-upscale-general.safetensors`	Video HD enhancement, super-resolution, and detail recovery
`ltx2.3-ic-watermark-remove-general.safetensors`	Watermark removal and occlusion-aware reconstruction
`ltx2.3-ic-subtitles-remove-general.safetensors`	Subtitle removal and text overlay cleanup

🎬 Showcase

Video Restoration	Video HD Enhancement

Watermark Removal	Subtitle Removal

Video Restoration	Video HD Enhancement

Watermark Removal	Subtitle Removal

🚀 Script Usage

Run all scripts from the project root.

bash run_restoration.sh
bash run_hd.sh
bash run_hd.sh /path/to/input.mp4
bash run_watermark_rm.sh
bash run_watermark_rm.sh /path/to/input.mp4
bash run_subtitle_rm.sh
bash run_subtitle_rm.sh /path/to/input.mp4

💻 Command Examples

Video Restoration

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
python run_pipeline.py \
  --mode restoration \
  --video ./inputs/input_480p.mp4 \
  --prompt "Convert the video to ultra-high-definition quality while removing artifacts and rebuilding high-frequency details." \
  --output ./outputs/output_restoration.mp4 \
  --height 1184 --width 704 --num-frames 97 \
  --fps 24.0 --seed 42 \
  --sigma-profile workflow \
  --streaming-prefetch-count 2 \
  --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
  --lora ./models/loras/ltx2.3-train/ltx2.3-video-restoration-general.safetensors

Video HD Enhancement

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
python run_pipeline.py \
  --mode hd \
  --video ./inputs/input_480p.mp4 \
  --prompt "Convert the video to ultra-high-definition quality, significantly improving clarity, fine detail richness, texture fidelity, and overall perceptual sharpness." \
  --output ./outputs/output_hd.mp4 \
  --height 1184 --width 704 --num-frames 97 \
  --fps 24.0 --seed 42 \
  --sigma-profile workflow \
  --streaming-prefetch-count 2 \
  --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
  --lora ./models/loras/ltx2.3-train/ltx2.3-ic-video-upscale-general.safetensors

Watermark Removal

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
python run_pipeline.py \
  --mode watermark_rm \
  --video ./inputs/input_480p.mp4 \
  --prompt "Remove short-video platform watermarks and related occlusions from the video, restoring a clean, clear, and natural original image." \
  --output ./outputs/output_watermark_rm.mp4 \
  --height 1184 --width 704 --num-frames 97 \
  --fps 24.0 --seed 1546 \
  --sigma-profile workflow \
  --streaming-prefetch-count 2 \
  --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
  --lora ./models/loras/ltx2.3-train/ltx2.3-ic-watermark-remove-general.safetensors

Subtitle Removal

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
python run_pipeline.py \
  --mode subtitle_rm \
  --video ./inputs/input_480p.mp4 \
  --prompt "Remove subtitles, captions, and related text occlusions from the video, restoring a clean and natural underlying image." \
  --output ./outputs/output_subtitle_rm.mp4 \
  --height 1184 --width 704 --num-frames 97 \
  --fps 24.0 --seed 42 \
  --sigma-profile workflow \
  --streaming-prefetch-count 2 \
  --model-checkpoint ./models/checkpoints/ltx-2.3-edit-insight-dev-fp8.safetensors \
  --lora ./models/loras/ltx2.3-train/ltx2.3-ic-subtitles-remove-general.safetensors

✨ Key Improvements

Task-Aware IC-Edit Framework

We introduce a task-aware IC-Edit training framework for LTX-2.3, where each restoration direction is optimized with dedicated instruction conditioning and task-specific IC-LoRA adapters.

The model is trained not only to improve visual quality, but also to understand the editing goal behind different restoration tasks, including watermark removal, subtitle cleanup, damaged region recovery, and high-definition enhancement.

LTX-2.3 DiT Backbone Adaptation

The model family is built on the LTX-2.3 foundation architecture, a diffusion-transformer video model designed for high-fidelity image-to-video and video generation workflows.

Our adaptation targets video restoration by improving:

latent-space editability
instruction-following behavior
frame-to-frame stability
high-frequency detail recovery
local reconstruction around degraded or occluded regions

Spatiotemporal Consistency Optimization

Video restoration requires more than strong single-frame quality. We optimize temporal consistency so that restored areas remain stable across adjacent frames.

This reduces common artifacts such as:

flickering textures
unstable reconstructed backgrounds
inconsistent watermark removal
subtitle ghosting
frame-wise color shift
detail popping during motion

Degradation-Aware Training Curriculum

The training curriculum covers realistic video defects including:

compression artifacts
motion blur
sensor noise
low-bitrate video
text overlays
hard subtitles
semi-transparent watermarks
platform logos
local occlusions
low-resolution inputs

This improves generalization across short videos, social-media clips, mobile footage, downloaded videos, and compressed production material.

Occlusion-Aware Reconstruction

For watermark and subtitle removal, the model is optimized to reconstruct the hidden visual content behind occluded regions.

Instead of smearing or blurring the target area, it uses surrounding spatial context and temporal cues to infer plausible background structure, object boundaries, lighting, and texture continuity.

Frequency-Enhanced HD Restoration

For HD enhancement, the model improves perceptual sharpness and fine visual detail through frequency-aware restoration training.

This is especially helpful for recovering:

hair strands
fabric texture
skin detail
product edges
background patterns
typography-like fine structures
natural image clarity

🧠 Inference Notes

Single-stage inference is recommended for most editing tasks.
Two-stage refinement can improve visual polish but may weaken task-specific LoRA constraints.
Watermark and subtitle removal perform best when the occlusion area is stable and not excessively large.
HD enhancement quality depends on input resolution, motion complexity, and compression level.
Higher output resolution improves detail but requires more VRAM.
For strong-motion videos, conservative denoising settings are recommended to preserve temporal structure.
Frame count should follow the 8k + 1 rule.
Output height and width should be multiples of 32 in single-stage inference.

🏗️ Training

This model family was trained and optimized by JoyFox Lab (Chengdu Xuanhu Technology Co., Ltd.).

The training pipeline includes:

task-aware video restoration data construction
degradation synthesis and curriculum training
IC-LoRA specialization for four editing directions
temporal consistency regularization
occlusion-aware reconstruction training
high-frequency perceptual enhancement
instruction-guided video editing optimization

📬 Contact

For research collaboration, commercial licensing, or workflow integration, contact:

z@vvicat.com

📜 License

Licensed under Apache 2.0.

Please also review the license terms of the upstream LTX-2.3 base model when using or redistributing derivative checkpoints.

Downloads last month: 524

Model tree for joyfox/LTX2.3-ICEdit-Insight

Base model

Lightricks/LTX-2.3

Finetuned

(46)

this model