DiTFuse: Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach (Official Weights)

This repository provides the official pretrained weights for DiTFuse. The project code is available on GitHub:

๐Ÿ‘‰ GitHub: https://github.com/Henry-Lee-real/DiTFuse

DiTFuse supports multiple fusion tasksโ€”including infraredโ€“visible fusion, multi-focus fusion, multi-exposure fusion, and instruction-driven controllable fusion / segmentationโ€”all within a single unified model.


๐Ÿ“Œ Available Model Versions

๐Ÿ”น V1 โ€” Stronger Zero-Shot Generalization

  • Designed with better zero-shot fusion capability.
  • Performs robustly on unseen fusion scenarios.
  • Recommended if your use case emphasizes cross-dataset generalization.

๐Ÿ”น V2 โ€” Full Capability Version (Paper Model)

  • This is the main model used in the DiTFuse paper.

  • Provides the most comprehensive capabilities:

    • Full instruction-following control
    • Joint fusion + segmentation
    • Better fidelity and controllability
    • Stronger alignment with text prompts
  • Recommended for research reproduction, benchmarking, and controllable image fusion tasks.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support