NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation
Abstract
Phase-Preserving Diffusion and Frequency-Selective Structured noise enable structure-aligned generation in diffusion models without altering architecture or introducing extra parameters, enhancing performance in tasks like re-rendering and simulation.
Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion φ-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. φ-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, φ-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, φ-PD improves CARLA-to-Waymo planner performance by 50\%. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our https://yuzeng-at-tri.github.io/ppd-page/{project page}.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Training-Free Generation of Diverse and High-Fidelity Images via Prompt Semantic Space Optimization (2025)
- ReMix: Towards a Unified View of Consistent Character Generation and Editing (2025)
- GeoVideo: Introducing Geometric Regularization into Video Generation Model (2025)
- PickStyle: Video-to-Video Style Transfer with Context-Style Adapters (2025)
- Progressive Image Restoration via Text-Conditioned Video Generation (2025)
- FreeControl: Efficient, Training-Free Structural Control via One-Step Attention Extraction (2025)
- Are Image-to-Video Models Good Zero-Shot Image Editors? (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
