VMAE MR25 ImageNet256 400ep LPIPS Tuned
This is a VMAE (Vision Masked AutoEncoder) model trained for LDMAE (Latent Diffusion with Masked AutoEncoder).
Model Details
- Architecture: VMAE (Vision Masked AutoEncoder) with f8d16 configuration
- Training: 400 epochs on ImageNet 256x256
- Optimization: LPIPS-tuned for better perceptual quality
- Compression: 8x spatial compression, 16-dimensional latent space
- Input Size: 256x256 RGB images
- Output: 32x32x16 latent representation
Usage
import torch
from models_mae import mae_for_ldmae_f8d16_prev
# Load model
model = mae_for_ldmae_f8d16_prev(
ldmae_mode=True,
no_cls=True,
kl_loss_weight=True,
smooth_output=True,
img_size=256
)
# Load checkpoint
checkpoint = torch.load('pytorch_model.bin', map_location='cpu')
model.load_state_dict(checkpoint['model'], strict=False)
model.eval()
# Encode images
with torch.no_grad():
latents = model.encode(images).latent_dist.mode()
# Decode latents
with torch.no_grad():
reconstructed = model.decode(latents).sample
Training Configuration
augmentation:
color_jitter: 0.4
random_crop: true
random_flip: true
data:
batch_size: 32
data_path: /data/dataset/imagenet/1K_dataset
image_size: 256
num_workers: 8
evaluation:
metrics:
- rfid
- psnr
- lpips
- ssim
loss:
kl_weight: 1.0e-06
lpips_weight: 0.1
reconstruction_weight: 1.0
model_info:
compression_ratio: 8
description: VMAE with 8x spatial compression and 16-dimensional latent space
latent_channels: 16
optimization: LPIPS-tuned for perceptual quality
training_dataset: ImageNet 256x256
training:
epochs: 400
learning_rate: 0.00015
min_lr: 0.0
warmup_epochs: 40
weight_decay: 0.05
vae:
architecture: mae_for_ldmae_f8d16_prev
model_name: vmae_f8d16
params:
decoder_depth: 8
decoder_embed_dim: 512
decoder_num_heads: 16
depth: 24
embed_dim: 512
img_size: 256
in_channels: 3
kl_loss_weight: true
latent_dim: 16
ldmae_mode: true
mlp_ratio: 4.0
no_cls: true
norm_layer: LayerNorm
num_heads: 16
patch_size: 8
smooth_output: true
weight_path: pretrain_weight/vmaef8d16.pth
Citation
If you use this model, please cite:
@article{ldmae2025,
title={LDMAE: Latent Diffusion with Masked AutoEncoder},
author={Your Name},
year={2025}
}
License
Apache-2.0
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support