postprocess at model res, defer resize+write to CPU (saves ~35s GPU) f4a7288 Running Nekochu commited on 1 day ago
add blue 1024 ONNX (FP16 on disk, FP32 at runtime), rename models 646f0cd Nekochu commited on 1 day ago
safetensors loading, Phase 0 4x faster (uint8), total time in status 33e616b Nekochu commited on 1 day ago
quality: lower clean_matte threshold 0.25β0.02, always keep largest component 1363975 Nekochu commited on 1 day ago
cleanup: stale comments, dead import, redundant makedirs, fix batch size in UI a2a7a3e Nekochu commited on 1 day ago
simplify: merge write functions, fix missing Processed output, bulk transfer 9d23c67 Nekochu commited on 1 day ago
remove dead code: AOTI export, inductor/triton cache, shared_results, deferred write 2a4471f Nekochu commited on 1 day ago
disable torch.compile on ZeroGPU β net negative for GreenFormer f4a2965 Nekochu commited on 1 day ago
fix: reduce-overhead instead of max-autotune (118sβ~30s), dedicated export endpoint c53eb28 Nekochu commited on 1 day ago
fix README: accurate torch.compile description, no triton/AOTI claim cdef1d9 Nekochu commited on 1 day ago
add ZeroGPU GPU inference (FP16, flash-attn, batch=32@1024/16@2048) 0b6961f Nekochu commited on Mar 25