Visor - Natural language Anime Tagging

Visor is a natural-language-based image tagging model based on the BLIP model architecture.

Potential Use cases can be to caption anime images for training diffusion models

Downloads last month
15
Safetensors
Model size
0.5B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using shadowlilac/visor 1