video-SALMONN 2
Collection
video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions.
•
7 items
•
Updated
Official model release of video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
Base model
Qwen/Qwen2-7B