johannhartmann 's Collections Document & UI Intelligence
updated
8B • Updated
• 39
• 9
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper
• 2412.04454
• Published
• 71
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper
• 2401.10935
• Published
• 5
Text Generation
• 10B • Updated
• 194
• 18
jadechoghari/Ferret-UI-Llama8b
Image-Text-to-Text
• Updated
• 191
• 68
Ferret-UI 2: Mastering Universal User Interface Understanding Across
Platforms
Paper
• 2410.18967
• Published
• 1
Image-Text-to-Text
• Updated
• 451
• 1.71k
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning
and Reflection
Paper
• 2501.04575
• Published
• 25
Updated
• 2.89k
• 273
Image-Text-to-Text
• 0.3B • Updated
• 917
• 99