DISCO: Document Intelligence Suite for COmparative Evaluation
Abstract
DISCO evaluates OCR pipelines and vision-language models for document intelligence tasks, revealing varying performance across document types and highlighting the importance of task-aware approach selection.
Document intelligence requires accurate text extraction and reliable reasoning over document content. We introduce DISCO, a Document Intelligence Suite for COmparative Evaluation, that evaluates optical character recognition (OCR) pipelines and vision-language models (VLMs) separately on parsing and question answering across diverse document types, including handwritten text, multilingual scripts, medical forms, infographics, and multi-page documents. Our evaluation shows that performance varies substantially across tasks and document characteristics, underscoring the need for complexity-aware approach selection. OCR pipelines are generally more reliable for handwriting and for long or multi-page documents, where explicit text grounding supports text-heavy reasoning, while VLMs perform better on multilingual text and visually rich layouts. Task-aware prompting yields mixed effects, improving performance on some document types while degrading it on others. These findings provide empirical guidance for selecting document processing strategies based on document structure and reasoning demands.
Get this paper in your agent:
hf papers read 2603.23511 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper