Fuxiao Liu's picture

Fuxiao Liu

Fuxiao

·

https://scholar.google.ca/citations?user=e0P54E4AAAAJ&hl=en

https://fuxiaoliu.github.io

AI & ML interests

Multimodal Large Language Model

Recent Activity

authored a paper 14 days ago

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

authored a paper 14 days ago

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

authored a paper 14 days ago

A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges

View all activity

Organizations

authored 6 papers 14 days ago

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Paper • 2504.03624 • Published Apr 4 • 15

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

Paper • 2504.10514 • Published Apr 10 • 48

A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges

Paper • 2501.02189 • Published Jan 4 • 1

Self-Rewarding Vision-Language Model via Reasoning Decomposition

Paper • 2508.19652 • Published Aug 27 • 84

NVIDIA Nemotron Nano V2 VL

Paper • 2511.03929 • Published Nov 6 • 26

First Frame Is the Place to Go for Video Content Customization

Paper • 2511.15700 • Published 19 days ago • 52

authored 4 papers over 1 year ago

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28, 2024 • 87

MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning

Paper • 2311.10774 • Published Nov 15, 2023 • 2

Mosaic IT: Enhancing Instruction Tuning with Data Mosaics

Paper • 2405.13326 • Published May 22, 2024 • 1

Visual News: Benchmark and Challenges in News Image Captioning

Paper • 2010.03743 • Published Oct 8, 2020

authored a paper almost 2 years ago

DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents

Paper • 2306.06306 • Published Jun 9, 2023 • 1

authored a paper about 2 years ago

HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 27

authored a paper over 2 years ago

Aligning Large Multi-Modal Model with Robust Instruction Tuning

Paper • 2306.14565 • Published Jun 26, 2023 • 6