2 27 7

Lijun Wu

apeters

https://apeterswu.github.io/

AI & ML interests

None yet

Recent Activity

updated a dataset about 11 hours ago

OpenDataArena/ODA-Mixture-100k

updated a dataset about 11 hours ago

OpenDataArena/ODA-Mixture-500k

updated a dataset about 11 hours ago

OpenDataArena/ODA-Math-460k

View all activity

Organizations

upvoted a collection about 23 hours ago

BioT5

Collection

BioT5 and BioT5+ collections • 18 items • Updated Oct 23, 2025 • 3

upvoted 2 collections 4 days ago

ODA-Mixture

Collection

High-quality mixture datasets for post-training covering multiple domains. • 7 items • Updated 5 days ago • 2

ODA-Math

Collection

High-quality mathematical datasets for post training. • 5 items • Updated 5 days ago • 1

upvoted a paper 5 days ago

Closing the Data Loop: Using OpenDataArena to Engineer Superior Training Datasets

Paper • 2601.09733 • Published 22 days ago • 7

upvoted 2 papers about 1 month ago

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

Paper • 2512.17260 • Published Dec 19, 2025 • 49

OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value

Paper • 2512.14051 • Published Dec 16, 2025 • 44

upvoted a paper about 2 months ago

Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights

Paper • 2512.01816 • Published Dec 1, 2025 • 91

upvoted 4 papers 4 months ago

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 139

upvoted 2 papers 6 months ago

Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

Paper • 2507.17512 • Published Jul 23, 2025 • 36

REST: Stress Testing Large Reasoning Models by Asking Multiple Problems at Once

Paper • 2507.10541 • Published Jul 14, 2025 • 29

upvoted 5 papers 9 months ago

CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges

Paper • 2504.19093 • Published Apr 27, 2025 • 18

A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis

Paper • 2504.12322 • Published Apr 11, 2025 • 28

Heimdall: test-time scaling on the generative verification

Paper • 2504.10337 • Published Apr 14, 2025 • 33

FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

Paper • 2504.09925 • Published Apr 14, 2025 • 38

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 306

upvoted 2 papers 10 months ago

BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning

Paper • 2402.17810 • Published Feb 27, 2024 • 1

GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

Paper • 2504.02782 • Published Apr 3, 2025 • 57

Lijun Wu

AI & ML interests

Recent Activity

Organizations

apeters's activity