leaderboards - a Praise2112 Collection

Praise2112 's Collections

HRM

pdf-to-markdown

tts

t5

sae

o1-like-cot-data

zeroshot classificaiton

question-answering

SLMs

medical_datasets

transformer variants

instruction_generator

vlms

encoders-embedding-models

text_segmentation

ocr

leaderboards

updated Jan 25, 2025

Configuration error

4.71k

LMArena Leaderboard

🏆

4.71k

View the LMArena model performance leaderboard
Running on CPU Upgrade

13.8k

Open LLM Leaderboard

🏆

13.8k

Track, rank and evaluate open LLMs and chatbots
Running on CPU Upgrade

7.02k

MTEB Leaderboard

🥇

7.02k

Embedding Leaderboard
Running on CPU Upgrade

Featured

1.22k

Open ASR Leaderboard

🏆

1.22k

Explore speech model benchmarks and submit evaluation requests
Running

Featured

582

LLM-Perf Leaderboard

🏆

582

Compare LLM performance to find the best model for your hardware
Running

1.49k

Big Code Models Leaderboard

📈

1.49k

Explore and submit evaluations for code generation models
Runtime error

78

Human & GPT-4 Evaluation of LLMs Leaderboard

👩

78
Running

450

Can Ai Code Results

🏆

450

Can AI Code? An LLM leaderboard inclquantized models.
Runtime error

145

Hallucinations Leaderboard

🔥

145

View and submit LLM evaluations
Build error

105

Enterprise Scenarios Leaderboard

🥇

105
Running on CPU Upgrade

93

LLM Safety Leaderboard

🥇

93

Explore and submit LLM benchmarks
Running

Featured

560

Vision Arena (Testing VLMs side-by-side)

🖼

560

Explore AI vision models by uploading an image
Running

71

CyberSecEvalTest

📈

71

Evaluate LLMs' cybersecurity risks and capabilities
Running

Featured

438

LLM Performance Leaderboard

🐨

438

View LLM performance rankings on an interactive leaderboard
Running on CPU Upgrade

75

AIR-Bench Leaderboard

🥇

75

Explore and compare QA and long doc benchmarks
Running on CPU Upgrade

986

Open VLM Leaderboard

🌎

986

VLMEvalKit Evaluation Results Collection
Running

420

Reward Bench Leaderboard

📐

420

Explore and compare LLM reward benchmark scores
Running

230

BigCodeBench Leaderboard

🥇

230

Explore and analyze code completion benchmarks
Runtime error

10

MJ Bench Leaderboard

🥇

10

Display and filter multimodal model leaderboard results
Running

116

MTEB Arena

⚔

116

Display MTEB Arena interface
Runtime error

Featured

151

Open LLM Progress Tracker

🔬

151

Visualize Open vs. Proprietary LLM Progress
Running

109

Judge Arena

💻

109

Compare and rank AI models through human voting