LLM-as-a-judge - a JM-Brun Collection

JM-Brun 's Collections

RL

Diffusion models

Prompt Optimization

Tabular

Agents

SLMs

LLM-KG

LLM Architecture

Interpretability XAI

LLM-as-a-judge

updated Sep 30

Preference Leakage: A Contamination Problem in LLM-as-a-judge

Paper • 2502.01534 • Published Feb 3 • 40
Great Models Think Alike and this Undermines AI Oversight

Paper • 2502.04313 • Published Feb 6 • 33
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives

Paper • 2504.10823 • Published Apr 15 • 15
CLEAR: Error Analysis via LLM-as-a-Judge Made Easy

Paper • 2507.18392 • Published Jul 24 • 19
TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them

Paper • 2509.21117 • Published Sep 25 • 29