Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Published in arXiv, 2024
A comprehensive study of the LLM-as-a-judge paradigm in a controlled setup that reveals new results about its strengths and weaknesses.
Published in arXiv, 2024
A comprehensive study of the LLM-as-a-judge paradigm in a controlled setup that reveals new results about its strengths and weaknesses.
Published in Reinforcement Learning Conference, 2024
We propose a novel benchmark MDP for sepsis treatment in the ICU built using medical data from real patients.