AI
A Methodology for Evaluating LLMs on Any Task
The Problem "Which LLM is best?" is the wrong question. "Best for what?" is the right one. Traditional evals tell you which model is smartest. This tells you which model is right for your specific task. Every model has a design philosophy - a personality that shapes