In Defense of AI Evals, for Everyone
Trey Causey
The more interesting question, then, is not whether you do evals, but when you can afford to be less rigorous and when you cannot.
Current Japanese Micro-Season
Loading...
Loading...
Trey Causey
The more interesting question, then, is not whether you do evals, but when you can afford to be less rigorous and when you cannot.
It’s OK to be less rigorous when your task(s) are already heavily baked into the foundation model’s post-training (such as with coding).
It’s also ok when you have enough domain expertise and dogfood early and often.
In the end, it’s always better to avoid Twitter and form your own conclusions about things instead of parroting the hot takes of the moment.