When 4 of 40 Models Beat Coin Flip: Measuring Claims About Anthropic Opus and Claude Upgrades
https://reportz.io/ai/when-40-ai-models-faced-1200-hard-questions-what-the-numbers-actually-show/
Only 4 of 40 Models Beat Coin Flip on Hard Questions About Anthropic Opus Improvements The data suggests a surprising gap between vendor claims and real-world discriminative power on narrowly targeted technical questions