A new benchmark called OTelBench reveals that even advanced AI models stumble on straightforward site‑reliability tasks, scoring just 29% on the Opus 4.5 test. The results raise questions about how ready AI really is to handle everyday operations.
https://quesma.com/blog/introducing-otel-bench/