Evals that predict production, not vibes
4:12A free overview of why most LLM eval suites pass while production regresses — and the offline/online split that actually catches it.
RAG is becoming retrieval-plus-reasoning
6:48Why naive vector RAG underperforms, and the hybrid-search + re-ranking stack that fixes most of it.