Feb 20, 2025 · 9 min
The agent reliability problem
Why the agent demo that wowed everyone falls apart in production: compounding error, brittle tool use, planning that doesn't replan, and the verification loops that actually make autonomy survivable.
#agents#reliability#tool-use
Feb 2, 2025 · 9 min
RAG is becoming retrieval-plus-reasoning
Naive vector RAG was a 2023 pattern. What works now is hybrid retrieval, re-ranking, context engineering, and treating retrieval as a step the model reasons over — not a lookup it trusts.
#rag#retrieval#reranking
Jan 14, 2025 · 8 min
Evals that predict production behavior, not vibes
Why most eval suites pass while production regresses, how LLM-as-judge quietly lies to you, and the harness that actually catches what ships broken.
#evals#llm-as-judge#regression