articles

Articles

Open writing on applied generative AI: evals, retrieval, agents, and where the field is heading.

Feb 20, 2025 · 9 min

The agent reliability problem

Why the agent demo that wowed everyone falls apart in production: compounding error, brittle tool use, planning that doesn't replan, and the verification loops that actually make autonomy survivable.

#agents#reliability#tool-use

Feb 2, 2025 · 9 min

RAG is becoming retrieval-plus-reasoning

Naive vector RAG was a 2023 pattern. What works now is hybrid retrieval, re-ranking, context engineering, and treating retrieval as a step the model reasons over — not a lookup it trusts.

#rag#retrieval#reranking

Jan 14, 2025 · 8 min

Evals that predict production behavior, not vibes

Why most eval suites pass while production regresses, how LLM-as-judge quietly lies to you, and the harness that actually catches what ships broken.

#evals#llm-as-judge#regression