Production · Days 55-61

LLMOps and Observability

Production AI needs traces, token accounting, model and prompt versions, A/B tests, replay tools, drift detection, and feedback loops.

Intermediate 7 subtopics 7 daily blocks

Outcome

Trace, debug, version, replay, and monitor AI features so production behavior is visible instead of magical.

Practice builds

LLM tracing middlewarePrompt replay consoleFeedback-to-eval pipeline

What to learn

Tracing with Langfuse, LangSmith, Helicone, Arize Phoenix, Braintrust

Token, cost, and latency tracking per request and per user

Prompt and agent versioning

A/B testing prompts and models in production

Replay and debugging of failed runs

Drift detection from silent model upgrades

Feedback loops from thumbs up/down into eval datasets

Day 55: Add request IDs and structured logs around LLM calls.

Day 56: Track token count, cost, latency, model, and user ID per request.

Day 57: Add trace spans for prompt assembly, retrieval, model call, and tools.

Day 58: Version prompts and agent workflows.

Day 59: Replay failed runs from stored inputs and context snapshots.

Day 60: Add feedback capture and convert examples into eval rows.

Day 61: Design a dashboard for cost, latency, and failure hot spots.

Docs

Open resource →

Tool

Open resource →

Docs

Open resource →