M

Production · Days 55-61

LLMOps and Observability

Production AI needs traces, token accounting, model and prompt versions, A/B tests, replay tools, drift detection, and feedback loops.

Intermediate 7 subtopics 7 daily blocks

Outcome

Trace, debug, version, replay, and monitor AI features so production behavior is visible instead of magical.

Practice builds

LLM tracing middlewarePrompt replay consoleFeedback-to-eval pipeline

What to learn

Tracing with Langfuse, LangSmith, Helicone, Arize Phoenix, Braintrust
Token, cost, and latency tracking per request and per user
Prompt and agent versioning
A/B testing prompts and models in production
Replay and debugging of failed runs
Drift detection from silent model upgrades
Feedback loops from thumbs up/down into eval datasets

Daily study plan

Day 55: Add request IDs and structured logs around LLM calls.
Day 56: Track token count, cost, latency, model, and user ID per request.
Day 57: Add trace spans for prompt assembly, retrieval, model call, and tools.
Day 58: Version prompts and agent workflows.
Day 59: Replay failed runs from stored inputs and context snapshots.
Day 60: Add feedback capture and convert examples into eval rows.
Day 61: Design a dashboard for cost, latency, and failure hot spots.

Resources