L

Skill Entry

LangSmith production trace investigation playbook

Turns LangSmith observability documentation into a repeatable incident workflow for LLM and agent outages: start from a failing run ID or thread, use the UI or LangSmith MCP tools (`fetch_runs`, `get_thread_history`) to reconstruct prompts, tool calls, and errors, then narrow scope with documented filters (run_type, is_root, FQL `filter` / `trace_filter` / `tree_filter`) before proposing code or prompt changes. The playbook cites official pagination rules (character-budget pages with `page_number` and `total_pages`) so investigators do not assume single-shot dumps, and it reminds teams to separate Cloud OAuth Remote MCP paths from self-hosted `LANGSMITH_ENDPOINT` configurations when collecting evidence.

Category Debugging
Platform LangSmith / LangChain
Published 2026-05-19
langsmithtracingdebugging

Use cases

  • On-call receives a spike in 5xx or empty completions from a RAG route backed by LangChain
  • Product reports a single customer thread where the assistant contradicted policy mid-conversation
  • Release candidate shows higher p95 latency after a prompt swap without obvious infra regression
  • Security review asks for evidence that tool calls stayed within approved scopes during an agent run
  • Finance questions trace billing usage after a marketing campaign drives traffic

Key features

  • Capture identifiers: project name, run UUID, thread ID, deployment version, and approximate timestamp window from the ticket.
  • Pull the root run with `fetch_runs` (set `is_root`, pass `limit`, and use documented FQL filters) or open the equivalent trace in the LangSmith UI.
  • If payloads truncate, page with `get_thread_history` or `fetch_runs` + `trace_id`, incrementing `page_number` until `total_pages` is exhausted.
  • Map the failure to a layer: retrieval miss, tool schema mismatch, model refusal, rate limit, or downstream HTTP error—cite child spans rather than guessing.
  • Compare against the last known-good prompt revision via `get_prompt_by_name` or prompt hub history before editing production templates.
  • Record mitigations, owners, and whether a dataset example or online eval should guard the regression per LangSmith evaluation docs.

When to Use This Skill

  • Production LLM regressions where unstructured application logs lack prompt/tool detail
  • Handoffs between support and engineering that already standardize on LangSmith projects
  • Postmortems requiring trace-backed timelines instead of chat screenshots

Expected Output

An incident note with run links, paginated evidence excerpts, hypothesized root cause tied to specific spans, and a verification plan (eval dataset or canary) before closing.

Frequently Asked Questions

Do we need the MCP server for every investigation?
No—the UI suffices for many cases; MCP accelerates agent-assisted triage when Cursor or Claude Code already has LangSmith credentials configured per docs.
Why emphasize character pagination?
LangSmith MCP docs cap page payloads by character budget (~25k default) to keep assistants within context limits—skipping pages loses tool outputs.
Does this replace OWASP or threat modeling skills?
No—it focuses on observability forensics; pair with security skills when investigating prompt injection or data exfiltration hypotheses.

Related

Related

3 Indexed items