Finds genuine performance bottlenecks using CPU profiles, flame graphs, memory traces, and system metrics under realistic load before rewriting code. This prevents the common anti-pattern of spending days optimizing code paths that are not in the critical path, based on intuition rather than measurement.
Use cases
- A latency regression where p99 response time doubled after a recent deployment without an obvious cause
- An endpoint that is consuming disproportionate CPU or memory relative to its traffic share
- A garbage collection pause that is causing visible latency spikes in a JVM or Node.js service
- A memory leak suspected based on gradually increasing heap usage over days of operation
- A new feature that works correctly in development but degrades significantly under production load patterns
Key features
- Reproduce the performance issue under a realistic load profile that matches production traffic patterns in volume and diversity
- Capture a CPU or memory profile using a profiling tool appropriate for your runtime—flame graphs for CPU, heap dumps for memory
- Identify the hottest code paths: the functions that consume the most CPU time or allocate the most memory
- Form a hypothesis about the root cause (algorithmic complexity, memory allocation pattern, lock contention) before changing code
- Validate the fix by running the same benchmark or profile with the same load harness and confirming the metric improved
When to Use This Skill
- When a latency regression is suspected but the cause is not visible in application logs
- When adding a new feature that processes large data volumes and you want to establish a performance baseline
- When CI or a performance monitor has surfaced a degradation that needs root-cause diagnosis
Expected Output
A performance profile (flame graph, heap snapshot, or trace) with annotated hotspots and a verified fix that improves the measured metric.
Frequently Asked Questions
- What is the difference between profiling and benchmarking?
- Profiling identifies where time or memory is spent in an existing program. Benchmarking measures the absolute performance of a specific operation or workload. Profile first to find the problem area, then benchmark to measure whether the fix helps.
- How do I profile a production service without disrupting it?
- Use continuous profiling tools (e.g., Pyroscope, Parca) that sample profiles with low overhead in production. For one-off profiling sessions, take a short (30-60 second) sample during a low-traffic window to minimize impact.
- What if the flame graph shows everything is slow with no clear hotspot?
- This often means the bottleneck is in I/O (database queries, network calls, file reads) rather than CPU. Use distributed tracing to identify which downstream calls are adding latency, or profile with I/O wait time visible.
Related
Related
3 Indexed items
Observability baselines
Establishes golden signals (latency, traffic, errors, saturation), SLO windows, and dashboard checks before agents automate deployments so that 'healthy' and 'degraded' have measurable definitions rather than subjective interpretations. This is essential when AI agents are managing deploys because agents need objective metrics to make decisions, not human gut feelings.
SEO audit for web properties
Diagnoses indexing, crawlability, and on-page SEO issues across an entire site using automated crawls, Lighthouse checks, and structured output. An SEO audit surfaces actionable findings ranked by priority before manual review, making it possible to address critical issues quickly rather than discovering them through traffic drops.
Structured logging
Defines a consistent set of log fields—request ID, user ID, feature flag, latency bucket, error code—so production debugging does not rely on grep across inconsistent printf-style strings. Structured JSON or key=value logging enables dashboards, alerts, and log aggregation tools to parse and query logs programmatically rather than through manual text searching.