Walks through agent install, library instrumentation, and service configuration for your language and framework. Outputs setup code and explains what you're measuring.
Best for: Engineers setting up performance monitoring without reading docs for every runtime variation.
Creator's repository · datadog-labs/agent-skills
License: MIT
---
name: dd-apm
description: APM - install, onboard, instrument, enable, set up, configure, traces, services, dependencies, performance analysis. Use for any request involving Datadog APM setup, instrumentation (SSI, ddtrace, agent install), or analysis.
alwaysApply: true
metadata:
version: "1.1.0"
author: datadog-labs
repository: https://github.com/datadog-labs/agent-skills
tags: datadog,apm,tracing,performance,distributed-tracing,dd-apm,install,onboarding,instrumentation,ssi,agent
globs: "**/ddtrace*,**/datadog*.yaml,**/*trace*"
---
# Datadog APM
Distributed tracing, service maps, and performance analysis.
## Routing — Read This First
Match the user's request to one of the entries below. Each entry has the same shape: **triggers** → which sub-skill to load → the anti-pattern to avoid. If a request seems to fit more than one entry, see "Overlap disambiguation". If nothing matches, see "None of the above" at the end.
---
**Kubernetes APM install / instrument / onboard** — trigger when the user mentions Kubernetes, K8s, EKS, GKE, AKS, kind, minikube, K3s, helm, DatadogAgent CR, kubectl, SSI on a cluster, pod injection, or init containers.
**Immediately read** `.claude/skills/dd-apm/k8s-ssi/agent-install/SKILL.md` now, then `.claude/skills/dd-apm/k8s-ssi/enable-ssi/SKILL.md`, then `.claude/skills/dd-apm/k8s-ssi/verify-ssi/SKILL.md` — do not proceed from memory.
> **Common wrong approaches that LOOK like they work but silently fail:**
> - `helm install datadog datadog/datadog` — the standard chart does NOT support SSI via DatadogAgent CR.
> - Adding `ddtrace` imports or `ddtrace-run` to the app — SSI auto-instruments WITHOUT any code changes.
> - `admission.datadoghq.com/enabled` annotations — that's admission controller config injection, not SSI init container injection.
---
**Linux APM install / instrument / onboard** — trigger when the user mentions a single host, VM, EC2 instance, bare-metal, RHEL/Ubuntu/Debian, systemd, or no orchestrator.
**Immediately read** `.claude/skills/dd-apm/linux-ssi/agent-install/SKILL.md` now, then `.claude/skills/dd-apm/linux-ssi/enable-ssi/SKILL.md`, then `.claude/skills/dd-apm/linux-ssi/verify-ssi/SKILL.md` — do not proceed from memory.
> **Do NOT** install the agent via plain `apt-get install datadog-agent` (or yum equivalent) and assume SSI follows — host auto-instrumentation requires the install script with the SSI flags, which the sub-skill walks through.
---
**Service rename / service remapping** — trigger when the user mentions renaming a service, collapsing multiple service names, stripping suffixes/prefixes, or cleaning up inferred services.
**Immediately read** `.claude/skills/dd-apm/service-remapping/SKILL.md` now — do not proceed from memory.
> **Do NOT** change `tags.datadoghq.com/service` labels or `DD_SERVICE` env vars to rename a service in Datadog. That requires a rollout and only affects new data. Use a service remapping rule — it rewrites the name at ingestion time with no deployment change.
---
### Overlap disambiguation
When a request could plausibly fit more than one entry above, use these tiebreakers:
| Hint | Route to |
|---|---|
| Cluster orchestrator mentioned (EKS/GKE/AKS/kind/K3s/minikube) — even if "just one node" | k8s-ssi |
| Single host, VM, or EC2 with no orchestrator | linux-ssi |
| "Several services that should be one" | service-remapping — the sub-skill picks the rule type based on whether the duplicates are real instrumented services or inferred entities (DBs, queues, external APIs) |
| "My service shows under the wrong name" | First check `DD_SERVICE` on the deploy. If correct and the name is still wrong → service-remapping. |
| "Reduce APM volume / cost / noise" | No sub-skill yet. Ask whether the user means sampling (fewer ingested traces) or retention filters (less indexed data) before suggesting commands. |
---
### None of the above
If the request doesn't match any entry above, continue reading the trace-search, service analysis, and metrics content below. If even that doesn't fit, **ask the user to clarify** — do not invent a workflow.
---
## Requirements
Datadog Labs Pup should be installed. See [Setup Pup](https://github.com/datadog-labs/agent-skills/tree/main?tab=readme-ov-file#setup-pup) if not.
## Command Execution Order (Token-Efficient)
For scoped commands, use this order:
1. Check context first (prior outputs, conversation, saved values).
2. If a required value is missing, run a discovery command first.
3. If still ambiguous, ask the user to confirm.
4. Then run the target command.
5. Avoid speculative commands likely to fail.
## Quick Start
```bash
pup auth login
# Confirm env tag with the user first (do not assume production/prod/prd).
pup apm services list --env <env> --from 1h --to now
pup traces search --query "service:api-gateway" --from 1h
```
## Services
### List Services
```bash
pup apm services list --env <env> --from 1h --to now
pup apm services stats --env <env> --from 1h --to now
```
### Service Stats
```bash
pup apm services stats --env <env> --from 1h --to now
```
### Service Map
```bash
# View dependencies
pup apm flow-map --query "service:api-gateway&from=$(($(date +%s)-3600))000&to=$(date +%s)000" --env <env> --limit 10
```
## Traces
### Search Traces
```bash
# By service
pup traces search --query "service:api-gateway" --from 1h
# Errors only
pup traces search --query "service:api-gateway status:error" --from 1h
# Slow traces (>1s)
pup traces search --query "service:api-gateway @duration:>1000ms" --from 1h
# With specific tag
pup traces search --query "service:api-gateway @http.url:/api/users" --from 1h
```
### Trace Detail
```bash
# No direct get command for a single trace ID.
# Use traces search with a narrow query and time window.
pup traces search --query "trace_id:<trace_id>" --from 1h
```
## Key Metrics
| Metric | What It Measures |
|--------|------------------|
| `trace.http.request.hits` | Request count |
| `trace.http.request.duration` | Latency |
| `trace.http.request.errors` | Error count |
| `trace.http.request.apdex` | User satisfaction |
## Service Level Objectives
Link APM to SLOs:
```bash
pup slos create --file slo.json
```
## Common Queries
| Goal | Query |
|------|-------|
| Slowest endpoints | `avg:trace.http.request.duration{*} by {resource_name}` |
| Error rate | `sum:trace.http.request.errors{*} / sum:trace.http.request.hits{*}` |
| Throughput | `sum:trace.http.request.hits{*}.as_rate()` |
## Troubleshooting
| Problem | Fix |
|---------|-----|
| No traces | Check ddtrace installed, DD_TRACE_ENABLED=true |
| Missing service | Verify DD_SERVICE env var |
| Traces not linked | Check trace headers propagated |
| High cardinality | Don't tag with user_id/request_id |
## References/Docs
- [APM Setup](https://docs.datadoghq.com/tracing/)
- [Trace Search](https://docs.datadoghq.com/tracing/trace_explorer/)