observing-agentforce

Analyze production Agentforce agent behavior using session traces and Data Cloud. TRIGGER when: user queries STDM session data or Data Cloud trace records; investigates production agent failures, regressions, or performance issues; asks about session traces, conversation logs, or agent metrics; wants to reproduce a reported production issue in preview; runs findSessions or trace analysis queries. DO NOT TRIGGER when: user creates, modifies, or debugs .agent files during development (use developing-agentforce); writes or runs test specs (use testing-agentforce); uses sf agent preview for local development iteration; deploys or publishes agents.

Skill file

Preview skill file
---
name: observing-agentforce
description: "Analyze production Agentforce agent behavior using session traces and Data Cloud. TRIGGER when: user queries STDM session data or Data Cloud trace records; investigates production agent failures, regressions, or performance issues; asks about session traces, conversation logs, or agent metrics; wants to reproduce a reported production issue in preview; runs findSessions or trace analysis queries. DO NOT TRIGGER when: user creates, modifies, or debugs .agent files during development (use developing-agentforce); writes or runs test specs (use testing-agentforce); uses sf agent preview for local development iteration; deploys or publishes agents."
allowed-tools: Bash Read Write Edit Glob Grep
license: Apache-2.0
metadata:
  version: "1.0"
  last_updated: "2026-04-08"
  argument-hint: "<org-alias> [--agent-file <path>] [--session-id <id>] [--days <n>]"
  compatibility: claude-code
---


# Agentforce Observability

Improve Agentforce agents using session trace data and live preview testing.

**Three-phase workflow:**
- **Observe** -- Query STDM sessions from Data Cloud (if available), OR run test suites + preview with local traces as fallback
- **Reproduce** -- Use `sf agent preview` to simulate problematic conversations live
- **Improve** -- Edit the `.agent` file directly, validate, publish, verify

---

## Platform Notes

- Shell examples below use bash syntax. On Windows, use PowerShell equivalents or Git Bash.
- Replace `python3` with `python` on Windows.
- Replace `/tmp/` with `$env:TEMP\` (PowerShell) or `%TEMP%\` (cmd).
- Replace `jq` with `python -c "import json,sys; ..."` if jq is not installed.

---

## Routing

Gather these inputs before starting:

- **Org alias** (required)
- **Agent API name** (required for preview and deploy; ask if not provided)
- **Agent file path** (optional) -- path to the `.agent` file, typically `force-app/main/default/aiAuthoringBundles/<AgentName>/<AgentName>.agent`. Auto-detect if not provided.
- **Session IDs** (optional) -- analyze specific sessions; if absent, query last 7 days
- **Days to look back** (optional, default 7)

Determine intent from user input:

- **No specific action** -> run all three phases: Observe -> surface issues -> ask if user wants to Reproduce and/or Improve
- **"analyze" / "sessions" / "what's wrong"** -> Phase 1 only, then suggest next steps
- **"reproduce" / "test" / "preview"** -> Phase 2 (run Phase 1 first if no issues in hand)
- **"fix" / "improve" / "update"** -> Phase 3 (run Phase 1 first if no issues in hand)

### Resolve agent name

Before any STDM query, resolve the user-provided agent name against the org to get the exact `MasterLabel` and `DeveloperName`:

```bash
sf data query --json \
  --query "SELECT Id, MasterLabel, DeveloperName FROM GenAiPlannerDefinition WHERE MasterLabel LIKE '%<user-provided-name>%' OR DeveloperName LIKE '%<user-provided-name>%'" \
  -o <org>
```

- `MasterLabel` = display name used by STDM `findSessions` and Agent Builder UI (e.g. "Order Service")
- `DeveloperName` = API name with version suffix used in metadata (e.g. "OrderService_v9")
- The `--api-name` flag for `sf agent preview/activate/publish` uses `DeveloperName` **without** the `_vN` suffix (e.g. "OrderService")

Store these values:
- `AGENT_MASTER_LABEL` -- for `findSessions()` agent filter
- `AGENT_API_NAME` -- `DeveloperName` without `_vN` suffix, for `sf agent` CLI commands
- `PLANNER_ID` -- the Salesforce record ID for this agent

### Locate the .agent file

**Step 1 -- Search locally:**

```bash
find <project-root>/force-app/main/default/aiAuthoringBundles -name "*.agent" 2>/dev/null
```

If the user provided an agent file path, use that directly. Otherwise, search for files matching `AGENT_API_NAME`.

**Step 2 -- If not found locally, retrieve from the org:**

```bash
sf project retrieve start --json --metadata "AiAuthoringBundle:<AGENT_API_NAME>" -o <org>
```

> **Known bug:** `sf project retrieve start` creates a double-nested path: `force-app/main/default/main/default/aiAuthoringBundles/...`. Fix it immediately after retrieve:

```bash
if [ -d "force-app/main/default/main/default/aiAuthoringBundles" ]; then
  mkdir -p force-app/main/default/aiAuthoringBundles
  cp -r force-app/main/default/main/default/aiAuthoringBundles/* \
    force-app/main/default/aiAuthoringBundles/
  rm -rf force-app/main/default/main
fi
```

**Step 3 -- Validate the retrieved file:**

Read the `.agent` file and verify it has proper Agent Script structure:
- `system:` block with `instructions:`
- `config:` block with `developer_name:`
- `start_agent` or `subagent` blocks with `reasoning: instructions:`
- Each subagent should have distinct `instructions:` content (not identical across subagents)

Store the resolved path as `AGENT_FILE` for Phase 3.

---

## Phase 0: Discover Data Space

Before running any STDM query, determine the correct Data Cloud Data Space API name.

```bash
sf api request rest "/services/data/v63.0/ssot/data-spaces" -o <org>
```

Note: `sf api request rest` is a beta command -- do not add `--json` (that flag is unsupported and causes an error).

The response shape is:
```json
{
  "dataSpaces": [
    {
      "id": "0vhKh000000g3DjIAI",
      "label": "default",
      "name": "default",
      "status": "Active",
      "description": "Your org's default data space."
    }
  ],
  "totalSize": 1
}
```

The `name` field is the API name to pass to `AgentforceOptimizeService`.

**Decision logic:**
- If the command fails (e.g. 404 or permission error), fall back to `'default'` and note it as an assumption.
- Filter to only `status: "Active"` entries.
- If exactly one active Data Space exists, use it automatically and confirm to the user: "Using Data Space: `<name>`".
- If multiple active Data Spaces exist, show the list (label + name) and ask the user which to use.

Store the selected `name` value as `DATA_SPACE` for all subsequent steps.

### Prerequisite check: STDM DMOs

After deploying the helper class (step 1.0), run a quick probe to verify the STDM Data Model Objects exist in Data Cloud:

```bash
sf apex run -o <org> -f /dev/stdin << 'APEX'
ConnectApi.CdpQueryInput qi = new ConnectApi.CdpQueryInput();
qi.sql = 'SELECT ssot__Id__c FROM "ssot__AiAgentSession__dlm" LIMIT 1';
try {
    ConnectApi.CdpQueryOutputV2 out = ConnectApi.CdpQuery.queryAnsiSqlV2(qi, '<DATA_SPACE>');
    System.debug('STDM_CHECK:OK rows=' + (out.data != null ? out.data.size() : 0));
} catch (Exception e) {
    System.debug('STDM_CHECK:FAIL ' + e.getMessage());
}
APEX
```

**If `STDM_CHECK:FAIL`:** STDM is not activated. Inform the user and switch to **Phase 1-ALT**:

> STDM (Session Trace Data Model) is not available in this org. To enable: Setup -> Data Cloud -> Data Streams and verify "Agentforce Activity" is active. **Proceeding with fallback: test suites + local traces.**

**If `STDM_CHECK:OK`**, proceed to Phase 1 (STDM path).

---

## Phase 1-ALT: Observe Without STDM (Fallback Path)

When STDM is not available, use test suites and `sf agent preview --authoring-bundle` with local trace analysis.

| Data source | When to use | Pros | Cons |
|---|---|---|---|
| STDM (Phase 1) | Historical production analysis | Real user data, volume | Requires Data Cloud, 15-min lag |
| Test suites + local traces (Phase 1-ALT) | Dev iteration, orgs without STDM | Instant, full LLM prompt, variable state | Preview only, no real user data |

### 1-ALT.1 Run existing test suite (if available)

```bash
sf agent test list --json -o <org>
sf agent test run --json --api-name <TestSuiteName> --wait 10 --result-format json -o <org> | tee /tmp/test_run.json
JOB_ID=$(python3 -c "import json; print(json.load(open('/tmp/test_run.json'))['result']['runId'])")
sf agent test results --json --job-id "$JOB_ID" --result-format json -o <org>
```

### 1-ALT.2 Derive test utterances from .agent file (if no test suite)

If no test suite exists, derive utterances: one per non-entry subagent (from `description:` keywords), one per key action, one guardrail test, one multi-turn test.

### 1-ALT.3 Preview with `--authoring-bundle` (local traces)

Run each test utterance through preview to generate local trace files:

```bash
sf agent preview start --json --authoring-bundle <BundleName> -o <org> | tee /tmp/preview_start.json
SESSION_ID=$(python3 -c "import json; print(json.load(open('/tmp/preview_start.json'))['result']['sessionId'])")

sf agent preview send --json --session-id "$SESSION_ID" --authoring-bundle <BundleName> \
  --utterance "$UTT" -o <org> | tee /tmp/preview_response.json

sf agent preview end --json --session-id "$SESSION_ID" --authoring-bundle <BundleName> -o <org>
```

**Trace file location:** `.sfdx/agents/{BundleName}/sessions/{sessionId}/traces/{planId}.json`

### 1-ALT.4 Local trace diagnosis

| Issue type | Trace command |
|---|---|
| Subagent misroute | `jq -r '.plan[] \| select(.type=="NodeEntryStateStep") \| .data.agent_name' "$TRACE"` |
| Action not called | `jq -r '.plan[] \| select(.type=="EnabledToolsStep") \| .data.enabled_tools[]' "$TRACE"` |
| LOW adherence | `jq -r '.plan[] \| select(.type=="ReasoningStep") \| {category, reason}' "$TRACE"` |
| Variable capture fail | `jq -r '.plan[] \| select(.type=="VariableUpdateStep") \| .data.variable_updates[]' "$TRACE"` |
| Vague instructions | `jq -r '.plan[] \| select(.type=="LLMStep") \| .data.messages_sent[0].content' "$TRACE"` |

**DefaultTopic trace quirk:** With `--authoring-bundle`, the root `.topic` field often shows `"DefaultTopic"` even when routing works. Always use `NodeEntryStateStep.data.agent_name` for the real subagent chain.

**Entry answering directly (SMALL_TALK pattern):** If `start_agent` trace shows `SMALL_TALK` grounding and transition tools visible but none invoked, add "You are a router only. Do NOT answer questions directly." to `start_agent` instructions.

### 1-ALT.5 Classify and present

Classify issues using the categories in `references/issue-classification.md`. After presenting findings, automatically proceed to agent config evidence analysis.

---

## Phase 1: Observe -- Query STDM

> Full STDM query details, Apex service deployment, and response parsing: see `references/stdm-queries.md`

### 1.0 Deploy helper class (once per org)

Deploy `AgentforceOptimizeService` Apex class to the org. Check if already deployed first:

```bash
sf data query --json --query "SELECT Id, Name FROM ApexClass WHERE Name = 'AgentforceOptimizeService'" -o <org>
```

If not deployed, copy from skill directory and deploy. See `references/stdm-queries.md` for full steps.

### 1.1 Find sessions

Query recent sessions using `findSessions()`. Parse `DEBUG|STDM_RESULT:` from the Apex debug log. If `findSessions` returns empty, switch to Phase 1-ALT.

### 1.2 Get conversation details

Use `getMultipleConversationDetails()` for up to 5 sessions (most recent first). Returns turn-by-turn data with messages, steps, topics, and action results.

### 1.2b Get LLM prompt/response (optional)

When LOW adherence detected, use `getLlmStepDetails()` to get the actual LLM prompt and response.

### 1.2c Get aggregated metrics (recommended first step)

Use `getAggregatedMetrics()` for high-level health dashboard: session rates, top intents, quality distribution, RAG averages.

### 1.2d Get moment insights (per-session detail)

Use `getMomentInsights()` for intent summaries, quality scores (1-5), and retriever metrics per session.

### 1.2e Run observability queries (RAG deep-dive)

Use `runObservabilityQuery()` for targeted RAG analysis: KnowledgeGap, Hallucination, RetrievalQuality, AnswerRelevancy, Leaderboard.

### 1.3 Reconstruct conversations

Render turn-by-turn timeline from `ConversationData` JSON for each session.

### 1.4 Identify issues

> Full issue pattern table and classification categories: see `references/issue-classification.md`

Check each session for: action errors, subagent misroutes, missing actions, wrong inputs, variable capture failures, no transitions, slow actions, LOW adherence, abandoned sessions, dead subagents, publish drift, dead hub anti-pattern, entry answering directly, and safety issues.

Priority: P1 = action errors, misroutes, LOW adherence; P2 = missing actions, variable bugs, knowledge gaps; P3 = performance, abandoned sessions.

### 1.5 Present findings and agent config evidence

Present sessions analyzed, issues grouped by root cause category, and uplift estimate. Then automatically proceed to analyze the `.agent` file to confirm root causes.

> Full structural analysis checks, cross-reference procedures, and publish drift detection: see `references/issue-classification.md`

Retrieve the `.agent` file from the org, run automated checks (subagent count vs action blocks, dead hub detection, orphan actions, cross-subagent variable dependencies), and cross-reference STDM symptoms against the file structure.

---

## Phase 2: Reproduce -- Live Preview

> Full preview procedures, trace diagnosis commands, and classification criteria: see `references/reproduce-reference.md`

Build one test scenario per confirmed issue from Phase 1. Run each through `sf agent preview` with `--authoring-bundle` (generates local traces). Run each scenario **3 times** and classify:

| Verdict | Criteria |
|---|---|
| `[CONFIRMED]` | Same failure in 3/3 runs |
| `[INTERMITTENT]` | Failure in 1-2 of 3 runs |
| `[NOT REPRODUCED]` | Passes in 3/3 runs |

Only `[CONFIRMED]` and `[INTERMITTENT]` issues proceed to Phase 3.

**Key commands:**

```bash
sf agent preview start --json --authoring-bundle <Name> -o <org>
sf agent preview send --json --session-id "$SID" --utterance "<text>" --authoring-bundle <Name> -o <org>
sf agent preview end --json --session-id "$SID" --authoring-bundle <Name> -o <org>
```

**Trace location:** `.sfdx/agents/{Name}/sessions/{sessionId}/traces/{planId}.json`

---

## Phase 3: Improve -- Edit .agent File Directly

> Full procedures for pre-flight checks, fix mapping, instruction principles, regression prevention, deployment chain, verification, safety re-verification, and test case creation: see `references/improve-reference.md`

### 3.0 Pre-flight

Verify all action targets exist and are registered in the org before editing. If targets are missing, present options: deploy stubs, remove actions, register via UI, or proceed with routing-only fixes.

### 3.1-3.3 Map issue, edit, and follow instruction principles

Map each confirmed issue to a fix location in the `.agent` file (description, instructions, actions, bindings, transitions). Use the Edit tool for targeted changes. Follow instruction principles: name actions explicitly, state pre-conditions, scope tightly, keep persona in `system:` only.

### 3.4 Regression prevention

Establish baseline before editing. Make minimal edits. Test immediately after each edit. One fix per publish cycle. Check cross-subagent dependencies. Test adjacent subagents.

### 3.5 Apply fixes

Read the `.agent` file, edit with the Edit tool (tabs for indentation), show the diff.

### 3.6 Validate, deploy, publish, activate

```bash
# Validate (dry run)
sf agent validate authoring-bundle --json --api-name <AGENT_API_NAME> -o <org>

# Publish (compile + deploy + activate)
sf agent publish authoring-bundle --json --api-name <AGENT_API_NAME> -o <org>
```

If publish fails, use deploy + activate fallback (note: incomplete -- does not propagate `reasoning: actions:` to live metadata).

### 3.7 Verify

Run Phase 2 scenarios post-fix. Check trace for correct routing, grounding, tools, and variables. After 24-48 hours, re-run Phase 1 to compare against baseline.

### 3.7b Safety re-verification (required)

Re-run safety review (`Section 15 of /developing-agentforce`) on the modified `.agent` file. Revert any changes that introduce BLOCK findings.

### 3.8 Update Testing Center test cases

Create regression test cases from confirmed issues in Testing Center YAML format. Deploy with `sf agent test create` and verify all previously-broken scenarios pass.

---

## Reference Files

| Reference | Contents |
|---|---|
| `references/stdm-queries.md` | STDM query procedures, Apex service deployment, response parsing |
| `references/issue-classification.md` | Issue pattern table, root cause categories, structural analysis checks |
| `references/reproduce-reference.md` | Phase 2 preview procedures, trace diagnosis, classification criteria |
| `references/improve-reference.md` | Phase 3 editing, deployment chain, verification, safety, test cases |
| `references/stdm-schema.md` | DMO field schemas, data hierarchy, quality notes, agent name resolution |

Source

Creator's repository · forcedotcom/sf-skills

View on GitHub

License: Apache-2.0

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk