dt-obs-frontends

Real User Monitoring (RUM), Web Vitals, user sessions, mobile crashes, page performance, user interactions, and frontend errors. Query web and mobile frontend telemetry.

Skill file

Preview skill file
---
name: dt-obs-frontends
description: Real User Monitoring (RUM), Web Vitals, user sessions, mobile crashes, page performance, user interactions, and frontend errors. Query web and mobile frontend telemetry.
license: Apache-2.0
---

# Frontend Observability Skill

Monitor web and mobile frontends using Real User Monitoring (RUM) with DQL queries.
This skill targets the new RUM experience only; do not use classic RUM data.

## Overview

This skill helps you:
- Monitor Core Web Vitals and frontend performance
- Track user sessions, engagement, and behavior
- Analyze errors and correlate with backend traces
- Optimize mobile app startup and stability
- Diagnose performance issues with detailed timing analysis

**Data Sources:**
- **Metrics**: `timeseries` with `dt.frontend.*` (trends, alerting)
- **Events**: `fetch user.events` (individual page views, requests, clicks, errors)
- **Sessions**: `fetch user.sessions` (session-level aggregates: duration, bounce, counts)


## Quick Reference

### Common Metrics
- `dt.frontend.user_action.count` - User action volume
- `dt.frontend.user_action.duration` - User action duration
- `dt.frontend.request.count` - Request volume
- `dt.frontend.request.duration` - Request latency (ms)
- `dt.frontend.error.count` - Error counts
- `dt.frontend.session.active.estimated_count` - Active sessions
- `dt.frontend.user.active.estimated_count` - Unique users
- `dt.frontend.web.page.cumulative_layout_shift` - CLS metric
- `dt.frontend.web.navigation.dom_interactive` - DOM interactive time
- `dt.frontend.web.page.first_input_delay` - FID metric (legacy; prefer INP)
- `dt.frontend.web.page.largest_contentful_paint` - LCP metric
- `dt.frontend.web.page.interaction_to_next_paint` - INP metric
- `dt.frontend.web.navigation.load_event_end` - Load event end
- `dt.frontend.web.navigation.time_to_first_byte` - Time to first byte

### Common Filters
- `frontend.name` - Filter by frontend name (e.g. `my-frontend`)
- `dt.rum.user_type` - Exclude synthetic monitoring
- `geo.country.iso_code` - Geographic filtering
- `device.type` - Mobile, desktop, tablet
- `browser.name` - Browser filtering

### Common Timeseries Dimensions
Use these for `dt.frontend.*` timeseries splits and breakdowns:
- `frontend.name` - Frontend name
- `geo.country.iso_code`
- `device.type`
- `browser.name`
- `os.name`
- `user_type` - `real_user`, `synthetic`, `robot`

```dql
fetch user.events, from: now() - 2h
| filter characteristics.has_page_summary == true
| summarize page_views = count(), by: {frontend.name}
| sort page_views desc
```

### Event Characteristics
- `characteristics.has_page_summary` - Page views (web)
- `characteristics.has_view_summary` - Views (mobile)
- `characteristics.has_navigation` - Navigation events
- `characteristics.has_user_interaction` - Clicks, forms, etc.
- `characteristics.has_request` - Network request events
- `characteristics.has_error` - Error events
- `characteristics.has_crash` - Mobile crashes
- `characteristics.has_long_task` - Long JavaScript tasks
- `characteristics.has_csp_violation` - CSP violations

Full event model: https://docs.dynatrace.com/docs/semantic-dictionary/model/rum/user-events

### Session Data (`user.sessions`)

`user.sessions` contains session-level aggregates produced by the session aggregation service from `user.events`. **Field names differ from `user.events`** — sessions use underscores where events use dots.

**Session identity and context:**
- `dt.rum.session.id` — Session ID (NOT `dt.rum.session_id`)
- `dt.rum.instance.id` — Instance ID
- `frontend.name` - array of frontends involved in session
- `dt.rum.application.type` — `web` or `mobile`
- `dt.rum.user_type` — `real_user`, `synthetic`, or `robot`

**Session aggregates (underscore naming — NOT dot):**

| Field | Description | ⚠️ NOT this |
|-------|-------------|-------------|
| `navigation_count` | Number of navigations | ~~`navigation.count`~~ |
| `user_interaction_count` | Clicks, form submissions | ~~`user_interaction.count`~~ |
| `user_action_count` | User actions | ~~`user_action.count`~~ |
| `request_count` | XHR/fetch requests | ~~`request.count`~~ |
| `event_count` | Total events in session | ~~`event.count`~~ |
| `page_summary_count` | Page views (web) | ~~`page_summary.count`~~ |
| `view_summary_count` | Views (mobile/SPA) | ~~`view_summary.count`~~ |

**Error fields (dot naming — same as events):**
- `error.count`, `error.exception_count`, `error.http_4xx_count`, `error.http_5xx_count`
- `error.anr_count`, `error.csp_violation_count`, `error.has_crash`

**Session lifecycle:**
- `start_time`, `end_time`, `duration` (nanoseconds)
- `end_reason` — `timeout`, `synthetic_execution_finished`, etc.
- `characteristics.is_bounce` — Boolean bounce flag
- `characteristics.has_replay` — Session replay available

**User identity:**
- `dt.rum.user_tag` — User identifier (typically email, username or customerId), set via `dtrum.identifyUser()` API call in the instrumented frontend. **Not always populated** — only present when the frontend explicitly calls `identifyUser()`.
- When `dt.rum.user_tag` is empty, `dt.rum.instance.id` is often the only user differentiator. The value is a random ID assigned by the RUM agent on the client side, so it is not personally identifiable but can be used to distinguish unique users when `user_tag` is not set. On web this is based on a persistent cookie, so it can be deleted by the user.
- The user tag is a **session-level field** — query it from `user.sessions`, not `user.events` (where it may be empty even if the session has one).

**Client/device context:**
- `browser.name`, `browser.version`, `device.type`, `os.name`
- `geo.country.iso_code`, `client.ip`, `client.isp`

**Synthetic-only fields:**
- `dt.entity.synthetic_test`, `dt.entity.synthetic_location`, `dt.entity.synthetic_test_step`

**Time window behavior:**
- `fetch user.sessions, from: X, to: Y` only returns sessions that **started** in `[X, Y]` — NOT sessions that were merely active during that window.
- Sessions can last 8h+ (the aggregation service waits 30+ minutes of inactivity before closing a session).
- To find all sessions active during a time window, extend the lookback by at least 8 hours: e.g., to cover events from the last 24h, query `fetch user.sessions, from: now() - 32h`.
- This matters for correlation queries (e.g., matching `user.events` to `user.sessions` by session ID) — a narrow `user.sessions` window will miss long-running sessions and produce false "orphans."

**Session creation delay:**
- The session aggregation service waits for ~30+ minutes of inactivity before closing a session and writing the `user.sessions` record.
- This means **recent events (last ~1 hour) will not yet have a matching `user.sessions` entry** — this is normal, not a data gap.
- When correlating `user.events` with `user.sessions`, exclude recent data (e.g., use `to: now() - 1h`) to avoid counting in-progress sessions as orphans.

**Zombie sessions (events without a `user.sessions` record):**
- Not every `dt.rum.session.id` in `user.events` will have a corresponding `user.sessions` record. The session aggregation service intentionally skips **zombie sessions** — sessions with no real user activity (zero navigations and zero user interactions).
- Zombie sessions contain only background, machine-driven activity (e.g., automatic XHR requests, heartbeats) with no page views or clicks. Serializing them would add no value to users.
- When correlating `user.events` with `user.sessions`, expect a large number of unmatched session IDs. This is **by design**, not a data gap. Filter to sessions with activity before diagnosing orphans:
  ```dql
  fetch user.events, from: now() - 2h, to: now() - 1h
  | filter isNotNull(dt.rum.session.id)
  | summarize navs = countIf(characteristics.has_navigation == true),
      interactions = countIf(characteristics.has_user_interaction == true),
      by: {dt.rum.session.id}
  | filter navs > 0 or interactions > 0
  ```

**Example — bounce rate and session quality:**
```dql
fetch user.sessions, from: now() - 24h
| filter dt.rum.user_type == "real_user"
| summarize
    total_sessions = count(),
    bounces = countIf(characteristics.is_bounce == true),
    zero_activity = countIf(toLong(navigation_count) == 0 and toLong(user_interaction_count) == 0),
    avg_duration_s = avg(toLong(duration)) / 1000000000
| fieldsAdd bounce_rate_pct = round((bounces * 100.0) / total_sessions, decimals: 1)
```




### Performance Thresholds
- **LCP**: Good <2.5s | Poor >4.0s
- **INP**: Good <200ms | Poor >500ms
- **CLS**: Good <0.1 | Poor >0.25
- **Cold Start**: Good <3s | Poor >5s
- **Long Tasks**: >50ms problematic, >250ms severe

## Core Workflows

### 1. Web Performance Monitoring
Track Core Web Vitals, page performance, and request latency for SEO and UX optimization.

**Primary Files:**
- `references/WebVitals.md` - Core Web Vitals (LCP, INP, CLS)
- `references/performance-analysis.md` - Request and page performance

**Common Queries:**
- All Core Web Vitals summary
- Web Vitals by page/device
- Request duration SLA monitoring
- Page load performance trends

### 2. User Session & Behavior Analysis
Understand user engagement, navigation patterns, and session characteristics. Analyze button clicks, form interactions, and user journeys.

**Data source choice:**
- Use `fetch user.sessions` for session-level analysis (bounce rate, session duration, session counts)
- Use `fetch user.events` for event-level detail (individual clicks, navigation timing, specific pages)

**Primary Files:**
- `references/user-sessions.md` - Session tracking and user analytics
- `references/performance-analysis.md` - Navigation and engagement patterns

**Common Queries:**
- Active sessions by frontend
- Sessions by custom property
- Bounce rate analysis (use `user.sessions` with `characteristics.is_bounce`)
- Session quality (zero-activity sessions via `navigation_count`, `user_interaction_count`)
- Click analysis on UI elements (use `user.events` with `characteristics.has_user_interaction`)
- External referrers (traffic sources)

### 3. Error Tracking & Debugging
Monitor error rates, analyze exceptions, and correlate frontend issues with backend.

**Primary Files:**
- `references/error-tracking.md` - Error analysis and debugging
- `references/performance-analysis.md` - Trace correlation

**Common Queries:**
- Error rate monitoring
- JavaScript exceptions by type
- Failed requests with backend traces
- Request timing breakdown

### 4. Mobile Frontend Monitoring
Track mobile app performance, startup times, and crash analytics for iOS and Android. Analyze app version performance and device-specific issues.

**Primary Files:**
- `references/mobile-monitoring.md` - App starts, crashes, and mobile-specific metrics

**Common Queries:**
- Cold start performance by app version (iOS, Android)
- Warm start and hot start metrics
- Crash rate by device model and OS version
- ANR events (Android)
- Native crash signals
- App version comparison

### 5. Advanced Performance Optimization
Deep performance diagnostics including JavaScript profiling, main thread blocking, UI jank analysis, and geographic performance.

**Primary Files:**
- `references/performance-analysis.md` - Advanced diagnostics and long tasks

**Common Queries:**
- Long JavaScript tasks blocking main thread
- UI jank and rendering delays
- Tasks >50ms impacting responsiveness
- Third-party long tasks (iframes)
- Single-page app performance issues
- Geographic performance distribution
- Performance degradation detection

## Best Practices

1. **Use metrics for trends, events for debugging**
   - Metrics: Timeseries dashboards, alerting, capacity planning
   - Events: Root cause analysis, detailed diagnostics

2. **Filter by frontend in multi-app environments**
   - Always use `frontend.name` for clarity

3. **Match interval to time range**
   - 5m intervals for hours, 1h for days, 1d for weeks

4. **Exclude synthetic traffic when analyzing real users**
   - Filter `dt.rum.user_type` to focus on genuine behavior

5. **Combine metrics with events for complete insights**
   - Start with metric trends, drill into events for details

6. **Extend `user.sessions` time window for correlation queries**
   - `user.sessions` only returns sessions that **started** in the query window
   - Sessions can last 8h+, so extend lookback by at least 8h when joining with `user.events`

## Slow Page Load Playbook

Start by segmenting the problem by page, browser, geo location, and `dt.rum.user_type`.

Heuristics:
- High TTFB -> slow backend
- High LCP with normal TTFB -> render bottleneck
- High CLS -> layout shifts (late-loading content, ads, fonts)
- Long tasks dominate -> JavaScript execution bottlenecks (heavy frameworks, large bundles)

### Backend latency (high TTFB)

```dql
fetch user.events
| filter frontend.name == "my-frontend" and characteristics.has_request == true
| filter page.url.path == "/checkout"
| summarize avg_ttfb = avg(request.time_to_first_byte), avg_duration = avg(duration)
```

If TTFB is high, analyze backend spans by correlating frontend events with backend traces using `dt.rum.trace_id`.

### Heavy JavaScript execution (long tasks)

Long tasks by page:

```dql
fetch user.events, from: now() - 2h
| filter characteristics.has_long_task == true
| summarize
   long_task_count = count(),
   total_blocking_time = sum(duration),
   by: {frontend.name, page.url.path}
| sort total_blocking_time desc
| limit 20
```

Long tasks by script source:

```dql
fetch user.events, from: now() - 2h
| filter frontend.name == "my-frontend"
| filter characteristics.has_long_task == true
| summarize
   long_task_count = count(),
   total_blocking_time = sum(duration),
   by: {long_task.attribution.container_src}
| sort total_blocking_time desc
| limit 20
```

### Large JavaScript bundles

```dql
fetch user.events
| filter frontend.name == "my-frontend"
| filter characteristics.has_request
| filter endsWith(url.full, ".js")
| summarize dls = max(performance.decoded_body_size), by: url.full
| sort dls desc
| limit 20
```

### Large resources

```dql
fetch user.events
| filter frontend.name == "my-frontend"
| filter characteristics.has_request
| summarize dls = max(performance.decoded_body_size), by: url.full
| sort dls desc
| limit 20
```

### Cache effectiveness

```dql
fetch user.events, from: now() - 2h
| filter frontend.name == "my-frontend"
| filter characteristics.has_request == true
| fieldsAdd cache_status = if(
   performance.incomplete_reason == "local_cache" or performance.transfer_size == 0 and
   (performance.encoded_body_size > 0 or performance.decoded_body_size > 0),
   "cached",
   else: if(performance.transfer_size > 0, "network", else: "uncached")
  )
| summarize
   request_count = count(),
   avg_duration = avg(duration),
   by: {url.domain, cache_status}
```

### Compression waste

```dql
fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| filter isNotNull(performance.encoded_body_size) and isNotNull(performance.decoded_body_size)
| filter performance.encoded_body_size > 0
| fieldsAdd
   expansion_ratio = performance.decoded_body_size / performance.encoded_body_size,
   wasted_bytes = performance.decoded_body_size - performance.encoded_body_size
| summarize
   requests = count(),
   avg_expansion_ratio = avg(expansion_ratio),
   total_wasted_bytes = sum(wasted_bytes),
   by: {request.url.host, request.url.path}
| sort total_wasted_bytes desc
| limit 50
```

### Network issues

Compare by location and domain when TTFB is high but backend performance is good:

```dql
fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| summarize
   request_count = count(),
   avg_duration = avg(duration),
   p75_duration = percentile(duration, 75),
   p95_duration = percentile(duration, 95),
   by: {geo.country.iso_code, request.url.domain}
| sort p95_duration desc
| limit 50
```

Analyze DNS time:

```dql
fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| filter isNotNull(performance.domain_lookup_start) and isNotNull(performance.domain_lookup_end)
| fieldsAdd dns_ms = performance.domain_lookup_end - performance.domain_lookup_start
| summarize
   request_count = count(),
   avg_dns_ms = avg(dns_ms),
   p75_dns_ms = percentile(dns_ms, 75),
   p95_dns_ms = percentile(dns_ms, 95),
   by: {request.url.domain}
| sort p95_dns_ms desc
| limit 50
```

Analyze by protocol (http/1.1, h2, h3):

```dql
fetch user.events
| filter characteristics.has_request
| summarize cnt = count(), by: {url.domain, performance.next_hop_protocol}
| sort cnt desc
| limit 50
```

### Third-party dependencies

Analyze request performance by domain:

```dql
fetch user.events, from: now() - 2h
| filter characteristics.has_request == true
| summarize
   request_count = count(),
   avg_duration = avg(duration),
   p75_duration = percentile(duration, 75),
   p95_duration = percentile(duration, 95),
   by: {request.url.domain}
| sort p95_duration desc
| limit 50
```

## Troubleshooting

### Handling Zero Results

When queries return no data, follow this diagnostic workflow:

1. **Validate Timeframe**
   - Check if timeframe is appropriate for the data type
   - RUM data may have delay (1-2 minutes for recent events)
   - Verify timeframe syntax: `now()-1h to now()` or similar
   - Try expanding timeframe: `now()-24h` for initial exploration

2. **Verify frontend Configuration**
   - Confirm frontend is instrumented and sending RUM data
   - Check `frontend.name` filter is correct
   - Test without frontend filter to see if any RUM data exists
   - Verify frontend name matches the environment

3. **Check Data Availability**
   - Run basic query: `fetch user.events | limit 1`
   - If no events exist, RUM may not be configured
   - Check if timeframe predates frontend deployment
   - Verify user has access to the environment

4. **Review Query Syntax**
   - Validate filters aren't too restrictive
   - Check for typos in field names or metric names
   - Test query incrementally: start simple, add filters gradually
   - Verify characteristics filters match event types

**When to Ask User for Clarification:**
- No RUM data exists in environment → "Is RUM configured for this frontend?"
- Timeframe unclear → "What time period should I analyze?"
- Expected data missing → "Has this frontend sent data recently?"

### Handling Anomalous Results

When query results seem unexpected or suspicious:

**Unexpected High Values:**
- **Metric spikes**: Verify interval aggregation (avg vs. max vs. sum)
- **Session counts**: Check for bot traffic or synthetic monitoring
- **Error rates**: Confirm error definition matches expectations
- **Performance degradation**: Look for deployment or infrastructure changes

**Unexpected Low Values:**
- **Missing sessions**: Verify `dt.rum.user_type` filter isn't excluding real users
- **Low request counts**: Check if frontend filter is too narrow
- **Few errors**: Confirm error characteristics filter is correct
- **Missing mobile data**: Verify platform-specific fields exist

**Inconsistent Data:**
- **Metrics vs. Events mismatch**: Different aggregation methods are expected
- **Geographic anomalies**: Check timezone assumptions
- **Device distribution skew**: May reflect actual user base
- **Version mismatches**: Verify app version filtering logic

### Decision Tree: Ask vs. Investigate

```
Query returns unexpected results
│
├─ Is this a zero-result scenario?
│  ├─ YES → Follow "Handling Zero Results" workflow
│  └─ NO → Continue
│
├─ Can I validate the result independently?
│  ├─ YES → Run validation query
│  │        ├─ Validation confirms result → Report findings
│  │        └─ Validation contradicts → Investigate further
│  └─ NO → Continue
│
├─ Is the anomaly clearly explained by data?
│  ├─ YES → Report with explanation
│  └─ NO → Continue
│
├─ Do I need domain knowledge to interpret?
│  ├─ YES → Ask user for context
│  │        Example: "The error rate is 15%. Is this expected for your frontend?"
│  └─ NO → Continue
│
└─ Is the issue ambiguous or requires clarification?
   ├─ YES → Ask specific question with data context
   │        Example: "I see two frontends named 'web-app'. Which frontend name should I use?"
   └─ NO → Investigate and report findings with caveats
```

### Common Investigation Steps

**For Performance Issues:**
1. Compare to baseline: Query same metric for previous week
2. Segment by dimension: Break down by device, browser, geography
3. Check for outliers: Use percentiles (p50, p95, p99) vs. averages
4. Correlate with deployments: Filter by app version or time windows

**For Data Availability Issues:**
1. Start broad: Query all RUM data without filters
2. Add filters incrementally: Isolate which filter eliminates data
3. Check related metrics: If events missing, try timeseries
4. Validate entity relationships: Confirm frontend-to-service links

**For Unexpected Patterns:**
1. Expand timeframe: Look for historical context
2. Cross-reference data sources: Compare events and metrics
3. Check sampling: Verify no sampling is affecting results
4. Consider external factors: Holidays, outages, traffic changes

### Red Flags: When to Stop and Ask

**Always ask the user when:**
- ❌ No RUM data exists anywhere in the environment
- ❌ Multiple frontends match the user's description
- ❌ Results contradict user's stated expectations explicitly
- ❌ Data suggests monitoring is misconfigured
- ❌ Query requires business context (e.g., "acceptable error rate")
- ❌ Timeframe is ambiguous and affects interpretation significantly

**Example clarifying questions:**
- "I found two frontends named 'checkout'. Which one: `checkout-web` or `checkout-mobile`?"
- "The query returns 0 results for the past hour. Should I expand the timeframe, or do you expect real-time data?"
- "The average LCP is 8 seconds, which exceeds the 4-second threshold. Is this frontend known to have performance issues?"
- "I see only synthetic traffic. Should I include `dt.rum.user_type='REAL_USER'` to focus on real users?"

## When to Use This Skill

**Use frontend-observability skill when:**
- Monitoring web or mobile frontend performance
- Analyzing Core Web Vitals for SEO
- Tracking user sessions, engagement, or behavior
- Analyzing click events and button interactions
- Debugging frontend errors or slow requests
- Correlating frontend issues with backend traces
- Optimizing mobile app startup or crash rates (iOS, Android)
- Analyzing app version performance
- Diagnosing UI jank and main thread blocking
- Analyzing security compliance (CSP violations)
- Profiling JavaScript performance (long tasks)

**Do NOT use for:**
- Backend service monitoring (use services skill)
- Infrastructure metrics (use infrastructure skill)
- Log analysis (use logs skill)
- Business process monitoring (use business-events skill)

## Progressive Disclosure

### Always Available
- **FrontendBasics.md** - RUM fundamentals and quick reference

### Loaded by Workflow
- **Web Performance**: WebVitals.md, performance-analysis.md
- **User Behavior**: user-sessions.md, performance-analysis.md
- **Error Analysis**: error-tracking.md, performance-analysis.md
- **Mobile Apps**: mobile-monitoring.md

### Load on Explicit Request
- Advanced diagnostics (long tasks, user actions)
- Security compliance (CSP violations, visibility tracking)
- Specialized mobile features (platform-specific phases)

## Reference Files

### Core Reference Documents
- `references/WebVitals.md` - Core Web Vitals monitoring
- `references/user-sessions.md` - Session and user analytics
- `references/error-tracking.md` - Error analysis and debugging
- `references/mobile-monitoring.md` - Mobile app performance and crashes
- `references/performance-analysis.md` - Advanced performance diagnostics

Source

Creator's repository · dynatrace/dynatrace-for-ai

View on GitHub

License: Apache-2.0

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk