triaging-issues

Triages GitHub issues by routing to oncall teams, applying labels, and closing questions. Use when processing new PyTorch issues or when asked to triage an issue.

Skill file

Preview skill file
---
name: triaging-issues
description: Triages GitHub issues by routing to oncall teams, applying labels, and closing questions. Use when processing new PyTorch issues or when asked to triage an issue.
hooks:
  PreToolUse:
    - matcher: "mcp__github__issue_write|mcp__github__update_issue"
      hooks:
        - type: command
          command: "python3 \"$CLAUDE_PROJECT_DIR\"/.claude/skills/triaging-issues/scripts/validate_labels.py"
  PostToolUse:
    - matcher: "mcp__github__issue_write|mcp__github__update_issue|mcp__github__add_issue_comment|mcp__github__transfer_issue"
      hooks:
        - type: command
          command: "python3 \"$CLAUDE_PROJECT_DIR\"/.claude/skills/triaging-issues/scripts/add_bot_triaged.py"
---

# PyTorch Issue Triage Skill

This skill helps triage GitHub issues by routing issues, applying labels, and leaving first-line responses.

## Contents
- [MCP Tools Available](#mcp-tools-available)
- [Labels You Must NEVER Add](#labels-you-must-never-add)
- [Issue Triage Steps](#issue-triage-for-each-issue)
  - Step 0: Already Routed — SKIP
  - Step 1: Question vs Bug/Feature
  - Step 1.5: Needs Reproduction — External Files
  - Step 2: Transfer
  - Step 2.5: PT2 Issues — Special Handling
  - Step 3: Redirect to Secondary Oncall
  - Step 4: Label the Issue
  - Step 5: High Priority — REQUIRES HUMAN REVIEW
  - Step 6: bot-triaged (automatic)
  - Step 7: Mark Triaged
- [V1 Constraints](#v1-constraints)

**Labels reference:** See [labels.json](labels.json) for the full catalog of labels suitable for triage. **ONLY apply labels that exist in this file.** Do not invent or guess label names. This file excludes CI triggers, test configs, release notes, deprecated labels, and labels requiring human decision.

**PT2 triage guide:** See [pt2-triage-rubric.md](pt2-triage-rubric.md) for detailed labeling guidance when triaging PT2/torch.compile issues.

**Response templates:** See [templates.json](templates.json) for standard response messages.

---

## MCP Tools Available

Use these GitHub MCP tools for triage:

| Tool | Purpose |
|------|---------|
| `mcp__github__issue_read` | Get issue details, comments, and existing labels |
| `mcp__github__issue_write` | Apply labels or close issues |
| `mcp__github__add_issue_comment` | Add comment (only for redirecting questions) |
| `mcp__github__search_issues` | Find similar issues for context |

---

## Labels You Must NEVER Add

| Prefix/Category | Reason |
|-----------------|--------|
| Labels not in `labels.json` | Only apply labels that exist in the allowlist |
| `ciflow/*` | CI job triggers for PRs only |
| `test-config/*` | Test suite selectors for PRs only |
| `release notes: *` | Auto-assigned for release notes |
| `ci-*`, `ci:*` | CI infrastructure controls |
| `sev*` | Severity labels require human decision |
| `merge blocking` | Requires human decision |
| `actionable` | Requires human decision |
| Any label containing "deprecated" | Obsolete |
| `oncall: releng` | Not a triage redirect target. Use `module: ci` instead |

**If blocked:** When a label is blocked by the hook, add ONLY `triage review` and stop. A human will handle it.

These rules are enforced by a PreToolUse hook that validates all labels against `labels.json`.

### Never Override Human Labels

If a human has already applied labels (especially `ci: sev`, severity labels, or priority labels), do NOT remove or replace them. Your job is to supplement, not override.

---

## Issue Triage (for each issue)

### 0) Already Routed — SKIP

**If an issue already has ANY `oncall:` label, SKIP IT entirely.** Do not:
- Add any labels
- Add `triaged`
- Leave comments
- Do any triage work

That issue belongs to the sub-oncall team. They own their queue.

### 1) Question vs Bug/Feature

- If it is a question (not a bug report or feature request): close and use the `redirect_to_forum` template from `templates.json`.
- If unclear whether it is a bug/feature vs a question: request additional information using the `request_more_info` template and stop.

### 1.5) Needs Reproduction — External Files

Check if the issue body contains links to external files that users would need to download to reproduce.

**Patterns to detect:**
- File attachments: `.zip`, `.pt`, `.pth`, `.pkl`, `.safetensors`, `.onnx`, `.bin` files
- External storage: Google Drive, Dropbox, OneDrive, Mega, WeTransfer links
- Model hubs: Hugging Face Hub links to model files

**Action:**
1. **Edit the issue body** to remove/redact the download links
   - Replace with: `[Link removed - external file downloads are not permitted for security reasons]`
2. Add `needs reproduction` label
3. Use the `needs_reproduction` template from `templates.json` to request a self-contained reproduction
4. Do NOT add `triaged` — wait for the user to provide a reproducible example

### 1.55) Needs Reproduction — Other Cases

Also add `needs reproduction` when:
- The user reports a hardware-specific issue (e.g., specific GPU model) without a self-contained repro script
- The user references a specific model/checkpoint/dataset that is not publicly runnable in a few lines
- The issue describes version-upgrade breakage but only provides a high-level description without a minimal script
- The repro depends on a specific training setup, distributed environment, or non-trivial infrastructure

### 1.6) Edge Cases & Numerical Accuracy

If the issue involves extremal values or numerical precision differences:

**Patterns to detect:**
- Values near `torch.finfo(dtype).max` or `torch.finfo(dtype).min`
- NaN/Inf appearing in outputs from valid (but extreme) inputs
- Differences between CPU and GPU results
- Precision differences between dtypes (e.g., fp32 vs fp16)
- Fuzzer-generated edge cases

**IMPORTANT — avoid keyword-triggered mislabeling:**

Label based on the **root cause**, not keywords that appear in the error or title. A keyword tells you what failed, not why.

- An `undefined symbol: ncclAlltoAll` error at `import torch` is a **packaging** issue (`module: binaries`), not a distributed training bug — the user never ran distributed code.
- A `nan` in a parameter name or tolerance check is not `module: NaNs and Infs` unless the bug is actually about NaN propagation.
- A stack trace mentioning `autograd` does not mean `module: autograd` — check whether the bug is in autograd itself or just on the call path.
- A test failure with tolerance thresholds is `module: tests`, not `module: numerical-stability`.

Ask: "Where would the fix need to be made?" That determines the label.

**Action:**
1. Add `module: edge cases` label
2. If from a fuzzer, also add `topic: fuzzer`
3. Use the `numerical_accuracy` template from `templates.json` to link to the docs
4. If the issue is clearly expected behavior per the docs, close it with the template comment

### 2) Transfer (domain library or ExecuTorch)

If the issue belongs in another repo (vision/text/audio/RL/ExecuTorch/etc.), transfer the issue and **STOP**.

### 2.5) PT2 Issues — Special Handling

**PT2 is NOT a redirect.** `oncall: pt2` is not like the other oncall labels in Step 3. PT2 issues continue through Steps 4–7 for full triage — add `oncall: pt2`, then proceed to label with `module:` labels, mark `triaged`, etc.

**Every `oncall: pt2` issue MUST have at least one `module:` label.** The PT2 oncall queue is too broad without a module label — the team needs to know which component is affected (e.g., `module: dynamo`, `module: inductor`, `module: helion`, `module: dynamic shapes`). If you cannot determine the specific module, use `module: compile ux` as a fallback, but always try to be specific first. See [pt2-triage-rubric.md](pt2-triage-rubric.md) for detailed guidance.

### 3) Redirect to Secondary Oncall

**CRITICAL:** When redirecting issues to a **non-PT2** oncall queue, apply exactly one `oncall: ...` label and **STOP**. Do NOT:
- Add any `module:` labels
- Mark it `triaged`
- Do any further triage work

The sub-oncall team will handle their own triage. Your job is only to route it to them.

#### Oncall Redirect Labels

| Label | When to use |
|-------|-------------|
| `oncall: jit` | TorchScript issues |
| `oncall: distributed` | Distributed training (DDP, FSDP, RPC, c10d, DTensor, DeviceMesh, symmetric memory, context parallel, pipelining). **Special handling:** after applying this label, invoke the distributed triage sub-skill (`/distributed-triage` on this issue) for second-level triage — it will route to a sub-oncall, add module labels, and mark triaged. |
| `oncall: export` | torch.export issues |
| `oncall: quantization` | Quantization issues |
| `oncall: mobile` | Mobile (iOS/Android), excludes ExecuTorch |
| `oncall: profiler` | Profiler issues (CPU, GPU, Kineto) |
| `oncall: visualization` | TensorBoard integration |

**Common routing mistakes to avoid:**
- **MPS ≠ Mobile.** MPS (Metal Performance Shaders) is the macOS/Apple Silicon GPU backend. Do NOT route MPS issues to `oncall: mobile`. MPS issues stay in the general queue with `module: mps`.
- **DTensor → `oncall: distributed`.** DTensor issues should always be routed to `oncall: distributed`, even if they don't mention DDP/FSDP.
- **ONNX → `module: onnx`.** There is no `oncall: onnx`. Use `module: onnx` and keep in the general queue.
- **CI/releng → `module: ci`.** Do not use `oncall: releng`. Use `module: ci` for CI infrastructure issues.
- **torch.compile + distributed.** When `torch.compile` mishandles a distributed op (e.g., `dist.all_reduce`), the issue typically needs BOTH `oncall: pt2` and `oncall: distributed` since the fix may span both codebases.

**Note:** `oncall: cpu inductor` is a sub-queue of PT2. For general triage, just use `oncall: pt2`.

### 4) Label the issue (if NOT transferred/redirected)

Only if the issue stays in the general queue:
- Add 1+ `module: ...` labels based on the affected area
- Prefer specific labels over general ones when both exist. Check `labels.json` descriptions for guidance on when a specific label supersedes a general one (e.g., `module: sdpa` instead of `module: nn` for SDPA issues, `module: flex attention` instead of `module: nn` for flex attention).
- `feature` — wholly new functionality that does not exist today in any form
- `enhancement` — improvement to something that already works (e.g., adding a native backend kernel for an op that already runs via fallback/composite, performance optimization, better error messages). If the enhancement is about performance, also add `module: performance`.
- `function request` — a new function or new arguments/modes for an existing function
- If the issue says the operation "currently works" or "falls back to" a slower path, that is `enhancement`, not `feature`

**Commonly missed labels — always check for these:**

| Condition | Label |
|-----------|-------|
| Segfault, illegal memory access, SIGSEGV | `module: crash` |
| Performance issue: regression, slowdown, or optimization request | `module: performance` |
| Issue on Windows | `module: windows` |
| Previously working feature now broken | `module: regression` |
| Broken docs/links that previously worked | `module: docs` + `module: regression` (NOT `enhancement`) |
| Issue about a test failing (not the underlying functionality) | `module: tests` |
| Backward pass / gradient computation bug | `module: autograd` (in addition to the op's module label) |
| `torch.linalg` ops or linear algebra ops (solve, svd, eig, inv, etc.) | `module: linear algebra` |
| `has workaround` | Only add when the workaround is **non-trivial and non-obvious**. If the issue is "X doesn't work for non-contiguous tensors," calling `.contiguous()` is the tautological inverse of the bug, not a workaround. A real workaround is something like installing a specific package version, adding a synchronization point, inserting `gc.collect()`, or using a different API that isn't obviously implied by the bug description. |

**Label based on the actual bug, not keywords.** Read the issue to understand what is actually broken. A bug about broadcasting that happens to mention "nan" in a parameter name is a frontend bug, not a NaN/Inf bug.

### 5) High Priority — REQUIRES HUMAN REVIEW

**CRITICAL:** If you believe an issue is high priority, you MUST:
1. Add `triage review` label and do not add `triaged`

Do NOT directly add `high priority` without human confirmation.

High priority criteria:
- Crash / segfault / illegal memory access
- Silent correctness issue (wrong results without error)
- Regression from a prior version
- Internal assert failure
- Many users affected
- Core component or popular model impact

### 6) bot-triaged (automatic)

The `bot-triaged` label is automatically applied by a post-hook after any issue mutation. You do not need to add it manually.

### 7) Mark triaged

If not transferred/redirected and not flagged for review, add `triaged`.

---

## V1 Constraints

**DO NOT:**
- Close bug reports or feature requests automatically
- Close issues unless they are clear usage questions per Step 1
- Assign issues to users
- Add `high priority` directly without human confirmation
- Add module labels when redirecting to oncall
- Add comments to bug reports or feature requests, except a single info request when classification is unclear

**DO:**
- Close clear usage questions and point to discuss.pytorch.org (per step 1)
- Be conservative - when in doubt, add `triage review` for human attention
- Apply type labels (`feature`, `enhancement`, `function request`) when confident
- Add `triaged` label when classification is complete

**Note:** `bot-triaged` is automatically applied by a post-hook after any issue mutation.

Source

Creator's repository · pytorch/pytorch

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk