mcore-split-pr

Split a PR into multiple PRs to reduce the number of required CODEOWNERS reviewer groups.

Skill file

Preview skill file
---
name: mcore-split-pr
description: Split a PR into multiple PRs to reduce the number of required CODEOWNERS reviewer groups.
license: Apache-2.0
when_to_use: User asks to split a PR, reduce reviewer groups, or break up a large PR; 'too many CODEOWNERS', 'split this PR', 'break up PR', 'reduce reviewers needed'.
user_invocable: true
argument: "PR URL or number"
metadata:
  author: Philip Petrakian <ppetrakian@nvidia.com>
---

# Split PR by CODEOWNERS Groups

Split a large pull request into multiple smaller PRs, where each PR touches
the fewest possible CODEOWNERS reviewer groups. The goal is to reduce review
burden: a PR that only touches `megatron/core/` needs only the core reviewers,
while a PR that also touches `examples/`, `tools/`, and `megatron/training/`
pulls in many additional groups.

## Answer-First Constraints

For split-planning questions, lead with these constraints before the full
workflow:

- Minimize CODEOWNERS reviewer groups per PR, but each resulting PR must still
  be independently mergeable and reviewable.
- Tests travel with the production code they validate; do not split tests into a
  separate PR just to reduce reviewer groups.
- If PR B depends on symbols renamed in PR A, call out the dependency and put
  backward-compatible aliases, re-exports, or shims in PR A when needed.
- Wait for user approval before execution.
- Execution creates draft PRs from the right base, applies file-scoped diffs
  with `git diff upstream/main..<source-branch> -- <paths> | git apply`, pushes
  to the user's fork, and never pushes directly to upstream.

## Workflow

### 1. Analyze the PR

1. Fetch the PR details: `gh pr view <number> --repo NVIDIA/Megatron-LM --json title,body,headRefName,author` and `gh pr diff <number> --repo NVIDIA/Megatron-LM --stat`. Also determine the current GitHub user with `gh api user --jq .login`.
2. Parse `.github/CODEOWNERS` to build a mapping from file path patterns to owner groups.
3. For each changed file in the PR, determine which CODEOWNERS groups would be required to review it.
4. Build a summary table grouped by CODEOWNERS group, showing which files pull in which groups.
5. Count the total number of distinct reviewer groups the PR currently requires.

### 2. Propose a split that minimizes reviewer groups per PR

The primary optimization goal: **minimize the number of CODEOWNERS reviewer groups required for each resulting PR**.

Strategy:
1. Cluster files by their CODEOWNERS groups. Files owned by the same set of groups naturally belong together.
2. Identify the largest cluster — this becomes the first (and usually largest) PR.
3. Remaining files form one or more additional PRs, each ideally requiring only one or two reviewer groups.
4. If a split creates a dependency (e.g., PR B uses symbols renamed in PR A), the dependent PR must be merged after the first. Note this explicitly.
5. Each PR must be independently mergeable to main — no broken imports, no missing symbols. Backward-compatible aliases and re-export stubs in the first PR can make this possible.

Present the proposed split as a table:
- PR name/description
- Files included
- CODEOWNERS groups required
- Dependencies on other PRs (if any)

Wait for user approval before proceeding.

### 3. Execute the split (after user approval)

For each new PR:
1. Create a new branch from the appropriate base (`main`, or a dependency PR's branch).
2. Extract the relevant changes: `git diff upstream/main..<source-branch> -- <file paths> | git apply`.
3. Stage, commit with a clear message, and push to the user's fork.
4. Create the PR as a **draft** (per repo contributing guidelines).
5. If the original PR needs to be narrowed in scope, confirm with the user before force-pushing.
6. Report all PR URLs when done.

## Important guidelines

- Always create PRs as **drafts** and push to the user's fork, never directly to upstream.
- Backward-compatible changes (aliases, re-exports, deprecation shims) should go in the first PR so subsequent PRs can depend on them.
- Test files should go with the production code they test, not in a separate PR.
- Prefer a single clean commit per split PR over replaying the original commit history.
- If a file is hard to categorize (e.g., it touches two groups), ask the user which PR it should go in.
- If the current GitHub user is not the author of the original PR, each new PR's description must explicitly credit the original author (e.g., "Original changes by @<author> in #<number>").

Source

Creator's repository · nvidia/skills

View on GitHub

License: Apache-2.0

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk