gsd-2-agent-framework

Meta-prompting, context engineering, and spec-driven development system for autonomous long-running coding agents

Skill file

Preview skill file
---
name: gsd-2-agent-framework
description: Meta-prompting, context engineering, and spec-driven development system for autonomous long-running coding agents
triggers:
  - gsd autonomous agent
  - spec-driven development
  - context engineering coding
  - long running agent task
  - gsd auto mode
  - milestone slice task hierarchy
  - gsd-pi cli agent
  - autonomous coding agent framework
---

# GSD 2 — Autonomous Spec-Driven Agent Framework

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection

GSD 2 is a standalone CLI that turns a structured spec into running software autonomously. It controls the agent harness directly — managing fresh context windows per task, git worktree isolation, crash recovery, cost tracking, and stuck detection — rather than relying on LLM self-loops. One command, walk away, come back to a built project with clean git history.

---

## Installation

```bash
npm install -g gsd-pi
```

Requires Node.js 18+. Works with Claude (Anthropic) as the underlying model via the Pi SDK.

---

## Core Concepts

### Work Hierarchy

```
Milestone  →  a shippable version (4–10 slices)
  Slice    →  one demoable vertical capability (1–7 tasks)
    Task   →  one context-window-sized unit of work
```

**Iron rule:** A task must fit in one context window. If it can't, split it into two tasks.

### Directory Layout

```
project/
├── .gsd/
│   ├── STATE.md          # current auto-mode position
│   ├── DECISIONS.md      # architecture decisions register
│   ├── LOCK              # crash recovery lock file
│   ├── milestones/
│   │   └── M1/
│   │       ├── slices/
│   │       │   └── S1/
│   │       │       ├── PLAN.md        # task breakdown with must-haves
│   │       │       ├── RESEARCH.md    # codebase/doc scouting output
│   │       │       ├── SUMMARY.md     # completion summary
│   │       │       └── tasks/
│   │       │           └── T1/
│   │       │               ├── PLAN.md
│   │       │               └── SUMMARY.md
│   └── costs/
│       └── ledger.json   # per-unit token/cost tracking
├── ROADMAP.md            # milestone/slice structure
└── PROJECT.md            # project description and goals
```

---

## Commands

### `/gsd auto` — Primary Autonomous Mode

Run the full automation loop. Reads `.gsd/STATE.md`, dispatches each unit in a fresh session, handles recovery, and advances through the entire milestone without intervention.

```bash
/gsd auto
# or with options:
/gsd auto --budget 5.00        # pause if cost exceeds $5
/gsd auto --milestone M1       # run only milestone 1
/gsd auto --dry-run            # show dispatch plan without executing
```

### `/gsd init` — Initialize a Project

Scaffold the `.gsd/` directory from a `ROADMAP.md` and optional `PROJECT.md`.

```bash
/gsd init
```

Creates initial `STATE.md`, registers milestones and slices from your roadmap, sets up the cost ledger.

### `/gsd status` — Dashboard

Shows current position, per-slice costs, token usage, and what's queued next.

```bash
/gsd status
```

Output example:
```
Milestone 1: Auth System  [3/5 slices complete]
  ✓ S1: User model + migrations
  ✓ S2: Password auth endpoints
  ✓ S3: JWT session management
  → S4: OAuth integration  [PLANNING]
    S5: Role-based access control

Cost: $1.84 / $5.00 budget
Tokens: 142k input, 38k output
```

### `/gsd run` — Single Unit Dispatch

Execute one specific unit manually instead of running the full loop.

```bash
/gsd run --slice M1/S4            # run research + plan + execute for a slice
/gsd run --task M1/S4/T2          # run a single task
/gsd run --phase research M1/S4   # run just the research phase
/gsd run --phase plan M1/S4       # run just the planning phase
```

### `/gsd migrate` — Migrate from v1

Import old `.planning/` directories from the original Get Shit Done.

```bash
/gsd migrate                        # migrate current directory
/gsd migrate ~/projects/old-project # migrate specific path
```

### `/gsd costs` — Cost Report

Detailed cost breakdown with projections.

```bash
/gsd costs
/gsd costs --by-phase
/gsd costs --by-slice
/gsd costs --export costs.csv
```

---

## Project Setup

### 1. Write `ROADMAP.md`

```markdown
# My Project Roadmap

## Milestone 1: Core API

### S1: Database schema and migrations
Set up Postgres schema for users, posts, and comments.

### S2: REST endpoints
CRUD endpoints for all resources with validation.

### S3: Authentication
JWT-based auth with refresh tokens.

## Milestone 2: Frontend

### S1: React app scaffold
...
```

### 2. Write `PROJECT.md`

```markdown
# My Project

A REST API for a blogging platform built with Express + TypeScript + Postgres.

## Tech Stack
- Node.js 20, TypeScript 5
- Express 4
- PostgreSQL 15 via pg + kysely
- Jest for tests

## Conventions
- All endpoints return `{ data, error }` envelope
- Database migrations in `db/migrations/`
- Feature modules in `src/features/<name>/`
```

### 3. Initialize

```bash
/gsd init
```

### 4. Run

```bash
/gsd auto
```

---

## The Auto-Mode State Machine

```
Research → Plan → Execute (per task) → Complete → Reassess → Next Slice
```

Each phase runs in a **fresh session** with context pre-inlined into the dispatch prompt:

| Phase | What the LLM receives | What it produces |
|---|---|---|
| Research | PROJECT.md, ROADMAP.md, slice description, codebase index | RESEARCH.md with findings, gotchas, relevant files |
| Plan | Research output, slice description, must-haves | PLAN.md with task breakdown, verification steps |
| Execute (task N) | Task plan, prior task summaries, dependency summaries, DECISIONS.md | Working code committed to git |
| Complete | All task summaries, slice plan | SUMMARY.md, UAT script, updated ROADMAP.md |
| Reassess | Completed slice summary, full ROADMAP.md | Updated roadmap with any corrections |

---

## Must-Haves: Mechanically Verifiable Outcomes

Every task plan includes must-haves — explicit, checkable criteria the LLM uses to confirm completion. Write them as shell commands or file existence checks:

```markdown
## Must-Haves

- [ ] `npm test -- --testPathPattern=auth` passes with 0 failures
- [ ] File `src/features/auth/jwt.ts` exists and exports `signToken`, `verifyToken`
- [ ] `curl -X POST http://localhost:3000/auth/login` returns 200 with `{ data: { token } }`
- [ ] No TypeScript errors: `npx tsc --noEmit` exits 0
```

The execute phase ends only when the LLM can check off every must-have.

---

## Git Strategy

GSD manages git automatically in auto mode:

```
main
 └── milestone/M1          ← worktree branch created at start
      ├── commit: [M1/S1/T1] implement user model
      ├── commit: [M1/S1/T2] add migrations
      ├── commit: [M1/S1] slice complete
      ├── commit: [M1/S2/T1] POST /users endpoint
      └── ...
 
 After milestone complete:
main ← squash merge of milestone/M1 as "[M1] Auth system"
```

Each task commits with a structured message. Each slice commits a summary commit. The milestone squash-merges to main as one clean entry.

---

## Crash Recovery

GSD writes a lock file at `.gsd/LOCK` when a unit starts and removes it on clean completion. If the process dies:

```bash
# Next run detects the lock and auto-recovers:
/gsd auto

# Output:
# ⚠ Lock file found: M1/S3/T2 was interrupted
# Synthesizing recovery briefing from session artifacts...
# Resuming with full context
```

The recovery briefing is synthesized from every tool call that reached disk — file writes, shell output, partial completions — so the resumed session has context continuity.

---

## Cost Controls

Set a budget ceiling to pause auto mode before overspending:

```bash
/gsd auto --budget 10.00
```

The cost ledger at `.gsd/costs/ledger.json`:

```json
{
  "units": [
    {
      "id": "M1/S1/research",
      "model": "claude-opus-4",
      "inputTokens": 12400,
      "outputTokens": 3200,
      "costUsd": 0.21,
      "completedAt": "2025-01-15T10:23:44Z"
    }
  ],
  "totalCostUsd": 1.84,
  "budgetUsd": 10.00
}
```

---

## Decisions Register

`.gsd/DECISIONS.md` is auto-injected into every task dispatch. Record architectural decisions here and the LLM will respect them across all future sessions:

```markdown
# Decisions Register

## D1: Use kysely not prisma
**Date:** 2025-01-14
**Reason:** Better TypeScript inference, no code generation step needed.
**Impact:** All DB queries use kysely QueryBuilder syntax.

## D2: JWT in httpOnly cookie, not Authorization header
**Date:** 2025-01-14  
**Reason:** Better XSS protection for the web client.
**Impact:** Auth middleware reads `req.cookies.token`.
```

---

## Stuck Detection

If the same unit dispatches twice without producing its expected artifact, GSD:

1. Retries once with a deep diagnostic prompt that includes what was expected vs. what exists on disk
2. If the second attempt fails, **stops auto mode** and reports:

```
✗ Stuck on M1/S3/T1 after 2 attempts
Expected: src/features/auth/jwt.ts (not found)
Last session: .gsd/sessions/M1-S3-T1-attempt2.log
Run `/gsd run --task M1/S3/T1` to retry manually
```

---

## Skills Integration

GSD supports auto-detecting and installing relevant skills during the research phase. Create `SKILLS.md` in your project:

```markdown
# Project Skills

- name: postgres-kysely
- name: express-typescript  
- name: jest-testing
```

Skills are injected into the research and plan dispatch prompts, giving the LLM curated knowledge about your exact stack without burning context on irrelevant docs.

---

## Timeout Supervision

Three timeout tiers prevent runaway sessions:

| Timeout | Default | Behavior |
|---|---|---|
| Soft | 8 min | Sends "please wrap up" steering message |
| Idle | 3 min no tool calls | Sends "are you stuck?" recovery prompt |
| Hard | 15 min | Pauses auto mode, preserves all disk state |

Configure in `.gsd/config.json`:

```json
{
  "timeouts": {
    "softMinutes": 8,
    "idleMinutes": 3,
    "hardMinutes": 15
  },
  "defaultModel": "claude-opus-4",
  "researchModel": "claude-sonnet-4"
}
```

---

## TypeScript Integration (Pi SDK)

GSD is built on the [Pi SDK](https://github.com/badlogic/pi-mono). You can extend it programmatically:

```typescript
import { GSDProject, AutoRunner } from 'gsd-pi';

const project = await GSDProject.load('/path/to/project');

// Check current state
const state = await project.getState();
console.log(state.currentMilestone, state.currentSlice);

// Run a single slice programmatically
const runner = new AutoRunner(project, {
  budget: 5.00,
  onUnitComplete: (unit, cost) => {
    console.log(`Completed ${unit.id}, cost: $${cost.toFixed(3)}`);
  },
  onStuck: (unit, attempts) => {
    console.error(`Stuck on ${unit.id} after ${attempts} attempts`);
    process.exit(1);
  }
});

await runner.runSlice('M1/S4');
```

---

## Custom Dispatch Hooks

Inject custom context into any dispatch prompt:

```typescript
// .gsd/hooks.ts
import type { DispatchHook } from 'gsd-pi';

export const beforeTaskDispatch: DispatchHook = async (ctx) => {
  // Append custom context to every task dispatch
  return {
    ...ctx,
    extraContext: `
## Live API Docs
${await fetchInternalAPIDocs()}
    `
  };
};
```

Register in `.gsd/config.json`:

```json
{
  "hooks": "./hooks.ts"
}
```

---

## Roadmap Reassessment

After each slice completes, GSD runs a reassessment pass that may:

- Re-order upcoming slices based on discovered dependencies
- Split a slice that turned out larger than expected
- Mark a slice as no longer needed
- Add a new slice for discovered work

The LLM edits `ROADMAP.md` in place. You can review diffs with:

```bash
git diff ROADMAP.md
```

To disable reassessment:

```json
{
  "reassessment": false
}
```

---

## Troubleshooting

### Auto mode stops immediately with "no pending slices"
All slices in `ROADMAP.md` are marked `[x]`. Reset a slice: remove `[x]` from its entry and delete `.gsd/milestones/M1/slices/S3/SUMMARY.md`.

### LLM keeps failing must-haves
Check `.gsd/sessions/` for the last session log. Common causes: must-have references wrong file path, or test command needs environment variable. Adjust must-haves in the task's `PLAN.md` and re-run with `/gsd run --task M1/S3/T2`.

### Cost ceiling hit unexpectedly
The research phase on large codebases can be expensive. Set `researchModel` to a cheaper model in config, or reduce codebase index depth.

### Lock file left after clean exit
```bash
rm .gsd/LOCK
/gsd auto
```

### Git worktree conflicts
```bash
git worktree list          # see active worktrees
git worktree remove .gsd/worktrees/M1 --force
/gsd auto                  # recreates cleanly
```

### Session file too large for recovery
If `.gsd/sessions/` grows large, GSD compresses sessions older than 24h automatically. Manual cleanup:
```bash
/gsd cleanup --sessions --older-than 7d
```

---

## Links

- [GitHub: gsd-build/GSD-2](https://github.com/gsd-build/GSD-2)
- [npm: gsd-pi](https://www.npmjs.com/package/gsd-pi)
- [Pi SDK](https://github.com/badlogic/pi-mono)
- [Original GSD v1](https://github.com/gsd-build/get-shit-done)

Source

Creator's repository · aradotso/trending-skills

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk