experiment-code

Write ML experiment code with iterative improvement. Generate training/evaluation pipelines, debug errors, and optimize results through code reflection. Use when implementing experiments for a research paper.

Skill file

Preview skill file↓↑

---
name: experiment-code
description: Write ML experiment code with iterative improvement. Generate training/evaluation pipelines, debug errors, and optimize results through code reflection. Use when implementing experiments for a research paper.
argument-hint: [plan-or-idea]
---

# Experiment Code

Generate and iteratively improve ML experiment code for research papers.

## Input

- `$0` — Task: `generate`, `improve`, `debug`, `plot`
- `$1` — Research plan, idea description, or error message

## References

- Experiment prompts and patterns: `~/.claude/skills/experiment-code/references/experiment-prompts.md`
- Code patterns (error handling, repair, hill-climbing): `~/.claude/skills/experiment-code/references/code-patterns.md`

## Action: `generate`

Generate initial experiment code following this structure:

1. **Plan experiments first** — List all runs needed (hyperparameter sweeps, ablations, baselines)
2. **Write self-contained code** — All code in project directory, no external imports from reference repos
3. **Include proper logging** — Save results to JSON, print intermediate metrics
4. **Generate figures** — At minimum Figure_1.png and Figure_2.png

### Mandatory Structure
```
project/
├── experiment.py # Main experiment script
├── plot.py # Visualization script
├── notes.txt # Experiment descriptions and results
├── run_1/ # Results from run 1
│ └── final_info.json
├── run_2/
└── ...
```

### Constraints
- No placeholder code (`pass`, `...`, `raise NotImplementedError`)
- Must use actual datasets (not toy data unless explicitly requested)
- PyTorch or scikit-learn preferred (no TensorFlow/Keras)
- Each run uses: `python experiment.py --out_dir=run_i`

## Action: `improve`

Improve existing experiment code:
1. Read current code and results
2. Reflect on what worked and what didn't
3. Apply targeted edits (prefer small edits over full rewrites)
4. Re-run and compare scores
5. Keep the best-performing code variant

## Action: `debug`

Fix experiment code errors:
1. Read the error message (truncate to last 1500 chars if very long)
2. Identify the root cause
3. Apply minimal fix
4. Up to 4 retry attempts before changing approach

## Action: `plot`

Generate publication-quality plots from experiment results:
1. Read all `run_*/final_info.json` files
2. Generate comparison plots with proper labels
3. Use the figure-generation skill for styling

## Rules

- Always plan experiments before writing code
- After each run, document results in notes.txt
- Include print statements explaining what results show
- Method MUST not get 0% accuracy — verify accuracy calculations
- Use seeds for reproducibility
- Before each experiment include a print statement explaining exactly what the results are meant to show

## Related Skills
- Upstream: [experiment-design](../experiment-design/), [algorithm-design](../algorithm-design/)
- Downstream: [data-analysis](../data-analysis/), [backward-traceability](../backward-traceability/)
- See also: [code-debugging](../code-debugging/), [paper-to-code](../paper-to-code/)

Source

Creator's repository · lingzhi227/agent-research-skills

View on GitHub ↗

Security

Security checks in progress

Results will appear here once audits complete

What this skill can do

Reads your filesConnects to the internetRuns code on your machine

Checked by 3 independent security firms

Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub

Does it sneak in hidden code?Not yet checkedPending · Socket

Does it have known bugs?Not yet checkedPending · Snyk