pentest-r1-autonomous-penetration-testing

Two-stage reinforcement learning framework for training LLMs to perform autonomous penetration testing and CTF challenges

Skill file

Preview skill file
---
name: pentest-r1-autonomous-penetration-testing
description: Two-stage reinforcement learning framework for training LLMs to perform autonomous penetration testing and CTF challenges
triggers:
  - train a penetration testing AI model
  - use pentest-r1 for autonomous security testing
  - setup reinforcement learning for pentesting
  - run pentest-r1 offline training
  - configure pentest-r1 online RL environment
  - train LLM on CTF challenges
  - build autonomous penetration testing agent
  - setup intercode-ctf docker environment
---

# Pentest-R1 Autonomous Penetration Testing

> Skill by [ara.so](https://ara.so) — Security Skills collection

Pentest-R1 is a two-stage reinforcement learning framework that trains Large Language Models for autonomous penetration testing. It combines offline RL on expert walkthroughs with online RL in interactive CTF environments to develop robust attack reasoning capabilities.

## Installation

### Prerequisites

- Python 3.11.11
- Docker (for Stage 2 and reproducible environments)
- NVIDIA Container Toolkit (for GPU support)
- CUDA 12.4 runtime (optional, for GPU acceleration)

### Basic Setup

```bash
git clone https://github.com/KHenryAegis/Pentest-R1.git
cd Pentest-R1
pip install -r requirements.txt
```

### Docker Environment (Recommended for Reproducibility)

```bash
# Build the reproducible research environment
source setup-docker.sh

# Run container with optimized cache mounting
docker run --rm -it \
  --name pentest-r1 \
  -v "$(pwd)":/root/Pentest-R1 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  -v ~/.cache/triton:/root/.cache/triton \
  -v ~/.cache/torch_extensions:/root/.cache/torch_extensions \
  -w /root/Pentest-R1 \
  --gpus all \
  --net=host \
  pentest-r1:ubuntu22.04
```

## Core Training Pipeline

### Stage 1: Offline Reinforcement Learning

Stage 1 trains the base LLM on a curated dataset of 500+ real-world expert penetration testing walkthroughs.

```bash
python grpo_stage1.py
```

**Key configuration in `grpo_stage1.py`:**

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from unsloth import FastLanguageModel
import torch

# Model configuration
model_name = "unsloth/Meta-Llama-3.1-8B-Instruct"
max_seq_length = 4096
dtype = None  # Auto-detect
load_in_4bit = True  # Use 4-bit quantization

# Load model with unsloth optimization
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

# PEFT configuration for efficient training
from peft import LoraConfig, get_peft_model

peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, peft_config)
```

### Stage 2: Online Reinforcement Learning

Stage 2 fine-tunes the model in interactive CTF environments with real-time feedback.

**Setup CTF environment:**

```bash
cd train_ctf_env
docker build -t intercode-ctf .
cd ..
```

**Run Stage 2 training:**

```bash
python grpo_multi_turn_stage2.py
```

## Key Components

### Data Loading and Preprocessing

```python
from datasets import load_dataset

# Load expert walkthrough dataset
dataset = load_dataset("json", data_files="path/to/expert_walkthroughs.jsonl")

# Example dataset format
# {
#   "challenge": "SQL Injection in login form",
#   "steps": [
#     {"action": "reconnaissance", "command": "sqlmap -u http://target/login", "reasoning": "..."},
#     {"action": "exploit", "command": "sqlmap --dump", "reasoning": "..."}
#   ],
#   "flag": "CTF{...}"
# }

def preprocess_function(examples):
    """Format data for training"""
    prompts = []
    responses = []
    
    for challenge, steps in zip(examples["challenge"], examples["steps"]):
        prompt = f"Challenge: {challenge}\nWhat are the steps to solve this?"
        response = "\n".join([
            f"Step {i+1}: {step['reasoning']}\nCommand: {step['command']}"
            for i, step in enumerate(steps)
        ])
        prompts.append(prompt)
        responses.append(response)
    
    return {"prompt": prompts, "response": responses}

tokenized_dataset = dataset.map(preprocess_function, batched=True)
```

### Reward Model Configuration

```python
class PentestRewardModel:
    """Reward model for evaluating penetration testing actions"""
    
    def __init__(self):
        self.success_reward = 1.0
        self.partial_reward = 0.5
        self.failure_penalty = -0.1
    
    def calculate_reward(self, action, environment_feedback):
        """Calculate reward based on action outcome"""
        if "flag" in environment_feedback.lower():
            return self.success_reward
        elif "error" in environment_feedback.lower():
            return self.failure_penalty
        elif "progress" in environment_feedback.lower():
            return self.partial_reward
        return 0.0

reward_model = PentestRewardModel()
```

### Interacting with CTF Environment

```python
import docker

class CTFEnvironment:
    """Wrapper for InterCode-CTF Docker environment"""
    
    def __init__(self, image_name="intercode-ctf"):
        self.client = docker.from_env()
        self.image_name = image_name
        self.container = None
    
    def start(self, challenge_id):
        """Start a CTF challenge container"""
        self.container = self.client.containers.run(
            self.image_name,
            detach=True,
            environment={"CHALLENGE_ID": challenge_id},
            network_mode="host",
            remove=True
        )
        return self.container
    
    def execute_command(self, command):
        """Execute a command in the container"""
        if not self.container:
            raise RuntimeError("Container not started")
        
        exec_result = self.container.exec_run(command)
        return {
            "stdout": exec_result.output.decode('utf-8'),
            "exit_code": exec_result.exit_code
        }
    
    def cleanup(self):
        """Stop and remove container"""
        if self.container:
            self.container.stop()
            self.container = None

# Usage example
env = CTFEnvironment()
env.start(challenge_id="sql_injection_001")
result = env.execute_command("sqlmap -u http://localhost/login --batch")
print(result["stdout"])
env.cleanup()
```

## Training Configuration

### GRPO (Group Relative Policy Optimization) Settings

```python
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./pentest-r1-checkpoints",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=5e-5,
    warmup_steps=100,
    logging_steps=10,
    save_steps=500,
    save_total_limit=3,
    fp16=True,  # Mixed precision training
    report_to="wandb",  # Optional: integration with Weights & Biases
    remove_unused_columns=False,
)
```

### Environment Variables

```python
import os

# Model and training configuration
os.environ["HF_TOKEN"] = os.getenv("HUGGINGFACE_TOKEN")  # For model downloads
os.environ["WANDB_API_KEY"] = os.getenv("WANDB_API_KEY")  # For logging
os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # GPU selection

# Cache directories (mounted in Docker)
os.environ["HF_HOME"] = "/root/.cache/huggingface"
os.environ["TRITON_CACHE_DIR"] = "/root/.cache/triton"
```

## Common Patterns

### Multi-Turn Reasoning

```python
class MultiTurnAgent:
    """Agent for multi-turn penetration testing reasoning"""
    
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        self.conversation_history = []
    
    def generate_action(self, observation):
        """Generate next action based on current observation"""
        # Build prompt with conversation history
        prompt = self._build_prompt(observation)
        
        inputs = self.tokenizer(prompt, return_tensors="pt")
        outputs = self.model.generate(
            **inputs,
            max_new_tokens=512,
            temperature=0.7,
            do_sample=True,
            top_p=0.95
        )
        
        action = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        self.conversation_history.append({
            "observation": observation,
            "action": action
        })
        
        return action
    
    def _build_prompt(self, observation):
        """Build prompt with conversation history"""
        prompt = "You are a penetration testing expert. Analyze and exploit:\n\n"
        
        for turn in self.conversation_history[-3:]:  # Last 3 turns
            prompt += f"Observation: {turn['observation']}\n"
            prompt += f"Action: {turn['action']}\n\n"
        
        prompt += f"Current Observation: {observation}\n"
        prompt += "Next Action:"
        
        return prompt
```

### Evaluation Loop

```python
def evaluate_on_ctf_challenges(model, tokenizer, challenge_set):
    """Evaluate model on a set of CTF challenges"""
    agent = MultiTurnAgent(model, tokenizer)
    results = []
    
    for challenge in challenge_set:
        env = CTFEnvironment()
        env.start(challenge["id"])
        
        solved = False
        max_turns = 20
        
        for turn in range(max_turns):
            # Get current state
            observation = env.execute_command("cat /challenge/description.txt")
            
            # Generate action
            action = agent.generate_action(observation["stdout"])
            
            # Execute action
            result = env.execute_command(action)
            
            # Check for success
            if "CTF{" in result["stdout"]:
                solved = True
                break
        
        results.append({
            "challenge_id": challenge["id"],
            "solved": solved,
            "turns": turn + 1
        })
        
        env.cleanup()
    
    return results
```

## Troubleshooting

### CUDA Out of Memory

```python
# Use gradient checkpointing
model.gradient_checkpointing_enable()

# Reduce batch size
training_args.per_device_train_batch_size = 1
training_args.gradient_accumulation_steps = 8

# Use 8-bit or 4-bit quantization
load_in_4bit = True
```

### Docker Container Issues

```bash
# Check container logs
docker logs intercode-ctf

# Verify network connectivity
docker run --rm --net=host intercode-ctf ping -c 4 localhost

# Rebuild with no cache if issues persist
docker build --no-cache -t intercode-ctf train_ctf_env/
```

### Unsloth Version Compatibility

The research used `unsloth==2025.5.10`. If unavailable, the closest match is specified in `requirements.txt`:

```txt
unsloth_zoo==2025.5.11
unsloth @ git+https://github.com/unslothai/unsloth.git@45f26cda996ec0b9a2e28cb18a03251095aa29e8
```

### Model Loading Errors

```python
# If model fails to load, try without quantization first
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=torch.float16,
    load_in_4bit=False,
)

# Clear cache if needed
import shutil
shutil.rmtree(os.path.expanduser("~/.cache/huggingface"), ignore_errors=True)
```

## Performance Optimization

### Cache Mounting for Faster Training

Always mount HuggingFace, Triton, and PyTorch caches when using Docker:

```bash
-v ~/.cache/huggingface:/root/.cache/huggingface \
-v ~/.cache/triton:/root/.cache/triton \
-v ~/.cache/torch_extensions:/root/.cache/torch_extensions
```

### Distributed Training

```python
# For multi-GPU training
training_args.ddp_find_unused_parameters = False
training_args.local_rank = int(os.environ.get("LOCAL_RANK", -1))

# Launch with torchrun
# torchrun --nproc_per_node=4 grpo_stage1.py
```

## References

- Paper: [arXiv:2508.07382](https://arxiv.org/abs/2508.07382)
- Base model: Meta-Llama-3.1-8B-Instruct
- Framework: Unsloth for optimized fine-tuning
- Environment: InterCode-CTF for interactive training

Source

Creator's repository · aradotso/security-skills

View on GitHub

Security

Flagged — install with caution
2 of 3 checks raised a concern
  • Socket detected code alerts
  • Snyk found a high-severity vulnerability
Checked by 3 independent security firms
Does it try to trick the AI?NoMed risk · Gen Agent Trust Hub
Does it sneak in hidden code?Yes — see below1 alert: gptSecurity · Socket
Does it have known bugs?Yes — see belowCritical · Snyk