h-pentest-ai-platform

AI-powered penetration testing platform with multi-agent architecture, 52+ attack knowledge base, and automated vulnerability scanning

Skill file

Preview skill file↓↑

---
name: h-pentest-ai-platform
description: AI-powered penetration testing platform with multi-agent architecture, 52+ attack knowledge base, and automated vulnerability scanning
triggers:
  - set up H-Pentest AI penetration testing platform
  - create automated pentest task with H-Pentest
  - configure H-Pentest multi-agent system
  - run AI-driven vulnerability scan using H-Pentest
  - query H-Pentest attack knowledge base
  - monitor H-Pentest agent execution in real-time
  - generate pentest report with H-Pentest
  - integrate H-Pentest with custom LLM models
---

# H-Pentest AI Platform

> Skill by [ara.so](https://ara.so) — Security Skills collection.

H-Pentest is an AI-driven penetration testing platform using multi-agent architecture with LLM (Large Language Model) orchestration. It features a Meta Supervisor, Strategic Supervisor, Worker Agent, Payload Master, and Report Supervisor working together to perform automated security testing. The platform includes 52+ attack knowledge documents, Docker sandbox execution, and real-time monitoring.

## Architecture Overview

**Multi-Agent System:**
- **Meta Supervisor**: Generates insights every 3 rounds, decides when to stop testing, mode-aware (CTF vs RealWorld)
- **Strategic Supervisor**: Creates initial test plan, dynamically adjusts strategy each round
- **Worker Agent**: Executes tasks using ReAct framework (Reasoning → Action → Observation → Reflection)
- **Payload Master**: Provides testing guidance every 3 rounds, suggests payloads for identified vulnerabilities
- **Report Supervisor**: Analyzes conversation history, extracts vulnerabilities, generates attack paths

**Key Features:**
- 52+ attack knowledge base (IDOR, SQL Injection, XSS, File Upload, SSRF, XXE, etc.)
- Docker sandbox for safe Python code execution
- Integrated tools: Nuclei scanner, directory scanner, Kali Linux tools
- RAG (Retrieval-Augmented Generation) knowledge retrieval
- Real-time WebSocket monitoring
- CTF and RealWorld testing modes

## Installation

### Docker Compose (Recommended)

```bash
# Clone repository
git clone https://github.com/hexian2001/H-pentest.git
cd H-pentest

# Configure API keys in config.json
# Required: openai.api_key, dashscope.api_key (for embeddings)

# Start services
docker-compose up -d

# Access points:
# Frontend: http://localhost:5173
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docs
```

### Local Development

**Backend:**
```bash
cd backend
pip install -r requirements.txt

# Initialize database
python init_db.py

# Start backend
python -m app.main
```

**Frontend:**
```bash
cd frontend
npm install
npm run dev
```

## Configuration

Edit `config.json` in the root directory:

```json
{
  "openai": {
    "api_key": "OPENAI_API_KEY_ENV_VAR",
    "base_url": "https://open.bigmodel.cn/api/paas/v4",
    "model": "glm-4-flash"
  },
  "supervisor_model": {
    "api_key": "SUPERVISOR_API_KEY_ENV_VAR",
    "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
    "model": "qwen-max"
  },
  "dashscope": {
    "api_key": "DASHSCOPE_API_KEY_ENV_VAR"
  },
  "agent": {
    "max_rounds": 30,
    "temperature": 0.7
  },
  "tools": {
    "execute_python": {
      "enabled": true,
      "timeout": 30,
      "memory_limit": "512m"
    },
    "nuclei_scan": {
      "enabled": true,
      "severity": ["critical", "high", "medium"]
    }
  }
}
```

**Environment Variables:**
```bash
export OPENAI_API_KEY="your-llm-api-key"
export DASHSCOPE_API_KEY="your-embedding-api-key"
```

## API Usage

### Create Penetration Test Task

```python
import requests

API_BASE = "http://localhost:8000/api/v1"

# Create CTF-mode task
task_data = {
    "target_url": "http://target-ctf.example.com",
    "mode": "ctf",  # or "realworld"
    "description": "Test web application for vulnerabilities",
    "max_rounds": 25
}

response = requests.post(f"{API_BASE}/tasks/", json=task_data)
task = response.json()
task_id = task["id"]

print(f"Task created: {task_id}")
```

### Monitor Task Progress (WebSocket)

```python
import asyncio
import websockets
import json

async def monitor_task(task_id):
    uri = f"ws://localhost:8000/api/v1/ws/{task_id}"
    
    async with websockets.connect(uri) as websocket:
        async for message in websocket:
            data = json.loads(message)
            
            if data["type"] == "agent_message":
                print(f"[{data['agent']}] {data['content']}")
            elif data["type"] == "tool_call":
                print(f"Tool: {data['tool_name']}")
                print(f"Result: {data['result']}")
            elif data["type"] == "task_complete":
                print("Task completed!")
                break

asyncio.run(monitor_task("task-uuid-here"))
```

### Query Attack Knowledge Base

```python
# Using the query_knowledge tool through the API
knowledge_query = {
    "task_id": task_id,
    "tool_name": "query_knowledge",
    "parameters": {
        "query": "SQL injection bypass techniques",
        "top_k": 5
    }
}

response = requests.post(f"{API_BASE}/tasks/{task_id}/tool", json=knowledge_query)
results = response.json()

for doc in results["documents"]:
    print(f"Relevance: {doc['score']}")
    print(f"Content: {doc['content'][:200]}...")
```

### Get Task Status and Results

```python
# Get task details
response = requests.get(f"{API_BASE}/tasks/{task_id}")
task = response.json()

print(f"Status: {task['status']}")
print(f"Rounds: {task['rounds_completed']}/{task['max_rounds']}")

# Get conversation history
response = requests.get(f"{API_BASE}/tasks/{task_id}/conversations")
conversations = response.json()

for conv in conversations:
    print(f"{conv['agent']}: {conv['content'][:100]}...")
```

### Human Intervention

```python
# Provide guidance to agents mid-execution
intervention = {
    "message": "Focus on testing the /api/admin endpoint for IDOR vulnerabilities",
    "agent": "worker"  # or "strategic_supervisor"
}

response = requests.post(
    f"{API_BASE}/tasks/{task_id}/intervention",
    json=intervention
)
```

### Stop Task

```python
response = requests.post(f"{API_BASE}/tasks/{task_id}/stop")
print(response.json())
```

### Generate Report

```python
# Report is automatically generated by Report Supervisor
response = requests.get(f"{API_BASE}/tasks/{task_id}/report")
report = response.json()

print(f"Vulnerabilities found: {len(report['vulnerabilities'])}")
for vuln in report['vulnerabilities']:
    print(f"- {vuln['type']}: {vuln['severity']}")
    print(f"  Location: {vuln['location']}")
    print(f"  Description: {vuln['description']}")
```

## Tool Integration Examples

### Execute Python Code in Sandbox

The Worker Agent can execute Python code through the `execute_python` tool:

```python
# Example tool call structure (used internally by agents)
tool_call = {
    "name": "execute_python",
    "arguments": {
        "code": """
import requests

response = requests.get('http://target.com/api/users?id=1')
print(response.status_code)
print(response.text[:500])
"""
    }
}

# Executes in Docker container with:
# - 512MB memory limit
# - 30s timeout
# - Pre-installed: requests, beautifulsoup4, pwntools, etc.
```

### Run Nuclei Vulnerability Scan

```python
# Nuclei scan tool call
tool_call = {
    "name": "nuclei_scan",
    "arguments": {
        "target": "http://target.com",
        "severity": ["critical", "high"],
        "templates": ["cves", "vulnerabilities"]
    }
}

# Returns structured vulnerability data
# Uses 11,000+ CVE templates
```

### Directory Scanning

```python
# Directory scan tool call
tool_call = {
    "name": "dirscan",
    "arguments": {
        "target": "http://target.com",
        "wordlist": "common",  # or "medium", "large"
        "extensions": [".php", ".asp", ".jsp"],
        "threads": 10
    }
}
```

### Kali Linux Tools Execution

```python
# Execute Kali tools
tool_call = {
    "name": "kali_execute",
    "arguments": {
        "command": "nmap -sV -p 80,443,8080 target.com",
        "timeout": 60
    }
}
```

## Agent Prompting Patterns

### Worker Agent ReAct Loop

The Worker Agent follows this pattern:

```
Thought: I need to check if the login endpoint is vulnerable to SQL injection
Action: execute_python
Action Input: {"code": "import requests\nresponload = requests.post('http://target/login', data={'user': \"admin' OR '1'='1\", 'pass': 'x'})"}
Observation: Status 200, response contains "Welcome admin"
Thought: SQL injection confirmed, the application doesn't sanitize input
Action: query_knowledge
Action Input: {"query": "SQL injection authentication bypass payloads"}
Observation: Found 3 relevant documents with advanced SQLi techniques
... (continues)
```

### Strategic Supervisor Planning

```python
# The Strategic Supervisor generates plans like:
{
  "phase": "reconnaissance",
  "steps": [
    "Perform directory enumeration to find hidden endpoints",
    "Scan for known CVEs using Nuclei",
    "Test authentication mechanisms for common weaknesses"
  ],
  "priority": "high",
  "next_adjustment": "round_5"
}
```

### Meta Supervisor Insights

Generated every 3 rounds:

```python
{
  "round": 6,
  "insight": "Worker is stuck in repetitive SQLi attempts without progress",
  "recommendation": "Switch to testing file upload functionality",
  "should_continue": True,
  "mode_adjustment": "more_focused"
}
```

## Configuration Management via API

### Update Agent Configuration

```python
agent_config = {
    "max_rounds": 40,
    "temperature": 0.8,
    "meta_supervisor_interval": 3,
    "payload_master_interval": 3
}

response = requests.post(f"{API_BASE}/config/agent", json=agent_config)
```

### Update Tool Configuration

```python
# Disable Nuclei, adjust Python sandbox limits
tool_config = {
    "nuclei_scan": {
        "enabled": False
    },
    "execute_python": {
        "timeout": 60,
        "memory_limit": "1g"
    }
}

response = requests.post(
    f"{API_BASE}/config/tools/execute_python",
    json=tool_config["execute_python"]
)
```

## Testing Modes

### CTF Mode

Optimized for Capture The Flag competitions:

```python
task = {
    "target_url": "http://ctf.example.com",
    "mode": "ctf",
    "max_rounds": 20,
    "aggressive": True,  # More aggressive testing
    "auto_submit_flag": True,  # Auto-submit found flags
    "flag_pattern": r"flag\{[a-zA-Z0-9_]+\}"
}

response = requests.post(f"{API_BASE}/tasks/", json=task)
```

### RealWorld Mode

Conservative, thorough security assessment:

```python
task = {
    "target_url": "http://production.example.com",
    "mode": "realworld",
    "max_rounds": 50,
    "aggressive": False,
    "comprehensive_report": True,
    "respect_robots_txt": True
}

response = requests.post(f"{API_BASE}/tasks/", json=task)
```

## Knowledge Base RAG Usage

The platform uses RAG to retrieve relevant attack techniques:

```python
# Query is automatically embedded and searched against 52+ docs
# Covering: IDOR, SQL Injection, XSS, File Upload, SSRF, XXE,
# Deserialization, JWT attacks, OAuth bypass, etc.

# Example: The agent queries knowledge when stuck
query_result = {
    "query": "bypass file upload restrictions",
    "results": [
        {
            "document": "file_upload_bypass.md",
            "score": 0.92,
            "content": "To bypass extension filters: 1) Use double extensions (.php.jpg)...",
            "metadata": {
                "attack_type": "file_upload",
                "difficulty": "medium"
            }
        }
    ]
}
```

## Troubleshooting

### Task Stuck in Loop

```python
# Check Meta Supervisor insights
response = requests.get(f"{API_BASE}/tasks/{task_id}/conversations")
messages = response.json()

# Look for meta_supervisor messages
meta_insights = [m for m in messages if m["agent"] == "meta_supervisor"]

# Manual intervention if needed
intervention = {
    "message": "Stop SQL injection attempts, move to testing file uploads",
    "agent": "strategic_supervisor"
}
requests.post(f"{API_BASE}/tasks/{task_id}/intervention", json=intervention)
```

### Docker Sandbox Issues

```bash
# Check Docker container status
docker ps | grep h-pentest

# View sandbox logs
docker logs h-pentest-sandbox

# Restart sandbox
docker-compose restart sandbox
```

### API Connection Errors

```python
# Verify backend is running
try:
    response = requests.get(f"{API_BASE}/health", timeout=5)
    print(f"Backend status: {response.json()}")
except requests.exceptions.RequestException as e:
    print(f"Backend unreachable: {e}")
    
# Check WebSocket connection
import asyncio
import websockets

async def check_ws():
    try:
        async with websockets.connect("ws://localhost:8000/api/v1/ws/test") as ws:
            print("WebSocket OK")
    except Exception as e:
        print(f"WebSocket error: {e}")

asyncio.run(check_ws())
```

### LLM API Rate Limits

```python
# Monitor token usage via context info
response = requests.get(f"{API_BASE}/context/info")
context = response.json()

print(f"Total tokens used: {context['total_tokens']}")
print(f"Compressed messages: {context['compressed_count']}")

# Adjust temperature to reduce output length
config = {"temperature": 0.3}  # Lower = more concise
requests.post(f"{API_BASE}/config/agent", json=config)
```

### Knowledge Base Not Returning Results

```bash
# Verify embeddings are initialized
# Check backend logs for Dashscope API errors

# Rebuild knowledge base
cd backend
python -c "from app.services.knowledge_base import rebuild_index; rebuild_index()"
```

## Common Workflows

### Full Automated Pentest

```python
import requests
import time

API_BASE = "http://localhost:8000/api/v1"

# 1. Create task
task = requests.post(f"{API_BASE}/tasks/", json={
    "target_url": "http://target.com",
    "mode": "realworld",
    "max_rounds": 30
}).json()

task_id = task["id"]

# 2. Monitor until complete
while True:
    status = requests.get(f"{API_BASE}/tasks/{task_id}").json()
    
    if status["status"] == "completed":
        break
    elif status["status"] == "failed":
        print(f"Task failed: {status.get('error')}")
        break
    
    print(f"Round {status['rounds_completed']}/{status['max_rounds']}")
    time.sleep(10)

# 3. Get report
report = requests.get(f"{API_BASE}/tasks/{task_id}/report").json()

# 4. Export results
with open(f"report_{task_id}.json", "w") as f:
    json.dump(report, f, indent=2)

print(f"Found {len(report['vulnerabilities'])} vulnerabilities")
```

### Interactive Testing with Guidance

```python
import asyncio
import websockets
import requests

async def guided_test(target_url):
    # Create task
    task = requests.post(f"{API_BASE}/tasks/", json={
        "target_url": target_url,
        "mode": "ctf",
        "max_rounds": 25
    }).json()
    
    task_id = task["id"]
    
    # Connect to WebSocket
    uri = f"ws://localhost:8000/api/v1/ws/{task_id}"
    async with websockets.connect(uri) as ws:
        round_count = 0
        
        async for message in ws:
            data = json.loads(message)
            
            if data["type"] == "round_complete":
                round_count += 1
                
                # Provide guidance every 5 rounds
                if round_count % 5 == 0:
                    guidance = input("Provide guidance (or press Enter): ")
                    if guidance:
                        requests.post(f"{API_BASE}/tasks/{task_id}/intervention", json={
                            "message": guidance,
                            "agent": "worker"
                        })
            
            elif data["type"] == "task_complete":
                print("Testing complete!")
                break

asyncio.run(guided_test("http://target.com"))
```

Source

Creator's repository · aradotso/security-skills

View on GitHub ↗

Security

Security checks in progress

Results will appear here once audits complete

Checked by 3 independent security firms

Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub

Does it sneak in hidden code?Not yet checkedPending · Socket

Does it have known bugs?Not yet checkedPending · Snyk