AI-powered penetration testing platform with multi-agent architecture, 52+ attack knowledge base, and automated vulnerability scanning
---
name: h-pentest-ai-platform
description: AI-powered penetration testing platform with multi-agent architecture, 52+ attack knowledge base, and automated vulnerability scanning
triggers:
- set up H-Pentest AI penetration testing platform
- create automated pentest task with H-Pentest
- configure H-Pentest multi-agent system
- run AI-driven vulnerability scan using H-Pentest
- query H-Pentest attack knowledge base
- monitor H-Pentest agent execution in real-time
- generate pentest report with H-Pentest
- integrate H-Pentest with custom LLM models
---
# H-Pentest AI Platform
> Skill by [ara.so](https://ara.so) — Security Skills collection.
H-Pentest is an AI-driven penetration testing platform using multi-agent architecture with LLM (Large Language Model) orchestration. It features a Meta Supervisor, Strategic Supervisor, Worker Agent, Payload Master, and Report Supervisor working together to perform automated security testing. The platform includes 52+ attack knowledge documents, Docker sandbox execution, and real-time monitoring.
## Architecture Overview
**Multi-Agent System:**
- **Meta Supervisor**: Generates insights every 3 rounds, decides when to stop testing, mode-aware (CTF vs RealWorld)
- **Strategic Supervisor**: Creates initial test plan, dynamically adjusts strategy each round
- **Worker Agent**: Executes tasks using ReAct framework (Reasoning → Action → Observation → Reflection)
- **Payload Master**: Provides testing guidance every 3 rounds, suggests payloads for identified vulnerabilities
- **Report Supervisor**: Analyzes conversation history, extracts vulnerabilities, generates attack paths
**Key Features:**
- 52+ attack knowledge base (IDOR, SQL Injection, XSS, File Upload, SSRF, XXE, etc.)
- Docker sandbox for safe Python code execution
- Integrated tools: Nuclei scanner, directory scanner, Kali Linux tools
- RAG (Retrieval-Augmented Generation) knowledge retrieval
- Real-time WebSocket monitoring
- CTF and RealWorld testing modes
## Installation
### Docker Compose (Recommended)
```bash
# Clone repository
git clone https://github.com/hexian2001/H-pentest.git
cd H-pentest
# Configure API keys in config.json
# Required: openai.api_key, dashscope.api_key (for embeddings)
# Start services
docker-compose up -d
# Access points:
# Frontend: http://localhost:5173
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docs
```
### Local Development
**Backend:**
```bash
cd backend
pip install -r requirements.txt
# Initialize database
python init_db.py
# Start backend
python -m app.main
```
**Frontend:**
```bash
cd frontend
npm install
npm run dev
```
## Configuration
Edit `config.json` in the root directory:
```json
{
"openai": {
"api_key": "OPENAI_API_KEY_ENV_VAR",
"base_url": "https://open.bigmodel.cn/api/paas/v4",
"model": "glm-4-flash"
},
"supervisor_model": {
"api_key": "SUPERVISOR_API_KEY_ENV_VAR",
"base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"model": "qwen-max"
},
"dashscope": {
"api_key": "DASHSCOPE_API_KEY_ENV_VAR"
},
"agent": {
"max_rounds": 30,
"temperature": 0.7
},
"tools": {
"execute_python": {
"enabled": true,
"timeout": 30,
"memory_limit": "512m"
},
"nuclei_scan": {
"enabled": true,
"severity": ["critical", "high", "medium"]
}
}
}
```
**Environment Variables:**
```bash
export OPENAI_API_KEY="your-llm-api-key"
export DASHSCOPE_API_KEY="your-embedding-api-key"
```
## API Usage
### Create Penetration Test Task
```python
import requests
API_BASE = "http://localhost:8000/api/v1"
# Create CTF-mode task
task_data = {
"target_url": "http://target-ctf.example.com",
"mode": "ctf", # or "realworld"
"description": "Test web application for vulnerabilities",
"max_rounds": 25
}
response = requests.post(f"{API_BASE}/tasks/", json=task_data)
task = response.json()
task_id = task["id"]
print(f"Task created: {task_id}")
```
### Monitor Task Progress (WebSocket)
```python
import asyncio
import websockets
import json
async def monitor_task(task_id):
uri = f"ws://localhost:8000/api/v1/ws/{task_id}"
async with websockets.connect(uri) as websocket:
async for message in websocket:
data = json.loads(message)
if data["type"] == "agent_message":
print(f"[{data['agent']}] {data['content']}")
elif data["type"] == "tool_call":
print(f"Tool: {data['tool_name']}")
print(f"Result: {data['result']}")
elif data["type"] == "task_complete":
print("Task completed!")
break
asyncio.run(monitor_task("task-uuid-here"))
```
### Query Attack Knowledge Base
```python
# Using the query_knowledge tool through the API
knowledge_query = {
"task_id": task_id,
"tool_name": "query_knowledge",
"parameters": {
"query": "SQL injection bypass techniques",
"top_k": 5
}
}
response = requests.post(f"{API_BASE}/tasks/{task_id}/tool", json=knowledge_query)
results = response.json()
for doc in results["documents"]:
print(f"Relevance: {doc['score']}")
print(f"Content: {doc['content'][:200]}...")
```
### Get Task Status and Results
```python
# Get task details
response = requests.get(f"{API_BASE}/tasks/{task_id}")
task = response.json()
print(f"Status: {task['status']}")
print(f"Rounds: {task['rounds_completed']}/{task['max_rounds']}")
# Get conversation history
response = requests.get(f"{API_BASE}/tasks/{task_id}/conversations")
conversations = response.json()
for conv in conversations:
print(f"{conv['agent']}: {conv['content'][:100]}...")
```
### Human Intervention
```python
# Provide guidance to agents mid-execution
intervention = {
"message": "Focus on testing the /api/admin endpoint for IDOR vulnerabilities",
"agent": "worker" # or "strategic_supervisor"
}
response = requests.post(
f"{API_BASE}/tasks/{task_id}/intervention",
json=intervention
)
```
### Stop Task
```python
response = requests.post(f"{API_BASE}/tasks/{task_id}/stop")
print(response.json())
```
### Generate Report
```python
# Report is automatically generated by Report Supervisor
response = requests.get(f"{API_BASE}/tasks/{task_id}/report")
report = response.json()
print(f"Vulnerabilities found: {len(report['vulnerabilities'])}")
for vuln in report['vulnerabilities']:
print(f"- {vuln['type']}: {vuln['severity']}")
print(f" Location: {vuln['location']}")
print(f" Description: {vuln['description']}")
```
## Tool Integration Examples
### Execute Python Code in Sandbox
The Worker Agent can execute Python code through the `execute_python` tool:
```python
# Example tool call structure (used internally by agents)
tool_call = {
"name": "execute_python",
"arguments": {
"code": """
import requests
response = requests.get('http://target.com/api/users?id=1')
print(response.status_code)
print(response.text[:500])
"""
}
}
# Executes in Docker container with:
# - 512MB memory limit
# - 30s timeout
# - Pre-installed: requests, beautifulsoup4, pwntools, etc.
```
### Run Nuclei Vulnerability Scan
```python
# Nuclei scan tool call
tool_call = {
"name": "nuclei_scan",
"arguments": {
"target": "http://target.com",
"severity": ["critical", "high"],
"templates": ["cves", "vulnerabilities"]
}
}
# Returns structured vulnerability data
# Uses 11,000+ CVE templates
```
### Directory Scanning
```python
# Directory scan tool call
tool_call = {
"name": "dirscan",
"arguments": {
"target": "http://target.com",
"wordlist": "common", # or "medium", "large"
"extensions": [".php", ".asp", ".jsp"],
"threads": 10
}
}
```
### Kali Linux Tools Execution
```python
# Execute Kali tools
tool_call = {
"name": "kali_execute",
"arguments": {
"command": "nmap -sV -p 80,443,8080 target.com",
"timeout": 60
}
}
```
## Agent Prompting Patterns
### Worker Agent ReAct Loop
The Worker Agent follows this pattern:
```
Thought: I need to check if the login endpoint is vulnerable to SQL injection
Action: execute_python
Action Input: {"code": "import requests\nresponload = requests.post('http://target/login', data={'user': \"admin' OR '1'='1\", 'pass': 'x'})"}
Observation: Status 200, response contains "Welcome admin"
Thought: SQL injection confirmed, the application doesn't sanitize input
Action: query_knowledge
Action Input: {"query": "SQL injection authentication bypass payloads"}
Observation: Found 3 relevant documents with advanced SQLi techniques
... (continues)
```
### Strategic Supervisor Planning
```python
# The Strategic Supervisor generates plans like:
{
"phase": "reconnaissance",
"steps": [
"Perform directory enumeration to find hidden endpoints",
"Scan for known CVEs using Nuclei",
"Test authentication mechanisms for common weaknesses"
],
"priority": "high",
"next_adjustment": "round_5"
}
```
### Meta Supervisor Insights
Generated every 3 rounds:
```python
{
"round": 6,
"insight": "Worker is stuck in repetitive SQLi attempts without progress",
"recommendation": "Switch to testing file upload functionality",
"should_continue": True,
"mode_adjustment": "more_focused"
}
```
## Configuration Management via API
### Update Agent Configuration
```python
agent_config = {
"max_rounds": 40,
"temperature": 0.8,
"meta_supervisor_interval": 3,
"payload_master_interval": 3
}
response = requests.post(f"{API_BASE}/config/agent", json=agent_config)
```
### Update Tool Configuration
```python
# Disable Nuclei, adjust Python sandbox limits
tool_config = {
"nuclei_scan": {
"enabled": False
},
"execute_python": {
"timeout": 60,
"memory_limit": "1g"
}
}
response = requests.post(
f"{API_BASE}/config/tools/execute_python",
json=tool_config["execute_python"]
)
```
## Testing Modes
### CTF Mode
Optimized for Capture The Flag competitions:
```python
task = {
"target_url": "http://ctf.example.com",
"mode": "ctf",
"max_rounds": 20,
"aggressive": True, # More aggressive testing
"auto_submit_flag": True, # Auto-submit found flags
"flag_pattern": r"flag\{[a-zA-Z0-9_]+\}"
}
response = requests.post(f"{API_BASE}/tasks/", json=task)
```
### RealWorld Mode
Conservative, thorough security assessment:
```python
task = {
"target_url": "http://production.example.com",
"mode": "realworld",
"max_rounds": 50,
"aggressive": False,
"comprehensive_report": True,
"respect_robots_txt": True
}
response = requests.post(f"{API_BASE}/tasks/", json=task)
```
## Knowledge Base RAG Usage
The platform uses RAG to retrieve relevant attack techniques:
```python
# Query is automatically embedded and searched against 52+ docs
# Covering: IDOR, SQL Injection, XSS, File Upload, SSRF, XXE,
# Deserialization, JWT attacks, OAuth bypass, etc.
# Example: The agent queries knowledge when stuck
query_result = {
"query": "bypass file upload restrictions",
"results": [
{
"document": "file_upload_bypass.md",
"score": 0.92,
"content": "To bypass extension filters: 1) Use double extensions (.php.jpg)...",
"metadata": {
"attack_type": "file_upload",
"difficulty": "medium"
}
}
]
}
```
## Troubleshooting
### Task Stuck in Loop
```python
# Check Meta Supervisor insights
response = requests.get(f"{API_BASE}/tasks/{task_id}/conversations")
messages = response.json()
# Look for meta_supervisor messages
meta_insights = [m for m in messages if m["agent"] == "meta_supervisor"]
# Manual intervention if needed
intervention = {
"message": "Stop SQL injection attempts, move to testing file uploads",
"agent": "strategic_supervisor"
}
requests.post(f"{API_BASE}/tasks/{task_id}/intervention", json=intervention)
```
### Docker Sandbox Issues
```bash
# Check Docker container status
docker ps | grep h-pentest
# View sandbox logs
docker logs h-pentest-sandbox
# Restart sandbox
docker-compose restart sandbox
```
### API Connection Errors
```python
# Verify backend is running
try:
response = requests.get(f"{API_BASE}/health", timeout=5)
print(f"Backend status: {response.json()}")
except requests.exceptions.RequestException as e:
print(f"Backend unreachable: {e}")
# Check WebSocket connection
import asyncio
import websockets
async def check_ws():
try:
async with websockets.connect("ws://localhost:8000/api/v1/ws/test") as ws:
print("WebSocket OK")
except Exception as e:
print(f"WebSocket error: {e}")
asyncio.run(check_ws())
```
### LLM API Rate Limits
```python
# Monitor token usage via context info
response = requests.get(f"{API_BASE}/context/info")
context = response.json()
print(f"Total tokens used: {context['total_tokens']}")
print(f"Compressed messages: {context['compressed_count']}")
# Adjust temperature to reduce output length
config = {"temperature": 0.3} # Lower = more concise
requests.post(f"{API_BASE}/config/agent", json=config)
```
### Knowledge Base Not Returning Results
```bash
# Verify embeddings are initialized
# Check backend logs for Dashscope API errors
# Rebuild knowledge base
cd backend
python -c "from app.services.knowledge_base import rebuild_index; rebuild_index()"
```
## Common Workflows
### Full Automated Pentest
```python
import requests
import time
API_BASE = "http://localhost:8000/api/v1"
# 1. Create task
task = requests.post(f"{API_BASE}/tasks/", json={
"target_url": "http://target.com",
"mode": "realworld",
"max_rounds": 30
}).json()
task_id = task["id"]
# 2. Monitor until complete
while True:
status = requests.get(f"{API_BASE}/tasks/{task_id}").json()
if status["status"] == "completed":
break
elif status["status"] == "failed":
print(f"Task failed: {status.get('error')}")
break
print(f"Round {status['rounds_completed']}/{status['max_rounds']}")
time.sleep(10)
# 3. Get report
report = requests.get(f"{API_BASE}/tasks/{task_id}/report").json()
# 4. Export results
with open(f"report_{task_id}.json", "w") as f:
json.dump(report, f, indent=2)
print(f"Found {len(report['vulnerabilities'])} vulnerabilities")
```
### Interactive Testing with Guidance
```python
import asyncio
import websockets
import requests
async def guided_test(target_url):
# Create task
task = requests.post(f"{API_BASE}/tasks/", json={
"target_url": target_url,
"mode": "ctf",
"max_rounds": 25
}).json()
task_id = task["id"]
# Connect to WebSocket
uri = f"ws://localhost:8000/api/v1/ws/{task_id}"
async with websockets.connect(uri) as ws:
round_count = 0
async for message in ws:
data = json.loads(message)
if data["type"] == "round_complete":
round_count += 1
# Provide guidance every 5 rounds
if round_count % 5 == 0:
guidance = input("Provide guidance (or press Enter): ")
if guidance:
requests.post(f"{API_BASE}/tasks/{task_id}/intervention", json={
"message": guidance,
"agent": "worker"
})
elif data["type"] == "task_complete":
print("Testing complete!")
break
asyncio.run(guided_test("http://target.com"))
```
Creator's repository · aradotso/security-skills