defuddle

Extract clean article content from web pages or local HTML files. Removes clutter (ads, sidebars, nav) and returns readable content with metadata.

Skill file

Preview skill file↓↑

---
name: defuddle
description: Extract clean article content from web pages or local HTML files. Removes clutter (ads, sidebars, nav) and returns readable content with metadata.
trigger: Use when user wants to extract/clean web page content, strip clutter from HTML, get article text from a URL, or convert web pages to clean markdown. Triggers include "defuddle", "extract article", "clean this page", "get content from URL", "strip clutter", "web extract".
---

# Defuddle - Web Content Extraction

Extract main article content from web pages, removing ads, sidebars, navigation, and other clutter. Output clean Markdown with metadata.

## Prerequisites

Before first use, check if `defuddle` is installed:

```bash
command -v defuddle >/dev/null 2>&1 || npm install -g defuddle jsdom
```

## Default Workflow

When user provides a URL, follow this workflow:

### Step 1: Extract content as Markdown + JSON metadata

Always use both `-m` and `-j` flags to get markdown content with full metadata:

```bash
defuddle parse "<url>" -m -j
```

### Step 2: Present a summary to the user

Show the user:
- **Title**: from JSON `title` field
- **Author**: from JSON `author` field
- **Source**: domain
- **Word count**: from JSON `wordCount` field
- A brief preview (first 2-3 sentences)

### Step 3: Ask where to save

If this is the **first time** using defuddle in this conversation, ask the user:
> "Save to which directory? (e.g. `~/Documents`, `~/Desktop`, or a custom path)"

Remember the user's chosen directory for subsequent uses in the same conversation.

### Step 4: Save as Markdown file

Write the file with frontmatter + full content:

```markdown
---
title: {title}
author: {author}
source: {url}
date: {published or "Unknown"}
clipped: {today's date YYYY-MM-DD}
wordCount: {wordCount}
---

# {title}

{markdown content}
```

**File naming**: Use the article title as filename, sanitized for filesystem:
- Replace special characters with spaces
- Trim whitespace
- Example: `The Shape of the Essay Field.md`

### Step 5: Confirm to user

Tell the user the file path where it was saved.

## CLI Reference

```bash
defuddle parse <source> [options]
```

**Arguments:**
- `<source>` — URL (`https://...`) or local HTML file path

**Options:**
| Flag | Description |
|------|-------------|
| `-m, --markdown` | Convert content to Markdown |
| `-j, --json` | Output as JSON with full metadata |
| `-o, --output <file>` | Write to file instead of stdout |
| `-p, --property <name>` | Extract single property (title, description, domain, author, published, wordCount, content) |
| `--debug` | Verbose logging |

## JSON Response Fields

When using `-j`, the response includes:
- `title` — Article title
- `author` — Author name
- `published` — Publication date
- `description` — Meta description
- `content` — Extracted Markdown (when `-m` used)
- `domain` — Source domain
- `favicon` — Favicon URL
- `image` — Featured image URL
- `site` — Site name
- `wordCount` — Word count
- `parseTime` — Processing time in ms

## Notes
- Requires Node.js and npm
- `jsdom` is required as a peer dependency
- Works best with article-style pages (blogs, news, documentation)
- Not designed for SPAs or JavaScript-heavy pages (e.g. WeChat articles need browser rendering)

Source

Creator's repository · joeseesun/defuddle-skill

View on GitHub ↗

Security

Security checks in progress

Results will appear here once audits complete

What this skill can do

Reads your filesConnects to the internetRuns code on your machine

Checked by 3 independent security firms

Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub

Does it sneak in hidden code?Not yet checkedPending · Socket

Does it have known bugs?Not yet checkedPending · Snyk