duckdb-docs

>

Skill file

Preview skill file
---
name: duckdb-docs
description: >
  Search DuckDB and DuckLake documentation and blog posts. Returns relevant
  doc chunks for a question or keyword using full-text search against a
  locally cached index.
argument-hint: <question or keyword>
allowed-tools: Bash
---

You are helping the user find relevant DuckDB or DuckLake documentation.

Query: `$@`

Follow these steps in order.

## Step 1 — Check DuckDB is installed

```bash
command -v duckdb
```

If not found, delegate to `/duckdb-skills:install-duckdb` and then continue.

## Step 2 — Ensure required extensions are installed

```bash
duckdb :memory: -c "INSTALL httpfs; INSTALL fts;"
```

If this fails, report the error and stop.

## Step 3 — Choose the data source and extract search terms

The query is: `$@`

### Data source selection

There are two search indexes available:

| Index | Remote URL | Local cache filename | Versions | Use when |
|-------|-----------|---------------------|----------|----------|
| **DuckDB docs + blog** | `https://duckdb.org/data/docs-search.duckdb` | `duckdb-docs.duckdb` | `lts`, `current`, `blog` | Default — any DuckDB question |
| **DuckLake docs** | `https://ducklake.select/data/docs-search.duckdb` | `ducklake-docs.duckdb` | `stable`, `preview` | Query mentions DuckLake, catalogs, or DuckLake-specific features |

Both indexes share the same schema:

| Column | Type | Description |
|--------|------|-------------|
| `chunk_id` | `VARCHAR` (PK) | e.g. `stable/sql/functions/numeric#absx` |
| `page_title` | `VARCHAR` | Page title from front matter |
| `section` | `VARCHAR` | Section heading (null for page intros) |
| `breadcrumb` | `VARCHAR` | e.g. `SQL > Functions > Numeric` |
| `url` | `VARCHAR` | URL path with anchor |
| `version` | `VARCHAR` | See table above |
| `text` | `TEXT` | Full markdown of the chunk |

By default, search **DuckDB docs** and filter to `version = 'lts'`. Use different versions when:

- The user explicitly asks about `current`/nightly features → `version = 'current'`
- The user asks about a blog post or wants background/motivation → `version = 'blog'`
- The user asks about DuckLake → search the DuckLake index with `version = 'stable'`
- When unsure, omit the version filter to search across all versions.

### Search terms

If the input is a **natural language question** (e.g. "how do I find the most frequent value"), extract the key technical terms (nouns, function names, SQL keywords) to form a compact BM25 query string. Drop stop words like "how", "do", "I", "the".

If the input is already a **function name or technical term** (e.g. `arg_max`, `GROUP BY ALL`), use it as-is.

Use the extracted terms as `SEARCH_QUERY` in the next step.

## Step 4 — Ensure local cache is fresh

The cache lives at `$HOME/.duckdb/docs/CACHE_FILENAME` (where `CACHE_FILENAME` is `duckdb-docs.duckdb` or `ducklake-docs.duckdb` per Step 3).

First, ensure the directory exists:

```bash
mkdir -p "$HOME/.duckdb/docs"
```

Then check whether the cache file exists and is fresh (≤2 days old):

```bash
CACHE_FILE="$HOME/.duckdb/docs/CACHE_FILENAME"
if [ -f "$CACHE_FILE" ]; then
    MTIME=$(stat -f %m "$CACHE_FILE" 2>/dev/null || stat -c %Y "$CACHE_FILE")
    CACHE_AGE_DAYS=$(( ( $(date +%s) - MTIME ) / 86400 ))
else
    CACHE_AGE_DAYS=999
fi
echo "Cache age: $CACHE_AGE_DAYS days"
```

**If `CACHE_AGE_DAYS` ≤ 2** → skip to Step 5.

**Otherwise** (stale or missing) → fetch the index:

```bash
duckdb -c "
LOAD httpfs;
LOAD fts;
ATTACH 'REMOTE_URL' AS remote (READ_ONLY);
ATTACH '$HOME/.duckdb/docs/CACHE_FILENAME.tmp' AS tmp;
COPY FROM DATABASE remote TO tmp;
" && mv "$HOME/.duckdb/docs/CACHE_FILENAME.tmp" "$HOME/.duckdb/docs/CACHE_FILENAME"
```

Replace `REMOTE_URL` and `CACHE_FILENAME` per Step 3. If the fetch fails (network error), report the error and stop.

## Step 5 — Search the docs

```bash
duckdb "$HOME/.duckdb/docs/CACHE_FILENAME" -readonly -json -c "
LOAD fts;
SELECT
    chunk_id, page_title, section, breadcrumb, url, version, text,
    fts_main_docs_chunks.match_bm25(chunk_id, 'SEARCH_QUERY') AS score
FROM docs_chunks
WHERE score IS NOT NULL
  AND version = 'VERSION'
ORDER BY score DESC
LIMIT 8;
"
```

Replace `CACHE_FILENAME`, `SEARCH_QUERY`, and `VERSION` per Step 3. Remove the `AND version = 'VERSION'` line if searching across all versions.

If the user's question could benefit from both DuckDB docs and blog results, run two queries (one with `version = 'stable'`, one with `version = 'blog'`) or omit the version filter entirely.

## Step 6 — Handle errors

- **Extension not installed** (`httpfs` or `fts` not found): run `duckdb :memory: -c "INSTALL httpfs; INSTALL fts;"` and retry.
- **ATTACH fails / network unreachable**: inform the user that the docs index is unavailable and suggest checking their internet connection. The DuckDB index is hosted at `https://duckdb.org/data/docs-search.duckdb` and the DuckLake index at `https://ducklake.select/data/docs-search.duckdb`.
- **No results** (all scores NULL or empty result set): try broadening the query — drop the least specific term, or try a single-word version of the query — then retry Step 5. If still no results, tell the user no matching documentation was found and suggest visiting https://duckdb.org/docs or https://ducklake.select/docs directly.

## Step 7 — Present results

For each result chunk returned (ordered by score descending), format as:

```
### {section} — {page_title}
{url}

{text}

---
```

After presenting all chunks, synthesize a concise answer to the user's original question (`$@`) based on the retrieved documentation. If the chunks directly answer the question, lead with the answer before showing the sources.

Source

Creator's repository · duckdb/duckdb-skills

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk