firecrawl-knowledge-ingest

Ingest public or authenticated knowledge bases and docs portals with Firecrawl browser. Use for JS-heavy docs, login-gated portals, paginated help centers, support knowledge bases, or structured JSON/markdown extraction from documentation sites.

Skill file

Preview skill file
---
name: firecrawl-knowledge-ingest
description: Ingest public or authenticated knowledge bases and docs portals with Firecrawl browser. Use for JS-heavy docs, login-gated portals, paginated help centers, support knowledge bases, or structured JSON/markdown extraction from documentation sites.
license: ISC
metadata:
  author: firecrawl
  version: "0.1.0"
  homepage: https://www.firecrawl.dev
  source: https://github.com/firecrawl/firecrawl-workflows
inputs:
  - name: FIRECRAWL_API_KEY
    description: Firecrawl API key for hosted Firecrawl requests.
    required: true
---

# Firecrawl Knowledge Ingest

Use this when a docs portal needs browser navigation, auth, pagination, or JS rendering.

## Onboarding Interview

Infer the portal URL, output format, auth needs, and page limit from context. If the portal is clear, proceed immediately.

Ask at most 1-3 concise questions only if blocked, such as the portal URL, whether authentication is required, or the desired output format.

## Firecrawl Collection Plan

Use Firecrawl browser to:

- open the portal and inspect navigation
- identify sections, categories, sidebar links, and article URLs
- follow sidebar navigation, next links, pagination, load-more controls, or search
- scrape article content as markdown
- extract metadata such as title, section, last updated date, author, and tags

Try Firecrawl map as a supplement for public URLs, but use browser navigation for auth-gated or JS-heavy content.

## Final Deliverable

```markdown
# Knowledge Ingest: [Portal]

## Summary
[Pages extracted, sections covered, limitations]

## Output
[JSON/markdown/merged file path or content]

## Sections
[Section names and article counts]

## Failed Or Restricted Pages
[Any access/loading issues]

## Sources
[URLs extracted]

## Rerun Inputs
workflow: firecrawl-knowledge-ingest
url: [portal url]
format: [json/markdown/merged]
max_pages: [number]
```

## JSON Shape

Use `source`, `url`, `extractedAt`, `totalArticles`, and `sections[]` with article `title`, `url`, `section`, `content`, and `metadata`.

## Quality Bar

- Preserve code examples, tables, and formatting.
- Strip nav chrome, headers, and footers.
- Track extraction progress and page failures.
- Respect authentication boundaries.

Source

Creator's repository · firecrawl/firecrawl-workflows

View on GitHub

License: ISC

Security

Verified — safe to install
Passed all 3 independent security checks
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?NoSAFE · Gen Agent Trust Hub
Does it sneak in hidden code?NoNo alerts · Socket
Does it have known bugs?NoMed risk · Snyk