run-models

Run AI models on Replicate via predictions, webhooks, and streaming.

Skill file

Preview skill file↓↑

---
name: run-models
description: Run AI models on Replicate via predictions, webhooks, and streaming.
---

## Docs

- Reference: <https://replicate.com/docs/llms.txt>
- OpenAPI schema: <https://api.replicate.com/openapi.json>
- MCP server: <https://mcp.replicate.com>
- Per-model docs: `https://replicate.com/{owner}/{model}/llms.txt`
- Set `Accept: text/markdown` when requesting docs pages for Markdown responses.

## Workflow

1. **Choose the right model** - Search with the API or ask the user.
2. **Get model metadata** - Fetch input and output schema via API.
3. **Create prediction** - POST to /v1/predictions.
4. **Poll for results** - GET prediction until status is "succeeded".
5. **Return output** - Usually URLs to generated content.

## Three ways to get output

1. Create a prediction, store its id from the response, and poll until completion.
2. Set a `Prefer: wait` header when creating a prediction for a blocking synchronous response. Only recommended for very fast models. Max 60 seconds.
3. Set an HTTPS webhook URL when creating a prediction, and Replicate will POST to that URL when the prediction completes.

## Guidelines

- Use the `POST /v1/predictions` endpoint, as it supports both official and community models.
- Every model has its own OpenAPI schema. Always fetch and check model schemas to make sure you're setting valid inputs. Even popular models change their schemas.
- Validate input parameters against schema constraints (`minimum`, `maximum`, `enum` values). Don't generate values that violate them.
- When unsure about a parameter value, use the model's default example or omit the optional parameter.
- Don't set optional inputs unless you have a reason to. Stick to the required inputs and let the model's defaults do the work.
- Use HTTPS URLs for file inputs whenever possible. You can also send base64-encoded files, but they should be avoided.
- Fire off multiple predictions concurrently. Don't wait for one to finish before starting the next.
- Output file URLs expire after 1 hour, so back them up if you need to keep them, using a service like Cloudflare R2.
- Webhooks are a good mechanism for receiving and storing prediction output.

## Predictions

- A prediction goes through these states: `starting` -> `processing` -> `succeeded` / `failed` / `canceled`.
- Official models use `owner/name` format. Community models require `owner/name:version_id`.
- The `POST /v1/predictions` endpoint handles both.

## Webhooks

- Set `webhook` to an HTTPS URL when creating a prediction. Replicate POSTs the full prediction object when it completes.
- Filter events with `webhook_events_filter`: `start`, `output`, `logs`, `completed`.
- Validate webhook signatures using the `Webhook-ID`, `Webhook-Timestamp`, and `Webhook-Signature` headers. Get the signing secret from `GET /v1/webhooks/default/secret`.

## Prediction lifetime

- Set `lifetime` to auto-cancel predictions that run too long (e.g. `30s`, `5m`, `1h`). Measured from creation time.

## Streaming

- Language models that support streaming include a `stream` URL in the response. Use SSE to receive incremental output.

## File handling

- Prefer HTTPS URLs for file inputs. Output URLs from one prediction can be passed directly as file inputs to the next model.
- Output file URLs expire after 1 hour. Download and store them immediately if you need to keep them.

## Multi-model workflows

- Chain models by passing output URLs as file inputs to the next model.
- Start all independent predictions in parallel, then collect results.
- Output URLs are valid for 1 hour, which is enough for pipeline steps.

Source

Creator's repository · replicate/skills

View on GitHub ↗

Security

Security checks in progress

Results will appear here once audits complete

What this skill can do

Reads your filesConnects to the internetRuns code on your machine

Checked by 3 independent security firms

Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub

Does it sneak in hidden code?Not yet checkedPending · Socket

Does it have known bugs?Not yet checkedPending · Snyk