podcast

>-

Skill file

Preview skill file
---
name: podcast
description: >-
  Two-host podcast video for any URL or free-form topic — 1 minute, 4 acts × ~15s,
  native multi-shot dialogue, optional voice cloning for Host A. Use when the user
  asks to "make a podcast", "podcast about [thing]", "podcast review of [url]",
  "two-host explainer", "interview-style clip", "two people talking on camera",
  "I/me and X talk about Y", or "interview with [persona] about [topic]". Native
  audio is the deliverable; captions are skipped by default because podcast dialogue
  mistranscribes domain terms.
argument-hint: <url-or-topic> [bg_img=] [host_a_img=] [host_b_img=] [voice_a=] [voice_b=] [use_avatar] [aspect_ratio=16:9]
---

# /pika:podcast

4 acts × 15s each = 60s. Host A always LEFT, Host B always RIGHT. Accepts a URL **or** a free-form topic / brief.

## Parameters

| Param | Default | Notes |
|---|---|---|
| `input` | required | URL to review **or** free-form topic / brief (e.g. "I and Elon Musk talk about Mars") |
| `bg_img` | auto-generated | Podcast studio background |
| `host_a_img` | auto-generated | Host A portrait — see Real-person handling below |
| `host_b_img` | auto-generated | Host B portrait — see Real-person handling below |
| `voice_a` | `876341503281471517` | Kling preset or cloned voice ID for Host A |
| `voice_b` | `829837252279803904` | Kling preset or cloned voice ID for Host B |
| `use_avatar` | off | Clone user's identity voice as Host A via `clone_voice` |
| `aspect_ratio` | `16:9` | Output aspect ratio |

## Defaults — fire fast, no mid-flow confirmation

- **Use the param-table defaults silently for voices.** `voice_a` defaults to the Kling preset `876341503281471517` and `voice_b` to `829837252279803904`. Do **not** ask "which voice?" or "should I clone yours?" before firing — only honor explicit overrides (`voice_a=`, `voice_b=`, `use_avatar`).
- **Auto-generate any missing host portraits silently** (Step 1's archetype prompts). Do **not** ask "should I generate a host image?" — just generate.
- **No "type yes to proceed" gates.** Submit → render the 4 acts → return URL. Account credit balance + provider failover are the canonical guardrails. The `--yes` flag is accepted as a no-op for backward compatibility.
- **Topic-mode personas (Step 3)** — when the user names a real public figure, follow Step 4 (Real-person handling) silently: archetype portrait by default, no auto-generated photographic likeness, no question to the user about likeness rights.

## Local images on Claude Desktop

Claude Desktop can't pass inline-pasted images to MCP tools yet (Anthropic-side limitation). If the user pastes a photo inline, or mentions a local file they want as `host_a_img` / `host_b_img`, pause Step 1 and kindly send them this — something like:

> Heads up — pasted images don't reach MCP tools on Claude Desktop yet (Anthropic limitation). Two easy options for your photo:
>
> - **Paste a URL** if it's already hosted (Imgur, S3, your site) — fastest
> - **Attach the image file** so I can upload it before generation.

When a local file arrives, convert it to a public URL with `upload_asset` and use the returned `public_url` as the parameter before Step 1. Already-hosted `https://...` URLs work as-is and skip this entirely.

If the user names a real public figure without attaching anything, do NOT auto-generate their likeness — Step 4 (Real-person handling) uses an archetype portrait instead.

## Steps

### 0. Resolve input (empty-args menu)

Strip flags (`--yes`, `--no-captions`, etc.) and `key=value` parameters from `$ARGUMENTS`. **If what remains is empty or whitespace-only**, print this menu **verbatim** as your full response, then **stop and wait for the user's next message** — do NOT call any tool, do NOT proceed to Step 1, do NOT invent a topic or URL. If the stripped input is non-empty (a URL or any prose), skip this step silently and proceed to Step 1.

> **What would you like a podcast about?** I can take any of:
>
> - **A website URL** (product page, docs site, launch page) — e.g. `https://pika.art`
> - **A GitHub repo** — e.g. `https://github.com/anthropics/claude-code`
> - **A blog post / article URL** — e.g. a recent piece you'd like discussed
> - **A free-form topic or brief** — e.g. *"I and Elon Musk talk about Mars"* or *"two scientists debate AGI"*
>
> Reply with your choice and I'll generate a 1-minute two-host podcast video (4 acts × ~15s).
>
> *Tip: you don't need to type `/pika:podcast` — just say things like "make a podcast about <topic>", "podcast review of <url>", or "I and <persona> talk about <topic>" and I'll fire this skill automatically.*

When the user replies, treat their reply as the resolved input (URL or topic) and proceed to Step 1. Do not re-prompt.

### 1. Generate missing assets (parallel)

Generate only what's not provided. Default archetype prompts:
- `bg_img` — modern podcast studio, two chairs, warm lighting, no people, 16:9
- `host_a_img` — enthusiastic host, studio portrait, left-side framing, 1:1
- `host_b_img` — pragmatic skeptic host, studio portrait, right-side framing, 1:1

If the input mentions specific personas (Step 3), tune the archetype to match the persona vibe — see Real-person handling below.

### 2. Resolve voice IDs (only if `use_avatar` is set)

1. Call `identity_voice_info` → `{ voice_id, platform, sample_url }`
2. If `sample_url` is present: call `clone_voice(voice_url=sample_url, voice_name="host_a_voice")` → set `voice_a` to the returned Kling voice ID

### 3. Parse input mode — URL vs topic

Strip flags (`--yes`, `--no-captions`, etc.) and key=value parameters from `$ARGUMENTS`. Inspect what remains.

**URL mode** — input contains a `https?://` URL:
- Call `capture_website` on the URL.
- Extract: product name, value prop, 2–3 specific features or facts, pricing, one jokeable detail.
- Use these as the script's factual anchors.

**Topic mode** — input is free-form prose (no URL):
- Treat the whole input as the brief. Parse for:
  - **Subject** — what the conversation is about
  - **Hosts** — explicit if mentioned ("I and Elon Musk", "two scientists", "Joe and Sarah"); otherwise use defaults (enthusiastic host + skeptic host)
  - **Angle** — debate / interview / explainer / casual
  - **Concrete facts** — any specific claims, numbers, dates, quotes the user gave
- If no concrete facts are given, use **2–3 clearly framed observations or hypotheses** to anchor jokes and the "wait, actually..." pivot. Do not present invented claims as facts; if factual accuracy matters for the topic, ask for a source or URL.
- If the user says "I and X" or "me and X", Host A = the user (use `use_avatar` flow if not already, or default avatar) and Host B = X.

### 4. Real-person handling (topic mode only)

If the parsed input names a specific real public figure as a host (e.g. "Elon Musk", "Taylor Swift", "Joe Rogan"):

- **Default behavior**: do NOT auto-generate that person's photographic likeness. Generate an **archetype portrait** matching the persona vibe — e.g. "tech-billionaire-energy CEO at a podcast desk" for an Elon-style host, "pop-star aesthetic" for a Taylor-style host. Clearly inspired-by, not impersonation.
- **Override**: if the user explicitly provides `host_a_img=<url>` or `host_b_img=<url>`, use the provided image as-is. The user takes responsibility for likeness rights.
- **Voices**: same logic — default to a generic Kling preset; only use a cloned voice when the user provides one (`voice_a=` / `voice_b=`) or invokes `use_avatar` (which clones the user's own voice for Host A).
- **Script tone**: the dialogue can riff on the named persona's known public positions or vibe (e.g. Mars enthusiasm for Elon-style) — public-record opinions are fair game. Do NOT put specific defamatory, off-character, or fabricated-private-life statements in their mouth.

This guardrail keeps the skill creative ("I want a podcast where I argue with a tech CEO about Mars") without auto-generating deepfakes of named real people.

### 5. Write script

Write 4 acts × 2 lines (HOST_A / HOST_B). Each line ~10–12s of spoken dialogue.

**Required (Matan rules — apply to both URL and topic modes):**
- One specific joke tied to a concrete detail (scraped fact in URL mode; topic-derived claim in topic mode)
- One "wait, actually..." skeptic-flip moment
- At least one mid-sentence interruption
- Natural filler: "okay so", "wait", "right?", "i mean", "honestly"
- Real reactions, not generic praise
- Reference at least one actual feature name, price, claim, or quote
- Natural ending — no forced "bye!"

Acts: Hook → Feature deep-dive → The Turn → Verdict
(In topic mode the analogue: Hook → Substance → The Pivot → Verdict.)

### 6. Generate video acts (subagent, sequential)

Delegate to a subagent with all resolved assets and the script. The subagent runs acts 1→2→3→4 sequentially — do NOT parallelize.

Each act: one `generate_reference_video` call (`kling-v3-omni`, `duration=15`, `sound=true`). Pass `reference_images=[bg_img, host_a_img, host_b_img]` and `voice_ids=[voice_a, voice_b]`. Optional knobs (added by `pika-mcp-server` BACK-339, 2026-05-10): `quality_mode: "pro"` for higher-fidelity kling output (longer wall-clock; reserve for high-stakes renders), and `kling_model` to pin a specific kling family member if you need reproducibility across runs. Three shots:

- Wide 5s: both hosts, no voice token
- MCU-A 5s: `<<<voice_1>>> '<HOST_A line>'`
- MCU-B 5s: `<<<voice_2>>> '<HOST_B line>'`

Emotional beats per act:
- Act 1: A excited, B skeptical
- Act 2: A gesturing/explaining, B questioning
- Act 3: A firm, B surprised and reconsidering
- Act 4: A satisfied, B conceding

After act 4, subagent calls `edit_concat([act1, act2, act3, act4])` and returns the final video URL.

### 7. Output

Return the final video URL and a one-sentence verdict. **Do not call `add_captions`** — Whisper auto-transcription is unreliable on the domain-specific terms typical of podcast dialogue (product names, persona names, technical jargon). Native Kling Omni audio is the deliverable.

---

**Rules:**
- `voice_ids` must be valid Kling voice IDs — never use name-style strings like `Calm_Man`
- Host A always LEFT (`<<<image_2>>>`), Host B always RIGHT (`<<<image_3>>>`) — never swapped

## Load-bearing phrases

These anchors keep the podcast output coherent across URL and topic modes:

| Phrase | Where | Why load-bearing |
|---|---|---|
| `Host A always LEFT, Host B always RIGHT` | Layout and shot prompts | Prevents host identity swapping across the four separate act renders. |
| `4 acts × 15s each` | Overall structure | Keeps the concat predictable and avoids uneven act pacing. |
| `Hook → Feature deep-dive → The Turn → Verdict` | Script structure | Gives the episode a conversational arc instead of four disconnected reactions. |
| `wait, actually...` skeptic-flip moment | Script requirements | Creates the pivot that makes the podcast feel like a real exchange. |
| `Do not call add_captions` | Output rule | Avoids low-quality burned captions on fast two-host dialogue with names and jargon. |

## Engine choice: Kling v3-omni for native two-host dialogue

Use Kling v3-omni for the four acts because it supports native dialogue with two reference hosts and voice tokens in a single shot plan. The tradeoff is that acts run sequentially for consistency and can take longer than pure edit/composite flows. Do not add a separate caption or music layer by default; the value of this skill is the native spoken exchange.

## Runtime expectations

Typical wall-clock is 8-18 minutes:

| Step | Wall clock | Notes |
|---|---:|---|
| Missing asset generation | 30-90s | Skipped for provided background/host refs |
| URL/topic parse + script | 1-3 min | URL mode depends on page fetch quality |
| Four Kling acts | 6-14 min | Runs sequentially to reduce host/voice drift |
| Concat + return | 30-90s | Final URL only; captions skipped by default |

## Examples

URL mode (review a website / repo / blog):

```
/pika:podcast https://pika.art
/pika:podcast https://github.com/anthropics/claude-code
/pika:podcast https://cursor.com use_avatar
```

Topic mode (free-form brief):

```
/pika:podcast Two AI researchers debate whether AGI arrives before 2030
/pika:podcast I and a Mars-obsessed tech CEO talk about colonization timelines
/pika:podcast interview with a seed-stage VC about what kills most startups
/pika:podcast podcast about quantum computing breakthroughs in 2026
```

Mixed (URL inside a topic prompt — agent prefers URL mode if a valid URL is found):

```
/pika:podcast podcast about https://pika.art with skeptical investor energy
```

Source

Creator's repository · pika-labs/pika-plugins

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk