wildworld-dataset

WildWorld large-scale action-conditioned world modeling dataset with 108M+ frames from a photorealistic ARPG game, featuring per-frame annotations, 450+ actions, and explicit state information for generative world modeling research.

Skill file

Preview skill file
---
name: wildworld-dataset
description: WildWorld large-scale action-conditioned world modeling dataset with 108M+ frames from a photorealistic ARPG game, featuring per-frame annotations, 450+ actions, and explicit state information for generative world modeling research.
triggers:
  - use WildWorld dataset
  - load WildWorld ARPG data
  - work with WildWorld annotations
  - WildWorld world modeling
  - action conditioned video dataset
  - WildBench benchmark evaluation
  - WildWorld frame annotations
  - generative ARPG dataset
---

# WildWorld Dataset Skill

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

## What WildWorld Is

**WildWorld** is a large-scale action-conditioned world modeling dataset automatically collected from a photorealistic AAA action role-playing game (ARPG). It is designed for training and evaluating **dynamic world models** — generative models that predict future game states given past observations and player actions.

### Key Statistics

| Property | Value |
|---|---|
| Total frames | 108M+ |
| Actions | 450+ semantically meaningful |
| Monster species | 29 |
| Player characters | 4 |
| Weapon types | 4 |
| Distinct stages | 5 |
| Max clip length | 30+ minutes continuous |

### Per-Frame Annotations

Every frame includes:
- **Character skeletons** — joint positions for player and monsters
- **Actions & states** — HP, animation state, stamina, etc.
- **Camera poses** — position, rotation, field of view
- **Depth maps** — monocular depth for each frame
- **Hierarchical captions** — action-level and sample-level natural language descriptions

---

## Project Status

> ⚠️ As of March 2026, the dataset and WildBench benchmark have **not yet been released**. Monitor the repository for updates.

```bash
# Watch the repository for dataset release
# https://github.com/ShandaAI/WildWorld
```

---

## Repository Setup

```bash
# Clone the repository
git clone https://github.com/ShandaAI/WildWorld.git
cd WildWorld

# Install dependencies (when benchmark code is released)
pip install -r requirements.txt
```

---

## Expected Dataset Structure

Based on the paper and framework description, the dataset is expected to follow this structure:

```
WildWorld/
├── data/
│   ├── sequences/
│   │   ├── stage_01/
│   │   │   ├── clip_000001/
│   │   │   │   ├── frames/          # RGB frames (e.g., PNG)
│   │   │   │   ├── depth/           # Depth maps
│   │   │   │   ├── skeleton/        # Per-frame skeleton JSON
│   │   │   │   ├── states/          # HP, animation, stamina JSON
│   │   │   │   ├── camera/          # Camera pose JSON
│   │   │   │   └── actions/         # Action label files
│   │   │   └── clip_000002/
│   │   └── stage_02/
│   └── captions/
│       ├── action_level/            # Per-action descriptions
│       └── sample_level/            # Clip-level descriptions
├── benchmark/
│   └── wildbench/                   # WildBench evaluation code
├── assets/
│   └── framework-arxiv.png
├── LICENSE
└── README.md
```

---

## Working with the Dataset (Anticipated API)

### Loading Frame Annotations

```python
import json
import os
from pathlib import Path
from PIL import Image
import numpy as np

class WildWorldClip:
    """Helper class to load a WildWorld clip and its annotations."""

    def __init__(self, clip_dir: str):
        self.clip_dir = Path(clip_dir)
        self.frames_dir = self.clip_dir / "frames"
        self.depth_dir = self.clip_dir / "depth"
        self.skeleton_dir = self.clip_dir / "skeleton"
        self.states_dir = self.clip_dir / "states"
        self.camera_dir = self.clip_dir / "camera"
        self.actions_dir = self.clip_dir / "actions"

    def get_frame(self, frame_id: int) -> Image.Image:
        frame_path = self.frames_dir / f"{frame_id:06d}.png"
        return Image.open(frame_path)

    def get_depth(self, frame_id: int) -> np.ndarray:
        depth_path = self.depth_dir / f"{frame_id:06d}.npy"
        return np.load(depth_path)

    def get_skeleton(self, frame_id: int) -> dict:
        skeleton_path = self.skeleton_dir / f"{frame_id:06d}.json"
        with open(skeleton_path) as f:
            return json.load(f)

    def get_state(self, frame_id: int) -> dict:
        """Returns HP, animation state, stamina, etc."""
        state_path = self.states_dir / f"{frame_id:06d}.json"
        with open(state_path) as f:
            return json.load(f)

    def get_camera(self, frame_id: int) -> dict:
        """Returns camera position, rotation, and FOV."""
        camera_path = self.camera_dir / f"{frame_id:06d}.json"
        with open(camera_path) as f:
            return json.load(f)

    def get_action(self, frame_id: int) -> dict:
        action_path = self.actions_dir / f"{frame_id:06d}.json"
        with open(action_path) as f:
            return json.load(f)

    def iter_frames(self, start: int = 0, end: int = None):
        """Iterate over all frames in the clip."""
        frame_files = sorted(self.frames_dir.glob("*.png"))
        for frame_path in frame_files[start:end]:
            frame_id = int(frame_path.stem)
            yield {
                "frame_id": frame_id,
                "frame": self.get_frame(frame_id),
                "depth": self.get_depth(frame_id),
                "skeleton": self.get_skeleton(frame_id),
                "state": self.get_state(frame_id),
                "camera": self.get_camera(frame_id),
                "action": self.get_action(frame_id),
            }


# Usage
clip = WildWorldClip("data/sequences/stage_01/clip_000001")
for sample in clip.iter_frames(start=0, end=100):
    frame_id = sample["frame_id"]
    state = sample["state"]
    action = sample["action"]
    print(f"Frame {frame_id}: HP={state.get('hp')}, Action={action.get('name')}")
```

### PyTorch Dataset

```python
import torch
from torch.utils.data import Dataset, DataLoader
from pathlib import Path
import json
import numpy as np
from PIL import Image
import torchvision.transforms as T

class WildWorldDataset(Dataset):
    """
    PyTorch Dataset for WildWorld action-conditioned world modeling.
    
    Returns sequences of (frames, actions, states) for next-frame prediction.
    """

    def __init__(
        self,
        root_dir: str,
        sequence_length: int = 16,
        image_size: tuple = (256, 256),
        stage: str = None,
        split: str = "train",
    ):
        self.root_dir = Path(root_dir)
        self.sequence_length = sequence_length
        self.image_size = image_size

        self.transform = T.Compose([
            T.Resize(image_size),
            T.ToTensor(),
            T.Normalize(mean=[0.485, 0.456, 0.406],
                        std=[0.229, 0.224, 0.225]),
        ])

        # Discover all clips
        self.clips = self._discover_clips(stage, split)
        self.samples = self._build_sample_index()

    def _discover_clips(self, stage, split):
        clips = []
        stage_dirs = (
            [self.root_dir / "data" / "sequences" / stage]
            if stage
            else sorted((self.root_dir / "data" / "sequences").iterdir())
        )
        for stage_dir in stage_dirs:
            if stage_dir.is_dir():
                for clip_dir in sorted(stage_dir.iterdir()):
                    if clip_dir.is_dir():
                        clips.append(clip_dir)
        # Simple train/val split
        split_idx = int(len(clips) * 0.9)
        return clips[:split_idx] if split == "train" else clips[split_idx:]

    def _build_sample_index(self):
        """Build index of (clip_dir, start_frame) pairs."""
        samples = []
        for clip_dir in self.clips:
            frames = sorted((clip_dir / "frames").glob("*.png"))
            n_frames = len(frames)
            for start in range(0, n_frames - self.sequence_length, self.sequence_length // 2):
                samples.append((clip_dir, start))
        return samples

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        clip_dir, start = self.samples[idx]
        frames_dir = clip_dir / "frames"
        frame_files = sorted(frames_dir.glob("*.png"))[start:start + self.sequence_length]

        frames, actions, states = [], [], []
        for frame_path in frame_files:
            frame_id = int(frame_path.stem)

            # Load RGB frame
            img = Image.open(frame_path).convert("RGB")
            frames.append(self.transform(img))

            # Load action
            action_path = clip_dir / "actions" / f"{frame_id:06d}.json"
            with open(action_path) as f:
                action_data = json.load(f)
            actions.append(action_data.get("action_id", 0))

            # Load state
            state_path = clip_dir / "states" / f"{frame_id:06d}.json"
            with open(state_path) as f:
                state_data = json.load(f)
            states.append([
                state_data.get("hp", 1.0),
                state_data.get("stamina", 1.0),
                state_data.get("animation_id", 0),
            ])

        return {
            "frames": torch.stack(frames),            # (T, C, H, W)
            "actions": torch.tensor(actions, dtype=torch.long),   # (T,)
            "states": torch.tensor(states, dtype=torch.float32),  # (T, S)
        }


# Usage
dataset = WildWorldDataset(
    root_dir="/path/to/WildWorld",
    sequence_length=16,
    image_size=(256, 256),
    split="train",
)

loader = DataLoader(dataset, batch_size=4, shuffle=True, num_workers=4)

for batch in loader:
    frames = batch["frames"]   # (B, T, C, H, W)
    actions = batch["actions"] # (B, T)
    states = batch["states"]   # (B, T, S)
    print(f"Frames: {frames.shape}, Actions: {actions.shape}")
    break
```

### Filtering by Action Type

```python
# Action categories in WildWorld
ACTION_CATEGORIES = {
    "movement": ["walk", "run", "sprint", "dodge", "jump"],
    "attack": ["light_attack", "heavy_attack", "combo_finisher"],
    "skill": ["skill_cast_1", "skill_cast_2", "skill_cast_3", "skill_cast_4"],
    "defense": ["block", "parry", "guard"],
    "idle": ["idle", "idle_combat"],
}

def filter_clips_by_action(dataset_root: str, action_category: str) -> list:
    """Find all frame indices that contain a specific action category."""
    root = Path(dataset_root)
    results = []
    target_actions = ACTION_CATEGORIES.get(action_category, [])

    for clip_dir in root.glob("data/sequences/**"):
        if not clip_dir.is_dir():
            continue
        for action_file in sorted((clip_dir / "actions").glob("*.json")):
            with open(action_file) as f:
                data = json.load(f)
            if data.get("action_name") in target_actions:
                results.append({
                    "clip": str(clip_dir),
                    "frame_id": int(action_file.stem),
                    "action": data.get("action_name"),
                })
    return results

# Find all skill cast frames
skill_frames = filter_clips_by_action("/path/to/WildWorld", "skill")
print(f"Found {len(skill_frames)} skill cast frames")
```

---

## WildBench Evaluation

```python
# WildBench evaluates world models on next-frame prediction quality.
# Expected metrics: FVD, PSNR, SSIM, action accuracy

class WildBenchEvaluator:
    """Evaluator for world model predictions on WildBench."""

    def __init__(self, benchmark_dir: str):
        self.benchmark_dir = Path(benchmark_dir)
        self.metrics = {}

    def evaluate(self, model, dataloader):
        from torchmetrics.image import StructuralSimilarityIndexMeasure, PeakSignalNoiseRatio

        ssim = StructuralSimilarityIndexMeasure()
        psnr = PeakSignalNoiseRatio()

        all_psnr, all_ssim = [], []

        for batch in dataloader:
            frames = batch["frames"]       # (B, T, C, H, W)
            actions = batch["actions"]     # (B, T)
            states = batch["states"]       # (B, T, S)

            # Use first T-1 frames to predict the T-th frame
            context_frames = frames[:, :-1]
            context_actions = actions[:, :-1]
            target_frame = frames[:, -1]

            with torch.no_grad():
                predicted_frame = model(context_frames, context_actions, states[:, :-1])

            all_psnr.append(psnr(predicted_frame, target_frame).item())
            all_ssim.append(ssim(predicted_frame, target_frame).item())

        return {
            "PSNR": np.mean(all_psnr),
            "SSIM": np.mean(all_ssim),
        }
```

---

## Citation

```bibtex
@misc{li2026wildworldlargescaledatasetdynamic,
      title={WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG}, 
      author={Zhen Li and Zian Meng and Shuwei Shi and Wenshuo Peng and Yuwei Wu and Bo Zheng and Chuanhao Li and Kaipeng Zhang},
      year={2026},
      eprint={2603.23497},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.23497}, 
}
```

---

## Resources

- **Project Page**: https://shandaai.github.io/wildworld-project/
- **arXiv Paper**: https://arxiv.org/abs/2603.23497
- **YouTube Demo**: https://www.youtube.com/watch?v=9vcSg553r2g
- **GitHub**: https://github.com/ShandaAI/WildWorld

---

## Troubleshooting

| Issue | Solution |
|---|---|
| Dataset not yet available | Monitor the repo; dataset release is pending as of March 2026 |
| Frame loading OOM | Reduce `sequence_length` or `image_size` in the Dataset |
| Missing annotation files | Check that all subdirs (frames, depth, skeleton, states, camera, actions) are fully downloaded |
| Slow DataLoader | Increase `num_workers`, use SSD storage, or preprocess to HDF5 |
| Benchmark code not found | The `benchmark/wildbench` directory will be released separately — watch the repo |

Source

Creator's repository · aradotso/trending-skills

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk