altimate-data-engineering-skills

Claude Code skills for analytics and data engineers working with dbt, Snowflake, and data pipelines

Skill file

Preview skill file↓↑

---
name: altimate-data-engineering-skills
description: Claude Code skills for analytics and data engineers working with dbt, Snowflake, and data pipelines
triggers:
  - "help me create a dbt model"
  - "debug this dbt compilation error"
  - "optimize this Snowflake query"
  - "add tests to my dbt models"
  - "document my dbt models"
  - "convert this SQL to dbt"
  - "find expensive queries in Snowflake"
  - "refactor this dbt model safely"
---

# Altimate Data Engineering Skills

> Skill by [ara.so](https://ara.so) — Data Skills collection.

Altimate Data Engineering Skills is a collection of Claude Code skills that encode the workflows and best practices of experienced analytics engineers. These skills transform Claude from a code generator into a capable data engineering assistant by teaching **how to approach tasks**, not just what syntax to use.

The project demonstrates 53% accuracy on ADE-bench (43 real-world dbt tasks), 3x improvement on model creation tasks, and 84% pass rate on Snowflake query optimization.

## What This Project Does

Data Engineering Skills provides:
- **7 dbt skills**: Model creation, debugging, testing, documentation, migration, refactoring, incremental models
- **3 Snowflake skills**: Cost analysis, query optimization by ID, query optimization by text
- **1 delegation skill**: Hand off complex tasks to altimate-code CLI tool
- **Workflow automation**: Skills trigger automatically based on user intent
- **Best practices**: Encoded patterns from experienced analytics engineers

Skills are markdown files with YAML frontmatter that define trigger conditions and step-by-step workflows.

## Installation

### Installing Skills in Claude Code

```bash
# Add the marketplace plugin
/plugin marketplace add AltimateAI/data-engineering-skills

# Install all skills
/plugin install dbt-skills@data-engineering-skills
/plugin install snowflake-skills@data-engineering-skills
/plugin install altimate-code@data-engineering-skills
```

### Installing Kits

Kits bundle skills, MCP servers, and instructions:

```bash
# Install altimate-code CLI (required for kits)
npm install -g altimate-code

# Install kit system
altimate-code kit install AltimateAI/data-engineering-skills

# Activate the dbt-snowflake kit
altimate-code kit activate dbt-snowflake

# Check status
altimate-code kit status
```

### Manual Installation

Clone and reference skills directly:

```bash
git clone https://github.com/AltimateAI/data-engineering-skills.git
cd data-engineering-skills
```

## Available Skills

### dbt Skills

**creating-dbt-models**: Creates new dbt models following project conventions
- Discovers existing patterns before writing
- Runs `dbt build` after creation (not just compile)
- Verifies output with `dbt show`
- Handles staging, intermediate, and mart models

**debugging-dbt-errors**: Troubleshoots dbt compilation and runtime errors
- Reads full error messages carefully
- Checks upstream dependencies
- Applies fixes and rebuilds
- Stops after 3 failed attempts to reassess

**testing-dbt-models**: Adds schema tests to models
- Studies existing test patterns in project
- Matches project testing style
- Covers uniqueness, not_null, relationships, accepted_values

**documenting-dbt-models**: Generates dbt model documentation
- Analyzes model logic and columns
- Creates descriptions for models and fields
- Follows project documentation patterns

**migrating-sql-to-dbt**: Converts legacy SQL to dbt models
- Parses raw SQL queries
- Creates proper dbt model structure
- Handles CTEs, refs, and sources

**refactoring-dbt-models**: Safely restructures dbt models
- Tracks all dependencies before changes
- Applies refactoring
- Verifies downstream models still work

**developing-incremental-models**: Creates incremental dbt models
- Selects appropriate strategy (append, merge, delete+insert)
- Designs proper unique_key
- Handles late-arriving data and edge cases

### Snowflake Skills

**finding-expensive-queries**: Identifies costly Snowflake queries
- Finds queries by cost, time, or data scanned
- Ranks by impact
- Provides query IDs for optimization

**optimizing-query-by-id**: Optimizes using Snowflake query history ID
- Retrieves query profile from ID
- Applies optimization patterns
- Validates semantic preservation

**optimizing-query-text**: Optimizes raw SQL query text
- Profiles query execution
- Identifies bottlenecks (scans, joins, aggregations)
- Applies anti-pattern fixes
- Tests performance improvement

### Delegation Skill

**altimate-code**: Delegates complex data tasks to altimate-code CLI
- Verifies altimate-code installation
- Invokes `altimate-code run --yolo` non-interactively
- Reads output file and summarizes results
- Requires: `npm install -g altimate-code` (Node 20+)

## Skill Structure

Skills are markdown files with YAML frontmatter:

```yaml
---
name: creating-dbt-models
description: |
  Guide for creating dbt models. ALWAYS use this skill when:
  (1) Creating ANY new model (staging, intermediate, mart)
  (2) Task mentions "create", "build", "add" with model/table
  (3) Modifying model logic or columns
---
```

Followed by workflow instructions:

```markdown
# dbt Model Development

**Read before you write. Build after you write. Verify your output.**

## Critical Rules
1. ALWAYS run `dbt build` after creating models - compile is NOT enough
2. ALWAYS verify output after build using `dbt show`
3. If build fails 3+ times, stop and reassess your approach

## Workflow
1. Discover Conventions
   - Check existing models in same layer (staging/intermediate/mart)
   - Note naming patterns, CTE style, column ordering
   
2. Write Model
   - Follow discovered patterns
   - Use proper refs and sources
   
3. Build and Verify
   ```bash
   dbt build --select model_name
   dbt show --select model_name --limit 10
   ```
```

## Usage Examples

### Creating a dbt Model

```yaml
# User request:
"Create a staging model for raw customers data"

# Skill triggers: creating-dbt-models
# AI workflow:
# 1. Checks models/staging/ for naming patterns
# 2. Creates models/staging/stg_customers.sql
# 3. Runs: dbt build --select stg_customers
# 4. Verifies: dbt show --select stg_customers
```

Example model created:

```sql
-- models/staging/stg_customers.sql
with source as (
    select * from {{ source('jaffle_shop', 'customers') }}
),

renamed as (
    select
        id as customer_id,
        first_name,
        last_name,
        email,
        created_at
    from source
)

select * from renamed
```

### Debugging dbt Errors

```yaml
# User request:
"Fix this error: Compilation Error in model customers (models/customers.sql)"

# Skill triggers: debugging-dbt-errors
# AI workflow:
# 1. Reads full error message
# 2. Checks upstream model dependencies
# 3. Identifies issue (e.g., missing ref)
# 4. Applies fix
# 5. Rebuilds: dbt build --select customers
```

### Optimizing Snowflake Query

```yaml
# User request:
"This query is taking 5 minutes, can you optimize it?"

# Skill triggers: optimizing-query-text
# AI workflow:
# 1. Profiles query (EXPLAIN, execution stats)
# 2. Identifies bottlenecks (table scans, inefficient joins)
# 3. Applies optimizations (clustering, filters, join order)
# 4. Tests and measures improvement
```

Example optimization:

```sql
-- Before (slow)
SELECT 
    o.order_id,
    c.customer_name,
    SUM(oi.amount) as total
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.order_date >= '2024-01-01'
GROUP BY 1, 2;

-- After (optimized)
SELECT 
    o.order_id,
    c.customer_name,
    SUM(oi.amount) as total
FROM orders o
INNER JOIN order_items oi ON o.order_id = oi.order_id
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= '2024-01-01'
  AND o.order_date < '2024-12-31'  -- Added upper bound for partition pruning
GROUP BY 1, 2;
-- Assumes orders table is clustered by order_date
```

### Using Kits

```bash
# Activate dbt-snowflake kit for a project
cd /path/to/dbt-project
altimate-code kit activate dbt-snowflake

# This configures:
# - All dbt and Snowflake skills
# - MCP server for dbt (live project access)
# - Project-specific instructions

# Deactivate when done
altimate-code kit deactivate
```

Kit configuration (`.altimate/kits/active.yaml`):

```yaml
kit: dbt-snowflake
version: 1.0.0
skills:
  - creating-dbt-models
  - debugging-dbt-errors
  - testing-dbt-models
  - documenting-dbt-models
  - migrating-sql-to-dbt
  - refactoring-dbt-models
  - developing-incremental-models
  - finding-expensive-queries
  - optimizing-query-by-id
  - optimizing-query-text
mcp_servers:
  - dbt
instructions: |
  You are working on a dbt + Snowflake project.
  Always check project conventions before creating models.
  Run dbt build (not compile) after creating models.
```

## Configuration

### Skill Configuration

Skills are auto-triggered based on user requests. No explicit configuration needed.

### MCP Integration

Skills work best with Altimate MCP server for live project access:

```json
// claude_desktop_config.json or similar
{
  "mcpServers": {
    "altimate-dbt": {
      "command": "npx",
      "args": ["-y", "@altimate/mcp-server-dbt"],
      "env": {
        "DBT_PROJECT_DIR": "/path/to/dbt/project",
        "DBT_PROFILES_DIR": "/path/to/.dbt"
      }
    }
  }
}
```

MCP tools available to skills:
- `dbt_project_info`: Project structure, model list, sources
- `dbt_model_details`: Column types, dependencies, compiled SQL
- `dbt_compile`: Compile models without CLI
- `snowflake_query_history`: Recent query executions and stats
- `snowflake_table_stats`: Row counts, clustering info

## Common Patterns

### Pattern 1: Multi-Step Model Creation

```yaml
# User: "Create a mart model for monthly revenue by customer"
# Skills auto-chains:
# 1. creating-dbt-models: Creates initial model
# 2. testing-dbt-models: Adds schema tests
# 3. documenting-dbt-models: Adds documentation
```

### Pattern 2: Debug-Fix-Verify Loop

```yaml
# User: "My model won't compile"
# Skill: debugging-dbt-errors
# Workflow:
# 1. Read error
# 2. Check dependencies
# 3. Apply fix
# 4. Rebuild
# 5. If still fails, repeat up to 3x
# 6. If 3x fails, stop and reassess approach
```

### Pattern 3: Safe Refactoring

```yaml
# User: "Break this large model into smaller ones"
# Skill: refactoring-dbt-models
# Workflow:
# 1. Map all downstream dependencies
# 2. Create new intermediate models
# 3. Update refs in downstream models
# 4. Run dbt build on entire DAG
# 5. Verify no breakage
```

### Pattern 4: Incremental Model Development

```sql
-- Pattern: merge strategy with unique_key
{{ config(
    materialized='incremental',
    unique_key='event_id',
    merge_update_columns=['status', 'updated_at']
) }}

select
    event_id,
    user_id,
    event_type,
    status,
    created_at,
    updated_at
from {{ source('events', 'raw_events') }}
{% if is_incremental() %}
    where updated_at > (select max(updated_at) from {{ this }})
{% endif %}
```

### Pattern 5: Query Optimization Workflow

```python
# Skill: optimizing-query-text
# Steps encoded:
# 1. Profile query
# 2. Identify bottleneck type:
#    - Full table scan → Add filters/clustering
#    - Inefficient join → Reorder, change type
#    - Large aggregation → Pre-aggregate or partition
# 3. Apply pattern-based fix
# 4. Validate semantics unchanged
# 5. Measure improvement
```

## Troubleshooting

### Skills Not Triggering

**Problem**: Skills don't activate for your request

**Solution**: 
- Be explicit: "Create a dbt model" not "make a model"
- Check skill installation: `/plugin list`
- Review trigger phrases in skill YAML
- Use exact trigger phrases from skill description

### dbt Build Failures

**Problem**: Models fail to build after creation

**Solution**:
- Skill will auto-retry up to 3 times
- After 3 failures, skill stops to reassess
- Check `dbt debug` for connection issues
- Verify refs and sources exist
- Check for SQL syntax errors

### MCP Server Not Connected

**Problem**: Skills can't access live project data

**Solution**:
```bash
# Check MCP server config
cat claude_desktop_config.json | grep altimate

# Verify dbt project path
export DBT_PROJECT_DIR=/correct/path
export DBT_PROFILES_DIR=/correct/.dbt/path

# Restart Claude Code
```

### altimate-code Delegation Fails

**Problem**: "altimate-code not found" error

**Solution**:
```bash
# Install altimate-code CLI
npm install -g altimate-code

# Verify installation
altimate-code --version

# Check Node version (requires 20+)
node --version
```

### Query Optimization No Improvement

**Problem**: Optimized query performs the same

**Solution**:
- Skill checks for anti-patterns but can't fix all issues
- May need manual clustering/indexing setup
- Check if bottleneck is data volume (consider aggregation)
- Review Snowflake warehouse size
- Use `finding-expensive-queries` to compare before/after

### Kit Activation Issues

**Problem**: Kit won't activate

**Solution**:
```bash
# Check kit is installed
altimate-code kit list

# Reinstall if needed
altimate-code kit install AltimateAI/data-engineering-skills

# Check for conflicting active kits
altimate-code kit status

# Deactivate others first
altimate-code kit deactivate
```

## Performance

Benchmark results (ADE-bench, 43 real-world dbt tasks):
- **Baseline Claude**: 46.5% accuracy (20/43 tasks)
- **Claude + Skills**: 53.5% accuracy (23/43 tasks)
- **Model Creation**: 3x improvement (40% → 65%)

Snowflake optimization (TPC-H SF1000, 62 queries):
- **Baseline**: 77.4% pass rate, 4.7% avg improvement
- **With Skills**: 83.9% pass rate, 16.8% avg improvement (3.6x better)

## Additional Resources

- Documentation: https://docs.myaltimate.com/
- ADE-bench Framework: https://github.com/dbt-labs/ade-bench
- altimate-code CLI: https://github.com/AltimateAI/altimate-code
- dbt Slack: https://getdbt.slack.com/archives/C05KPDGRMDW
- Contact: https://app.myaltimate.com/contactus

Source

Creator's repository · aradotso/data-skills

View on GitHub ↗

Security

Security checks in progress

Results will appear here once audits complete

What this skill can do

Reads your filesConnects to the internetRuns code on your machine

Checked by 3 independent security firms

Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub

Does it sneak in hidden code?Not yet checkedPending · Socket

Does it have known bugs?Not yet checkedPending · Snyk