rustyseo-toolkit

Cross-platform SEO/GEO toolkit built with Tauri and Rust for crawling, analyzing websites, and parsing server logs without crawl limits

Skill file

Preview skill file↓↑

---
name: rustyseo-toolkit
description: Cross-platform SEO/GEO toolkit built with Tauri and Rust for crawling, analyzing websites, and parsing server logs without crawl limits
triggers:
  - how do I crawl a website with RustySEO
  - analyze SEO metrics using RustySEO
  - parse nginx or apache logs for SEO insights
  - integrate Google Search Console with RustySEO
  - use RustySEO API connectors for PageSpeed Insights
  - generate keyword clusters and topics with RustySEO
  - run deep crawl with RustySEO headless mode
  - export RustySEO crawl data to CSV or Excel
---

# RustySEO Toolkit

> Skill by [ara.so](https://ara.so) — Marketing Skills collection.

RustySEO is a free, cross-platform SEO/GEO toolkit built with Tauri, Rust, Next.js, and TypeScript. It provides comprehensive website crawling, technical SEO analysis, log parsing (Nginx/Apache), AI-powered insights, and integrations with Google Search Console, GA4, PageSpeed Insights, and more. No crawl limits, fully local with optional cloud API integrations.

## Installation

### Desktop Application

Download the latest release for your platform from the [releases page](https://github.com/mascanho/RustySEO/releases):

- **Windows**: `.msi` installer (ignore "Unknown Developer" warning)
- **macOS**: `.dmg` installer (allow in System Preferences > Security & Privacy)
- **Linux**: `.AppImage` or `.deb` package

### Development Setup

```bash
# Clone the repository
git clone https://github.com/mascanho/RustySEO.git
cd RustySEO

# Install dependencies
npm install

# Run in development mode
npm run tauri dev

# Build for production
npm run tauri build
```

### TUI/Headless Mode (Separate Installation)

For terminal-based crawling, install the headless version:

```bash
git clone https://github.com/mascanho/RustySEO-Headless.git
cd RustySEO-Headless
cargo build --release
```

## Core Features

### 1. Website Crawling

**Shallow Crawl (Single Page)**
```typescript
// Example: Trigger shallow crawl programmatically
// Located in: src/components/Crawler.tsx

interface CrawlConfig {
  url: string;
  followLinks: boolean;
  respectRobotsTxt: boolean;
  userAgent: string;
}

const shallowCrawl = async (config: CrawlConfig) => {
  const result = await invoke('shallow_crawl', {
    url: config.url,
    followLinks: false,
    respectRobotsTxt: true
  });
  return result;
};
```

**Deep Crawl (Multiple Pages)**
```typescript
// Example: Deep crawl with concurrency control
const deepCrawl = async (baseUrl: string, maxPages: number = 100) => {
  const result = await invoke('deep_crawl', {
    url: baseUrl,
    maxPages: maxPages,
    concurrency: 5, // Concurrent requests
    delay: 1000 // Milliseconds between requests
  });
  return result;
};
```

**Keyboard Shortcuts**
- `CTRL + D`: Deep Crawl
- `CTRL + S`: Shallow Crawl
- `CTRL + H`: Toggle Sidebar
- `CTRL + L`: Toggle Task Manager

### 2. API Connectors Configuration

RustySEO integrates with multiple APIs. Configure them via the UI (Connectors menu) or directly in the config:

```typescript
// Example: Configure PageSpeed Insights API
// Stored in local SQLite database

interface APIConfig {
  provider: 'pagespeed' | 'gemini' | 'gsc' | 'ga4';
  apiKey: string;
  enabled: boolean;
}

const setAPIKey = async (config: APIConfig) => {
  await invoke('save_api_config', {
    provider: config.provider,
    apiKey: config.apiKey,
    enabled: true
  });
};

// Usage
await setAPIKey({
  provider: 'pagespeed',
  apiKey: process.env.GOOGLE_PAGESPEED_API_KEY || '',
  enabled: true
});
```

**Environment Variables for API Keys**
```bash
# .env.local
GOOGLE_PAGESPEED_API_KEY=your_pagespeed_key_here
GOOGLE_GEMINI_API_KEY=your_gemini_key_here
GOOGLE_GSC_CLIENT_ID=your_oauth_client_id
GOOGLE_GSC_CLIENT_SECRET=your_oauth_client_secret
GOOGLE_GA4_PROPERTY_ID=your_ga4_property_id
```

### 3. Log Analysis (Nginx/Apache)

```typescript
// Example: Parse and analyze server logs
interface LogAnalysisConfig {
  logPath: string;
  logType: 'nginx' | 'apache';
  startDate?: string;
  endDate?: string;
}

const analyzeLogs = async (config: LogAnalysisConfig) => {
  const analysis = await invoke('parse_logs', {
    path: config.logPath,
    logType: config.logType,
    filters: {
      startDate: config.startDate,
      endDate: config.endDate
    }
  });
  
  // Returns: bot traffic, crawl frequency, errors, popular pages
  return analysis;
};

// Usage
const logData = await analyzeLogs({
  logPath: '/var/log/nginx/access.log',
  logType: 'nginx',
  startDate: '2024-01-01',
  endDate: '2024-01-31'
});

console.log(logData.botTraffic); // Googlebot, Bingbot stats
console.log(logData.errorPages); // 404, 500 errors
```

### 4. SEO Analysis & Reporting

```typescript
// Example: Extract SEO metrics from crawl
interface SEOMetrics {
  title: string;
  metaDescription: string;
  h1Tags: string[];
  h2Tags: string[];
  canonicalUrl: string;
  openGraphTags: Record<string, string>;
  structuredData: object[];
  imageCount: number;
  internalLinks: number;
  externalLinks: number;
  wordCount: number;
  loadTime: number;
}

const analyzePage = async (url: string): Promise<SEOMetrics> => {
  const crawlResult = await invoke('shallow_crawl', { url });
  
  return {
    title: crawlResult.title,
    metaDescription: crawlResult.meta_description,
    h1Tags: crawlResult.h1_tags,
    h2Tags: crawlResult.h2_tags,
    canonicalUrl: crawlResult.canonical,
    openGraphTags: crawlResult.og_tags,
    structuredData: crawlResult.structured_data,
    imageCount: crawlResult.images.length,
    internalLinks: crawlResult.internal_links.length,
    externalLinks: crawlResult.external_links.length,
    wordCount: crawlResult.word_count,
    loadTime: crawlResult.load_time_ms
  };
};
```

**Export Data**
```typescript
// Example: Export crawl data to various formats
const exportData = async (format: 'csv' | 'excel' | 'pdf' | 'sheets') => {
  await invoke('export_crawl_data', {
    format: format,
    crawlId: currentCrawlId,
    outputPath: `./exports/crawl-${Date.now()}.${format}`
  });
};

// CSV Export
await exportData('csv');

// Google Sheets (requires GSC OAuth)
await exportData('sheets');
```

### 5. AI Features

**Topic & Keyword Generation**
```typescript
// Example: Generate content topics using AI
interface TopicRequest {
  seed: string;
  count: number;
  model: 'gemini' | 'ollama';
}

const generateTopics = async (request: TopicRequest) => {
  const topics = await invoke('ai_generate_topics', {
    seed: request.seed,
    count: request.count,
    provider: request.model
  });
  return topics;
};

// Usage with Google Gemini
const topics = await generateTopics({
  seed: 'organic gardening tips',
  count: 10,
  model: 'gemini'
});

// Returns: ["Companion Planting Guide", "Composting 101", ...]
```

**Keyword Clustering**
```typescript
// Example: Cluster keywords using ML
const clusterKeywords = async (keywords: string[]) => {
  const clusters = await invoke('cluster_keywords', {
    keywords: keywords,
    minClusterSize: 5,
    algorithm: 'kmeans'
  });
  
  return clusters;
};

const keywords = ["seo tools", "seo software", "website crawler", "site audit"];
const result = await clusterKeywords(keywords);
// Returns: { cluster1: ["seo tools", "seo software"], cluster2: ["website crawler", "site audit"] }
```

**AI Chatbot with Crawl Context**
```typescript
// Example: Query AI about crawled pages
const askChatbot = async (question: string, crawlContext: string) => {
  const response = await invoke('ai_chat', {
    question: question,
    context: crawlContext,
    provider: 'gemini'
  });
  return response;
};

// Usage
const answer = await askChatbot(
  "What are the main SEO issues on this page?",
  JSON.stringify(currentPageData)
);
```

### 6. Google Search Console Integration

```typescript
// Example: Fetch GSC data
interface GSCQuery {
  siteUrl: string;
  startDate: string;
  endDate: string;
  dimensions?: ('query' | 'page' | 'country' | 'device')[];
}

const fetchGSCData = async (query: GSCQuery) => {
  // Requires OAuth2 authentication
  const data = await invoke('gsc_fetch_data', {
    siteUrl: query.siteUrl,
    startDate: query.startDate,
    endDate: query.endDate,
    dimensions: query.dimensions || ['query', 'page']
  });
  
  return data.rows; // { query, clicks, impressions, ctr, position }
};

// Usage
const gscData = await fetchGSCData({
  siteUrl: 'https://example.com',
  startDate: '2024-01-01',
  endDate: '2024-01-31',
  dimensions: ['query', 'page']
});
```

### 7. Schema Generator & Validator

```typescript
// Example: Generate and validate structured data
interface SchemaConfig {
  type: 'Article' | 'Product' | 'LocalBusiness' | 'FAQ';
  data: Record<string, any>;
}

const generateSchema = (config: SchemaConfig) => {
  const schema = {
    "@context": "https://schema.org",
    "@type": config.type,
    ...config.data
  };
  
  return JSON.stringify(schema, null, 2);
};

// Usage
const articleSchema = generateSchema({
  type: 'Article',
  data: {
    headline: 'Complete SEO Guide',
    author: { "@type": "Person", name: "John Doe" },
    datePublished: '2024-01-01',
    image: 'https://example.com/image.jpg'
  }
});

// Validate schema
const validateSchema = async (schemaJson: string) => {
  const validation = await invoke('validate_schema', {
    schema: schemaJson
  });
  return validation.isValid;
};
```

## Common Patterns

### Database Access (SQLite)

```typescript
// Example: Query crawl history
const getCrawlHistory = async (limit: number = 50) => {
  const history = await invoke('get_crawl_history', { limit });
  return history;
};

// Clear crawl logs
const clearLogs = async () => {
  await invoke('clear_crawl_logs');
};
```

### Task Manager

```typescript
// Example: Create and track SEO tasks
interface SEOTask {
  title: string;
  description: string;
  priority: 'low' | 'medium' | 'high';
  dueDate?: string;
}

const createTask = async (task: SEOTask) => {
  await invoke('create_task', {
    title: task.title,
    description: task.description,
    priority: task.priority,
    dueDate: task.dueDate
  });
};

// Keyboard shortcut: CTRL + T
```

### Image Optimization

```typescript
// Example: Convert and optimize images
interface ImageOptimizeConfig {
  inputPath: string;
  outputPath: string;
  format: 'webp' | 'avif' | 'jpeg';
  quality: number; // 1-100
}

const optimizeImage = async (config: ImageOptimizeConfig) => {
  await invoke('optimize_image', {
    input: config.inputPath,
    output: config.outputPath,
    format: config.format,
    quality: config.quality
  });
};
```

## Configuration

### User Agent Customization

```typescript
const setUserAgent = async (userAgent: string) => {
  await invoke('set_user_agent', { userAgent });
};

// Respect robots.txt
const setRobotsTxt = async (respect: boolean) => {
  await invoke('set_robots_txt_respect', { respect });
};
```

### Cache Management

```bash
# Keyboard shortcuts
CTRL + /           # Clear cache
CTRL + Shift + /   # Full app reset
```

## Troubleshooting

### Issue: "Unknown Developer" Warning (Windows/Mac)

**Windows**: Settings > Apps > Apps & features > Choose where to get apps > Allow apps from anywhere
**macOS**: System Preferences > Security & Privacy > Open Anyway

### Issue: API Rate Limiting

```typescript
// Add delay between requests
const crawlWithDelay = async (urls: string[], delayMs: number = 2000) => {
  for (const url of urls) {
    await invoke('shallow_crawl', { url });
    await new Promise(resolve => setTimeout(resolve, delayMs));
  }
};
```

### Issue: Large Website Crawling (>100K URLs)

Use the headless/TUI version for better performance:

```bash
# Install RustySEO-Headless
cd RustySEO-Headless
cargo run --release -- --url https://example.com --max-pages 100000
```

### Issue: OAuth2 Authentication Fails

Currently OAuth is server-side. Ensure redirect URIs are configured:

```
Authorized redirect URIs:
http://localhost:3000/api/auth/callback/google
```

### Issue: Local LLM (Ollama) Not Performing Well

Use Google Gemini instead for better AI features:

```typescript
// Switch to Gemini
const config = {
  provider: 'gemini',
  apiKey: process.env.GOOGLE_GEMINI_API_KEY
};
```

## CLI Commands (Headless Mode)

```bash
# Basic crawl
rustyseo-headless --url https://example.com

# Deep crawl with limits
rustyseo-headless --url https://example.com --max-pages 500 --concurrency 10

# Parse logs
rustyseo-headless --parse-logs /var/log/nginx/access.log --log-type nginx

# Export to CSV
rustyseo-headless --url https://example.com --export-csv output.csv
```

## Database Schema

RustySEO stores data in SQLite (`~/.rustyseo/data.db`):

```sql
-- Crawl results table
CREATE TABLE crawls (
  id INTEGER PRIMARY KEY,
  url TEXT NOT NULL,
  title TEXT,
  meta_description TEXT,
  status_code INTEGER,
  load_time_ms INTEGER,
  word_count INTEGER,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- SEO tasks table
CREATE TABLE tasks (
  id INTEGER PRIMARY KEY,
  title TEXT NOT NULL,
  description TEXT,
  priority TEXT,
  status TEXT DEFAULT 'pending',
  due_date TEXT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```

## Best Practices

1. **Rate Limiting**: Always add delays between requests to avoid getting blocked
2. **API Keys**: Store in environment variables, never hardcode
3. **Large Crawls**: Use headless mode for sites with >10K pages
4. **OAuth**: Set up proper redirect URIs in Google Cloud Console
5. **AI Features**: Prefer Gemini over local Ollama for production use
6. **Logging**: Enable debug mode for troubleshooting crawl issues

## Resources

- [Official Website](https://www.rustyseo.com)
- [GitHub Repository](https://github.com/mascanho/RustySEO)
- [Headless/TUI Version](https://github.com/mascanho/RustySEO-Headless)
- [Discord Community](https://discord.gg/X49Kj7AT)
- [Google PageSpeed API Docs](https://developers.google.com/speed/docs/insights/v5/get-started)
- [Google Gemini API](https://ai.google.dev/gemini-api/docs/api-key)

Source

Creator's repository · aradotso/marketing-skills

View on GitHub ↗

Security

Security checks in progress

Results will appear here once audits complete

What this skill can do

Reads your filesConnects to the internetRuns code on your machine

Checked by 3 independent security firms

Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub

Does it sneak in hidden code?Not yet checkedPending · Socket

Does it have known bugs?Not yet checkedPending · Snyk