gemma-gem-browser-ai

Build and extend Gemma Gem, an on-device AI browser assistant Chrome extension running Google's Gemma 4 model via WebGPU with no cloud dependencies.

Skill file

Preview skill file
---
name: gemma-gem-browser-ai
description: Build and extend Gemma Gem, an on-device AI browser assistant Chrome extension running Google's Gemma 4 model via WebGPU with no cloud dependencies.
triggers:
  - "add a tool to gemma gem"
  - "extend the gemma gem extension"
  - "how do I build the gemma gem chrome extension"
  - "gemma gem agent loop"
  - "add a new gemma gem tool"
  - "gemma 4 webgpu chrome extension"
  - "gemma gem content script tool"
  - "on-device llm browser extension"
---

# Gemma Gem Browser AI

> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.

Gemma Gem is a Chrome extension that runs Google's Gemma 4 model entirely on-device via WebGPU. It injects a chat overlay into every page and exposes a tool-calling agent loop that can read pages, click elements, fill forms, execute JavaScript, and take screenshots — all without sending data to any server.

## Architecture Overview

```
Offscreen Document          Service Worker           Content Script
(Gemma 4 + Agent Loop)  <-> (Message Router)    <-> (Chat UI + DOM Tools)
       |                         |
  WebGPU inference          Screenshot capture
  Token streaming           JS execution
```

- **Offscreen document** (`offscreen/`): Loads the ONNX model via `@huggingface/transformers`, runs the agent loop, streams tokens.
- **Service worker** (`background/`): Routes messages, handles `take_screenshot` and `run_javascript`.
- **Content script** (`content/`): Injects shadow DOM chat UI, executes DOM tools.
- **`agent/`**: Zero-dependency module defining `ModelBackend` and `ToolExecutor` interfaces — extractable as a standalone library.

## Install & Build

```bash
# Prerequisites: Node.js 18+, pnpm
pnpm install

# Development build (logging active, source maps)
pnpm build

# Production build (errors only, minified)
pnpm build:prod
```

Load the extension:
1. Open `chrome://extensions`
2. Enable **Developer mode**
3. Click **Load unpacked** → select `.output/chrome-mv3-dev/`

**Model download** happens automatically on first chat open:
- `onnx-community/gemma-4-E2B-it-ONNX` — ~500 MB (default)
- `onnx-community/gemma-4-E4B-it-ONNX` — ~1.5 GB

Models are cached in the browser's cache storage after the first run.

## Key Interfaces (`agent/`)

### ModelBackend

```typescript
// agent/types.ts
export interface ModelBackend {
  generate(
    messages: ChatMessage[],
    tools: ToolDefinition[],
    options: GenerateOptions
  ): AsyncGenerator<StreamChunk>;
}

export interface ToolDefinition {
  name: string;
  description: string;
  parameters: JSONSchema;
}

export interface GenerateOptions {
  maxNewTokens?: number;
  thinking?: boolean;
}
```

### ToolExecutor

```typescript
// agent/types.ts
export interface ToolExecutor {
  execute(toolName: string, args: Record<string, unknown>): Promise<unknown>;
}
```

### Agent Loop

```typescript
// agent/loop.ts — simplified illustration
export async function* runAgentLoop(
  userMessage: string,
  history: ChatMessage[],
  model: ModelBackend,
  tools: ToolExecutor,
  toolDefs: ToolDefinition[],
  maxIterations: number
): AsyncGenerator<AgentEvent> {
  const messages = [...history, { role: "user", content: userMessage }];

  for (let i = 0; i < maxIterations; i++) {
    for await (const chunk of model.generate(messages, toolDefs, {})) {
      if (chunk.type === "token") yield { type: "token", token: chunk.token };
      if (chunk.type === "tool_call") {
        yield { type: "tool_start", name: chunk.name };
        const result = await tools.execute(chunk.name, chunk.args);
        yield { type: "tool_result", name: chunk.name, result };
        messages.push({ role: "tool", name: chunk.name, content: String(result) });
      }
      if (chunk.type === "done") return;
    }
  }
}
```

## Built-in Tools

| Tool | Location | Description |
|------|----------|-------------|
| `read_page_content` | Content script | Read page text/HTML or a CSS selector |
| `take_screenshot` | Service worker | Capture visible tab as PNG |
| `click_element` | Content script | Click by CSS selector |
| `type_text` | Content script | Type into input by CSS selector |
| `scroll_page` | Content script | Scroll by pixel amount |
| `run_javascript` | Service worker | Execute JS in page context |

## Adding a New Tool

Tools live in two places: the **definition** (in the offscreen agent) and the **executor** (in content script or service worker).

### Step 1 — Define the tool schema

```typescript
// offscreen/tools/definitions.ts
export const MY_TOOL_DEFINITION: ToolDefinition = {
  name: "get_page_title",
  description: "Returns the document title of the current page.",
  parameters: {
    type: "object",
    properties: {},
    required: [],
  },
};
```

### Step 2 — Register in the tool list

```typescript
// offscreen/tools/index.ts
import { MY_TOOL_DEFINITION } from "./definitions";

export const ALL_TOOLS: ToolDefinition[] = [
  // ...existing tools
  MY_TOOL_DEFINITION,
];
```

### Step 3 — Implement execution in the content script

```typescript
// content/tools/executor.ts
export async function executeContentTool(
  name: string,
  args: Record<string, unknown>
): Promise<unknown> {
  switch (name) {
    case "get_page_title":
      return document.title;

    case "read_page_content": {
      const selector = args.selector as string | undefined;
      if (selector) {
        return document.querySelector(selector)?.textContent ?? "Not found";
      }
      return document.body.innerText;
    }

    case "click_element": {
      const el = document.querySelector(args.selector as string) as HTMLElement;
      if (!el) throw new Error(`Element not found: ${args.selector}`);
      el.click();
      return "clicked";
    }

    case "type_text": {
      const input = document.querySelector(args.selector as string) as HTMLInputElement;
      if (!input) throw new Error(`Input not found: ${args.selector}`);
      input.focus();
      input.value = args.text as string;
      input.dispatchEvent(new Event("input", { bubbles: true }));
      input.dispatchEvent(new Event("change", { bubbles: true }));
      return "typed";
    }

    default:
      throw new Error(`Unknown content tool: ${name}`);
  }
}
```

### Step 4 — Handle service-worker-side tools

```typescript
// background/tools.ts
export async function executeSwTool(
  name: string,
  args: Record<string, unknown>,
  tabId: number
): Promise<unknown> {
  switch (name) {
    case "take_screenshot": {
      const dataUrl = await chrome.tabs.captureVisibleTab({ format: "png" });
      return dataUrl;
    }

    case "run_javascript": {
      const results = await chrome.scripting.executeScript({
        target: { tabId },
        func: new Function(args.code as string) as () => unknown,
      });
      return results[0]?.result ?? null;
    }

    default:
      return null; // not a SW tool — forward to content script
  }
}
```

## Message Routing Pattern

The service worker acts as a message bus. All communication uses `chrome.runtime.sendMessage`.

```typescript
// Message types (shared/messages.ts)
export type ExtMessage =
  | { type: "TOOL_CALL"; name: string; args: Record<string, unknown>; tabId: number }
  | { type: "TOOL_RESULT"; name: string; result: unknown }
  | { type: "TOKEN"; token: string }
  | { type: "AGENT_DONE" }
  | { type: "AGENT_ERROR"; error: string };

// Offscreen → SW
chrome.runtime.sendMessage<ExtMessage>({
  type: "TOOL_CALL",
  name: "click_element",
  args: { selector: "#submit-btn" },
  tabId: currentTabId,
});

// SW → Content script
chrome.tabs.sendMessage<ExtMessage>(tabId, {
  type: "TOOL_CALL",
  name: "click_element",
  args: { selector: "#submit-btn" },
  tabId,
});
```

## Model Configuration

```typescript
// offscreen/model.ts — loading with transformers.js
import { pipeline, TextGenerationPipeline } from "@huggingface/transformers";

const MODEL_IDS = {
  E2B: "onnx-community/gemma-4-E2B-it-ONNX",
  E4B: "onnx-community/gemma-4-E4B-it-ONNX",
} as const;

export type ModelSize = keyof typeof MODEL_IDS;

export async function loadModel(
  size: ModelSize,
  onProgress: (progress: number) => void
): Promise<TextGenerationPipeline> {
  return pipeline("text-generation", MODEL_IDS[size], {
    dtype: "q4f16",
    device: "webgpu",
    progress_callback: (p: { progress: number }) => onProgress(p.progress),
  });
}
```

## Settings & Persistence

Settings are stored via `chrome.storage.sync`:

```typescript
export interface GemmaGemSettings {
  modelSize: "E2B" | "E4B";
  thinking: boolean;
  maxIterations: number;
  disabledHosts: string[];
}

const DEFAULT_SETTINGS: GemmaGemSettings = {
  modelSize: "E2B",
  thinking: false,
  maxIterations: 10,
  disabledHosts: [],
};

export async function getSettings(): Promise<GemmaGemSettings> {
  const stored = await chrome.storage.sync.get("settings");
  return { ...DEFAULT_SETTINGS, ...(stored.settings ?? {}) };
}

export async function saveSettings(patch: Partial<GemmaGemSettings>): Promise<void> {
  const current = await getSettings();
  await chrome.storage.sync.set({ settings: { ...current, ...patch } });
}

// Disable extension on current host
async function disableOnCurrentSite() {
  const host = new URL(location.href).hostname;
  const settings = await getSettings();
  if (!settings.disabledHosts.includes(host)) {
    await saveSettings({ disabledHosts: [...settings.disabledHosts, host] });
  }
}
```

## Shadow DOM Chat UI Pattern

The content script injects a shadow DOM to isolate styles:

```typescript
// content/ui.ts
export function injectChatOverlay(): ShadowRoot {
  const host = document.createElement("div");
  host.id = "gemma-gem-host";
  // Prevent page styles from leaking in
  const shadow = host.attachShadow({ mode: "closed" });

  // Inject styles
  const style = document.createElement("style");
  style.textContent = CHAT_STYLES; // imported CSS string
  shadow.appendChild(style);

  // Inject chat container
  const container = document.createElement("div");
  container.id = "gemma-gem-container";
  shadow.appendChild(container);

  document.body.appendChild(host);
  return shadow;
}
```

## Debugging

All logs use `[Gemma Gem]` prefix. Development builds log info/debug/warn; production only logs errors.

```
# Service worker logs
chrome://extensions → Gemma Gem → "Inspect views: service worker"

# Offscreen document (most useful: model loading, prompts, tool calls)
chrome://extensions → Gemma Gem → "Inspect views: offscreen.html"

# Content script logs
DevTools on any page → Console (filter: [Gemma Gem])

# All extension contexts
chrome://inspect#other
```

Key things to check in offscreen document logs:
- Model download progress
- Full prompt construction
- Token counts per turn
- Raw model output (before tool call parsing)
- Tool execution results

## Common Patterns & Gotchas

**WebGPU availability check:**
```typescript
if (!navigator.gpu) {
  throw new Error("WebGPU not supported. Use Chrome 113+ with hardware acceleration enabled.");
}
const adapter = await navigator.gpu.requestAdapter();
if (!adapter) throw new Error("No WebGPU adapter found.");
```

**Offscreen document lifecycle** — Chrome may suspend the offscreen document. Ping it before sending messages:
```typescript
async function ensureOffscreen() {
  const existing = await chrome.offscreen.hasDocument();
  if (!existing) {
    await chrome.offscreen.createDocument({
      url: "offscreen.html",
      reasons: [chrome.offscreen.Reason.WORKERS],
      justification: "Run Gemma 4 inference via WebGPU",
    });
  }
}
```

**Context window management** — Gemma 4 supports 128K tokens but inference slows with long contexts. Clear history per-page with `clear_context` or limit stored turns:
```typescript
const MAX_HISTORY_TURNS = 20;
function trimHistory(messages: ChatMessage[]): ChatMessage[] {
  if (messages.length <= MAX_HISTORY_TURNS * 2) return messages;
  return messages.slice(-MAX_HISTORY_TURNS * 2);
}
```

**Tool call parsing** — Gemma 4 emits tool calls in a structured format. If adding custom parsing, guard against partial/streamed JSON:
```typescript
function safeParseToolCall(raw: string): { name: string; args: Record<string, unknown> } | null {
  try {
    return JSON.parse(raw);
  } catch {
    return null; // still streaming
  }
}
```

**CSS selector safety for DOM tools:**
```typescript
function safeQuerySelector(selector: string): Element | null {
  try {
    return document.querySelector(selector);
  } catch {
    return null; // invalid selector from model
  }
}
```

Source

Creator's repository · aradotso/trending-skills

View on GitHub

Security

Security checks in progress
Results will appear here once audits complete
What this skill can do
Reads your filesConnects to the internetRuns code on your machine
Checked by 3 independent security firms
Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub
Does it sneak in hidden code?Not yet checkedPending · Socket
Does it have known bugs?Not yet checkedPending · Snyk