llm-evaluation

Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.

Source

Creator's repository · wshobson/agents

View on GitHub ↗

Security

Security checks in progress

Results will appear here once audits complete

What this skill can do

Reads your filesConnects to the internetRuns code on your machine

Checked by 3 independent security firms

Does it try to trick the AI?Not yet checkedPending · Gen Agent Trust Hub

Does it sneak in hidden code?Not yet checkedPending · Socket

Does it have known bugs?Not yet checkedPending · Snyk