3.9 KiB

Raw Blame History

Subagent Testing Commands - Quick Reference

Purpose: Quick command reference for testing subagents

Last Updated: 2026-01-07

Standalone Mode (Unit Testing)

Run All Standalone Tests

cd evals/framework
npm run eval:sdk -- --subagent=ContextScout --pattern="standalone/*.yaml"

Run Single Test

npm run eval:sdk -- --subagent=ContextScout --pattern="standalone/01-simple-discovery.yaml"

Debug Mode

npm run eval:sdk -- --subagent=ContextScout --pattern="standalone/*.yaml" --debug

Delegation Mode (Integration Testing)

Run Delegation Tests

npm run eval:sdk -- --agent=core/openagent --pattern="delegation/*.yaml"

Test Specific Delegation

npm run eval:sdk -- --agent=core/openagent --pattern="delegation/01-contextscout-delegation.yaml"

Verification Commands

Check Agent File

# View agent frontmatter
head -30 .opencode/agent/ContextScout.md

# Check tool permissions
grep -A 10 "^tools:" .opencode/agent/ContextScout.md

Check Test Config

cat evals/agents/ContextScout/config/config.yaml

View Latest Results

# Summary
cat evals/results/latest.json | jq '.summary'

# Agent loaded
cat evals/results/latest.json | jq '.meta.agent'

# Tool calls
cat evals/results/latest.json | jq '.tests[0]' | grep -A 5 "Tool"

# Violations
cat evals/results/latest.json | jq '.tests[0].violations'

Common Test Patterns

Smoke Test

npm run eval:sdk -- --subagent=ContextScout --pattern="smoke-test.yaml"

Specific Test Suite

npm run eval:sdk -- --subagent=ContextScout --pattern="discovery/*.yaml"

All Tests for Subagent

npm run eval:sdk -- --subagent=ContextScout

Flag Reference

Flag	Purpose	Example
`--subagent`	Test subagent in standalone mode	`--subagent=ContextScout`
`--agent`	Test primary agent (or delegation)	`--agent=core/openagent`
`--pattern`	Filter test files	`--pattern="standalone/*.yaml"`
`--debug`	Show detailed output	`--debug`
`--timeout`	Override timeout	`--timeout=120000`

Troubleshooting Commands

Check Which Agent Ran

# Should show subagent name for standalone mode
cat evals/results/latest.json | jq '.meta.agent'

Check Tool Usage

# Should show tool calls > 0
cat evals/results/latest.json | jq '.tests[0]' | grep "Tool Calls"

View Test Timeline

# See full conversation
cat evals/results/history/2026-01/07-*.json | jq '.tests[0].timeline'

Check for Errors

# View violations
cat evals/results/latest.json | jq '.tests[0].violations.details'

File Locations

Agent Files

.opencode/agent/subagents/core/{subagent}.md

Test Files

evals/agents/subagents/core/{subagent}/
├── config/config.yaml
└── tests/
    ├── standalone/
    │   ├── 01-simple-discovery.yaml
    │   └── 02-advanced-test.yaml
    └── delegation/
        └── 01-delegation-test.yaml

Results

evals/results/
├── latest.json                    # Latest test run
└── history/2026-01/              # Historical results
    └── 07-HHMMSS-{agent}.json

Quick Checks

Is Agent Loaded Correctly?

# Should show: "agent": "ContextScout"
cat evals/results/latest.json | jq '.meta.agent'

Did Agent Use Tools?

# Should show: Tool Calls: 1 (or more)
cat evals/results/latest.json | jq '.tests[0]' | grep "Tool Calls"

Did Test Pass?

# Should show: "passed": 1, "failed": 0
cat evals/results/latest.json | jq '.summary'

concepts/subagent-testing-modes.md - Understand testing modes
guides/testing-subagents.md - Step-by-step testing guide
errors/tool-permission-errors.md - Fix common issues

Reference: evals/framework/src/sdk/run-sdk-tests.ts

3.9 KiB Raw Blame History

Subagent Testing Commands - Quick Reference

Standalone Mode (Unit Testing)

Run All Standalone Tests

Run Single Test

Debug Mode

Delegation Mode (Integration Testing)

Run Delegation Tests

Test Specific Delegation

Verification Commands

Check Agent File

Check Test Config

View Latest Results

Common Test Patterns

Smoke Test

Specific Test Suite

All Tests for Subagent

Flag Reference

Troubleshooting Commands

Check Which Agent Ran

Check Tool Usage

View Test Timeline

Check for Errors

File Locations

Agent Files

Test Files

Results

Quick Checks

Is Agent Loaded Correctly?

Did Agent Use Tools?

Did Test Pass?

Related

3.9 KiB

Raw Blame History