agent-runtime-governance

Audit runtime controls for tool permissions, approvals, memory, telemetry, evals, rollout, and containment. Use when reviewing tool-bearing agent systems.

agent-runtime-governance941 wordsMITRepo-owned

Audit runtime controls for tool permissions, approvals, memory, telemetry, evals, rollout, and containment. Use when reviewing tool-bearing agent systems.

Quick Start

Install:

npx skills add github:wyattowalsh/agents --skill agent-runtime-governance -y -g --agent antigravity --agent claude-code --agent codex --agent crush --agent cursor --agent gemini-cli --agent github-copilot --agent grok --agent opencode

Use: /agent-runtime-governance <mode> [system-or-path]

Works with Claude Code, Gemini CLI, OpenCode, and other agentskills.io-compatible agents.

What It Does

Design and audit the controls that keep tool-bearing agent systems predictable, observable, and safe to operate.

Modes

`$ARGUMENTS`	Mode	Action
Empty	`menu`	Show governance modes and required inputs
`design <system>`	`design`	Define runtime policies for a new or changing agent system
`audit <path-or-system>`	`audit`	Review existing tool, approval, memory, telemetry, and eval controls
`permissions <agent-or-tools>`	`permissions`	Design allowlists, denylists, approval modes, and escalation rules
`memory <agent-or-system>`	`memory`	Define memory scope, retention, privacy, and invalidation policy
`evals <workflow>`	`evals`	Plan regression, adversarial, and runtime acceptance eval loops
`rollout <system>`	`rollout`	Define staged release, monitoring, rollback, and operator readiness controls
`incident <failure-mode>`	`incident`	Define containment and recovery controls for agent failures
Natural language about agent tools, permissions, memory, evals, or containment	Auto-detect the closest mode

Critical Rules

Classify tools by consequence before recommending autonomy.
Require explicit approval for irreversible, costly, public, credential, or destructive actions.
Keep memory scope narrow and document retention, redaction, and invalidation.
Require telemetry for tool calls, approvals, denials, failures, and containment actions.
Add evals for unsafe tool use, missing approval, stale memory, and rollback behavior.
Separate policy from enforcement; name the hook, wrapper, test, or runtime gate that enforces each rule.
Do not replace security-scanner, honest-review, prompt-engineer, or mcp-creator; route to them when the request is outside runtime governance.
Do not mark a governance change ready without rollout, rollback, and monitoring criteria.

Governance Surfaces

Surface	Review Questions
Tools	Which tools can read, write, spend money, deploy, message users, or delete data?
Approvals	Which operations require explicit user approval or human review?
Memory	What can be stored, for how long, and at what scope?
State	What is durable, replayable, idempotent, and auditable?
Telemetry	Which traces, decisions, tool calls, and failures are observable?
Evals	Which scenarios prevent regression before rollout?
Containment	How does the system stop, rollback, quarantine, or degrade safely?

Canonical Vocabulary

Use these canonical terms exactly when producing governance reports.

Term	Meaning
tool consequence	The real-world effect a tool call can have: read, write, deploy, message, spend, delete, or expose
approval gate	Explicit human or policy checkpoint before a higher-risk action
runtime guard	Hook, wrapper, allowlist, denylist, test, or platform policy that enforces a governance rule
memory boundary	Scope, retention, redaction, and invalidation policy for stored agent context
containment	Stop, rollback, quarantine, or degrade action after unsafe or failed behavior
shadow mode	Runtime mode that records proposed actions without executing them

Classification Gate

Classify the request before choosing a mode:

If it asks for app vulnerability scanning, route to security-scanner.
If it asks for code review, route to honest-review.
If it asks for prompt wording only, route to prompt-engineer.
If it asks how to implement an MCP server, route to mcp-creator.
Otherwise, choose the closest runtime governance mode from the dispatch table.

Workflow

Step — Define the agent’s job, users, data sensitivity, and external effects.
Step — Inventory tools by capability: read-only, write, destructive, financial,
Step — Map approval gates to consequence, reversibility, and confidence.
Step — Define memory scope, retention, redaction, and invalidation rules.
Step — Require telemetry for tool calls, decisions, approval outcomes, and failures.
Step — Build evals around unsafe tool use, stale memory, missing approval, and
Step — Define rollout gates, rollback criteria, and operator evidence for changes
Step — Return a governance matrix with owners and enforcement points.

Scaling Strategy

Scope	Strategy
Single agent or workflow	Produce one control matrix and one eval/monitoring set
Multiple agents sharing tools	Group by tool consequence and shared approval gates
Platform-wide governance	Define baseline policy first, then exceptions by agent class
Live production rollout	Add staged rollout, rollback, monitoring, and owner review gates

Progressive Disclosure

Start with this SKILL.md for routing and control surfaces.
Read references/control-matrix.md for permissions, memory, telemetry, and eval controls.
Read references/rollout-governance.md only when release, rollback, monitoring, or production readiness is in scope.
Do not load security, prompt, or MCP implementation references unless routing redirects to those skills.

Output Shape

## Agent Governance Report

- System:
- Mode:
- Risk tier:

### Control Matrix
| Surface | Current | Required | Enforcement | Evidence |
|---|---|---|---|---|

### Required Changes
- ...

### Evals And Monitoring
- ...

### Rollout And Containment
- ...

Validation Contract

Run from this skill directory before declaring changes complete:

python scripts/check.py

Completion criteria:

scripts/check.py exits 0.
No portable-CLI violations remain under this skill directory.
Smoke review covers explicit, implicit, rollout, and negative-control prompts.

Field	Value
Source Type	`repo-owned`
Display Source	`github:wyattowalsh/agents`
Source Kind	`repo`
Installability	portable command
Review State	reviewed
Target Agents	`antigravity`, `claude-code`, `codex`, `crush`, `cursor`, `gemini-cli`, `github-copilot`, `grok`, `opencode`

Field	Value
Name	`agent-runtime-governance`
License	MIT
Version	1.0.0
Author	wyattowalsh

Field	Value
Model	`opus`
Argument Hint	`[mode] [system-or-path]`

View Full SKILL.md

---
name: agent-runtime-governance
description: >-
  Audit runtime controls for tool permissions, approvals, memory, telemetry,
  evals, rollout, and containment. Use when reviewing tool-bearing agent
  systems. NOT for security scans, prompt-only work, or static code review.
argument-hint: "<mode> [system-or-path]"
model: opus
license: MIT
metadata:
  author: wyattowalsh
  version: "1.0.0"
---

# Agent Runtime Governance

Design and audit the controls that keep tool-bearing agent systems predictable,
observable, and safe to operate.

**Scope:** Runtime governance for agents that use tools, memory, approvals,
subagents, evals, or external systems. NOT for generic vulnerability scanning
(`security-scanner`), normal code review (`honest-review`), prompt-only
optimization (`prompt-engineer`), or MCP implementation details (`mcp-creator`).

## Dispatch

| `$ARGUMENTS` | Mode | Action |
|---|---|---|
| Empty | `menu` | Show governance modes and required inputs |
| `design <system>` | `design` | Define runtime policies for a new or changing agent system |
| `audit <path-or-system>` | `audit` | Review existing tool, approval, memory, telemetry, and eval controls |
| `permissions <agent-or-tools>` | `permissions` | Design allowlists, denylists, approval modes, and escalation rules |
| `memory <agent-or-system>` | `memory` | Define memory scope, retention, privacy, and invalidation policy |
| `evals <workflow>` | `evals` | Plan regression, adversarial, and runtime acceptance eval loops |
| `rollout <system>` | `rollout` | Define staged release, monitoring, rollback, and operator readiness controls |
| `incident <failure-mode>` | `incident` | Define containment and recovery controls for agent failures |
| Natural language about agent tools, permissions, memory, evals, or containment | Auto-detect the closest mode |

## Governance Surfaces

| Surface | Review Questions |
|---|---|
| Tools | Which tools can read, write, spend money, deploy, message users, or delete data? |
| Approvals | Which operations require explicit user approval or human review? |
| Memory | What can be stored, for how long, and at what scope? |
| State | What is durable, replayable, idempotent, and auditable? |
| Telemetry | Which traces, decisions, tool calls, and failures are observable? |
| Evals | Which scenarios prevent regression before rollout? |
| Containment | How does the system stop, rollback, quarantine, or degrade safely? |

## Canonical Vocabulary

Use these canonical terms exactly when producing governance reports.

| Term | Meaning |
|---|---|
| **tool consequence** | The real-world effect a tool call can have: read, write, deploy, message, spend, delete, or expose |
| **approval gate** | Explicit human or policy checkpoint before a higher-risk action |
| **runtime guard** | Hook, wrapper, allowlist, denylist, test, or platform policy that enforces a governance rule |
| **memory boundary** | Scope, retention, redaction, and invalidation policy for stored agent context |
| **containment** | Stop, rollback, quarantine, or degrade action after unsafe or failed behavior |
| **shadow mode** | Runtime mode that records proposed actions without executing them |

## Classification Gate

Classify the request before choosing a mode:

1. If it asks for app vulnerability scanning, route to `security-scanner`.
2. If it asks for code review, route to `honest-review`.
3. If it asks for prompt wording only, route to `prompt-engineer`.
4. If it asks how to implement an MCP server, route to `mcp-creator`.
5. Otherwise, choose the closest runtime governance mode from the dispatch table.

## Workflow

1. Define the agent’s job, users, data sensitivity, and external effects.
2. Inventory tools by capability: read-only, write, destructive, financial,
   deploy, messaging, credential access, and network egress.
3. Map approval gates to consequence, reversibility, and confidence.
4. Define memory scope, retention, redaction, and invalidation rules.
5. Require telemetry for tool calls, decisions, approval outcomes, and failures.
6. Build evals around unsafe tool use, stale memory, missing approval, and
   failure containment.
7. Define rollout gates, rollback criteria, and operator evidence for changes
   that affect live users, accounts, credentials, or external systems.
8. Return a governance matrix with owners and enforcement points.

## Scaling Strategy

| Scope | Strategy |
|---|---|
| Single agent or workflow | Produce one control matrix and one eval/monitoring set |
| Multiple agents sharing tools | Group by tool consequence and shared approval gates |
| Platform-wide governance | Define baseline policy first, then exceptions by agent class |
| Live production rollout | Add staged rollout, rollback, monitoring, and owner review gates |

## Progressive Disclosure

- Start with this `SKILL.md` for routing and control surfaces.
- Read `references/control-matrix.md` for permissions, memory, telemetry, and eval controls.
- Read `references/rollout-governance.md` only when release, rollback, monitoring, or production readiness is in scope.
- Do not load security, prompt, or MCP implementation references unless routing redirects to those skills.

## Reference File Index

| File | Read When |
|---|---|
| `references/control-matrix.md` | Designing or auditing runtime control surfaces |
| `references/rollout-governance.md` | Planning staged release, rollback, monitoring, and operator readiness |

## Output Shape

```markdown
## Agent Governance Report

- System:
- Mode:
- Risk tier:

### Control Matrix
| Surface | Current | Required | Enforcement | Evidence |
|---|---|---|---|---|

### Required Changes
- ...

### Evals And Monitoring
- ...

### Rollout And Containment
- ...
```

## Critical Rules

1. Classify tools by consequence before recommending autonomy.
2. Require explicit approval for irreversible, costly, public, credential, or destructive actions.
3. Keep memory scope narrow and document retention, redaction, and invalidation.
4. Require telemetry for tool calls, approvals, denials, failures, and containment actions.
5. Add evals for unsafe tool use, missing approval, stale memory, and rollback behavior.
6. Separate policy from enforcement; name the hook, wrapper, test, or runtime gate that enforces each rule.
7. Do not replace `security-scanner`, `honest-review`, `prompt-engineer`, or `mcp-creator`; route to them when the request is outside runtime governance.
8. Do not mark a governance change ready without rollout, rollback, and monitoring criteria.

## Validation Contract

Run from this skill directory before declaring changes complete:

```bash
python scripts/check.py
```

Completion criteria:

1. `scripts/check.py` exits 0.
2. No portable-CLI violations remain under this skill directory.
3. Smoke review covers explicit, implicit, rollout, and negative-control prompts.

Download from GitHub

Resources

Skill Catalog Browse custom and external skills.

CLI Reference Install and manage skills.

agentskills.io The open ecosystem for cross-agent skills.

View source on GitHub