devops-engineer
Design, optimize, and debug CI/CD pipelines. GitHub Actions and GitLab CI patterns. Use for pipeline work. NOT for infrastructure provisioning (infrastructure-coder) or app code.
Design, optimize, and debug CI/CD pipelines. GitHub Actions and GitLab CI patterns. Use for pipeline work. NOT for infrastructure provisioning (infrastructure-coder) or app code.
Quick Start
Install:
npx skills add github:wyattowalsh/agents --skill devops-engineer -y -g --agent antigravity --agent claude-code --agent codex --agent crush --agent cursor --agent gemini-cli --agent github-copilot --agent grok --agent opencode Use: /devops-engineer <mode> [target]
Works with Claude Code, Gemini CLI, OpenCode, and other agentskills.io-compatible agents.
What It Does
Section titled “What It Does”CI/CD pipeline design, optimization, and deployment strategy. 6-mode pipeline: generate workflows, optimize build times, design deployment strategies, review existing pipelines, debug CI failures.
| $ARGUMENTS | Mode |
|---|---|
pipeline <requirements> | Generate: new CI/CD workflow from requirements |
action <description> | Action: GitHub Action step/job generation |
optimize <workflow> | Optimize: pipeline build time optimization |
deploy <strategy> | Deploy: deployment strategy design |
review <workflow> | Review: audit existing pipeline |
debug <logs> | Debug: analyze CI failure logs |
| Natural language about CI/CD | Auto-detect appropriate mode |
| Empty | Show mode menu with examples |
Critical Rules
Section titled “Critical Rules”- Never generate workflows with unpinned third-party actions — always use full SHA pins (
uses: actions/checkout@<sha>) - Never use
pull_request_targetwithactions/checkoutof PR head — script injection risk - Always set explicit
permissionsblock — never rely on default (overly broad) permissions - Never hardcode secrets in workflow files — use
${{ secrets.NAME }}or environment variables - Always include a
concurrencygroup for deployment workflows to prevent parallel deploys - Always add
timeout-minutesto every job — prevent runaway jobs consuming quota - Never generate
runs-on: self-hostedwithout explicit user request — security implications - Always validate generated YAML by running
workflow-analyzer.pybefore presenting - Deployment workflows must include health checks and rollback triggers
- Debug mode must truncate/sample large logs (>500 lines) before analysis — do not load entire CI logs into context
- Review mode is read-only until user approves fixes (approval gate)
- Load ONE reference file at a time — do not preload all references into context
- Every optimization recommendation must include estimated time savings
- Generated workflows must include inline comments explaining non-obvious configuration choices
Canonical Vocabulary
Section titled “Canonical Vocabulary”Use these terms exactly throughout all modes:
| Term | Definition |
|---|---|
| workflow | A CI/CD pipeline definition file (.github/workflows/*.yml, .gitlab-ci.yml) |
| job | A named unit of work within a workflow containing one or more steps |
| step | A single action within a job (run command, uses action) |
| stage | A logical grouping of jobs (build, test, deploy) |
| artifact | Build output passed between jobs or stages |
| cache | Dependency/build cache persisted across runs to reduce build time |
| matrix | Parameterized job expansion across multiple configurations |
| concurrency group | Mutual exclusion mechanism preventing parallel runs |
| environment | Deployment target with protection rules (staging, production) |
| promotion | Moving artifacts through environments (dev -> staging -> prod) |
| rollback | Reverting a deployment to a previous known-good state |
| canary | Incremental traffic shift to new version (1% -> 5% -> 25% -> 100%) |
| blue/green | Two identical environments with instant traffic switch |
| rolling | Gradual instance-by-instance replacement |
| gate | Manual or automated approval checkpoint before deployment proceeds |
| runner | Execution environment for CI/CD jobs (GitHub-hosted, self-hosted) |
| reusable workflow | Callable workflow template invoked from other workflows |
| composite action | Multi-step action packaged as a single reusable unit |
Mode 1: Generate (Pipeline)
Section titled “Mode 1: Generate (Pipeline)”Design and generate CI/CD workflow files from requirements.
- Gather requirements — language, framework, test suite, deployment targets, branch strategy
- Select platform — GitHub Actions (default), GitLab CI, or both
- Load patterns — read
references/github-actions-patterns.mdorreferences/gitlab-ci-patterns.md - Design structure — jobs, stages, dependencies, triggers, caching strategy
- Generate workflow — complete YAML file with inline comments explaining non-obvious choices
- Validate — run
uv run python skills/devops-engineer/scripts/workflow-analyzer.py <file>on generated output
Output
Section titled “Output”Complete workflow YAML file written to the appropriate location.
Mode 2: Action (Action)
Section titled “Mode 2: Action (Action)”Generate individual GitHub Action steps or jobs.
- Parse description — what the action should accomplish
- Load patterns — read
references/github-actions-patterns.md - Generate — step or job YAML with correct
uses,with,envconfiguration - Context check — if an existing workflow is referenced, read it and integrate the new action
Output: YAML snippet ready for insertion into a workflow file.
Mode 3: Optimize (Optimize)
Section titled “Mode 3: Optimize (Optimize)”Analyze and optimize pipeline build times.
Analysis
Section titled “Analysis”- Analyze — run
uv run python skills/devops-engineer/scripts/workflow-analyzer.py <workflow> - Estimate costs — run
uv run python skills/devops-engineer/scripts/pipeline-cost-estimator.py <workflow> - Load techniques — read
references/pipeline-optimization.md
Optimization Opportunities
Section titled “Optimization Opportunities”- Identify opportunities:
- Missing caches (dependency, build artifact, Docker layer)
- Sequential jobs that could run in parallel
- Missing matrix strategy for multi-version testing
- Unnecessary full checkouts (use sparse-checkout or shallow clone)
- Redundant steps across jobs
- Missing path filters for selective runs
- Oversized runner for lightweight tasks
- Present plan — ranked optimization recommendations with estimated time savings
- Implement — apply approved optimizations to the workflow file
Mode 4: Deploy (Deploy)
Section titled “Mode 4: Deploy (Deploy)”Design deployment strategies with rollback plans.
- Assess requirements — uptime SLA, rollback speed, traffic management capability
- Load strategies — read
references/deployment-strategies.md - Recommend strategy — blue/green, canary, or rolling based on requirements
| Factor | Blue/Green | Canary | Rolling |
|---|---|---|---|
| Rollback speed | Instant | Fast | Slow |
| Resource cost | 2x | 1.1-1.5x | 1x |
| Risk exposure | None (pre-switch) | Gradual | Gradual |
| Complexity | Medium | High | Low |
| Best for | Critical services | High-traffic APIs | Cost-sensitive apps |
- Generate — deployment workflow with health checks, gates, and rollback triggers
- Document — runbook with rollback procedure and escalation path
Mode 5: Review (Review)
Section titled “Mode 5: Review (Review)”Audit an existing CI/CD pipeline for issues and improvements.
Audit Process
Section titled “Audit Process”- Read workflow — parse the target workflow file(s)
- Analyze — run
uv run python skills/devops-engineer/scripts/workflow-analyzer.py <workflow> - Load checklists — read
references/pipeline-review-checklist.md
Evaluation Dimensions
Section titled “Evaluation Dimensions”- Evaluate dimensions:
- Security: secrets management, permissions scope, unpinned actions, script injection
- Reliability: retry logic, timeout configuration, concurrency handling
- Performance: caching, parallelization, selective triggers
- Maintainability: DRY (reusable workflows/composite actions), readability, documentation
- Cost: runner selection, unnecessary matrix combinations, artifact retention
- Present findings — categorized by severity (critical/warning/info) with fix recommendations
- Implement — apply approved fixes
Mode 6: Debug (Debug)
Section titled “Mode 6: Debug (Debug)”Analyze CI failure logs to identify root causes and fixes.
- Ingest logs — read provided log file or inline content. For large logs (>500 lines): truncate to last 200 lines + first 50 lines, then sample middle sections around error patterns
- Parse errors — run
uv run python skills/devops-engineer/scripts/log-parser.py <logfile> - Load triage protocol — read
references/ci-failure-triage.md - Classify failures by category:
| Category | Examples | Common Fixes |
|---|---|---|
| dependency | Version conflict, missing package, registry timeout | Pin versions, add retry, use cache |
| build | Compilation error, type error, out of memory | Fix code, increase runner memory |
| test | Assertion failure, flaky test, timeout | Fix test, add retry for flaky, increase timeout |
| lint | Format violation, rule violation | Run formatter, update config |
| deploy | Permission denied, health check fail, resource limit | Fix permissions, check config, scale resources |
- Trace root cause — follow error chain to the originating failure
- Recommend fix — specific actionable steps with code/config changes
Reference Files
Section titled “Reference Files”Load ONE reference at a time. Do not preload all references into context.
| File | Content | Read When |
|---|---|---|
references/github-actions-patterns.md | Workflow patterns, reusable workflows, composite actions, security hardening | Generate, Action, Review modes |
references/gitlab-ci-patterns.md | GitLab CI pipeline patterns, includes, rules, environments | Generate mode (GitLab) |
references/deployment-strategies.md | Blue/green, canary, rolling strategies with comparison and rollback | Deploy mode |
references/pipeline-optimization.md | Caching, parallelization, selective runs, matrix optimization | Optimize mode |
references/pipeline-review-checklist.md | Security, reliability, performance, maintainability, cost checklists | Review mode |
references/ci-failure-triage.md | Error category taxonomy, root cause patterns, fix recipes | Debug mode |
references/artifact-management.md | Artifact passing, retention, environment promotion patterns | Generate, Deploy modes |
| Script | When to Run |
|---|---|
scripts/workflow-analyzer.py | Analyze workflow structure, detect issues, find optimization opportunities |
scripts/pipeline-cost-estimator.py | Estimate CI minutes and identify cost savings |
scripts/log-parser.py | Extract actionable errors from CI failure logs |
| Template | When to Render |
|---|---|
templates/dashboard.html | After analysis — inject pipeline health data into the dashboard |
| Field | Value |
|---|---|
| Source Type | repo-owned |
| Display Source | github:wyattowalsh/agents |
| Source Kind | repo |
| Installability | portable command |
| Review State | reviewed |
| Target Agents | antigravity, claude-code, codex, crush, cursor, gemini-cli, github-copilot, grok, opencode |
| Field | Value |
|---|---|
| Name | devops-engineer |
| License | MIT |
| Version | 1.0.0 |
| Author | wyattowalsh |
| Field | Value |
|---|---|
| Model | opus |
| Argument Hint | [mode] [target] |
View Full SKILL.md
---name: devops-engineerdescription: >- Design, optimize, and debug CI/CD pipelines. GitHub Actions and GitLab CI patterns. Use for pipeline work. NOT for infrastructure provisioning (infrastructure-coder) or app code.argument-hint: "<mode> [target]"model: opuslicense: MITmetadata: author: wyattowalsh version: "1.0.0"---
# DevOps Engineer
CI/CD pipeline design, optimization, and deployment strategy. 6-mode pipeline: generate workflows, optimize build times, design deployment strategies, review existing pipelines, debug CI failures.
**Scope:** CI/CD pipelines and deployment automation only. NOT for infrastructure provisioning (infrastructure-coder), application code, monitoring setup, or database migrations (database-architect).
## Canonical Vocabulary
Use these terms exactly throughout all modes:
| Term | Definition ||------|------------|| **workflow** | A CI/CD pipeline definition file (.github/workflows/*.yml, .gitlab-ci.yml) || **job** | A named unit of work within a workflow containing one or more steps || **step** | A single action within a job (run command, uses action) || **stage** | A logical grouping of jobs (build, test, deploy) || **artifact** | Build output passed between jobs or stages || **cache** | Dependency/build cache persisted across runs to reduce build time || **matrix** | Parameterized job expansion across multiple configurations || **concurrency group** | Mutual exclusion mechanism preventing parallel runs || **environment** | Deployment target with protection rules (staging, production) || **promotion** | Moving artifacts through environments (dev -> staging -> prod) || **rollback** | Reverting a deployment to a previous known-good state || **canary** | Incremental traffic shift to new version (1% -> 5% -> 25% -> 100%) || **blue/green** | Two identical environments with instant traffic switch || **rolling** | Gradual instance-by-instance replacement || **gate** | Manual or automated approval checkpoint before deployment proceeds || **runner** | Execution environment for CI/CD jobs (GitHub-hosted, self-hosted) || **reusable workflow** | Callable workflow template invoked from other workflows || **composite action** | Multi-step action packaged as a single reusable unit |
## Dispatch
| $ARGUMENTS | Mode ||------------|------|| `pipeline <requirements>` | Generate: new CI/CD workflow from requirements || `action <description>` | Action: GitHub Action step/job generation || `optimize <workflow>` | Optimize: pipeline build time optimization || `deploy <strategy>` | Deploy: deployment strategy design || `review <workflow>` | Review: audit existing pipeline || `debug <logs>` | Debug: analyze CI failure logs || Natural language about CI/CD | Auto-detect appropriate mode || Empty | Show mode menu with examples |
## Mode 1: Generate (`pipeline`)
Design and generate CI/CD workflow files from requirements.
### Steps
1. **Gather requirements** -- language, framework, test suite, deployment targets, branch strategy2. **Select platform** -- GitHub Actions (default), GitLab CI, or both3. **Load patterns** -- read `references/github-actions-patterns.md` or `references/gitlab-ci-patterns.md`4. **Design structure** -- jobs, stages, dependencies, triggers, caching strategy5. **Generate workflow** -- complete YAML file with inline comments explaining non-obvious choices6. **Validate** -- run `uv run python skills/devops-engineer/scripts/workflow-analyzer.py <file>` on generated output
### Output
Complete workflow YAML file written to the appropriate location.
## Mode 2: Action (`action`)
Generate individual GitHub Action steps or jobs.
1. **Parse description** -- what the action should accomplish2. **Load patterns** -- read `references/github-actions-patterns.md`3. **Generate** -- step or job YAML with correct `uses`, `with`, `env` configuration4. **Context check** -- if an existing workflow is referenced, read it and integrate the new action
Output: YAML snippet ready for insertion into a workflow file.
## Mode 3: Optimize (`optimize`)
Analyze and optimize pipeline build times.
### Analysis
1. **Analyze** -- run `uv run python skills/devops-engineer/scripts/workflow-analyzer.py <workflow>`2. **Estimate costs** -- run `uv run python skills/devops-engineer/scripts/pipeline-cost-estimator.py <workflow>`3. **Load techniques** -- read `references/pipeline-optimization.md`
### Optimization Opportunities
4. **Identify opportunities**: - Missing caches (dependency, build artifact, Docker layer) - Sequential jobs that could run in parallel - Missing matrix strategy for multi-version testing - Unnecessary full checkouts (use sparse-checkout or shallow clone) - Redundant steps across jobs - Missing path filters for selective runs - Oversized runner for lightweight tasks5. **Present plan** -- ranked optimization recommendations with estimated time savings6. **Implement** -- apply approved optimizations to the workflow file
## Mode 4: Deploy (`deploy`)
Design deployment strategies with rollback plans.
1. **Assess requirements** -- uptime SLA, rollback speed, traffic management capability2. **Load strategies** -- read `references/deployment-strategies.md`3. **Recommend strategy** -- blue/green, canary, or rolling based on requirements
| Factor | Blue/Green | Canary | Rolling ||--------|-----------|--------|---------|| Rollback speed | Instant | Fast | Slow || Resource cost | 2x | 1.1-1.5x | 1x || Risk exposure | None (pre-switch) | Gradual | Gradual || Complexity | Medium | High | Low || Best for | Critical services | High-traffic APIs | Cost-sensitive apps |
4. **Generate** -- deployment workflow with health checks, gates, and rollback triggers5. **Document** -- runbook with rollback procedure and escalation path
## Mode 5: Review (`review`)
Audit an existing CI/CD pipeline for issues and improvements.
### Audit Process
1. **Read workflow** -- parse the target workflow file(s)2. **Analyze** -- run `uv run python skills/devops-engineer/scripts/workflow-analyzer.py <workflow>`3. **Load checklists** -- read `references/pipeline-review-checklist.md`
### Evaluation Dimensions
4. **Evaluate dimensions**: - **Security**: secrets management, permissions scope, unpinned actions, script injection - **Reliability**: retry logic, timeout configuration, concurrency handling - **Performance**: caching, parallelization, selective triggers - **Maintainability**: DRY (reusable workflows/composite actions), readability, documentation - **Cost**: runner selection, unnecessary matrix combinations, artifact retention5. **Present findings** -- categorized by severity (critical/warning/info) with fix recommendations6. **Implement** -- apply approved fixes
## Mode 6: Debug (`debug`)
Analyze CI failure logs to identify root causes and fixes.
1. **Ingest logs** -- read provided log file or inline content. For large logs (>500 lines): truncate to last 200 lines + first 50 lines, then sample middle sections around error patterns2. **Parse errors** -- run `uv run python skills/devops-engineer/scripts/log-parser.py <logfile>`3. **Load triage protocol** -- read `references/ci-failure-triage.md`4. **Classify failures** by category:
| Category | Examples | Common Fixes ||----------|----------|-------------|| dependency | Version conflict, missing package, registry timeout | Pin versions, add retry, use cache || build | Compilation error, type error, out of memory | Fix code, increase runner memory || test | Assertion failure, flaky test, timeout | Fix test, add retry for flaky, increase timeout || lint | Format violation, rule violation | Run formatter, update config || deploy | Permission denied, health check fail, resource limit | Fix permissions, check config, scale resources |
5. **Trace root cause** -- follow error chain to the originating failure6. **Recommend fix** -- specific actionable steps with code/config changes
## Reference Files
Load ONE reference at a time. Do not preload all references into context.
| File | Content | Read When ||------|---------|-----------|| `references/github-actions-patterns.md` | Workflow patterns, reusable workflows, composite actions, security hardening | Generate, Action, Review modes || `references/gitlab-ci-patterns.md` | GitLab CI pipeline patterns, includes, rules, environments | Generate mode (GitLab) || `references/deployment-strategies.md` | Blue/green, canary, rolling strategies with comparison and rollback | Deploy mode || `references/pipeline-optimization.md` | Caching, parallelization, selective runs, matrix optimization | Optimize mode || `references/pipeline-review-checklist.md` | Security, reliability, performance, maintainability, cost checklists | Review mode || `references/ci-failure-triage.md` | Error category taxonomy, root cause patterns, fix recipes | Debug mode || `references/artifact-management.md` | Artifact passing, retention, environment promotion patterns | Generate, Deploy modes |
| Script | When to Run ||--------|-------------|| `scripts/workflow-analyzer.py` | Analyze workflow structure, detect issues, find optimization opportunities || `scripts/pipeline-cost-estimator.py` | Estimate CI minutes and identify cost savings || `scripts/log-parser.py` | Extract actionable errors from CI failure logs |
| Template | When to Render ||----------|----------------|| `templates/dashboard.html` | After analysis -- inject pipeline health data into the dashboard |
## Critical Rules
1. Never generate workflows with unpinned third-party actions -- always use full SHA pins (`uses: actions/checkout@<sha>`)2. Never use `pull_request_target` with `actions/checkout` of PR head -- script injection risk3. Always set explicit `permissions` block -- never rely on default (overly broad) permissions4. Never hardcode secrets in workflow files -- use `${{ secrets.NAME }}` or environment variables5. Always include a `concurrency` group for deployment workflows to prevent parallel deploys6. Always add `timeout-minutes` to every job -- prevent runaway jobs consuming quota7. Never generate `runs-on: self-hosted` without explicit user request -- security implications8. Always validate generated YAML by running `workflow-analyzer.py` before presenting9. Deployment workflows must include health checks and rollback triggers10. Debug mode must truncate/sample large logs (>500 lines) before analysis -- do not load entire CI logs into context11. Review mode is read-only until user approves fixes (approval gate)12. Load ONE reference file at a time -- do not preload all references into context13. Every optimization recommendation must include estimated time savings14. Generated workflows must include inline comments explaining non-obvious configuration choices