Skip to content

devops-engineer

Design, optimize, and debug CI/CD pipelines. GitHub Actions and GitLab CI patterns. Use for pipeline work. NOT for infrastructure provisioning (infrastructure-coder) or app code.

devops-engineer1429 wordsMITRepo-owned
Design, optimize, and debug CI/CD pipelines. GitHub Actions and GitLab CI patterns. Use for pipeline work. NOT for infrastructure provisioning (infrastructure-coder) or app code.

Quick Start

Install:

npx skills add github:wyattowalsh/agents --skill devops-engineer -y -g --agent antigravity --agent claude-code --agent codex --agent crush --agent cursor --agent gemini-cli --agent github-copilot --agent grok --agent opencode

Use: /devops-engineer <mode> [target]

Works with Claude Code, Gemini CLI, OpenCode, and other agentskills.io-compatible agents.

CI/CD pipeline design, optimization, and deployment strategy. 6-mode pipeline: generate workflows, optimize build times, design deployment strategies, review existing pipelines, debug CI failures.

$ARGUMENTSMode
pipeline &lt;requirements&gt;Generate: new CI/CD workflow from requirements
action &lt;description&gt;Action: GitHub Action step/job generation
optimize &lt;workflow&gt;Optimize: pipeline build time optimization
deploy &lt;strategy&gt;Deploy: deployment strategy design
review &lt;workflow&gt;Review: audit existing pipeline
debug &lt;logs&gt;Debug: analyze CI failure logs
Natural language about CI/CDAuto-detect appropriate mode
EmptyShow mode menu with examples
  1. Never generate workflows with unpinned third-party actions — always use full SHA pins (uses: actions/checkout@&lt;sha&gt;)
  2. Never use pull_request_target with actions/checkout of PR head — script injection risk
  3. Always set explicit permissions block — never rely on default (overly broad) permissions
  4. Never hardcode secrets in workflow files — use ${{ secrets.NAME }} or environment variables
  5. Always include a concurrency group for deployment workflows to prevent parallel deploys
  6. Always add timeout-minutes to every job — prevent runaway jobs consuming quota
  7. Never generate runs-on: self-hosted without explicit user request — security implications
  8. Always validate generated YAML by running workflow-analyzer.py before presenting
  9. Deployment workflows must include health checks and rollback triggers
  10. Debug mode must truncate/sample large logs (>500 lines) before analysis — do not load entire CI logs into context
  11. Review mode is read-only until user approves fixes (approval gate)
  12. Load ONE reference file at a time — do not preload all references into context
  13. Every optimization recommendation must include estimated time savings
  14. Generated workflows must include inline comments explaining non-obvious configuration choices

Use these terms exactly throughout all modes:

TermDefinition
workflowA CI/CD pipeline definition file (.github/workflows/*.yml, .gitlab-ci.yml)
jobA named unit of work within a workflow containing one or more steps
stepA single action within a job (run command, uses action)
stageA logical grouping of jobs (build, test, deploy)
artifactBuild output passed between jobs or stages
cacheDependency/build cache persisted across runs to reduce build time
matrixParameterized job expansion across multiple configurations
concurrency groupMutual exclusion mechanism preventing parallel runs
environmentDeployment target with protection rules (staging, production)
promotionMoving artifacts through environments (dev -> staging -> prod)
rollbackReverting a deployment to a previous known-good state
canaryIncremental traffic shift to new version (1% -> 5% -> 25% -> 100%)
blue/greenTwo identical environments with instant traffic switch
rollingGradual instance-by-instance replacement
gateManual or automated approval checkpoint before deployment proceeds
runnerExecution environment for CI/CD jobs (GitHub-hosted, self-hosted)
reusable workflowCallable workflow template invoked from other workflows
composite actionMulti-step action packaged as a single reusable unit

Design and generate CI/CD workflow files from requirements.

  1. Gather requirements — language, framework, test suite, deployment targets, branch strategy
  2. Select platform — GitHub Actions (default), GitLab CI, or both
  3. Load patterns — read references/github-actions-patterns.md or references/gitlab-ci-patterns.md
  4. Design structure — jobs, stages, dependencies, triggers, caching strategy
  5. Generate workflow — complete YAML file with inline comments explaining non-obvious choices
  6. Validate — run uv run python skills/devops-engineer/scripts/workflow-analyzer.py &lt;file&gt; on generated output

Complete workflow YAML file written to the appropriate location.

Generate individual GitHub Action steps or jobs.

  1. Parse description — what the action should accomplish
  2. Load patterns — read references/github-actions-patterns.md
  3. Generate — step or job YAML with correct uses, with, env configuration
  4. Context check — if an existing workflow is referenced, read it and integrate the new action

Output: YAML snippet ready for insertion into a workflow file.

Analyze and optimize pipeline build times.

  1. Analyze — run uv run python skills/devops-engineer/scripts/workflow-analyzer.py &lt;workflow&gt;
  2. Estimate costs — run uv run python skills/devops-engineer/scripts/pipeline-cost-estimator.py &lt;workflow&gt;
  3. Load techniques — read references/pipeline-optimization.md
  1. Identify opportunities:
    • Missing caches (dependency, build artifact, Docker layer)
    • Sequential jobs that could run in parallel
    • Missing matrix strategy for multi-version testing
    • Unnecessary full checkouts (use sparse-checkout or shallow clone)
    • Redundant steps across jobs
    • Missing path filters for selective runs
    • Oversized runner for lightweight tasks
  2. Present plan — ranked optimization recommendations with estimated time savings
  3. Implement — apply approved optimizations to the workflow file

Design deployment strategies with rollback plans.

  1. Assess requirements — uptime SLA, rollback speed, traffic management capability
  2. Load strategies — read references/deployment-strategies.md
  3. Recommend strategy — blue/green, canary, or rolling based on requirements
FactorBlue/GreenCanaryRolling
Rollback speedInstantFastSlow
Resource cost2x1.1-1.5x1x
Risk exposureNone (pre-switch)GradualGradual
ComplexityMediumHighLow
Best forCritical servicesHigh-traffic APIsCost-sensitive apps
  1. Generate — deployment workflow with health checks, gates, and rollback triggers
  2. Document — runbook with rollback procedure and escalation path

Audit an existing CI/CD pipeline for issues and improvements.

  1. Read workflow — parse the target workflow file(s)
  2. Analyze — run uv run python skills/devops-engineer/scripts/workflow-analyzer.py &lt;workflow&gt;
  3. Load checklists — read references/pipeline-review-checklist.md
  1. Evaluate dimensions:
    • Security: secrets management, permissions scope, unpinned actions, script injection
    • Reliability: retry logic, timeout configuration, concurrency handling
    • Performance: caching, parallelization, selective triggers
    • Maintainability: DRY (reusable workflows/composite actions), readability, documentation
    • Cost: runner selection, unnecessary matrix combinations, artifact retention
  2. Present findings — categorized by severity (critical/warning/info) with fix recommendations
  3. Implement — apply approved fixes

Analyze CI failure logs to identify root causes and fixes.

  1. Ingest logs — read provided log file or inline content. For large logs (>500 lines): truncate to last 200 lines + first 50 lines, then sample middle sections around error patterns
  2. Parse errors — run uv run python skills/devops-engineer/scripts/log-parser.py &lt;logfile&gt;
  3. Load triage protocol — read references/ci-failure-triage.md
  4. Classify failures by category:
CategoryExamplesCommon Fixes
dependencyVersion conflict, missing package, registry timeoutPin versions, add retry, use cache
buildCompilation error, type error, out of memoryFix code, increase runner memory
testAssertion failure, flaky test, timeoutFix test, add retry for flaky, increase timeout
lintFormat violation, rule violationRun formatter, update config
deployPermission denied, health check fail, resource limitFix permissions, check config, scale resources
  1. Trace root cause — follow error chain to the originating failure
  2. Recommend fix — specific actionable steps with code/config changes

Load ONE reference at a time. Do not preload all references into context.

FileContentRead When
references/github-actions-patterns.mdWorkflow patterns, reusable workflows, composite actions, security hardeningGenerate, Action, Review modes
references/gitlab-ci-patterns.mdGitLab CI pipeline patterns, includes, rules, environmentsGenerate mode (GitLab)
references/deployment-strategies.mdBlue/green, canary, rolling strategies with comparison and rollbackDeploy mode
references/pipeline-optimization.mdCaching, parallelization, selective runs, matrix optimizationOptimize mode
references/pipeline-review-checklist.mdSecurity, reliability, performance, maintainability, cost checklistsReview mode
references/ci-failure-triage.mdError category taxonomy, root cause patterns, fix recipesDebug mode
references/artifact-management.mdArtifact passing, retention, environment promotion patternsGenerate, Deploy modes
ScriptWhen to Run
scripts/workflow-analyzer.pyAnalyze workflow structure, detect issues, find optimization opportunities
scripts/pipeline-cost-estimator.pyEstimate CI minutes and identify cost savings
scripts/log-parser.pyExtract actionable errors from CI failure logs
TemplateWhen to Render
templates/dashboard.htmlAfter analysis — inject pipeline health data into the dashboard
FieldValue
Source Typerepo-owned
Display Sourcegithub:wyattowalsh/agents
Source Kindrepo
Installabilityportable command
Review Statereviewed
Target Agentsantigravity, claude-code, codex, crush, cursor, gemini-cli, github-copilot, grok, opencode
View Full SKILL.md
SKILL.md
---
name: devops-engineer
description: >-
Design, optimize, and debug CI/CD pipelines. GitHub Actions and GitLab CI
patterns. Use for pipeline work. NOT for infrastructure provisioning
(infrastructure-coder) or app code.
argument-hint: "<mode> [target]"
model: opus
license: MIT
metadata:
author: wyattowalsh
version: "1.0.0"
---
# DevOps Engineer
CI/CD pipeline design, optimization, and deployment strategy. 6-mode pipeline: generate workflows, optimize build times, design deployment strategies, review existing pipelines, debug CI failures.
**Scope:** CI/CD pipelines and deployment automation only. NOT for infrastructure provisioning (infrastructure-coder), application code, monitoring setup, or database migrations (database-architect).
## Canonical Vocabulary
Use these terms exactly throughout all modes:
| Term | Definition |
|------|------------|
| **workflow** | A CI/CD pipeline definition file (.github/workflows/*.yml, .gitlab-ci.yml) |
| **job** | A named unit of work within a workflow containing one or more steps |
| **step** | A single action within a job (run command, uses action) |
| **stage** | A logical grouping of jobs (build, test, deploy) |
| **artifact** | Build output passed between jobs or stages |
| **cache** | Dependency/build cache persisted across runs to reduce build time |
| **matrix** | Parameterized job expansion across multiple configurations |
| **concurrency group** | Mutual exclusion mechanism preventing parallel runs |
| **environment** | Deployment target with protection rules (staging, production) |
| **promotion** | Moving artifacts through environments (dev -> staging -> prod) |
| **rollback** | Reverting a deployment to a previous known-good state |
| **canary** | Incremental traffic shift to new version (1% -> 5% -> 25% -> 100%) |
| **blue/green** | Two identical environments with instant traffic switch |
| **rolling** | Gradual instance-by-instance replacement |
| **gate** | Manual or automated approval checkpoint before deployment proceeds |
| **runner** | Execution environment for CI/CD jobs (GitHub-hosted, self-hosted) |
| **reusable workflow** | Callable workflow template invoked from other workflows |
| **composite action** | Multi-step action packaged as a single reusable unit |
## Dispatch
| $ARGUMENTS | Mode |
|------------|------|
| `pipeline <requirements>` | Generate: new CI/CD workflow from requirements |
| `action <description>` | Action: GitHub Action step/job generation |
| `optimize <workflow>` | Optimize: pipeline build time optimization |
| `deploy <strategy>` | Deploy: deployment strategy design |
| `review <workflow>` | Review: audit existing pipeline |
| `debug <logs>` | Debug: analyze CI failure logs |
| Natural language about CI/CD | Auto-detect appropriate mode |
| Empty | Show mode menu with examples |
## Mode 1: Generate (`pipeline`)
Design and generate CI/CD workflow files from requirements.
### Steps
1. **Gather requirements** -- language, framework, test suite, deployment targets, branch strategy
2. **Select platform** -- GitHub Actions (default), GitLab CI, or both
3. **Load patterns** -- read `references/github-actions-patterns.md` or `references/gitlab-ci-patterns.md`
4. **Design structure** -- jobs, stages, dependencies, triggers, caching strategy
5. **Generate workflow** -- complete YAML file with inline comments explaining non-obvious choices
6. **Validate** -- run `uv run python skills/devops-engineer/scripts/workflow-analyzer.py <file>` on generated output
### Output
Complete workflow YAML file written to the appropriate location.
## Mode 2: Action (`action`)
Generate individual GitHub Action steps or jobs.
1. **Parse description** -- what the action should accomplish
2. **Load patterns** -- read `references/github-actions-patterns.md`
3. **Generate** -- step or job YAML with correct `uses`, `with`, `env` configuration
4. **Context check** -- if an existing workflow is referenced, read it and integrate the new action
Output: YAML snippet ready for insertion into a workflow file.
## Mode 3: Optimize (`optimize`)
Analyze and optimize pipeline build times.
### Analysis
1. **Analyze** -- run `uv run python skills/devops-engineer/scripts/workflow-analyzer.py <workflow>`
2. **Estimate costs** -- run `uv run python skills/devops-engineer/scripts/pipeline-cost-estimator.py <workflow>`
3. **Load techniques** -- read `references/pipeline-optimization.md`
### Optimization Opportunities
4. **Identify opportunities**:
- Missing caches (dependency, build artifact, Docker layer)
- Sequential jobs that could run in parallel
- Missing matrix strategy for multi-version testing
- Unnecessary full checkouts (use sparse-checkout or shallow clone)
- Redundant steps across jobs
- Missing path filters for selective runs
- Oversized runner for lightweight tasks
5. **Present plan** -- ranked optimization recommendations with estimated time savings
6. **Implement** -- apply approved optimizations to the workflow file
## Mode 4: Deploy (`deploy`)
Design deployment strategies with rollback plans.
1. **Assess requirements** -- uptime SLA, rollback speed, traffic management capability
2. **Load strategies** -- read `references/deployment-strategies.md`
3. **Recommend strategy** -- blue/green, canary, or rolling based on requirements
| Factor | Blue/Green | Canary | Rolling |
|--------|-----------|--------|---------|
| Rollback speed | Instant | Fast | Slow |
| Resource cost | 2x | 1.1-1.5x | 1x |
| Risk exposure | None (pre-switch) | Gradual | Gradual |
| Complexity | Medium | High | Low |
| Best for | Critical services | High-traffic APIs | Cost-sensitive apps |
4. **Generate** -- deployment workflow with health checks, gates, and rollback triggers
5. **Document** -- runbook with rollback procedure and escalation path
## Mode 5: Review (`review`)
Audit an existing CI/CD pipeline for issues and improvements.
### Audit Process
1. **Read workflow** -- parse the target workflow file(s)
2. **Analyze** -- run `uv run python skills/devops-engineer/scripts/workflow-analyzer.py <workflow>`
3. **Load checklists** -- read `references/pipeline-review-checklist.md`
### Evaluation Dimensions
4. **Evaluate dimensions**:
- **Security**: secrets management, permissions scope, unpinned actions, script injection
- **Reliability**: retry logic, timeout configuration, concurrency handling
- **Performance**: caching, parallelization, selective triggers
- **Maintainability**: DRY (reusable workflows/composite actions), readability, documentation
- **Cost**: runner selection, unnecessary matrix combinations, artifact retention
5. **Present findings** -- categorized by severity (critical/warning/info) with fix recommendations
6. **Implement** -- apply approved fixes
## Mode 6: Debug (`debug`)
Analyze CI failure logs to identify root causes and fixes.
1. **Ingest logs** -- read provided log file or inline content. For large logs (>500 lines): truncate to last 200 lines + first 50 lines, then sample middle sections around error patterns
2. **Parse errors** -- run `uv run python skills/devops-engineer/scripts/log-parser.py <logfile>`
3. **Load triage protocol** -- read `references/ci-failure-triage.md`
4. **Classify failures** by category:
| Category | Examples | Common Fixes |
|----------|----------|-------------|
| dependency | Version conflict, missing package, registry timeout | Pin versions, add retry, use cache |
| build | Compilation error, type error, out of memory | Fix code, increase runner memory |
| test | Assertion failure, flaky test, timeout | Fix test, add retry for flaky, increase timeout |
| lint | Format violation, rule violation | Run formatter, update config |
| deploy | Permission denied, health check fail, resource limit | Fix permissions, check config, scale resources |
5. **Trace root cause** -- follow error chain to the originating failure
6. **Recommend fix** -- specific actionable steps with code/config changes
## Reference Files
Load ONE reference at a time. Do not preload all references into context.
| File | Content | Read When |
|------|---------|-----------|
| `references/github-actions-patterns.md` | Workflow patterns, reusable workflows, composite actions, security hardening | Generate, Action, Review modes |
| `references/gitlab-ci-patterns.md` | GitLab CI pipeline patterns, includes, rules, environments | Generate mode (GitLab) |
| `references/deployment-strategies.md` | Blue/green, canary, rolling strategies with comparison and rollback | Deploy mode |
| `references/pipeline-optimization.md` | Caching, parallelization, selective runs, matrix optimization | Optimize mode |
| `references/pipeline-review-checklist.md` | Security, reliability, performance, maintainability, cost checklists | Review mode |
| `references/ci-failure-triage.md` | Error category taxonomy, root cause patterns, fix recipes | Debug mode |
| `references/artifact-management.md` | Artifact passing, retention, environment promotion patterns | Generate, Deploy modes |
| Script | When to Run |
|--------|-------------|
| `scripts/workflow-analyzer.py` | Analyze workflow structure, detect issues, find optimization opportunities |
| `scripts/pipeline-cost-estimator.py` | Estimate CI minutes and identify cost savings |
| `scripts/log-parser.py` | Extract actionable errors from CI failure logs |
| Template | When to Render |
|----------|----------------|
| `templates/dashboard.html` | After analysis -- inject pipeline health data into the dashboard |
## Critical Rules
1. Never generate workflows with unpinned third-party actions -- always use full SHA pins (`uses: actions/checkout@<sha>`)
2. Never use `pull_request_target` with `actions/checkout` of PR head -- script injection risk
3. Always set explicit `permissions` block -- never rely on default (overly broad) permissions
4. Never hardcode secrets in workflow files -- use `${{ secrets.NAME }}` or environment variables
5. Always include a `concurrency` group for deployment workflows to prevent parallel deploys
6. Always add `timeout-minutes` to every job -- prevent runaway jobs consuming quota
7. Never generate `runs-on: self-hosted` without explicit user request -- security implications
8. Always validate generated YAML by running `workflow-analyzer.py` before presenting
9. Deployment workflows must include health checks and rollback triggers
10. Debug mode must truncate/sample large logs (>500 lines) before analysis -- do not load entire CI logs into context
11. Review mode is read-only until user approves fixes (approval gate)
12. Load ONE reference file at a time -- do not preload all references into context
13. Every optimization recommendation must include estimated time savings
14. Generated workflows must include inline comments explaining non-obvious configuration choices

Download from GitHub


View source on GitHub