Verify before you ship Skill for Codex

Runs the minimal set of checks—tests, builds, manual verifications, or environment-specific validations—that confirm a task is truly complete before it is marked done. This practice prevents the common pattern where 'done' means 'written' rather than 'working in production,' and it creates a shared definition of completion across the team.

Category Operations

Platform Codex

Published 2026-03-28

verificationreleasequality

Use cases

Wrapping up a feature ticket that has passed code review but has not been smoke-tested in the target environment
Completing a dependency upgrade or configuration change that could behave differently in staging versus locally
Handing off work to a teammate and wanting to ensure the integration points are actually working
Before merging a pull request that touches infrastructure or deployment configuration
After applying a hotfix directly to production and needing to confirm the fix resolved the incident

Key features

List all the verification commands or actions that would prove the task works end-to-end in its target environment
Run each verification step and capture the actual output, comparing it to the expected output rather than assuming success
Inspect log output and error traces for any unexpected warnings or degraded behavior even if the primary check passes
Mark the task complete only after all verification steps produce the expected results, not just the happy path

When to Use This Skill

When a feature has passed code review but has not been tested in the target deployment environment
When handing off work to another engineer and wanting to ensure integration points are verified
When a change touches infrastructure, configuration, or deployment pipelines where local behavior may differ from production

Expected Output

A verification checklist with each step marked pass or fail, and the actual output or screenshot that confirms the task is complete.

Frequently Asked Questions

How do I define the right verification steps for a task?: Ask: what would a skeptical reviewer need to see to believe this is done? Include at least one step that runs the code end-to-end, not just unit tests or linter checks.
What if verification requires credentials or infrastructure I do not have access to?: Escalate before marking the task done. Having a different person verify is still better than shipping unverified work. Document any verification gaps as follow-up items.
Does this apply to small, low-risk changes?: Yes—but the verification set can be small. Even running the relevant unit tests and checking that the feature flag can be toggled counts as verification for low-risk changes.

3 Indexed items

Evaluation and benchmarking

Operations

Builds evaluation suites with ground-truth answers, automated scoring, and regression detection so you can measure whether model or prompt changes actually improve outcomes before shipping. Without systematic evaluation, teams ship changes that seem better anecdotally but may degrade specific edge cases silently.

Finishing a development branch

Operations

Systematically closes out a development branch by running verification, cleaning up the commit history, pushing with proper tracking, and making an explicit choice between merge, squash, or follow-up tickets. This prevents the common pattern of abandoned branches, stale PRs, and lost context when work is not deliberately concluded.

Observability baselines

Operations

Establishes golden signals (latency, traffic, errors, saturation), SLO windows, and dashboard checks before agents automate deployments so that 'healthy' and 'degraded' have measurable definitions rather than subjective interpretations. This is essential when AI agents are managing deploys because agents need objective metrics to make decisions, not human gut feelings.

Verify before you ship

Use cases

Key features

When to Use This Skill

Expected Output

Frequently Asked Questions

Related

Evaluation and benchmarking

Finishing a development branch

Observability baselines

Related news