Spec-Driven Development With AI Agents: The Spec-to-Code Prompt
The philosophy is everywhere; the prompt isn't. Use spec-driven development with AI agents to turn a spec into ordered code with a per-step output contract.
Everyone agrees the spec should be the source of truth. Then they hand the agent a three-page spec and let it write the whole feature in one shot. The spec was good. The execution wasn't, because nothing turned the document into an ordered plan. Spec-driven development with AI agents only pays off when a prompt bridges the spec to the build, and that prompt is exactly what the popular writing leaves out.
The sources teach the philosophy beautifully. O'Reilly on writing a good spec for AI agents makes the case. The DeepLearning.AI course on spec-driven development goes deep. Addy Osmani's "good spec" post is sharp guidance. All of them stop at "the spec is the source of truth." None give the prompt that turns a spec into ordered, buildable code with a per-step output contract the agent follows.
That prompt is the missing link. Spec as document is necessary. Spec as execution plan is what ships.
What a spec-to-code prompt produces
A spec-to-code prompt is a harness that reads a written spec and emits an ordered build sequence, where each step names what it implements and the acceptance check that proves it's done. It converts intent into a plan the agent executes one step at a time.
The job it does:
- Parse the spec into discrete, ordered implementation steps
- Attach an acceptance check to every step, drawn from the spec's requirements
- Build in dependency order, so foundations land before the code that needs them
- Keep each step small enough to review and roll back alone
- Flag any spec ambiguity instead of guessing and building on the guess
- Stop at each step's acceptance check before moving on
- Run the same way across Claude, ChatGPT, and Gemini
The acceptance check per step is the part people skip. It's what makes the spec testable as you go, instead of a thing you compare against only at the very end when it's expensive to fix.
The anatomy of the harness prompt
The prompt takes the spec and the stack, then emits ordered steps, each with an implementation note and an acceptance check pulled from the spec.
Variables → {{spec}}, {{stack}}, {{constraints}}
Prompt → role: harness that builds strictly from {{spec}}
task: emit ordered steps with per-step acceptance checks
rule: flag ambiguity; never invent requirements
Output → steps: id, implements, acceptance-check, files, depends-on
The step contract goes at the very end of the prompt. A long {{spec}} pasted in the middle will out-weight a contract stated up top, and the model reverts to prose. Restate the per-step format on the final line and every step comes back in the same shape, which matters most on GPT-4o.
1. Write or paste the spec
The prompt is only as good as the spec. It should state the goal, the requirements, and what "done" means. Ambiguity in equals questions out, which is the correct behavior.
2. Fill the variables
Drop the spec into {{spec}}, the stack into {{stack}}, and any hard limits into {{constraints}}. Keep the spec the source of truth; don't smuggle requirements into the chat.
3. Run and review the step plan
The prompt returns ordered steps with acceptance checks. Read them before building. If a step's acceptance check is vague, your spec was vague there. Fix the spec, not the step.
4. Build step by step
Have the agent implement one step, then verify against that step's acceptance check before the next. A failing check stops the line.
5. Update the spec, not the code, when intent changes
When requirements shift, change the spec and re-run the harness. The spec stays the source of truth. Patching code directly is how the spec and the build drift apart.
A spec without per-step acceptance checks is a wish list. The harness earns its keep by pulling a concrete, checkable condition out of the spec for every step, so "done" is observable at each stage instead of a judgment call at the end. That's the difference between spec-driven and spec-shaped.
Prompt-craft patterns for building from a spec
Two patterns keep the build honest, plus a stance.
The ambiguity flag. Make "the spec doesn't say" a first-class output the agent can return.
If the spec is ambiguous or silent on a needed decision, output an
OPEN QUESTION instead of guessing. Do not invent requirements.
Acceptance pulled from the spec. Force each check to trace back to the spec text.
For each step, the acceptance-check must be derivable from {{spec}}.
Quote the spec clause it comes from. No clause, no step.
The opinionated take: write the spec, but keep it short and let the harness expand it. Long specs feel rigorous and produce worse builds, because the agent loses the thread and you stop reading carefully past page two. A tight spec, maybe one page, with crisp requirements and a clear definition of done, plus a harness that turns it into ordered steps with acceptance checks, beats a sprawling document every time. The spec's job is to be unambiguous, not exhaustive. Detail belongs in the steps the harness generates, where it's tied to a checkable condition.
Variables you'll set
| Variable | Required | What it is |
|---|---|---|
{{spec}} | Yes | The written spec: goal, requirements, definition of done |
{{stack}} | Yes | Language, framework, and test command the build uses |
{{constraints}} | No | Hard limits: no new deps, behind a flag, public API frozen |
A trust note: the harness builds what the spec says, including the spec's mistakes. A flawed requirement produces flawed code with a passing acceptance check, because the check came from the same flawed clause. Review the step plan against intent, not just against the spec text. And re-confirm the build behavior after a model update, since step-following can shift between versions.
Getting started
- Write a tight one-page spec: goal, requirements, definition of done.
- Note your stack and the test command the agent will use.
- Paste the spec-to-code prompt and fill
{{spec}},{{stack}},{{constraints}}. - Review the ordered steps. Resolve every OPEN QUESTION before building.
- Build one step, verify its acceptance check, then move to the next.
- When intent changes, edit the spec and re-run the harness, never the code alone.
- Save the prompt so every feature starts from a spec, not a chat. The Spec-to-Code Harness ships this step contract with acceptance checks built in.
The Spec-to-Code Harness does this end-to-end: a {{spec}} variable feeds a harness that emits ordered steps, each with a spec-derived acceptance check and an OPEN QUESTION escape for ambiguity, with the per-step format restated for GPT-4o so the contract holds on a long spec. It's part of The Complete AI Prompts Bundle, a one-time lifetime license to the whole catalog plus every pack added later, which is the value pick once you run more than one of these agent jobs.
A spec-to-code harness needs a plan to expand into, which is where the task decomposition prompt for coding agents overlaps: decomposition turns a goal into steps, the spec gives those steps their acceptance checks. And once the agent builds from the spec, you still grade the result, which is the job of verifying AI coding agent output. Weighing a saved pack against rolling your own? How to choose a reusable AI prompt pack covers the trade-offs.
See the Agent Code Output Verification Rubric →Common questions
What is spec-driven development with AI agents?
Why isn't a good spec enough on its own?
Does the spec-to-code prompt behave consistently across models?
Get the prompt packs this guide is built on
Ready-to-paste prompts with documented variables and worked examples for ChatGPT, Claude, and Gemini. One-time payment, own it forever.
More prompt guides

A Production Readiness Review Prompt That Grades a Service
A service ships, and two weeks later it pages someone at 3 a.m. because nobody asked whether it had alerting before launch. The production readiness review checklist exists to catch that. Most teams k…

Write an AI Code Review Prompt That Actually Finds Bugs
A developer pastes a 400-line diff into ChatGPT, types "review this," and gets back three friendly paragraphs ending in "overall this looks solid." The off-by-one in the pagination loop is still there…

An AI PR Review Prompt Template for Clean Diffs
The difference between a PR review that catches the regression and one that waves it through usually isn't the model. It's whether the prompt has a workflow or just a wish. "Review this pull request"…