Task Decomposition for Coding Agents: The Planner Prompt You Can Paste
Turn one coding goal into ordered, verifiable subtasks with a reusable AI coding agent task decomposition prompt. Copy the planner pattern and ship it.
Most coding agents fail the same way: you hand them a goal, they write a 600-line diff, and now you're reviewing a wall of changes with no safe place to stop. AI coding agent task decomposition is the fix that almost nobody ships as a reusable prompt. The architecture surveys describe it well. The pasteable artifact is missing.
Look at what currently ranks for this. The mgx.dev decomposition piece is an architecture survey. The CodeX deep dive on Medium walks the theory. The apxml course module teaches strategies. All useful. None hand you a planner prompt a working dev can paste and run today.
That's the gap. A decomposition prompt isn't an essay about planning. It's a saved prompt with a {{goal}} variable and a locked step contract that turns one goal into ordered, independently verifiable subtasks.
What a decomposition prompt actually produces
A task decomposition prompt is a planner that converts a single goal into a numbered build sequence, where each step names what it touches and how you'll know it's done. Not a vibe. A contract.
Here's the concrete buyer job. You want the agent to:
- Take a feature request or bug and return an ordered subtask list, not a diff
- Make each subtask small enough to verify alone (one file, one function, one migration)
- Surface dependencies so step 3 doesn't assume step 5 already ran
- Attach a done-test to every subtask ("the new endpoint returns 200 for a valid payload")
- Flag the risky step up front, so you review that one closely
- Pause for approval before any step that deletes or rewrites existing behavior
- Reuse the same plan format across Claude, ChatGPT, and Gemini
That last point matters more than it sounds. A plan you can't reread the same way every run is a plan you can't trust.
The anatomy of the planner prompt
A decomposition prompt has three parts: a tight role, the goal variable, and a step output contract that the model fills for every subtask.
Variables → {{goal}}, {{repo_facts}}, {{constraints}}
Prompt → role: senior planner who decomposes, does not write code
task: break {{goal}} into ordered, verifiable subtasks
context: {{repo_facts}} and {{constraints}}
Output → numbered list; per step: id, intent, files, done-test, depends-on
Note where the output contract sits. Last. A prompt that puts the contract first and the pasted {{repo_facts}} last gets ignored on long inputs, because models weight the most recent tokens. Put the goal and context in the middle and the step contract at the very end, and the format locks far more reliably.
1. Gather inputs
Collect the goal in one sentence, the repo facts the planner needs (stack, test command, the directories in play), and any hard constraints (no schema changes, ship behind a flag). Vague goals make vague plans. "Add search" is weak. "Add a /search endpoint that filters orders by status and date, paginated" gives the planner edges to cut along.
2. Fill the variables
Drop your goal into {{goal}}, your stack notes into {{repo_facts}}, and your guardrails into {{constraints}}. Keep {{repo_facts}} short. The planner doesn't need the whole codebase, just the shape.
3. Run the prompt
Run it and read the plan before you let any agent execute. This is the whole point. You're reviewing a five-line-per-step plan, not a diff.
4. Hand steps to the executor
Feed subtasks to your coding agent one at a time, in dependency order. After each, check the done-test. If it fails, you've spent one subtask, not the whole feature.
5. Iterate on the plan, not the code
When a step turns out wrong, rerun the planner with a note. Re-planning is cheap. Re-coding a tangled 600-line diff is not.
Decomposition isn't about making the agent smarter. It's about making failure cheap. When every step has its own done-test, a wrong turn costs one step. That's the actual return on a planner prompt, and it's why the plan should always come before the diff.
Prompt-craft patterns that make plans hold
Three patterns separate a planner that drifts from one that holds the line.
Role framing that forbids code. State plainly that the planner decomposes and does not write implementation. Without this, models start dumping code into step 1.
You are a planning agent. You decompose goals into subtasks.
You do NOT write implementation code. Output only the plan.
The per-step contract. Every subtask fills the same fields. Same shape, every step, every run.
For each subtask output exactly:
- id: S1, S2, …
- intent: one sentence
- files: paths likely touched
- done-test: the observable check that proves this step works
- depends-on: subtask ids, or "none"
The risk flag. Ask the planner to mark the single step most likely to break existing behavior. It changes how you review. You read one step closely instead of skimming twelve.
Here's an opinion the architecture write-ups won't give you: cap your plans at seven subtasks. If a goal needs more than seven, it's two goals wearing a trench coat. Split it and run the planner twice. Long plans read impressive and execute badly, because the dependency graph gets too tangled to verify by eye. Seven is roughly the limit a human reviewer holds in working memory anyway.
Variables you'll set
| Variable | Required | What it is |
|---|---|---|
{{goal}} | Yes | The single feature or fix, stated in one specific sentence |
{{repo_facts}} | Yes | Stack, test command, and the directories the work touches |
{{constraints}} | No | Hard limits: no schema change, behind a flag, no new deps |
Keep the values precise. {{goal}} should read like a ticket title, not a wish. The planner is only as sharp as the goal you feed it.
A quick caveat on trust. Plans drift after a model update the same way any prompt does. Pin the model version for anything you run on a schedule, and skim the first plan after any provider upgrade. The contract holds across versions far better than freehand planning, but it isn't magic.
Getting started
- Write your goal as one concrete sentence. If it has an "and" in it, consider splitting.
- Jot three to five repo facts: stack, test command, target directories.
- Paste the planner prompt, fill
{{goal}}and{{repo_facts}}, run it. - Read the plan. Confirm each step has a real done-test, not "it works."
- Hand subtasks to your agent in order, checking the done-test after each.
- Re-plan when a step is wrong. Don't patch a broken plan by hand.
- Save the prompt so the next feature starts from the same contract. The Agent Task Decomposition System Prompt ships this exact planner with the step contract built in.
The Agent Task Decomposition System Prompt does this end-to-end: a {{goal}} variable feeds a planner that emits ordered subtasks under a locked five-field step contract (id, intent, files, done-test, depends-on), tuned so Claude, ChatGPT, and Gemini all return the same plan shape. It's part of The Complete AI Prompts Bundle, a one-time lifetime license to the whole catalog plus every pack added later, which pays off fast if you run more than one of these agent jobs.
Once the plan holds, the next bottleneck is context: a long plan plus a fat codebase blows past the window. That's covered in context window budgeting for AI agents. And once steps start landing as diffs, you'll want a way to grade them, which is the job of verifying AI coding agent output. If you're still deciding whether a saved pack beats rolling your own, how to choose a reusable AI prompt pack lays out the trade-offs.
See the Spec-to-Code Harness for the build step →Common questions
What is task decomposition for a coding agent?
Why not just tell the agent to 'build the feature'?
Does the same decomposition prompt work across Claude, ChatGPT, and Gemini?
Get the prompt packs this guide is built on
Ready-to-paste prompts with documented variables and worked examples for ChatGPT, Claude, and Gemini. One-time payment, own it forever.
More prompt guides

A Production Readiness Review Prompt That Grades a Service
A service ships, and two weeks later it pages someone at 3 a.m. because nobody asked whether it had alerting before launch. The production readiness review checklist exists to catch that. Most teams k…

Write an AI Code Review Prompt That Actually Finds Bugs
A developer pastes a 400-line diff into ChatGPT, types "review this," and gets back three friendly paragraphs ending in "overall this looks solid." The off-by-one in the pagination loop is still there…

An AI PR Review Prompt Template for Clean Diffs
The difference between a PR review that catches the regression and one that waves it through usually isn't the model. It's whether the prompt has a workflow or just a wish. "Review this pull request"…