Context Window Budgeting for AI Agents: A Prompt, Not a Theory
Stop letting your agent's window fill with junk. Use a context window budgeting prompt for AI agents to decide what to load, summarize, and evict before a run.
The agent was fine for the first few turns. Then it started forgetting the constraint you set at the top, re-reading files it already saw, and producing slower, vaguer answers. The window didn't overflow. It just filled with junk. Context window budgeting for AI agents is the discipline that prevents this, and it's almost always treated as an architecture problem instead of a prompt you can run.
The good explainers stop at the diagnosis. Arize's piece on context management in harnesses covers compaction and subagents well. The DEV explainer on token budgets shows why the window fills fast. The machinelearningplus guide explains the concept cleanly. They all describe the ~130K effective-limit problem. None give you a prompt that decides what to load, summarize, and evict for a specific run.
That decision is the whole game. And it's promptable.
What a budgeting prompt decides
A context window budgeting prompt is a planner that takes a list of candidate inputs and labels each one load, summarize, or evict before the agent starts working. It's triage for tokens.
Here's what you actually want it to do:
- Sort candidate files and notes by relevance to the current task, not by recency
- Mark large-but-relevant inputs as "summarize" instead of loading them whole
- Evict anything the task doesn't touch (that test file from two features ago)
- Reserve headroom for the model's own output and reasoning
- Put the task constraints where the model weights them: near the end
- Estimate roughly how many tokens the plan will spend before you commit
- Produce the same triage logic across Claude, ChatGPT, and Gemini
The effective limit is lower than the advertised one. A model sold with a 200K window often works best under roughly 100–130K of actual content, because retrieval quality sags as the window fills. Budgeting buys back that headroom.
The anatomy of the harness prompt
The prompt takes the task, the candidate inputs, and the window size, then returns a budget: a labeled, ordered manifest of what enters the context.
Variables → {{task}}, {{candidate_inputs}}, {{window_tokens}}
Prompt → role: context budgeter for an agent run
task: classify each input as load / summarize / evict
rule: reserve output headroom; order loaded items by relevance
Output → manifest: per input → action, reason, est-tokens; total budget
The placement detail does real work here. Constraints and the output contract go last in the prompt. Models weight the most recent tokens, so a budgeting rule stated at the very end gets honored more reliably than the same rule stated up top and then buried under a long {{candidate_inputs}} list.
1. List the candidates
Write down everything you might feed the agent: files, prior decisions, the ticket, schema notes. Don't pre-filter. The point of the prompt is to filter for you.
2. Fill the variables
Drop the task into {{task}}, the candidate list into {{candidate_inputs}}, and your model's working size into {{window_tokens}}. Use the effective size, not the marketing number.
3. Run and read the manifest
The prompt returns a per-input verdict. Read it. If it wants to load a 4,000-line file whole, that's your cue to summarize that file first.
4. Load only what the manifest approves
Feed the agent the "load" items in the order given, the summaries for the "summarize" items, and nothing else. The evicted set stays out.
5. Re-budget at the halfway point
Long runs accumulate cruft. Re-run the budgeter mid-task to evict what's no longer relevant. This is the prompt equivalent of compaction, except you control what survives. Automatic compaction summarizes whatever's there, including the constraint you needed kept verbatim. A re-budget pass lets you say "the schema stays, the three files from step one can go." The difference shows up on tasks that run past a dozen turns, where the early context is mostly spent and the model is paying attention tax on tokens it no longer needs. Evict deliberately and the back half of the run stays as sharp as the front.
Independent retrieval tests across long-context models show facts placed in the middle of a packed window get recalled worse than facts at the start or end. So budgeting isn't only about staying under a limit. It's about placement. The constraint you most need honored belongs near the end of the context, not buried at turn one.
Prompt-craft patterns for context control
Two patterns do most of the work, plus one rule people get wrong.
The three-way verdict. Force every candidate into exactly one bucket. No "maybe."
For each input, output one of:
- load: include verbatim (small + directly used)
- summarize: include a 3-5 line digest (large + relevant)
- evict: exclude (not touched by this task)
Give a one-line reason for each.
Output headroom reservation. Tell the budgeter to leave room for the model to think and write.
Reserve at least 20% of {{window_tokens}} for the model's reasoning
and output. Budget the rest for loaded + summarized inputs.
And the contrarian take: summarizing is usually better than retrieval-on-demand for a single focused task. Pulling chunks mid-run sounds elegant, but it adds latency and the agent often retrieves the wrong chunk. For one bounded task, a tight upfront summary of the few relevant files beats a vector search you have to babysit. Save retrieval for genuinely open-ended exploration where you can't predict what's needed.
Variables you'll set
| Variable | Required | What it is |
|---|---|---|
{{task}} | Yes | The specific job the agent will run this session |
{{candidate_inputs}} | Yes | Every file, note, or decision you might load, unfiltered |
{{window_tokens}} | Yes | The model's effective working size, not the advertised max |
One trust note: token estimates from a prompt are approximate. Treat the budget as a plan, not a guarantee, and keep an eye on actual usage if cost matters. The value is the load/summarize/evict decision, which holds up regardless of whether the estimate is off by a few thousand tokens.
Getting started
- List every candidate input for the task, including the stuff you're tempted to skip.
- Find your model's effective window size (assume it's well under the advertised max).
- Paste the budgeting prompt and fill
{{task}},{{candidate_inputs}},{{window_tokens}}. - Read the manifest. Override any verdict that looks wrong, then lock it.
- Load only what's approved, in the order given, with constraints near the end.
- Re-budget halfway through long runs to evict accumulated noise.
- Save the prompt so every run starts from the same triage. The Context Window Budget Harness ships this manifest contract ready to fill.
The Context Window Budget Harness does this end-to-end: a {{candidate_inputs}} variable feeds a budgeter that returns a load/summarize/evict manifest with per-item token estimates and a reserved output headroom, so the window holds the facts the task needs instead of last week's files. It's part of The Complete AI Prompts Bundle, a one-time lifetime license to the whole catalog and every pack added later, which is the cheaper path if you run more than one of these agent jobs.
Budgeting pairs naturally with planning: a tight plan means fewer things compete for the window, which is exactly why the task decomposition prompt for coding agents is worth running first. And once you've protected the context, you still have to grade what the agent produces, which is the subject of verifying AI coding agent output. Not sure whether to buy a pack or assemble your own? How to choose a reusable AI prompt pack weighs it honestly.
See the Agent Task Decomposition System Prompt →Common questions
What is context window budgeting for an AI agent?
Why does my agent get worse on long tasks even under the token limit?
Can a prompt really manage context, or do I need code?
Get the prompt packs this guide is built on
Ready-to-paste prompts with documented variables and worked examples for ChatGPT, Claude, and Gemini. One-time payment, own it forever.
More prompt guides

A Production Readiness Review Prompt That Grades a Service
A service ships, and two weeks later it pages someone at 3 a.m. because nobody asked whether it had alerting before launch. The production readiness review checklist exists to catch that. Most teams k…

Write an AI Code Review Prompt That Actually Finds Bugs
A developer pastes a 400-line diff into ChatGPT, types "review this," and gets back three friendly paragraphs ending in "overall this looks solid." The off-by-one in the pagination loop is still there…

An AI PR Review Prompt Template for Clean Diffs
The difference between a PR review that catches the regression and one that waves it through usually isn't the model. It's whether the prompt has a workflow or just a wish. "Review this pull request"…