The AI Prompt to Review a Pull Request (With a Findings Contract)
Most code-review prompts are one-off and vague. Use an AI security code review prompt with a diff variable and a severity-rated findings contract on every PR.
A pull request review prompt that you retype from scratch every time isn't a workflow. It's a habit you'll skip the moment you're busy. The reusable version, with a real AI security code review prompt at its core, is the difference between catching the SQL injection on line 40 and merging it.
The prompts that rank today are ad-hoc. The DEV post on code-review AI prompts offers solid tips. Graphite's prompt-engineering guide for code reviews gives good guidance. Jose Casanova's PR review prompt is one usable prompt. But it has no variables, no severity tiers, and no locked output shape. Run it twice and you get two different formats.
The gap is a packaged review with a {{diff}} variable and a findings contract that produces the same severity-rated output every PR. That's what makes reviews comparable instead of vibes.
What a real review prompt produces
A pull request review prompt is a saved prompt that takes a diff and returns ranked findings in a fixed shape: severity, location, issue, fix. The format is the product.
What you want from it on every PR:
- Review only the changed lines, not the whole file the change lives in
- Rate each finding by severity so you triage high before nits
- Pin each finding to a file and line, so a teammate can jump straight there
- Flag security issues specifically: injection, secrets, auth gaps, unsafe deserialization
- Suggest a concrete fix, not "consider improving this"
- Stay quiet when the diff is clean instead of inventing problems
- Return the same columns on Claude, ChatGPT, and Gemini
The security framing isn't decoration. The highest-cost review misses are security ones, and a general "review this code" prompt buries them under formatting opinions. Naming security as a first-class severity tier pulls them to the top.
The anatomy of the review prompt
The prompt takes the diff and the review focus, then emits a findings table under a locked contract.
Variables → {{diff}}, {{review_focus}}, {{language}}
Prompt → role: senior reviewer scoped to the changed lines
task: find correctness, security, and clarity issues
rule: rate severity; cite file:line; propose a fix
Output → table: severity | file:line | issue | suggested fix
The output contract goes last for a reason. On a long {{diff}}, a contract stated at the top gets out-weighted by the recent diff tokens, and the model slides back into prose. Restate the columns on the final line and the table holds, especially on GPT-4o.
1. Gather inputs
Grab the unified diff for the PR (git diff main...HEAD), note the language, and decide the focus: security-first, correctness, or a full pass.
2. Fill the variables
Paste the diff into {{diff}}, set {{language}}, and set {{review_focus}}. Keep the diff to the actual change. Don't paste whole files; that's what makes models review unchanged code.
3. Run the prompt
Run it. You get a severity-sorted findings table, not an essay. High and critical at the top, style nits at the bottom. If the table comes back as prose instead, the contract slid out of the model's attention on a long diff; that's the cue to restate the columns on the final line and run again. On Claude this rarely happens. On GPT-4o it happens just often enough that the restatement is worth baking into the saved prompt rather than fixing by hand each time.
4. Triage by severity
Work top-down. Fix or push back on each high finding. Most style nits can wait or get auto-formatted away.
5. Re-run on the updated diff
After changes, re-run on the new diff. The fixed findings drop off; new ones surface. Same contract, comparable output. This is where the saved prompt pays for itself: because the format is locked, you can diff two review runs and see at a glance which findings the author actually resolved. An ad-hoc re-ask gives you a fresh essay every time, with no way to line up "what was flagged" against "what got fixed." Over a week of PRs, that consistency is what turns AI review from a novelty into something the team actually trusts and acts on.
The most common failure of a review prompt is reviewing code the PR never touched. Pass the diff, state explicitly that only changed lines are in scope, and the model stops flagging pre-existing issues the author didn't introduce. Reviews get shorter, sharper, and far more likely to be acted on.
Prompt-craft patterns for sharper reviews
Three patterns turn a chatty reviewer into a useful one.
Severity tiers defined in the prompt. Don't let the model invent its own scale.
Severity tiers:
- critical: security hole, data loss, or broken core behavior
- high: likely bug or correctness risk
- medium: maintainability / unclear logic
- low: style / naming (group these; don't itemize)
The empty-review escape hatch. Tell it that "no findings" is a valid answer.
If the diff has no issues at or above medium severity, say so in one
line. Do not manufacture findings to fill the table.
File:line citation. Force a location on every row, or the finding is unactionable.
Here's the take most review-prompt posts won't commit to: turn off low-severity nitpicks for AI review entirely. Style is a linter's job, and a formatter settles it without an argument. When the model spends half its output on naming preferences, reviewers learn to ignore the whole table, including the critical row. Make the AI review do what linters can't: spot the logic bug and the injection. Let tooling handle the rest.
Variables you'll set
| Variable | Required | What it is |
|---|---|---|
{{diff}} | Yes | The unified diff of the PR, changed lines only |
{{language}} | Yes | Primary language, so the model applies the right idioms |
{{review_focus}} | No | security-first, correctness, or full; defaults to full |
A trust note worth stating: an AI reviewer misses things and occasionally hallucinates a finding on code it misread. It augments human review; it doesn't replace it. Treat critical and high findings as leads to verify, not verdicts to merge on. And re-confirm behavior after any model update, since review sharpness can shift between versions.
Getting started
- Generate the PR diff with
git diff main...HEADand copy it. - Decide the focus. For anything touching auth, payments, or input handling, go security-first.
- Paste the review prompt, fill
{{diff}},{{language}}, and{{review_focus}}. - Read the table top-down. Address critical and high before anything else.
- Re-run on the updated diff to confirm the high findings are gone.
- Keep the same prompt across the team so reviews are comparable.
- Save it so every PR starts from the same contract. The Pull Request Review Workflow Pack ships this
{{diff}}-driven review with the severity table built in.
The Pull Request Review Workflow Pack does this end-to-end: a {{diff}} variable feeds a reviewer that returns a severity-rated findings table (critical through low) with file:line citations and a suggested fix per row, plus the GPT-4o final-line restatement baked in so the table doesn't drift. It's part of The Complete AI Prompts Bundle, a one-time lifetime license to the whole catalog and every future pack, which earns out quickly if you run more than one of these coding jobs.
A review prompt grades a diff against issues; a rubric grades an agent's whole change against an outcome, which is why verifying AI coding agent output is the natural next read. And many of the issues a reviewer catches are injection-shaped, so the defenses in prompt injection defense for coding agents are worth pairing with this. If you're weighing a saved pack against a homemade prompt, how to choose a reusable AI prompt pack covers the math.
See the Agent Code Output Verification Rubric →Common questions
What makes a good AI security code review prompt?
Why use a saved prompt instead of just asking ChatGPT to review my code?
Does the review prompt behave the same on Claude and GPT-4o?
Get the prompt packs this guide is built on
Ready-to-paste prompts with documented variables and worked examples for ChatGPT, Claude, and Gemini. One-time payment, own it forever.
More prompt guides

A Production Readiness Review Prompt That Grades a Service
A service ships, and two weeks later it pages someone at 3 a.m. because nobody asked whether it had alerting before launch. The production readiness review checklist exists to catch that. Most teams k…

Write an AI Code Review Prompt That Actually Finds Bugs
A developer pastes a 400-line diff into ChatGPT, types "review this," and gets back three friendly paragraphs ending in "overall this looks solid." The off-by-one in the pagination loop is still there…

An AI PR Review Prompt Template for Clean Diffs
The difference between a PR review that catches the regression and one that waves it through usually isn't the model. It's whether the prompt has a workflow or just a wish. "Review this pull request"…