Ai promptsCodingClaude promptsChatgpt

The AI Prompt to Review a Pull Request (With a Findings Contract)

Most code-review prompts are one-off and vague. Use an AI security code review prompt with a diff variable and a severity-rated findings contract on every PR.

PPromptsCart Team·June 1, 2026·Updated June 14, 2026·7 min read

A pull request review prompt that you retype from scratch every time isn't a workflow. It's a habit you'll skip the moment you're busy. The reusable version, with a real AI security code review prompt at its core, is the difference between catching the SQL injection on line 40 and merging it.

The prompts that rank today are ad-hoc. The DEV post on code-review AI prompts offers solid tips. Graphite's prompt-engineering guide for code reviews gives good guidance. Jose Casanova's PR review prompt is one usable prompt. But it has no variables, no severity tiers, and no locked output shape. Run it twice and you get two different formats.

The gap is a packaged review with a {{diff}} variable and a findings contract that produces the same severity-rated output every PR. That's what makes reviews comparable instead of vibes.

What a real review prompt produces

A pull request review prompt is a saved prompt that takes a diff and returns ranked findings in a fixed shape: severity, location, issue, fix. The format is the product.

What you want from it on every PR:

Review only the changed lines, not the whole file the change lives in
Rate each finding by severity so you triage high before nits
Pin each finding to a file and line, so a teammate can jump straight there
Flag security issues specifically: injection, secrets, auth gaps, unsafe deserialization
Suggest a concrete fix, not "consider improving this"
Stay quiet when the diff is clean instead of inventing problems
Return the same columns on Claude, ChatGPT, and Gemini

The security framing isn't decoration. The highest-cost review misses are security ones, and a general "review this code" prompt buries them under formatting opinions. Naming security as a first-class severity tier pulls them to the top.

The anatomy of the review prompt

The prompt takes the diff and the review focus, then emits a findings table under a locked contract.

Variables → {{diff}}, {{review_focus}}, {{language}}
Prompt    → role: senior reviewer scoped to the changed lines
            task: find correctness, security, and clarity issues
            rule: rate severity; cite file:line; propose a fix
Output    → table: severity | file:line | issue | suggested fix

The output contract goes last for a reason. On a long {{diff}}, a contract stated at the top gets out-weighted by the recent diff tokens, and the model slides back into prose. Restate the columns on the final line and the table holds, especially on GPT-4o.

1. Gather inputs

Grab the unified diff for the PR (git diff main...HEAD), note the language, and decide the focus: security-first, correctness, or a full pass.

2. Fill the variables

Paste the diff into {{diff}}, set {{language}}, and set {{review_focus}}. Keep the diff to the actual change. Don't paste whole files; that's what makes models review unchanged code.

3. Run the prompt

Run it. You get a severity-sorted findings table, not an essay. High and critical at the top, style nits at the bottom. If the table comes back as prose instead, the contract slid out of the model's attention on a long diff; that's the cue to restate the columns on the final line and run again. On Claude this rarely happens. On GPT-4o it happens just often enough that the restatement is worth baking into the saved prompt rather than fixing by hand each time.

4. Triage by severity

Work top-down. Fix or push back on each high finding. Most style nits can wait or get auto-formatted away.

5. Re-run on the updated diff

After changes, re-run on the new diff. The fixed findings drop off; new ones surface. Same contract, comparable output. This is where the saved prompt pays for itself: because the format is locked, you can diff two review runs and see at a glance which findings the author actually resolved. An ad-hoc re-ask gives you a fresh essay every time, with no way to line up "what was flagged" against "what got fixed." Over a week of PRs, that consistency is what turns AI review from a novelty into something the team actually trusts and acts on.

Scope the review to the diff, not the file

The most common failure of a review prompt is reviewing code the PR never touched. Pass the diff, state explicitly that only changed lines are in scope, and the model stops flagging pre-existing issues the author didn't introduce. Reviews get shorter, sharper, and far more likely to be acted on.

Prompt-craft patterns for sharper reviews

Three patterns turn a chatty reviewer into a useful one.

Severity tiers defined in the prompt. Don't let the model invent its own scale.

Severity tiers:
- critical: security hole, data loss, or broken core behavior
- high: likely bug or correctness risk
- medium: maintainability / unclear logic
- low: style / naming (group these; don't itemize)

The empty-review escape hatch. Tell it that "no findings" is a valid answer.

If the diff has no issues at or above medium severity, say so in one
line. Do not manufacture findings to fill the table.

File:line citation. Force a location on every row, or the finding is unactionable.

Here's the take most review-prompt posts won't commit to: turn off low-severity nitpicks for AI review entirely. Style is a linter's job, and a formatter settles it without an argument. When the model spends half its output on naming preferences, reviewers learn to ignore the whole table, including the critical row. Make the AI review do what linters can't: spot the logic bug and the injection. Let tooling handle the rest.

Variables you'll set

Variable	Required	What it is
`{{diff}}`	Yes	The unified diff of the PR, changed lines only
`{{language}}`	Yes	Primary language, so the model applies the right idioms
`{{review_focus}}`	No	security-first, correctness, or full; defaults to full

A trust note worth stating: an AI reviewer misses things and occasionally hallucinates a finding on code it misread. It augments human review; it doesn't replace it. Treat critical and high findings as leads to verify, not verdicts to merge on. And re-confirm behavior after any model update, since review sharpness can shift between versions.

Getting started

Generate the PR diff with git diff main...HEAD and copy it.
Decide the focus. For anything touching auth, payments, or input handling, go security-first.
Paste the review prompt, fill {{diff}}, {{language}}, and {{review_focus}}.
Read the table top-down. Address critical and high before anything else.
Re-run on the updated diff to confirm the high findings are gone.
Keep the same prompt across the team so reviews are comparable.
Save it so every PR starts from the same contract. The Pull Request Review Workflow Pack ships this {{diff}}-driven review with the severity table built in.

Browse the coding prompt packs →

Skip the setup

The Pull Request Review Workflow Pack does this end-to-end: a {{diff}} variable feeds a reviewer that returns a severity-rated findings table (critical through low) with file:line citations and a suggested fix per row, plus the GPT-4o final-line restatement baked in so the table doesn't drift. It's part of The Complete AI Prompts Bundle, a one-time lifetime license to the whole catalog and every future pack, which earns out quickly if you run more than one of these coding jobs.

Get the Pull Request Review Workflow Pack →

A review prompt grades a diff against issues; a rubric grades an agent's whole change against an outcome, which is why verifying AI coding agent output is the natural next read. And many of the issues a reviewer catches are injection-shaped, so the defenses in prompt injection defense for coding agents are worth pairing with this. If you're weighing a saved pack against a homemade prompt, how to choose a reusable AI prompt pack covers the math.

See the Agent Code Output Verification Rubric →

FAQ

Common questions

What makes a good AI security code review prompt?

A good AI security code review prompt takes the diff as a variable, scopes the review to changed lines, and returns findings in a fixed shape: severity, file and line, the issue, and a suggested fix. Without a severity-rated output contract, the model returns a wall of prose you still have to triage by hand.

Why use a saved prompt instead of just asking ChatGPT to review my code?

An ad-hoc ask gives you a different format every time and quietly reviews the whole file instead of the change. A saved prompt with a {{diff}} variable and a findings contract produces the same severity-rated output on every PR, so reviews are comparable and skimmable.

Does the review prompt behave the same on Claude and GPT-4o?

Close, with one tweak. Claude honors a findings table under an output-format heading reliably. GPT-4o needs the column contract restated on the final line of the prompt, or it drifts back into prose. Gemini tends to flag style nits as high severity, so define the severity tiers explicitly.

Stop reading. Start shipping.

Get the prompt packs this guide is built on

Ready-to-paste prompts with documented variables and worked examples for ChatGPT, Claude, and Gemini. One-time payment, own it forever.

Browse prompt packs ← All articles

More prompt guides

All posts

An Integration Test Generation Prompt Built From Your API Contract

7 min read

Ai promptsIntegration testsApi testing

An Integration Test Generation Prompt Built From Your API Contract

Ask a model to write integration tests for an API with nothing but a one-line description and you get a handful of happy-path calls that pass on day one and catch nothing on day ninety. The cases that…

Jul 30, 2026Read more →

A Mutation Testing Prompt That Writes Tests to Kill Survivors

7 min read

Ai promptsMutation testingClaude prompts

A Mutation Testing Prompt That Writes Tests to Kill Survivors

A suite at 90% line coverage feels safe. Then you flip a to a in the code, run the tests, and they all still pass. That mutant survived. Your coverage number measured lines executed, not behavior chec…

Jul 29, 2026Read more →

Generate Unit Tests With AI: A Prompt That Targets Untested Code First

7 min read

Ai promptsUnit testsClaude prompts

Generate Unit Tests With AI: A Prompt That Targets Untested Code First

Most teams already use AI to write code. Far fewer use it to write the tests that catch when that code breaks. The gap shows up the first time a refactor sails through a green suite that never actuall…

Jul 28, 2026Read more →