Security reviewAi promptsClaude promptsCode review

A Security Code Review Prompt Mapped to CWE

Build a security code review prompt that maps findings to CWE IDs and ranks them by risk — not a single research prompt, but a reusable, triaged output contract.

PPromptsCart Team·December 27, 2025·Updated June 14, 2026·7 min read

A security review prompt that says "this code might have vulnerabilities" has done nothing. Nobody can file that, fix it, or prove it's resolved. The research papers and one-off prompts that dominate this topic stop right there: they show that a model can find a flaw, then leave you with prose you can't act on.

The missing piece is structure. A security code review prompt earns its place when every finding carries a CWE ID, a severity, the exact location, and a fix, ranked so the exploitable thing is at the top. That's the difference between a demo and a tool you run on every diff.

This post is about building that prompt: mapped to CWE, triaged by risk, and reusable across files instead of rewritten each time.

Why "find the vulnerabilities" isn't enough

Point a model at code and ask for security issues, and you get a grab bag. Some real, some imagined, none ranked, none tied to anything you can track. Three failures repeat:

No taxonomy. Without CWE or OWASP categories, the model describes flaws in its own words, so the same bug gets named differently every run and you can't dedupe.
No severity anchor. A reflected XSS and a verbose error message read as equally urgent, so triage is impossible.
No remediation. "Sanitise the input" isn't a fix. Which input, which sink, what does safe look like here?

A mapped, triaged contract closes all three. CWE gives findings a stable identity. A severity scale ranks them. A required fix column forces the model from "this is risky" to "do this."

Definition first

A security code review prompt is a reusable prompt that scans code for exploitable weaknesses, maps each to a CWE ID, rates its severity and likelihood, and returns a triaged list with concrete remediation. The CWE mapping is what turns a vague warning into a trackable finding.

Anatomy of a CWE-mapped review prompt

Variables → {{code_or_diff}}, {{language}}, {{trust_boundaries}}
Role      → Application security reviewer.
Scope     → OWASP Top 10 plus the language's common CWE classes
Per finding → CWE-ID | TITLE | FILE:LINE | severity | likelihood | exploit path | fix
Triage    → sort by severity × likelihood (risk), highest first
Verdict   → counts by severity, and a ship/hold recommendation

The {{trust_boundaries}} variable matters more than it looks. Most exploitable bugs live where untrusted input crosses into trusted code: a request body hitting a query, a filename reaching the filesystem. Tell the model where those boundaries are and its false-positive rate drops sharply. Leave it out and the model flags every string concatenation it sees.

Model behavior on security review

Security review punishes sloppy prompting harder than style review, because a missed finding is an open door.

Claude tends to hold the CWE mapping and the per-finding columns across a long file, and it's comfortable saying a stretch of code is clean. GPT-4o is stronger at spotting the obscure injection sink but more likely to inflate severity and to keep listing low-value nits unless you anchor the scale. Both improve dramatically when you name the language: a security review prompt that knows it's reading Python flags pickle and subprocess shell=True; the same prompt unaware of the language treats them as ordinary calls.

Behavior	Claude	GPT-4o
Stable CWE mapping across a file	Reliable	Reliable when the contract is restated last
Severity discipline	Holds with anchors	Inflates without explicit anchors
Finding obscure sinks	Good	Often better
Calling clean code clean	Comfortable	Tends to pad with low findings

Opinion worth holding

Run the security pass separately from the general review. Bundling "check security" into a do-everything review prompt guarantees it gets the leftover attention. A dedicated pass with its own CWE contract finds more, because the model isn't also juggling style and architecture in the same breath.

Prompt-craft patterns for security work

Pattern 1: demand the exploit path

Require a one-line "how it's exploited" per finding. This single column kills most false positives, because the model has to actually trace input to sink. If it can't write the path, the finding usually isn't real.

Pattern 2: rank by risk, not by count

Risk is severity times likelihood. A critical flaw that needs admin access ranks below a medium flaw any anonymous user can hit. Make the model compute and sort on that, so the report leads with what an attacker reaches first.

Pattern 3: anchor severity to impact

critical: remote code execution, auth bypass, or data exfiltration
high:     injection or access-control flaw with a clear exploit path
medium:   exploitable only with chaining or unusual preconditions
low:      hardening gap; not directly exploitable

Without anchors, "high" means whatever the model felt. With them, severity is comparable across runs and across reviewers.

Variables you'll set

Variable	Required	What it is
`{{code_or_diff}}`	Yes	The code or diff under review
`{{language}}`	Yes	Language and framework, so the model knows the dangerous sinks
`{{trust_boundaries}}`	No	Where untrusted input enters the code
`{{scope}}`	No	Narrow to a CWE class or OWASP category

Getting started

Set the CWE classes in scope. The OWASP Top 10 plus your language's usual suspects is a strong start.
Write the per-finding row with a CWE column and an exploit-path column. Lock it.
Always fill {{language}}; it changes which sinks the model treats as dangerous.
Run it on code with a known CVE-style bug. Did it land the right CWE and rank it correctly?
Add "report clean files as clean; do not pad with low findings."
For a full assessment rather than a single file, the Application Security Audit Playbook runs a four-phase pass: map the threat surface, review against OWASP Top 10, prioritise by risk, and produce a phased remediation roadmap.

See the Application Security Audit Playbook →

If the code you're reviewing is an AI agent rather than a web app, the relevant weakness class shifts. The Agent Prompt-Injection Defense Harness covers the injection surface that lives in repo content, issues, and tool output.

Skip the setup

The Application Security Audit Playbook does this end-to-end. It maps every attack vector and trust boundary first, then reviews against OWASP Top 10 with a structured checklist, and ends in a prioritised remediation roadmap with effort estimates, so findings turn into a plan instead of a wall of warnings. It's part of The Complete AI Prompts Bundle, a one-time lifetime license to the whole catalog plus future packs, which pays off the moment you're securing more than one service.

Get the Application Security Audit Playbook →

Security is one dimension of a complete review, not the whole thing. For the broader correctness-and-severity layer, see the ai code review prompt that actually finds bugs. For the agent-specific threat that a normal security review won't catch, read prompt injection defense for AI agents.

Browse the developer prompt packs →

FAQ

Common questions

What is a security code review prompt?

A security code review prompt is a reusable prompt that scans a diff or file for vulnerabilities, maps each finding to a CWE ID, rates severity, and outputs a triaged remediation list. It differs from a general review prompt by focusing only on exploitable weaknesses.

Why map findings to CWE?

CWE IDs give every finding a stable, searchable identity. 'Possible injection' is an opinion; 'CWE-89 SQL Injection at line 142' is a ticket. Mapping to CWE also lets you dedupe, trend, and prove coverage across the OWASP Top 10.

Can an AI security review replace a pentest?

No. An ai security review prompt catches common, code-visible weaknesses fast and cheap, which is most of what slips through. It won't find logic flaws that need runtime context or chained exploits. Treat it as triage before a specialist, not instead of one.

Stop reading. Start shipping.

Get the prompt packs this guide is built on

Ready-to-paste prompts with documented variables and worked examples for ChatGPT, Claude, and Gemini. One-time payment, own it forever.

Browse prompt packs ← All articles

More prompt guides

All posts

An Integration Test Generation Prompt Built From Your API Contract

7 min read

Ai promptsIntegration testsApi testing

An Integration Test Generation Prompt Built From Your API Contract

Ask a model to write integration tests for an API with nothing but a one-line description and you get a handful of happy-path calls that pass on day one and catch nothing on day ninety. The cases that…

Jul 30, 2026Read more →

A Mutation Testing Prompt That Writes Tests to Kill Survivors

7 min read

Ai promptsMutation testingClaude prompts

A Mutation Testing Prompt That Writes Tests to Kill Survivors

A suite at 90% line coverage feels safe. Then you flip a to a in the code, run the tests, and they all still pass. That mutant survived. Your coverage number measured lines executed, not behavior chec…

Jul 29, 2026Read more →

Generate Unit Tests With AI: A Prompt That Targets Untested Code First

7 min read

Ai promptsUnit testsClaude prompts

Generate Unit Tests With AI: A Prompt That Targets Untested Code First

Most teams already use AI to write code. Far fewer use it to write the tests that catch when that code breaks. The gap shows up the first time a refactor sails through a green suite that never actuall…

Jul 28, 2026Read more →