Api reviewAi promptsPrompt rubricEvaluation

Turn an API Design Review Checklist into a Scored Prompt

A static api design review checklist tells you what to look at. A scored prompt rubric grades the design and emits pass/fail per dimension with evidence from the spec.

PPromptsCart Team·May 22, 2026·Updated June 14, 2026·6 min read

A team ships an API, consumers integrate, and six months later everyone's stuck with getUserData, fetchUserInfo, and user_details as three endpoints that return overlapping shapes. The design review was a meeting where someone skimmed the spec and said "looks fine." There was a checklist somewhere. Nobody scored against it.

That's the weakness of a prose api design review checklist: it lists what to look at and leaves the judgment to whoever's in the room that day. The same spec passes on Monday and fails on Friday depending on who's reviewing. The fix is to make the checklist executable — a scored prompt rubric that grades each dimension, cites evidence from the spec, and ranks fixes by how much they'll hurt consumers.

Why a static checklist drifts

A checklist is a memory aid, not a standard. It tells you to "check error handling" but not what good looks like or how much it matters. Three problems follow:

No weights. A naming nit and a missing pagination contract sit as equal checkboxes, so reviewers spend the same energy on both.
No evidence. "Versioning: OK" records an opinion with nothing behind it. Six months on, nobody knows why it was OK.
No ranking. The output is a flat list of ticks and crosses, so the team fixes the easy things and ships the painful ones.

A scored rubric closes each gap. Weighted dimensions force priority. An evidence requirement ties every score to a line in the spec. A risk-ranked fix list puts consumer pain first.

Definition first

An API design review rubric is a prompt that scores a spec across weighted dimensions — consistency, usability, versioning, scalability, operations — and returns a pass/fail per dimension with cited evidence and a prioritised fix list. The weights and evidence are what make it repeatable instead of subjective.

Anatomy of the scored rubric prompt

Variables → {{api_spec}}, {{consumer_scenarios}}, {{weights}}
Role      → API reviewer applying a fixed weighted rubric.
Dimensions→ consistency, usability, versioning, scalability, operations
Per dimension → score 1-5, weight, evidence (quote the spec), pass/fail
Composite → weighted score and an overall verdict
Fix list  → ordered by consumer pain, each with the dimension it lifts

The {{consumer_scenarios}} variable is the one teams forget. A design that scores well in the abstract can still be miserable for the actual integration paths. Feed the model "a mobile client paginating 10k records on a slow connection" and pagination problems that looked theoretical become blocking. Evidence beats taste, and consumer scenarios are where the evidence lives.

Model behavior when scoring a spec

Scoring is a different task than free-form review, and models handle it differently.

Claude is steady at holding a 1-to-5 scale with anchors and at quoting the spec as evidence rather than paraphrasing. GPT-4o scores fluently too, but without anchored criteria it clusters everything around 3-4, which makes the rubric useless. Both models will invent spec details that aren't there if the spec is incomplete, so an explicit "if the spec doesn't say, score it as a gap, don't assume" instruction keeps them honest. That one line prevents the most common failure: a generous score for behavior the API never actually documents.

Behavior	Claude	GPT-4o
Holds an anchored 1-5 scale	Reliable	Clusters mid-scale without anchors
Quotes spec as evidence	Strong	Paraphrases unless told to quote
Invents missing detail	Rare with the "score gaps as gaps" rule	Needs the rule stated explicitly
Ranks fixes by impact	Good with consumer scenarios	Good with consumer scenarios

Opinion worth holding

Rank the fix list by consumer pain, never by reviewer effort. The temptation is to lead with quick wins, but a quick win that no consumer feels is theater. The pagination contract that's annoying to add but saves every mobile client belongs at the top, even though it's the hard one.

Prompt-craft patterns for design rubrics

Pattern 1: anchor every score

5: meets REST/HTTP best practice with no gaps a consumer would hit
3: usable but with a documented rough edge
1: actively misleading or guaranteed to break a common client

Without anchors, a 3 means nothing. With them, two reviewers (or two runs) land in the same place.

Pattern 2: require a spec quote per finding

Make the model paste the offending line. "Inconsistent error format" is an opinion; the two different error bodies quoted side by side is proof. Quotes also make the review auditable later.

Pattern 3: separate score from fix

Score the design as it is. Then, separately, list what would raise each failing dimension. Mixing the two produces hedged scores ("it's a 3, but if you fixed X it'd be a 5"), which defeats the grading.

Variables you'll set

Variable	Required	What it is
`{{api_spec}}`	Yes	The OpenAPI doc, schema, or endpoint definitions
`{{consumer_scenarios}}`	No	Real integration paths to score usability against
`{{weights}}`	No	Per-dimension weights if the defaults don't fit
`{{standard}}`	No	House style or REST conventions to enforce

Getting started

Fix your five dimensions and their weights before you read a single spec.
Anchor the 1-to-5 scale with concrete descriptions, not adjectives.
Paste the spec into {{api_spec}} and the real integration paths into {{consumer_scenarios}}.
Add "score undocumented behavior as a gap; never assume intent."
Read the fix list. Is the highest-consumer-pain item first, even if it's the hard one?
Save the rubric so every API clears the same bar. The API Design Review Evaluation Rubric ships this: five scored dimensions with weights and a pass/fail verdict, each backed by cited spec evidence, ending in a fix list ordered by consumer pain.

See the API Design Review Rubric →

A design rubric checks the shape of the API. To guard against breaking that shape later, the API Contract Test Harness Pack generates tests that fail the moment a breaking change reaches the producer.

Skip the setup

The API Design Review Evaluation Rubric does this end-to-end. It scores consistency, usability, versioning, scalability, and operations against explicit criteria with weights, cites the spec for every score, and outputs a prioritised fix list — so a review is a graded artifact, not a meeting opinion. It's part of The Complete AI Prompts Bundle, a one-time lifetime license to the whole catalog plus future packs, worth it once you're reviewing more than one service's API.

Get the API Design Review Rubric →

A weighted rubric is the same machine whether you're grading an API or a whole repository. For the codebase-level version, see the repo health scorecard prompt. And if your API review is one gate inside a larger PR flow, the AI PR review prompt template shows how the verdict slots in.

Browse the developer prompt packs →

FAQ

Common questions

What should an API design review checklist cover?

A useful api design review checklist covers consistency (naming, errors, status codes), usability, versioning, scalability (pagination, rate limits), and operational impact (caching, observability). A scored prompt rubric turns each into a graded dimension instead of a yes/no box.

Why turn a checklist into a prompt rubric?

A prose checklist relies on the reviewer's judgment and mood. A prompt rubric applies the same weighted criteria every time, cites evidence from the spec for each score, and ends in a prioritised fix list ordered by consumer pain rather than reviewer taste.

Can AI review an OpenAPI spec for design quality?

Yes. Paste the OpenAPI doc into the prompt and the model scores naming, error contracts, versioning, and pagination against explicit criteria. It won't know your business context, so feed it the consumer scenarios alongside the spec.

Stop reading. Start shipping.

Get the prompt packs this guide is built on

Ready-to-paste prompts with documented variables and worked examples for ChatGPT, Claude, and Gemini. One-time payment, own it forever.

Browse prompt packs ← All articles

More prompt guides

All posts

An Integration Test Generation Prompt Built From Your API Contract

7 min read

Ai promptsIntegration testsApi testing

An Integration Test Generation Prompt Built From Your API Contract

Ask a model to write integration tests for an API with nothing but a one-line description and you get a handful of happy-path calls that pass on day one and catch nothing on day ninety. The cases that…

Jul 30, 2026Read more →

A Mutation Testing Prompt That Writes Tests to Kill Survivors

7 min read

Ai promptsMutation testingClaude prompts

A Mutation Testing Prompt That Writes Tests to Kill Survivors

A suite at 90% line coverage feels safe. Then you flip a to a in the code, run the tests, and they all still pass. That mutant survived. Your coverage number measured lines executed, not behavior chec…

Jul 29, 2026Read more →

Generate Unit Tests With AI: A Prompt That Targets Untested Code First

7 min read

Ai promptsUnit testsClaude prompts

Generate Unit Tests With AI: A Prompt That Targets Untested Code First

Most teams already use AI to write code. Far fewer use it to write the tests that catch when that code breaks. The gap shows up the first time a refactor sails through a green suite that never actuall…

Jul 28, 2026Read more →