Skip to main content
API REVIEWAI PROMPTSPROMPT RUBRICEVALUATION

Turn an API Design Review Checklist into a Scored Prompt

A static api design review checklist tells you what to look at. A scored prompt rubric grades the design and emits pass/fail per dimension with evidence from the spec.

PPromptsCart Team·May 22, 2026·Updated June 14, 2026·6 min read

A team ships an API, consumers integrate, and six months later everyone's stuck with getUserData, fetchUserInfo, and user_details as three endpoints that return overlapping shapes. The design review was a meeting where someone skimmed the spec and said "looks fine." There was a checklist somewhere. Nobody scored against it.

That's the weakness of a prose api design review checklist: it lists what to look at and leaves the judgment to whoever's in the room that day. The same spec passes on Monday and fails on Friday depending on who's reviewing. The fix is to make the checklist executable — a scored prompt rubric that grades each dimension, cites evidence from the spec, and ranks fixes by how much they'll hurt consumers.

Why a static checklist drifts

A checklist is a memory aid, not a standard. It tells you to "check error handling" but not what good looks like or how much it matters. Three problems follow:

  1. No weights. A naming nit and a missing pagination contract sit as equal checkboxes, so reviewers spend the same energy on both.
  2. No evidence. "Versioning: OK" records an opinion with nothing behind it. Six months on, nobody knows why it was OK.
  3. No ranking. The output is a flat list of ticks and crosses, so the team fixes the easy things and ships the painful ones.

A scored rubric closes each gap. Weighted dimensions force priority. An evidence requirement ties every score to a line in the spec. A risk-ranked fix list puts consumer pain first.

Definition first

An API design review rubric is a prompt that scores a spec across weighted dimensions — consistency, usability, versioning, scalability, operations — and returns a pass/fail per dimension with cited evidence and a prioritised fix list. The weights and evidence are what make it repeatable instead of subjective.

Anatomy of the scored rubric prompt

Variables → {{api_spec}}, {{consumer_scenarios}}, {{weights}}
Role      → API reviewer applying a fixed weighted rubric.
Dimensions→ consistency, usability, versioning, scalability, operations
Per dimension → score 1-5, weight, evidence (quote the spec), pass/fail
Composite → weighted score and an overall verdict
Fix list  → ordered by consumer pain, each with the dimension it lifts

The {{consumer_scenarios}} variable is the one teams forget. A design that scores well in the abstract can still be miserable for the actual integration paths. Feed the model "a mobile client paginating 10k records on a slow connection" and pagination problems that looked theoretical become blocking. Evidence beats taste, and consumer scenarios are where the evidence lives.

Model behavior when scoring a spec

Scoring is a different task than free-form review, and models handle it differently.

Claude is steady at holding a 1-to-5 scale with anchors and at quoting the spec as evidence rather than paraphrasing. GPT-4o scores fluently too, but without anchored criteria it clusters everything around 3-4, which makes the rubric useless. Both models will invent spec details that aren't there if the spec is incomplete, so an explicit "if the spec doesn't say, score it as a gap, don't assume" instruction keeps them honest. That one line prevents the most common failure: a generous score for behavior the API never actually documents.

BehaviorClaudeGPT-4o
Holds an anchored 1-5 scaleReliableClusters mid-scale without anchors
Quotes spec as evidenceStrongParaphrases unless told to quote
Invents missing detailRare with the "score gaps as gaps" ruleNeeds the rule stated explicitly
Ranks fixes by impactGood with consumer scenariosGood with consumer scenarios
Opinion worth holding

Rank the fix list by consumer pain, never by reviewer effort. The temptation is to lead with quick wins, but a quick win that no consumer feels is theater. The pagination contract that's annoying to add but saves every mobile client belongs at the top, even though it's the hard one.

Prompt-craft patterns for design rubrics

Pattern 1: anchor every score

5: meets REST/HTTP best practice with no gaps a consumer would hit
3: usable but with a documented rough edge
1: actively misleading or guaranteed to break a common client

Without anchors, a 3 means nothing. With them, two reviewers (or two runs) land in the same place.

Pattern 2: require a spec quote per finding

Make the model paste the offending line. "Inconsistent error format" is an opinion; the two different error bodies quoted side by side is proof. Quotes also make the review auditable later.

Pattern 3: separate score from fix

Score the design as it is. Then, separately, list what would raise each failing dimension. Mixing the two produces hedged scores ("it's a 3, but if you fixed X it'd be a 5"), which defeats the grading.

Variables you'll set

VariableRequiredWhat it is
{{api_spec}}YesThe OpenAPI doc, schema, or endpoint definitions
{{consumer_scenarios}}NoReal integration paths to score usability against
{{weights}}NoPer-dimension weights if the defaults don't fit
{{standard}}NoHouse style or REST conventions to enforce

Getting started

  1. Fix your five dimensions and their weights before you read a single spec.
  2. Anchor the 1-to-5 scale with concrete descriptions, not adjectives.
  3. Paste the spec into {{api_spec}} and the real integration paths into {{consumer_scenarios}}.
  4. Add "score undocumented behavior as a gap; never assume intent."
  5. Read the fix list. Is the highest-consumer-pain item first, even if it's the hard one?
  6. Save the rubric so every API clears the same bar. The API Design Review Evaluation Rubric ships this: five scored dimensions with weights and a pass/fail verdict, each backed by cited spec evidence, ending in a fix list ordered by consumer pain.
See the API Design Review Rubric

A design rubric checks the shape of the API. To guard against breaking that shape later, the API Contract Test Harness Pack generates tests that fail the moment a breaking change reaches the producer.

Skip the setup

The API Design Review Evaluation Rubric does this end-to-end. It scores consistency, usability, versioning, scalability, and operations against explicit criteria with weights, cites the spec for every score, and outputs a prioritised fix list — so a review is a graded artifact, not a meeting opinion. It's part of The Complete AI Prompts Bundle, a one-time lifetime license to the whole catalog plus future packs, worth it once you're reviewing more than one service's API.

Get the API Design Review Rubric

A weighted rubric is the same machine whether you're grading an API or a whole repository. For the codebase-level version, see the repo health scorecard prompt. And if your API review is one gate inside a larger PR flow, the AI PR review prompt template shows how the verdict slots in.

Browse the developer prompt packs
FAQ

Common questions

What should an API design review checklist cover?
A useful api design review checklist covers consistency (naming, errors, status codes), usability, versioning, scalability (pagination, rate limits), and operational impact (caching, observability). A scored prompt rubric turns each into a graded dimension instead of a yes/no box.
Why turn a checklist into a prompt rubric?
A prose checklist relies on the reviewer's judgment and mood. A prompt rubric applies the same weighted criteria every time, cites evidence from the spec for each score, and ends in a prioritised fix list ordered by consumer pain rather than reviewer taste.
Can AI review an OpenAPI spec for design quality?
Yes. Paste the OpenAPI doc into the prompt and the model scores naming, error contracts, versioning, and pagination against explicit criteria. It won't know your business context, so feed it the consumer scenarios alongside the spec.
Stop reading. Start shipping.

Get the prompt packs this guide is built on

Ready-to-paste prompts with documented variables and worked examples for ChatGPT, Claude, and Gemini. One-time payment, own it forever.