Repo healthAi promptsPrompt rubricEvaluation

A Repo Health Scorecard Prompt for Any Codebase

What a repo health scorecard prompt is and how to build one that scores tests, docs, CI, and dependencies — then turns the score into a fundable improvement plan.

PPromptsCart Team·January 12, 2026·Updated June 14, 2026·7 min read

A new engineer joins, clones the repo, and spends two days just figuring out how to run it. The tests pass locally but not in CI, the README documents a setup that changed a year ago, and three dependencies are two majors behind. None of this is in a ticket. It's the ambient cost of an unhealthy repo, and nobody measured it until it slowed the team down.

A repo health scorecard prompt measures it on purpose. It scores the repository across the dimensions that actually predict friction — tests, docs, CI, dependencies, structure — and turns the result into a plan a lead can fund. The tools and metric definitions you'll find searching this topic stop at "here's what a health score is." They don't give you a prompt that reads your repo and produces one.

What a repo health scorecard prompt is

It's a reusable prompt that applies a weighted rubric to a codebase and returns a score per dimension, a composite, and a ranked list of what to fix first. The rubric is fixed; the repo is the variable. That's what lets the same prompt grade a Python service and a TypeScript monorepo without rewriting it.

The dimensions that matter aren't mysterious:

Tests. Do they exist, run, and cover the risky paths?
Docs. Can someone set up and contribute without asking?
CI. Is it green, fast, and actually gating?
Dependencies. Current, pinned, and free of known risk?
Structure. Can a human or an agent navigate it?

Definition first

A repo health scorecard prompt scores a repository across weighted dimensions and outputs a number per dimension plus a sequenced improvement plan. It works on any codebase because the standard lives in the rubric, not in the repo, so the same prompt grades wildly different projects on the same scale.

Why a scanning tool isn't the whole answer

Tools like OpenSSF Scorecard are genuinely useful, and they measure things a prompt shouldn't try to: branch protection, signed releases, pinned dependencies. But they only see what's mechanically checkable. They can't read your README and judge whether a stranger could follow it. They can't tell you the test suite is technically present but covers only the happy path.

That subjective layer is where a scorecard prompt earns its keep. It reads the structure and the docs the way a reviewer would, scores the soft dimensions, and writes a sentence per score that a manager can act on. Run both: the scanner for the mechanical signals, the prompt for the judgment ones.

Agent-readiness: the dimension nobody scores

Here's the one most health checks miss. Agent-readiness scores how well an AI coding assistant can work in the repo today. Clear module boundaries, documented conventions, a build that runs from a clean clone, tests an agent can execute and read. It sounds futuristic, but it's just hygiene with a new name. A repo an agent can navigate is almost always one a new hire can navigate too, because both are defeated by the same things: hidden setup steps, undocumented conventions, and a test suite that only the original author can run.

Model behavior when scoring a repo

Scoring a whole repo stresses the model's tendency to guess at things it can't see.

Claude is comfortable saying "this dimension can't be scored from what's provided" rather than inventing a number, which keeps the scorecard honest. GPT-4o scores fluently but will confidently rate test coverage it never actually saw unless you require evidence per score. Both behave far better when you feed them the repo tree and key files rather than asking them to imagine a typical repo. The fix is the same as any rubric: anchor the scale, demand evidence, and explicitly allow "insufficient information" as a result so the model stops bluffing.

Behavior	Claude	GPT-4o
Admits when it can't score a dimension	Comfortable	Bluffs unless evidence is required
Holds an anchored scale	Reliable	Clusters mid-range without anchors
Scores agent-readiness coherently	Strong	Strong with the dimension defined
Produces a sequenced plan	Good with effort-vs-impact framing	Good with the framing

Opinion worth holding

A health score without a funded plan is a vanity metric. The number's only job is to justify the work that follows. So weight the rubric toward what unblocks the team, sequence the fixes by impact over effort, and attach owners. A scorecard that ends in "you're a 6/10" and nothing else gets admired once and ignored forever.

Prompt-craft patterns for repo scoring

Pattern 1: feed structure, don't ask for imagination

Give the model the directory tree, the README, the CI config, and the dependency manifest. A repo scored from these is grounded. A repo scored from "assume a normal project" is fiction.

Pattern 2: require evidence per dimension

Each score cites what it's based on: "tests scored 2 — tests/ exists but covers only utils, no integration tests for the API layer." That sentence is what makes the score defensible and the fix obvious.

Pattern 3: end in a sequenced plan, not a number

The composite score is the headline; the plan is the product. Order fixes by impact over effort, suggest an owner, and give each an acceptance criterion. Now it's fundable.

Variables you'll set

Variable	Required	What it is
`{{repo_tree}}`	Yes	The directory structure and key file list
`{{key_files}}`	Yes	README, CI config, dependency manifest, sample tests
`{{weights}}`	No	Per-dimension weights if the defaults don't fit
`{{context}}`	No	Team size or stage, to calibrate what "healthy" means

Getting started

Pick five weighted dimensions. Tests, docs, CI, dependencies, structure is a strong default; add agent-readiness if AI assistants touch the repo.
Anchor the scale and require an evidence sentence per score.
Feed the model the tree and key files; don't make it guess.
Allow "insufficient information" so it stops bluffing missing data.
Demand a sequenced plan with owners and acceptance criteria, not just a number.
Save the rubric so every repo gets scored the same way. The Repo Health Scorecard Rubric ships this: five weighted dimensions including agent-readiness, gaps ranked by business impact and effort, and a sequenced improvement plan with owner suggestions and acceptance criteria.

See the Repo Health Scorecard Rubric →

Before a service goes live, the related Production Readiness Review Rubric scores reliability, observability, scalability, and security with the same weighted-rubric approach and a pass/fail verdict.

Skip the setup

The Repo Health Scorecard Rubric does this end-to-end. It scores five weighted dimensions in one consistent pass, includes the agent-readiness dimension most checks skip, and turns the result into a prioritised, fundable plan with owners and acceptance criteria, so the score actually drives work. It's part of The Complete AI Prompts Bundle, a one-time lifetime license to the whole catalog and every pack added later, which pays off once you're grading more than one repo.

Get the Repo Health Scorecard Rubric →

The weighted-rubric pattern here is the same engine that grades an API surface — see the API design review checklist as a scored prompt for that sibling. And if you're weighing whether a packaged rubric beats writing your own, how to choose a reusable AI prompt pack lays out what to look for.

Browse the developer prompt packs →

FAQ

Common questions

What is a repo health scorecard prompt?

A repo health scorecard prompt is a reusable prompt that scores a repository across weighted dimensions — tests, docs, CI, dependencies, structure — and outputs a number per dimension plus a prioritised improvement plan. It works on any codebase because the rubric, not the repo, defines the standard.

How is this different from a tool like OpenSSF Scorecard?

Scanning tools measure fixed signals like branch protection or pinned dependencies. A repo health scorecard prompt reads the actual structure and docs, scores subjective dimensions a scanner can't, and explains each score in prose you can hand to a manager.

What is agent-readiness in a repo health score?

Agent-readiness scores how well an AI coding assistant can work in the repo: clear structure, documented conventions, runnable tests, and a sane build. A repo an agent can navigate is usually one a new human hire can navigate too.

Stop reading. Start shipping.

Get the prompt packs this guide is built on

Ready-to-paste prompts with documented variables and worked examples for ChatGPT, Claude, and Gemini. One-time payment, own it forever.

Browse prompt packs ← All articles

More prompt guides

All posts

An Integration Test Generation Prompt Built From Your API Contract

7 min read

Ai promptsIntegration testsApi testing

An Integration Test Generation Prompt Built From Your API Contract

Ask a model to write integration tests for an API with nothing but a one-line description and you get a handful of happy-path calls that pass on day one and catch nothing on day ninety. The cases that…

Jul 30, 2026Read more →

A Mutation Testing Prompt That Writes Tests to Kill Survivors

7 min read

Ai promptsMutation testingClaude prompts

A Mutation Testing Prompt That Writes Tests to Kill Survivors

A suite at 90% line coverage feels safe. Then you flip a to a in the code, run the tests, and they all still pass. That mutant survived. Your coverage number measured lines executed, not behavior chec…

Jul 29, 2026Read more →

Generate Unit Tests With AI: A Prompt That Targets Untested Code First

7 min read

Ai promptsUnit testsClaude prompts

Generate Unit Tests With AI: A Prompt That Targets Untested Code First

Most teams already use AI to write code. Far fewer use it to write the tests that catch when that code breaks. The gap shows up the first time a refactor sails through a green suite that never actuall…

Jul 28, 2026Read more →