logo
Field Notes

Release

Quality is the New Currency

Why “technically correct” is the lowest form of success in the agentic era.

ByJeremy Wayland and Sidney Gathrid

Release

Krv-Labs/topos is open source Star — or jump straight to installation.

A New Breed of Fatigue

Every engineer who has used Claude Code or let an agent loose on their repo knows the exact flavor of dread that comes next: a notification pops up, and you are staring at an 800-line diff spanning 14 files, generated in under five minutes.

The CI pipeline is blindingly green. The micro-syntax is flawless. But as you start scrolling, reality hits you: we are now approving code we don't actually understand.

The cognitive load is asymmetrical. The agent solved the immediate ticket, sure, but it did so by hallucinating a slightly worse copy of a utility module that already existed three folders away, duct-taping a bizarre abstraction around a simple function. By line 200, your eyes glaze over. Exhausted by the sheer volume, you succumb to green-light fatigue, trust the tests, and hit Squash and Merge.

“We are generating code at machine speed but reviewing it at human speed.”

The binary pass/fail of a unit test only proves the code ran; it tells you absolutely nothing about whether the agent actually grokked your system. Without structural guardrails, agentic speed is automated entropy: more code, less clarity, until review collapses under its own weight.

Topos was built to break this cycle. Instead of reviewing 60-file diffs line by line, you review structural properties—and how they evolve. We've already changed how we write code. It's time to change how we review it. By parsing programs into their underlying mathematical graphs—control flow, module dependency, and data flow—Topos measures the true cost of a diff. It turns vague demands to “clean this up” into concrete, verifiable targets your agents can actually aim for.

Beyond Pass/Fail

Agents have made code cheap. That is useful, but it changes the bottleneck. The scarce resource is no longer a first draft of the implementation; it is judgment about whether the draft belongs in the codebase.

The default evaluation loop is still binary: tests either pass or fail. That only describes two states. Production software needs a richer vocabulary: readable but tightly coupled, secure but unmaintainable, locally correct but globally awkward, and everything in between.

“Correctness is table stakes.” Passing tests tells you the code ran. It does not tell you whether the agent understood the repo.

This is the agentic spray problem: when generation outruns codebase understanding, you get more code than clarity. Duplicate modules, fragile linkages, extra wrappers, and review cycles that burn tokens explaining context the agent should have respected. Topos gives that review a measurable object: the structure of the program itself.

How Topos Works

Topos measures code quality and turns it into something you can aim at: a Code Quality Medal per file. Concrete tiers with defined structural criteria—not a vague directive to “make it better.”

Each file is scored on three independent pillars. You can pass any combination; Gold means all three pass.

  • GOLD: Passes all 3 (Simple + Composable + Secure)
  • SILVER: Passes 2 of 3 (e.g., Simple & Secure)
  • BRONZE: Passes 1 of 3 (e.g., Composable only)
  • SLOP: Passes 0 (or fails to parse)

Set preferences to tell agents which medal (or pillar mix) matters most under time and token budgets. If Gold isn't reachable, they take the next-best medal on your priority list—always a defined next step, never a dead end.

Why Structure?

Agents choke on “make it better.” They thrive on structured feedback and achievable next steps. A map of quality states—not a single pass/fail bit—is what makes the difference. That map is the quality lattice behind the medals.

Most analysis tools scan syntax—style violations, known anti-patterns, missing semicolons. Topos analyzes the shape of the program itself: its structure, independent of what language it is written in. That distinction is what makes structural medals something agents can genuinely optimize toward rather than superficially satisfy.

Time and tokens are finite. Ideal code is not always achievable in one pass—and it shouldn't have to be. Topos lets you set pillar priorities so agents pursue the best program structure you can afford without burning cycles on dimensions you don't care about yet. Those preferences induce an order on the lattice: concrete instructions for exactly how to relax, medal by medal, instead of vague “polish everything” prompts.

  • The relaxation walk: Stuck on Gold? Your preference ranking plus the lattice always defines the next medal to aim for—the highest feasible step along the path you chose.
  • Anti-gaming: Because we measure graph structure, padding comments and shuffling lines don't fool the scorer—only genuine structural improvements move the needle.

The framework draws on category theory—a branch of mathematics built to specify what we mean in messy, multi-dimensional domains. For a gentler introduction, see Getting Started with Category Theory; for the formal construction, the deep dive is below.

What We Measure

Topos parses code into graph representations of program structure—abstract syntax trees, control-flow graphs, and dependency graphs—and runs structural probes on those graphs. For v0.3.0 we launch with three independent pillars; the framework extends to new ones over time.

Each pillar maps to a specific graph family:

SIMPLE

AST + CFG

Code complexity. Cyclomatic complexity and token entropy on the abstract syntax tree and control-flow graph.

COMPOSABLE

MDG

Module coupling. Martin instability and fan-out on the module dependency graph (via GitNexus).

SECURE

CPG

Data-flow safety. Dangerous API reachability and taint paths from the code property graph.

These are separate dimensions of quality—like stats on a character sheet, not one blended GPA.

Three pillars expand that to eight structural quality states. Medals group them into tiers agents and humans can act on:

Fig 1/Topos quality lattice/Eight structural states grouped into Gold / Silver / Bronze / Slop tiers.

The Topos Architecture & Case Study

Topos separates structural analysis into three layers—from parsing code into graphs, to scoring a single file, to comparing versions:

  • Representations: Programs aren't just text. Topos parses source into graph representationsAST, CFG, CPG, and MDG— the same structures behind the SIMPLE, COMPOSABLE, and SECURE pillars.
  • Probes: Point-in-time metrics on a single graph—cyclomatic complexity on a CFG, Martin instability on an MDG. They answer “how is this file doing right now?”
  • Profunctors (comparisons): Relational checks between two programs, or between code and tests. They flag cosmetic refactors (near-zero structural change) and power Structural Test Coverage—aligning test shape with production logic, not just line counts.

Case Study: Grading Code You Already Trust

A scorer is only believable once you watch it run on code you already have opinions about. So treat this as a case study, not a calibration. The pillar thresholds themselves were set on a much larger, multi-language corpus—we'll detail that work in a future, more technical post. Here we simply point Topos v0.3.0 at three libraries developers reach for every day—requests, numpy, and pandas—and score every parseable file to see how familiar code lands against the same bar.

1,927
Python files scored
3
Trusted libraries
3
Structural pillars

Each file earns three independent scores from 0 to 100 (higher is better)—one per pillar:

  • Simplicity — Is complexity under control (branching, function size, structure)?
  • Secure — Does it avoid risky API patterns?
  • Composable — Are module dependencies clean, or tangled? (requires a one-time repo map via GitNexus)

Read the panels below as a sanity check: how does code we already trust land against thresholds calibrated elsewhere? The answer is reassuring—and revealing about where even well-loved libraries leave structural room to improve. Expand any panel to dig in.

LibrarySimplicitySecureComposableFiles scanned
requests46906919
numpy419026487
pandas3598151,421
Fig 2/Average structural scores by library

Case study: Topos v0.3.0 scored across 1,927 parseable files in requests, numpy, and pandas. Pillar thresholds were calibrated separately on a larger multi-language corpus.

Try It On Your Codebase

Topos meets you where you review code—start in your editor, drop to the terminal, or wire it into an agent. Each path stands on its own.

RecommendedVS Code Extension - MCP ServerGet quality medals and pillar scores inline in VS Code—inside agents and chat, as you review.Install extension

Coding agents — the MCP server exposes Topos quality targets to any coding agent. Setup guide

Terminal — install the CLI, then evaluate any repo.

curl -fsSL https://docs.krv.ai/topos/install.sh | sh
topos evaluate . -r

The Judgment Gap

The agentic era doesn't have a code shortage. It has a judgment gap. Agents ship faster than understanding travels—and the distance between generation and comprehension is exactly where debt accumulates.

Topos closes that gap concretely. An agent generates a file and Topos returns a verdict: GOLD, SILVER, BRONZE, or SLOP. Not a list of vague complaints—the same structural scorecard we just ran across production-grade libraries like numpy, pandas, and requests. If your agent lands a BRONZE on cyclomatic complexity (too many branches), the relaxation walk tells it exactly what to fix to reach SILVER: split the function, reduce nesting, or extract a helper. Every step is measurable, every target is real.

Concrete structural targets let agents build toward maintainability, not just a green test suite. You can't prompt away bad architecture—but you can measure it.

The code your agents write today is the system your team will diagnose at 2 a.m. Measure it now.

Topos v0.3.0 is live and open for exploration. Follow @krv_labs for updates.

References & Artifacts

Sources, code, data, and demos.

Open source framework for defining and optimizing code generation priorities using finite lattices.

Core concepts, API reference, and guides for managing agentic code generation quality.

Bring Topos quality metrics directly into your editor with our dedicated VS Code extension.

A practical primer on category theory fundamentals—objects, morphisms, functors, and natural transformations—for readers new to the formal side of Topos.

J. Wayland's thesis on strategic voting under uncertainty and the commutative monoidal structure of elections, via category theory (UC Berkeley Mathematics honors thesis, 2019).

Sridhar Mahadevan's work on topos causal models: sheaves, subobject classifiers, and intuitionistic logic for specifying causal structure beyond Boolean SCMs (NeurIPS 2025).