20 May 2026
The constitutional stack: why we replaced markdown with JSON
Every team that works with AI agents eventually writes the rules down: a style guide, a runbook, a CLAUDE.md, a page in a wiki. Then the agent ignores them, because prose is not a contract. It is a suggestion the model is free to interpret, and it will. We hit this wall building NOMARK's own harness. The fix wasn't a better prompt; it was to stop writing the constitution as prose and start writing it as data a runtime can enforce.
Start with the failure, because it is specific. You write “read the file before you edit it” into a rules document. The agent reads the rule, agrees with it, and edits a file it never opened: not out of defiance, but because under context pressure the rule was one sentence among thousands, and the model summarised it away. You write “never claim a task is done without running the test.” Same outcome. Prose rules degrade exactly when you need them most: late in a long session, when the window is full and the model is reconstructing intent from a compression of everything it has seen.
The problem is not that the rules are badly written. The problem is that nothing checks them. A prose rule is read by the same system it is meant to constrain. That is not a control; it is a hope.
A constitution something else can read
The fix is to move the rule out of the model's reading and into a runtime's. A constraint a separate process enforces does not depend on the model remembering it. We split NOMARK's own constitution along exactly that line. The principles humans ratify and amend stayed as prose: the Charter, the immutable rules about fabrication and false completion. Humans read those; humans change those. Everything operational became structured data, with something other than the model doing the checking.
The clearest example is the lifecycle manifest. Work moves through five stages (discover, plan, build, verify, ship), and each stage carries entry criteria, exit criteria, and a trust gate as data, not description. The ship stage has a trust gate of 0.8. A model cannot talk its way past 0.8 the way it can reinterpret “be careful before deploying.” The number is compared. Either the score clears the gate or the stage does not open.
Trust itself is a number for the same reason. “Operate reliably” is prose. A trust score that starts at 1.0, drops by a fixed amount on each class of breach, and maps to an autonomy level is data. Below 0.5 the agent loses autonomous dispatch: not because it judged itself untrustworthy, but because a runtime read the score and closed the capability.
The resource graph makes the point sharpest. Agents hallucinate infrastructure (a database name, an endpoint, an environment variable) and write it down with complete confidence. “Don't guess resource names” as a prose rule is ignored under exactly the pressure that produces the guess. As data, it holds without the model having to carry it: a graph of verified resources, checked by a hook before any write that touches infrastructure. The runtime remembers so the model doesn't have to.
Prose tells the model what good looks like. Data makes good the only path the runtime allows.
Why not just write better markdown
Because the two formats are doing different jobs, and asking one to do both is the original mistake. Markdown is for the humans who ratify and amend the constitution; it has to be read, argued over, signed. JSON and YAML are for the runtime that enforces it; they have to be parsed and checked. We kept the Charter in prose precisely because its audience is human. We moved the manifests, the trust score, the resource graph, and the signal ledger into structured data because their audience is a hook that runs before a tool call.
None of this removes the prose. You still need a document that states, in plain language, what the system values, or nobody can amend it well. But the document is not the control. The control is the part a runtime can read, and a runtime cannot read intent. It reads data. So the rules that actually have to hold became data, and the gap between what we wrote down and what the agent did closed to the width of what the runtime checks.