Field Report — Phase 2 Closed
The AI plane is up. Six routes, one contract, every model signed. Here's what made it through, what we shrank to fit the lab, and what's still on the Phase 3 list.
Two and a half weeks after Phase 1 closed, the AI plane is live. Six endpoints, one OpenAPI contract, every model running on the same hardened cluster the foundation work delivered. This is the entry where Writ stops being a platform you read about and starts being one a developer can call.
It is also still pre-release. That part hasn’t changed.
The short version
Phase 2 closed today.
The platform now serves six AI capabilities behind a single API: chat, embeddings, retrieval-augmented search, vision detection, tabular prediction, and agent execution. All six pass contract tests at the close commit. Every model that answers a request was signed before it was loaded, and every request that hits the gateway is recorded into the same hash-chained audit log that Phase 1 stood up.
A developer can now point a Python or TypeScript SDK at one URL and use any of the six capabilities without learning six different service shapes.
What that actually means, in plain English
Phase 1 was the foundation — identity, supply chain, encryption, audit, the boring infrastructure that has to exist before anything AI-shaped works. Phase 2 was the part most people recognize as “the product.” It went on top of that foundation without compromising any of it.
- Six capabilities, one contract. Chat, embeddings, RAG, vision, predict, agents. One OpenAPI document describes all of them. The Python and TypeScript SDKs are generated from that document — not written by hand and kept in sync with prayers.
- Open-source models with known origins. Llama 3.1 8B Instruct and Granite 20B Code Instruct for chat. MiniLM for embeddings. YOLOv11-X for vision detection. XGBoost for tabular prediction. Each model has a model card, a signature, and a documented provenance chain from training run to running pod.
- Retrieval that fans out and re-ranks. The RAG path embeds the query, runs it against both a vector index and a full-text index, fuses the two with reciprocal rank fusion, and re-ranks the top hits before answering. It is not a single-call shortcut — it is the path a real mission application would actually use.
- Agents are scripted, not autonomous. The agent runtime is LangGraph driving a deterministic graph against two demo tools — a private metasearch and a read-only Postgres query. That is on purpose. Phase 2 proves the wiring; Phase 3 is when tools get sandboxed, policy-gated, and trusted with anything sensitive.
- The audit chain absorbs every new route for free. Audit emission lives in the gateway middleware, not per-route. Adding vision and predict at the close cost zero audit-chain code. Every request, every prompt, every model call still lands in the signed log.
- The contract test suite is the gate. A route doesn’t ship until the contract suite agrees its shape matches the published OpenAPI. The drift check is a CI step, not a quarterly cleanup.
What we shrank to get here
Two pieces of Phase 2 are smaller than the original plan named.
The chat model. The plan called for Llama 3.1 70B Instruct as the default chat backend. The lab GPU posture — a single L40S/H100-class node already carrying Triton and the agent runtime — would not give a 70B model the cold-start time and memory headroom the rest of the platform needs. We swapped to Llama 3.1 8B Instruct plus Granite 20B Code Instruct, deleted a 1B placeholder we’d been carrying, and filed Phase 3 quantization work — AWQ on L40S, FP8 on H100 — as the bridge back to the 70B target. This decision should have been made up-front in an architecture review, not discovered under sprint pressure. Lesson taken.
The mission frontend. The plan named a dedicated Next.js + TypeScript-SDK reference application. Phase 2 ships Open WebUI deployed under OIDC against the gateway instead. That covers the “a human can talk to the cluster” demo bullet without committing to a frontend framework before the developer portal arc gets designed. The dedicated reference frontend is now Phase 3 portal scope. Same outcome on paper; honest delivery on the actual deliverable.
Neither swap was a quiet one. Both are written down in the Phase 2 retrospective with a Jira trail.
Where it hurt
Three honest scars from the phase:
- A ticket got closed before the work shipped. The KServe-plus-Triton serving epic was marked Done before the vision and predict routes were actually wired through to it. The gateway’s backend interface was missing the methods those routes needed; calls returned a 501 with a header pointing back at the closed ticket. We caught it at the parent epic close, shipped the missing abstraction and handlers in a single commit, and called the silent drift out in the close-out comment so the audit trail reflects what happened. Phase 3 will require the close-out comment and the ticket transition in the same commit — no two-step closes.
- No fresh-laptop SDK survey. Phase 2’s exit criteria included a 90% quickstart-success-rate target on a fresh workstation. We never ran the survey. The SDKs install, the quickstarts pass the OpenAPI drift gate, the docs build clean — but “a developer who has never seen this platform can build a working app in fifteen minutes” is unverified, and unverified means we don’t get to claim it. Phase 3 portal arc carries the survey.
- Model-size architecture decision happened too late. The 70B-to-8B swap should have been an architecture review record at the start of the phase, not a discovery during a sprint. The fix is procedural: any model that gates an exit metric gets an ADR before its sprint opens.
Writing these down is the same discipline as last time. A clean story would be a fiction. The accreditors will ask, and “we don’t have the receipt” is worse than “yes, here’s what we found and here’s what we did.”
What’s next
Phase 3 is governance — the controls that decide whether Writ is demo-grade or accreditable. The deliverables that matter most:
- A CUI classifier in the request path. A small ONNX classifier inspects every input and output, with three modes per tenant: block, redact, or tag-only. Decisions land in the audit chain. Today we have the placeholder; Phase 3 wires it to the gateway as a pre-route and post-route call.
- OPA/ABAC enforcing tenant, role, and classification policies on every request. Policy bundles published over OCI, signed, pulled by the gateway, deny-by-default, with structured decisions in the audit log. Conftest unit tests for every rule.
- MCP tools running inside gVisor sandboxes. Five built-in tools — private search, sandboxed Python, tenant-scoped object-store reads, allowlisted HTTP fetch, read-only Postgres against a tenant schema. Each tool gets a network policy and an egress proxy with a tool-scoped allowlist. The agent runtime invokes tools through a catalog that respects OPA decisions per call.
- Model cards generated and enforced at deploy time. A Kyverno admission policy refuses to load a model into the cluster without a card. Cards are emitted from the MLflow run record plus the dataset card plus the eval results, in Markdown and JSON, OSCAL-linkable.
- Audit chain hardened to ML-DSA signatures. The Phase 1/2 chain works; Phase 3 brings it up to the post-quantum signature standard, adds a verifier CLI that walks the chain end-to-end, and exposes a signed
/v1/audit/verifyendpoint as cited evidence in the SSP. - Provenance attestations end-to-end. in-toto layouts that bind training data → training run → registered model → image → deployed inference service, signed at each step. Today the model image and weights are signed; Phase 3 closes the loop back to the dataset.
The Phase 3 exit is the moment a C3PAO assessor can ask “show me, on this running cluster, how a CUI input gets classified, how the policy gate decides, what the agent did with the answer, and prove no one tampered with the trail” — and the answer is six terminal commands, not a meeting.
What is honestly not happening yet
- Still no customer deployments. Zero. Nothing has changed on this since the last entry.
- No SCIF briefings, no program-office demos, no sales team, no early-access list. Field presence remains a blog post and a tunnel.
- No inline LLM safety filtering. Garak runs in the eval pipeline, Presidio scrubs PII at rest, but neither is in the request path yet. That is a real production-safety gap and Phase 3 closes it as part of the CUI classifier and OPA work.
- No dedicated developer portal. Open WebUI under OIDC is the only first-party interface today. The portal — including the documentation site, SDK quickstarts, and the reference Next.js app — is Phase 3 scope.
- No relational policy across the RAG corpus. OPA gates requests today. Per-document, per-tenant ACLs that compose across a fan-out — the shape that a proper multi-tenant retrieval layer needs — is on the Phase 3 federation backlog, not landed.
- No air-gap delivery bundle. Tracked, scoped, not built.
These are not surprises and they are not embarrassments. They are the difference between a six-route AI plane that works on a single hardened cluster and a system an authorizing official will sign for. We’re between those two states, on purpose, in sequence.
What to watch for
The next entry lands when the policy gate is enforcing on every request and the first MCP tool runs inside a gVisor sandbox under an OPA decision. That is the moment Writ stops being a high-quality demo and starts being something an accreditor can actually evaluate. If Phase 3 slips, that entry runs anyway. Pre-release means honest. It still does not mean quiet.