Platform Architecture Reference
The end-to-end technical architecture of the Writ platform — control plane, data plane, runtime, supply chain, and deployment profiles.
Platform engineers · Security architects · Cloud operators
A full technical reference covering every tier of the Writ platform: the OpenAPI control plane, the inference runtime (vLLM, Triton, KServe), the data plane (PostgreSQL + pgvector, OpenSearch, Valkey, MinIO, NATS), the supply-chain lane (Tekton, Chains, Sigstore, Harbor), and the three deployment profiles (dev, edge, cluster). Intended for the reader who needs to evaluate, install, or operate the platform.
This paper is the canonical architecture reference for the Writ platform. It documents each tier, names the upstream components, enumerates the substitutions we made to avoid license-trapped components, and describes how the three deployment profiles map onto real hardware.
▸ View as list
Applications you build. Your mobile app, your web app, your Python or TypeScript service. They import the Writ toolkit and call the platform over the network.
Claude Desktop, Cursor, Windsurf, and other tools that speak the Model Context Protocol. A developer runs one command and their editor can use every Writ capability.
The front door. Terminates encrypted traffic, enforces rate limits, and passes validated requests to the control plane. Speaks hybrid future-proof encryption (X25519 + ML-KEM-768).
The single OpenAPI 3.1 surface at /v1/*. Every capability — chat, prediction, search, speech, agents — lives here. One auth model, one error model, one audit schema, one rate-limit policy.
The tool plane. Exposes every Writ capability as a Model Context Protocol tool. One endpoint; namespaced tools; OAuth 2.1 scopes; everything audited.
OIDC identity broker. Handles CAC / PIV / enterprise SSO, federation to external IdPs, and token issuance.
Workload identity. Every service in the cluster gets a SPIFFE identity issued by SPIRE; internal mTLS is verified against it.
Secrets and key management. HSM-backed. Replaces HashiCorp Vault under a permissive license.
Policy decisions. For every request, OPA evaluates tenant, classification, purpose, and release markings against the rules your admins wrote.
Generative model server. Runs large language models with efficient batching, streaming, and multi-tenant isolation.
Inference server for predictive, classical-ML, and computer-vision models. Scales to zero when idle; spins up when work arrives.
High-throughput inference for vision and multimodal models, with TensorRT-LLM backends for NVIDIA GPUs.
Agent runtime. Implements plan / act / reflect loops, tool calling, and multi-step workflows with persistent checkpoints.
The main database. Stores conversations, agent checkpoints, user records, and vector embeddings for AI search (via the pgvector extension).
Full-text and hybrid search. Documents with classification tags; policy-enforced retrieval.
Object storage. Model weights, training data, evidence bundles. Works offline, synchronizes when a link opens.
Messaging and event streaming. Feeds audit records, model-request queues, and cross-service events.
Signing and transparency. Every container image, every release, every audit bundle is signed with post-quantum-safe keys and logged to a public record.
Metrics, logs, and traces — the operations view of the platform. Dashboards your team already knows how to read. Every service reports; nothing is black-boxed.
1. Control plane
The control plane is a single OpenAPI 3.1 surface exposed at /v1/*. In Phase 2 (AI MVP) it is a Python / FastAPI implementation called the Unified Control Plane API; migration to a Go / Rust v1 is pinned to Phase 5 (Scale) per ADR-0010.
1.1 Unified endpoints
Every paradigm is addressable by a predictable endpoint naming convention — /v1/{paradigm}/{verb}:
/v1/chat/completions— generative, OpenAI-shape compatible/v1/predict— tabular / predictive/v1/vision/detect— computer vision (detection, segmentation, classification)/v1/rag/query·/v1/rag/graph— retrieval-augmented generation, including GraphRAG/v1/speech/asr·/v1/speech/tts— automatic speech recognition and text-to-speech/v1/multimodal/infer— vision-language inference/v1/timeseries/forecast·/v1/timeseries/anomaly— forecasting and anomaly detection/v1/rank— recommender and learn-to-rank/v1/rl/act— reinforcement learning and RLHF-style preference feedback/v1/sim/run— simulation and digital-twin/v1/agents/run— agentic workflows via LangGraph
Every call uses one auth model (OIDC via Keycloak), one audit schema (NIST AU-12 compatible), one rate-limit policy, and one error model (RFC 7807).
1.2 MCP Director
A single Model Context Protocol endpoint fronts every Writ API as namespaced tools — writ.identity.*, writ.data.*, writ.embed.*, writ.fhe.*, writ.llm.*, writ.agents.*, writ.k8s.*, writ.obs.*, writ.compliance.*, writ.federation.*, writ.deploy.*. OAuth 2.1 scopes map 1:1 onto namespaces. Every tool invocation is OPA-gated, mTLS-wrapped with hybrid X25519+ML-KEM-768, and audited.
2. Identity and policy
- Keycloak — OIDC broker. CAC / PIV certificate authentication. Device-flow for headless environments. External IdP trust via federation endpoints.
- OpenBao (MPL 2.0) — secrets and key management, HSM-backed. Replaces HashiCorp Vault (BSL).
- Kyverno + OPA — admission-chain ClusterPolicies and data-path policy. Policy-as-code.
- SPIFFE / SPIRE — workload identity issuance per enclave. Each workload gets a SPIFFE ID scoped to its trust domain.
3. Inference runtime
- vLLM + Triton — LLM serving. Paged attention, continuous batching, tensor parallelism, speculative decoding. TensorRT-LLM backend for NVIDIA GPUs.
- KServe — InferenceService abstraction for predictive, computer vision, and classical ML workloads. Scales to zero when idle.
- Ray — distributed compute for batch, training, and tuning.
- LangGraph — agent runtime. Plan / act / reflect primitives. Checkpointing to PostgreSQL.
The runtime is hardware-abstracted: x86, ARM, and Apple Silicon are all first-class for the dev and edge profiles. GPU classes supported include NVIDIA A100 / H100 / L40S, AMD MI250 / MI300, and Apple Silicon via MLX for development.
4. Data plane
Every data-plane component is OSI-approved OSS. The substitutions below avoid SSPL, BSL, ELv2, and RSALv2 licensing traps:
| Original (trapped) | License | Writ uses | Governance |
|---|---|---|---|
| MongoDB | SSPL | PostgreSQL + pgvector | PostgreSQL License |
| Elasticsearch | ELv2 / SSPL | OpenSearch | Apache 2.0 |
| Redis 7.4+ | RSALv2 | Valkey | BSD-3 · Linux Foundation |
| CockroachDB | BSL | PostgreSQL + Patroni | Apache 2.0 |
| HashiCorp Vault | BSL | OpenBao | MPL 2.0 · Linux Foundation |
| HashiCorp Terraform | BSL | OpenTofu | MPL 2.0 · Linux Foundation |
| Kafka (heavy) | Apache 2.0 | NATS JetStream | Apache 2.0 · CNCF |
Object storage is MinIO. Analytics and cross-enclave federation query is Trino with Iceberg catalogs.
5. Supply chain
The supply-chain lane is SLSA Level 3 compliant end-to-end:
- Tekton Pipelines + Triggers + Chains — pipeline engine, Git-triggered, SLSA provenance emitted per PipelineRun.
- Sigstore — Rekor transparency log, Fulcio certificate authority, cosign for signing. We run cosign v3 with ML-DSA-65 keys.
- Syft + Grype + VEX — SBOM generation (CycloneDX 1.6), vulnerability scanning, vulnerability exceptions documented and cosign-attested.
- Harbor + Iron Bank — container registry with an Iron Bank mirror. All base images are STIG-hardened Iron Bank images.
- Argo Workflows — long-running build jobs and AI training pipelines where Tekton’s ephemeral model is wrong.
Every artifact ships with:
- A cosign signature using ML-DSA-65 (post-quantum)
- An SLSA v1.0 L3 provenance attestation logged to Rekor
- A CycloneDX 1.6 SBOM
- A CBOM (Crypto Bill of Materials, CycloneDX 1.6 crypto profile)
- An OSCAL evidence snapshot
6. Service mesh and network
- Cilium — eBPF data path, L7 network policy, ClusterMesh for cross-enclave federation.
- Istio 1.24 (ambient mode) — mTLS mesh. Each pod gets a SPIFFE-issued identity. mTLS peer verification is mandatory; defaults deny.
- Envoy + oqs-provider — north-south ingress with hybrid X25519+ML-KEM-768 TLS, ML-DSA-65 signing on ingress certificates.
Network policy is default-deny. Every allowed flow is declared in GitOps-managed NetworkPolicy and CiliumNetworkPolicy resources, reviewed in the same PR as the application that needs them.
7. Observability
- Prometheus + Grafana — metrics, SLOs, burn-rate alerts.
- Loki + Vector — log aggregation.
- Jaeger + Tempo — distributed tracing.
- OpenSearch Security — SIEM-compatible audit store.
All telemetry is tagged with the SPIFFE identity of the emitting workload, the tenant, and the classification level.
8. Runtime security
- Falco — syscall-level runtime detection.
- NeuVector — runtime workload firewall, CIS benchmark automation, L7 DLP.
- Kyverno (runtime policy) — continuous admission checks for drift.
9. Backup and recovery
- Velero — cluster-state backup.
- CNPG (CloudNativePG) — PostgreSQL backup schedules with WAL archival to MinIO.
10. Deployment profiles
Three profiles, one Helm chart (charts/writ). The profile selects which subsystems deploy and at what replica counts.
10.1 dev
Single-binary. Works offline with synthetic responses. k3d or kind. CPU inference. 2 vCPU / 8 GB RAM minimum. Day-one smoke-test, no GPUs required. Seed data and reference apps included.
10.2 edge
Single-node. K3s + Big Bang Lite. 8–16 vCPU. 1–2 GPUs optional (an L4 or L40 is plenty). x86, ARM, or Apple Silicon. Air-gap ready. Full /v1/* gateway surface, smaller model fleet, smaller retention.
10.3 cluster
Full RKE2 with FIPS mode + Big Bang. 24+ cores per node. 4–16 GPUs. Iron Bank images only. IL4 / IL5 target. Multi-tenant. Cross-enclave federation via MCP peering.
10.4 Profile progression
A typical customer installs dev in an afternoon, advances to edge in the first week, and spends 6–12 weeks bringing up cluster with their accreditor in parallel with mission-frontend development.
11. What ships per release
Every release produces, in addition to the container images and Helm charts:
- Signed offline bundle (OCI image tar, cosign ML-DSA attested)
- Full OSCAL evidence pack (SSP, component definitions, continuous control snapshots)
- CycloneDX 1.6 SBOM and CBOM
- SLSA v1.0 L3 provenance attestation
- Release notes and upgrade runbook
- Iron Bank mirror update
12. References
- ADR-0010 — Control plane v1 migration to Go / Rust pinned to Phase 5 Scale
- SSDF narrative —
/docs/02-components/supply-chain-narrative.md(in customer bundle) - Federation architecture — see Federation Architecture white paper (WP-004)
- PQC posture — see Post-Quantum Cryptography white paper (WP-002)
- Accreditation pathway — see Accreditation Dossier white paper (WP-003)