On This Page
Diagram 01

System Topology — Zero-Egress Architecture

The complete system showing the Customer VPC boundary, all components running inside it, what crosses the boundary, and the SaaS Control Plane. The critical invariant: zero raw data exits the VPC — only CAS scores, violation flags, and trace topology egress over the network.

CAS Framework — Zero-Egress System Topology ARCHITECTURE OVERVIEW
CUSTOMER SECURE VPC — Zero-Egress Zone CAS SaaS Control Plane AI Agent ADK / LangGraph / CrewAI or any OTel agent K8s Mutating Webhook Auto-injects sidecar on 1 label 1 label OTel Collector + inline PII strip + span buffer OTel spans Eval Engine FastAPI + DSPy + Redis state no PII LLM Cascade Router Llama3-8B 90% Gemini Pro 10% route DSPy Signatures 4 default ADK sigs + custom per-agent evaluate Hatchet Worker + Presidio NLP Offline redaction batch Dynamic Signature Sync Engine — gRPC / SSE Receives compiled policy updates from SaaS in <5ms · Zero pod restarts ZERO-EGRESS BOUNDARY Only scores cross this line Zero raw data. Ever. math only: score + flags Go / Rust Ingestion API High-throughput score ingest ClickHouse Timeseries analytics Next.js Dashboard DAG viewer + leadership KPIs LLM Meta-Compiler NL rule → DSPy sig JSON DSL Policy Store + ADR Archive Versioned signatures + ADRs compiled sig <5ms push Agentic eval Compliance / infra Eval pipeline Zero-Egress boundary SaaS control plane
🔒
VPC Boundary is Inviolable
Every evaluation — DSPy scoring, Presidio NLP, LLM Cascade — runs inside your VPC. The boundary block above shows exactly what crosses it: CAS scores, violation flags, and trace topology only.
Sidecar is Non-Blocking
The K8s Mutating Webhook injects the OTel Collector and Eval sidecar into your Pod automatically. If the sidecar crashes, your agent continues running. Evaluation is never in the critical path.
🔄
Policy Updates Flow Inward Only
The LLM Meta-Compiler in the SaaS compiles new policy signatures and pushes them to the Dynamic Sync Engine via gRPC/SSE. Data never flows outward raw.
Diagram 02

OTel Span → DAG → CAS Score
Full Evaluation Pipeline

The complete sequence from your agent emitting an OTel span through PII stripping, DAG reconstruction, LLM Cascade routing, Presidio offline redaction, and final CAS score computation — showing every component handoff and what data travels between them.

Evaluation Pipeline — Sequence Diagram SEQUENCE
sequenceDiagram participant A as AI Agent participant OC as OTel Collector + PII Strip participant EE as Eval Engine (FastAPI + DSPy) participant LC as LLM Cascade Router participant LM as Local LLM (Llama3-8B) participant FM as Frontier LLM (Gemini Pro) participant PR as Presidio Offline NLP participant CH as ClickHouse A->>OC: OTel spans (raw trace, may contain PII) Note over OC: Inline strip: remove tokens,
cleartext PII, API keys OC->>EE: Batched spans (no PII present) EE->>EE: Reconstruct execution DAG
from span parent/child IDs EE->>LC: Route eval request
context + selected DSPy signature LC->>LM: 90% of evals (low-medium complexity) LM-->>LC: DSPy verdict + reasoning chain LC->>FM: 10% of evals (high complexity / edge cases) FM-->>LC: DSPy verdict + reasoning chain LC-->>EE: Merged evaluation results EE->>PR: Async: deep NLP redaction pass PR-->>EE: Entity-replaced payload [PERSON_NAME] [US_SSN] EE->>EE: Compute CAS score per node
Compliance + Policy + Patterns dimensions EE->>CH: Egress-safe payload: scores, flags, topology Note over CH: Sub-ms aggregation
DAG + per-node CAS scores stored Note over A,CH: Zero raw data exits the VPC at any point in this flow.
Diagram 03

OTel Trace → DAG Reconstruction
→ Per-Node CAS Scoring

How CAS Framework converts a flat list of OTel spans into a visualised execution graph with CAS scores on every node. The parent-child span relationships already encode the full execution topology — we reconstruct it, assign evaluation dimensions per node, select the right DSPy signature, and score each node independently.

DAG Reconstruction — Flow Diagram FLOW
flowchart LR subgraph INPUT["OTel Trace Input — flat span list"] S1["Span: SupervisorAgent\nparent: root · 12ms"] S2["Span: MCPTool.github\nparent: supervisor · 87ms"] S3["Span: PII_Scanner\nparent: supervisor · 6ms"] S4["Span: DataAnalysisAgent\nparent: supervisor · 340ms"] S5["Span: Slack_Notification\nparent: data_analysis · 22ms"] end subgraph RECON["DAG Reconstruction"] direction TB R1["Build adjacency graph\nfrom parent_id chain"] R2["Assign eval dimension\nper span.attributes"] R3["Select DSPy signature\nper node type"] R1 --> R2 --> R3 end subgraph EVAL["Per-Node Evaluation (parallel)"] direction TB E1["SupervisorAgent\nWorkflowAdherence\n0.96 PASS"] E2["MCPTool.github\nToolCallbackSafety\n0.91 PASS"] E3["PII_Scanner\nPII_Density default\n0.99 PASS"] E4["DataAnalysisAgent\nCodeExecSandbox\n0.74 WARN"] E5["Slack_Notification\nPolicyAdherence\n0.51 BLOCKED"] end subgraph OUTPUT["CAS Output"] direction TB O1["Global CAS: 0.82\nDAG rendered with\nper-node scores"] O2["1 ADR auto-generated\nfor blocked node"] O3["Zero-egress payload\nscores + topology only"] end S1 & S2 & S3 & S4 & S5 --> RECON RECON --> EVAL E1 & E2 & E3 & E4 & E5 --> OUTPUT
Diagram 04

Dynamic Signature Sync Engine

How a natural language policy rule written in the SaaS console gets compiled into a validated DSPy Signature, backtested in shadow mode against 30 days of history, and propagated globally to every running Sidecar in under 5 milliseconds — without restarting a single Kubernetes pod.

Dynamic Signature Sync — Policy Lifecycle FLOW
flowchart TD A([Policy Author writes natural language rule]) --> B[/CAS Policy Commander — SaaS Console/] B --> C{Shadow Mode or Enforce?} C -->|Shadow first| D[Backtest on 30-day trace history] D --> E[Review: what would have been caught?] E --> F{Approve for enforcement?} F -->|No — refine| B F -->|Yes| G C -->|Enforce directly| G G[LLM Meta-Compiler translates to DSPy Signature JSON DSL] G --> H[Validate signature against OTel span schema] H --> I{Valid?} I -->|No| J[Return error to Policy Author] I -->|Yes| K K[gRPC / SSE Sync Engine pushes compiled sig to all Sidecars] K --> L[Sidecar 1 — ADK Agent] K --> M[Sidecar 2 — LangGraph Agent] K --> N[Sidecar N — all others] L --> O([Policy live globally in under 5ms. Zero pod restarts.]) M --> O N --> O O --> P[First enforcement hit auto-generates compliance ADR]
Reference 05

Component Reference

Every technology in the CAS Framework stack, why it was chosen, and what it owns in the system.

ComponentTechnologyOwnsWhy This Choice
Sidecar InjectionKubernetes Mutating WebhookAuto-injects OTel Collector + Eval sidecar into labelled Pods. Zero developer action beyond one label.The only K8s-native way to achieve zero-code-change instrumentation across an entire cluster without SDK proliferation.
Span CollectionOTel Collector (CNCF)Receives spans from agent process via gRPC, strips PII inline using transform processors, buffers and batches before eval engine.CNCF standard — works with every major agent framework natively. No vendor lock-in. Transform processors enable inline PII removal before data moves.
Evaluation EngineFastAPI + DSPyReconstructs DAG from span parent/child IDs. Routes spans to the correct DSPy signature per node type. Computes per-node and global CAS scores.FastAPI for high-throughput async span processing. DSPy for structured LLM evaluation — signatures are strongly typed, reproducible, and version-controlled.
LLM Cascade RouterCustom Python + DSPyRoutes each evaluation to local (Llama3-8B, 90%) or frontier (Gemini Pro, 10%) model based on complexity scoring. Achieves 90% cost reduction.Frontier models are expensive at agent fleet scale. Most evaluations are structurally straightforward — local 8B models handle them accurately at a fraction of the cost.
NLP RedactionMicrosoft PresidioOffline async NLP pass replacing entities (names, SSNs, card numbers) with typed placeholders. Runs as Hatchet background worker.Presidio is purpose-built for PII recognition and de-identification. Running offline (Hatchet worker) means it never blocks the evaluation path — latency stays low.
Job QueueHatchetManages async Presidio redaction jobs, retry logic, and priority queuing for heavy NLP workloads.Hatchet provides durable, observable background job execution with built-in retry and dead-letter queues — essential for heavy NLP workloads at scale.
Analytics StorageClickHouseStores egress-safe payloads (scores, flags, topology). Serves DAG queries, per-agent CAS aggregations, 30-day backtesting, and 90-day projections.ClickHouse achieves sub-millisecond aggregation on time-series data at hundreds of millions of spans/day. PostgreSQL cannot serve DAG queries at agent fleet scale without degrading.
Policy SyncgRPC / SSE persistent connectionsMaintains long-lived connections from SaaS to all Sidecars. Pushes compiled DSPy signatures globally in under 5ms without pod restarts.Long-lived gRPC streams eliminate polling overhead and achieve near-instant propagation. SSE fallback for environments where gRPC is restricted.
Meta-CompilerInternal LLM (hosted in SaaS)Translates natural language policy rules into validated DSPy Signature JSON DSL. Validates output against OTel span schema before propagation.Natural language input dramatically reduces the policy authoring barrier — compliance teams can write rules without DSPy knowledge. Schema validation ensures no malformed signatures reach Sidecars.
DashboardNext.jsDAG visualiser with per-node CAS scores, fleet health KPIs, agent risk posture, 30-day trend charts, leadership projections, ADR archive.Server-rendered for fast initial load. React component model enables the interactive DAG canvas and real-time score updates via WebSocket.
Ingest APIGo / RustHigh-throughput ingestion of zero-egress payloads from Sidecars. Handles burst traffic from large agent fleets without dropping spans.Go/Rust for the ingestion hot path because Python cannot sustain the throughput required for large enterprise agent fleets at sub-millisecond latency targets.
Reference 06

Architecture Decision Log

The key trade-offs made during design — what alternatives were considered and why we chose what we did. These are the decisions that define the system's character.

DECISION 01
Evaluate locally inside the VPC, not in SaaS
All DSPy signature evaluation runs inside the customer's VPC using their own LLM keys. Only CAS scores, violation flags, and topology egress to the SaaS. This is the architectural invariant that makes enterprise procurement reviews tractable — there is nothing sensitive in the data flow to review.
Alternative: Cloud-hosted eval (rejected — CISO veto in regulated industries; creates data residency violations in EU/APAC)
DECISION 02
K8s Mutating Webhook for sidecar injection over SDK distribution
Platform teams install one Helm chart. AI engineers add one Kubernetes label. The Webhook handles everything else. No SDK version pinned in any agent codebase, no release coordination, no blast radius if evaluation infra has issues.
Alternative: Proprietary SDK (rejected — creates permanent dependency in customer codebases, release coupling, upgrade toil)
DECISION 03
DSPy for structured evaluation over regex rule engines
DSPy Signatures produce structured, typed evaluation verdicts with explicit reasoning chains. This means evaluation output is auditable, reproducible, and versioned — exactly what compliance requires. Regex rule engines cannot reason about intent, context, or complex agentic patterns.
Alternative: Regex rule engine (rejected — no semantic understanding, brittle to prompt variation, cannot evaluate agentic patterns)
DECISION 04
ClickHouse over PostgreSQL for analytics storage
Enterprise agent fleets emit hundreds of millions of spans per day. ClickHouse achieves sub-millisecond aggregation on this volume and enables real-time DAG queries, 30-day backtesting, and 90-day projections without degrading. PostgreSQL cannot serve these query patterns at agent fleet scale.
Alternative: PostgreSQL (rejected — aggregation latency unacceptable above ~50M spans/day; DAG topology queries require columnar storage)
DECISION 05
gRPC/SSE persistent connections for policy propagation over polling
Policy updates must reach every Sidecar globally in under 5 milliseconds without restarting pods. Long-lived gRPC streams eliminate polling overhead and achieve near-instant push. SSE provides a fallback for environments where gRPC is restricted by network policy.
Alternative: Polling / webhook (rejected — polling latency unacceptable for security-critical policy updates; webhook delivery not guaranteed under load)
DECISION 06
OTel as the universal integration standard
By anchoring the entire integration surface on OpenTelemetry — a CNCF standard with native support in Google ADK, LangGraph, CrewAI, and every major agent framework — CAS Framework achieves true framework agnosticism. Two lines of openlit.init() is the complete integration burden. No proprietary protocol, no framework-specific adapters, no lock-in.
Alternative: Custom span protocol (rejected — would require framework-specific adapters, defeats zero-code-change goal)
Next Steps

Ready to Deploy?

One Helm chart. One Kubernetes label. Your agent fleet DAG with per-node CAS scores appears automatically.

Request Early Access → ← Back to Product