Architecture — CAS Framework

Diagram 01

System Topology — Zero-Egress Architecture

The complete system showing the Customer VPC boundary, all components running inside it, what crosses the boundary, and the SaaS Control Plane. The critical invariant: zero raw data exits the VPC — only CAS scores, violation flags, and trace topology egress over the network.

CAS Framework — Zero-Egress System Topology ARCHITECTURE OVERVIEW

🔒

VPC Boundary is Inviolable

Every evaluation — DSPy scoring, Presidio NLP, LLM Cascade — runs inside your VPC. The boundary block above shows exactly what crosses it: CAS scores, violation flags, and trace topology only.

⚡

Sidecar is Non-Blocking

The K8s Mutating Webhook injects the OTel Collector and Eval sidecar into your Pod automatically. If the sidecar crashes, your agent continues running. Evaluation is never in the critical path.

🔄

Policy Updates Flow Inward Only

The LLM Meta-Compiler in the SaaS compiles new policy signatures and pushes them to the Dynamic Sync Engine via gRPC/SSE. Data never flows outward raw.

Diagram 02

OTel Span → DAG → CAS Score
Full Evaluation Pipeline

The complete sequence from your agent emitting an OTel span through PII stripping, DAG reconstruction, LLM Cascade routing, Presidio offline redaction, and final CAS score computation — showing every component handoff and what data travels between them.

Evaluation Pipeline — Sequence Diagram SEQUENCE

sequenceDiagram participant A as AI Agent participant OC as OTel Collector + PII Strip participant EE as Eval Engine (FastAPI + DSPy) participant LC as LLM Cascade Router participant LM as Local LLM (Llama3-8B) participant FM as Frontier LLM (Gemini Pro) participant PR as Presidio Offline NLP participant CH as ClickHouse A->>OC: OTel spans (raw trace, may contain PII) Note over OC: Inline strip: remove tokens,
cleartext PII, API keys OC->>EE: Batched spans (no PII present) EE->>EE: Reconstruct execution DAG
from span parent/child IDs EE->>LC: Route eval request
context + selected DSPy signature LC->>LM: 90% of evals (low-medium complexity) LM-->>LC: DSPy verdict + reasoning chain LC->>FM: 10% of evals (high complexity / edge cases) FM-->>LC: DSPy verdict + reasoning chain LC-->>EE: Merged evaluation results EE->>PR: Async: deep NLP redaction pass PR-->>EE: Entity-replaced payload [PERSON_NAME] [US_SSN] EE->>EE: Compute CAS score per node
Compliance + Policy + Patterns dimensions EE->>CH: Egress-safe payload: scores, flags, topology Note over CH: Sub-ms aggregation
DAG + per-node CAS scores stored Note over A,CH: Zero raw data exits the VPC at any point in this flow.

Diagram 03

OTel Trace → DAG Reconstruction
→ Per-Node CAS Scoring

How CAS Framework converts a flat list of OTel spans into a visualised execution graph with CAS scores on every node. The parent-child span relationships already encode the full execution topology — we reconstruct it, assign evaluation dimensions per node, select the right DSPy signature, and score each node independently.

DAG Reconstruction — Flow Diagram FLOW

flowchart LR subgraph INPUT["OTel Trace Input — flat span list"] S1["Span: SupervisorAgent\nparent: root · 12ms"] S2["Span: MCPTool.github\nparent: supervisor · 87ms"] S3["Span: PII_Scanner\nparent: supervisor · 6ms"] S4["Span: DataAnalysisAgent\nparent: supervisor · 340ms"] S5["Span: Slack_Notification\nparent: data_analysis · 22ms"] end subgraph RECON["DAG Reconstruction"] direction TB R1["Build adjacency graph\nfrom parent_id chain"] R2["Assign eval dimension\nper span.attributes"] R3["Select DSPy signature\nper node type"] R1 --> R2 --> R3 end subgraph EVAL["Per-Node Evaluation (parallel)"] direction TB E1["SupervisorAgent\nWorkflowAdherence\n0.96 PASS"] E2["MCPTool.github\nToolCallbackSafety\n0.91 PASS"] E3["PII_Scanner\nPII_Density default\n0.99 PASS"] E4["DataAnalysisAgent\nCodeExecSandbox\n0.74 WARN"] E5["Slack_Notification\nPolicyAdherence\n0.51 BLOCKED"] end subgraph OUTPUT["CAS Output"] direction TB O1["Global CAS: 0.82\nDAG rendered with\nper-node scores"] O2["1 ADR auto-generated\nfor blocked node"] O3["Zero-egress payload\nscores + topology only"] end S1 & S2 & S3 & S4 & S5 --> RECON RECON --> EVAL E1 & E2 & E3 & E4 & E5 --> OUTPUT

Diagram 04

Dynamic Signature Sync Engine

How a natural language policy rule written in the SaaS console gets compiled into a validated DSPy Signature, backtested in shadow mode against 30 days of history, and propagated globally to every running Sidecar in under 5 milliseconds — without restarting a single Kubernetes pod.

Dynamic Signature Sync — Policy Lifecycle FLOW

flowchart TD A([Policy Author writes natural language rule]) --> B[/CAS Policy Commander — SaaS Console/] B --> C{Shadow Mode or Enforce?} C -->|Shadow first| D[Backtest on 30-day trace history] D --> E[Review: what would have been caught?] E --> F{Approve for enforcement?} F -->|No — refine| B F -->|Yes| G C -->|Enforce directly| G G[LLM Meta-Compiler translates to DSPy Signature JSON DSL] G --> H[Validate signature against OTel span schema] H --> I{Valid?} I -->|No| J[Return error to Policy Author] I -->|Yes| K K[gRPC / SSE Sync Engine pushes compiled sig to all Sidecars] K --> L[Sidecar 1 — ADK Agent] K --> M[Sidecar 2 — LangGraph Agent] K --> N[Sidecar N — all others] L --> O([Policy live globally in under 5ms. Zero pod restarts.]) M --> O N --> O O --> P[First enforcement hit auto-generates compliance ADR]

Reference 05

Component Reference

Every technology in the CAS Framework stack, why it was chosen, and what it owns in the system.

Component	Technology	Owns	Why This Choice
Sidecar Injection	Kubernetes Mutating Webhook	Auto-injects OTel Collector + Eval sidecar into labelled Pods. Zero developer action beyond one label.	The only K8s-native way to achieve zero-code-change instrumentation across an entire cluster without SDK proliferation.
Span Collection	OTel Collector (CNCF)	Receives spans from agent process via gRPC, strips PII inline using transform processors, buffers and batches before eval engine.	CNCF standard — works with every major agent framework natively. No vendor lock-in. Transform processors enable inline PII removal before data moves.
Evaluation Engine	FastAPI + DSPy	Reconstructs DAG from span parent/child IDs. Routes spans to the correct DSPy signature per node type. Computes per-node and global CAS scores.	FastAPI for high-throughput async span processing. DSPy for structured LLM evaluation — signatures are strongly typed, reproducible, and version-controlled.
LLM Cascade Router	Custom Python + DSPy	Routes each evaluation to local (Llama3-8B, 90%) or frontier (Gemini Pro, 10%) model based on complexity scoring. Achieves 90% cost reduction.	Frontier models are expensive at agent fleet scale. Most evaluations are structurally straightforward — local 8B models handle them accurately at a fraction of the cost.
NLP Redaction	Microsoft Presidio	Offline async NLP pass replacing entities (names, SSNs, card numbers) with typed placeholders. Runs as Hatchet background worker.	Presidio is purpose-built for PII recognition and de-identification. Running offline (Hatchet worker) means it never blocks the evaluation path — latency stays low.
Job Queue	Hatchet	Manages async Presidio redaction jobs, retry logic, and priority queuing for heavy NLP workloads.	Hatchet provides durable, observable background job execution with built-in retry and dead-letter queues — essential for heavy NLP workloads at scale.
Analytics Storage	ClickHouse	Stores egress-safe payloads (scores, flags, topology). Serves DAG queries, per-agent CAS aggregations, 30-day backtesting, and 90-day projections.	ClickHouse achieves sub-millisecond aggregation on time-series data at hundreds of millions of spans/day. PostgreSQL cannot serve DAG queries at agent fleet scale without degrading.
Policy Sync	gRPC / SSE persistent connections	Maintains long-lived connections from SaaS to all Sidecars. Pushes compiled DSPy signatures globally in under 5ms without pod restarts.	Long-lived gRPC streams eliminate polling overhead and achieve near-instant propagation. SSE fallback for environments where gRPC is restricted.
Meta-Compiler	Internal LLM (hosted in SaaS)	Translates natural language policy rules into validated DSPy Signature JSON DSL. Validates output against OTel span schema before propagation.	Natural language input dramatically reduces the policy authoring barrier — compliance teams can write rules without DSPy knowledge. Schema validation ensures no malformed signatures reach Sidecars.
Dashboard	Next.js	DAG visualiser with per-node CAS scores, fleet health KPIs, agent risk posture, 30-day trend charts, leadership projections, ADR archive.	Server-rendered for fast initial load. React component model enables the interactive DAG canvas and real-time score updates via WebSocket.
Ingest API	Go / Rust	High-throughput ingestion of zero-egress payloads from Sidecars. Handles burst traffic from large agent fleets without dropping spans.	Go/Rust for the ingestion hot path because Python cannot sustain the throughput required for large enterprise agent fleets at sub-millisecond latency targets.

Reference 06

Architecture Decision Log

The key trade-offs made during design — what alternatives were considered and why we chose what we did. These are the decisions that define the system's character.

DECISION 01

Evaluate locally inside the VPC, not in SaaS

All DSPy signature evaluation runs inside the customer's VPC using their own LLM keys. Only CAS scores, violation flags, and topology egress to the SaaS. This is the architectural invariant that makes enterprise procurement reviews tractable — there is nothing sensitive in the data flow to review.

Alternative: Cloud-hosted eval (rejected — CISO veto in regulated industries; creates data residency violations in EU/APAC)

DECISION 02

K8s Mutating Webhook for sidecar injection over SDK distribution

Platform teams install one Helm chart. AI engineers add one Kubernetes label. The Webhook handles everything else. No SDK version pinned in any agent codebase, no release coordination, no blast radius if evaluation infra has issues.

Alternative: Proprietary SDK (rejected — creates permanent dependency in customer codebases, release coupling, upgrade toil)

DECISION 03

DSPy for structured evaluation over regex rule engines

DSPy Signatures produce structured, typed evaluation verdicts with explicit reasoning chains. This means evaluation output is auditable, reproducible, and versioned — exactly what compliance requires. Regex rule engines cannot reason about intent, context, or complex agentic patterns.

Alternative: Regex rule engine (rejected — no semantic understanding, brittle to prompt variation, cannot evaluate agentic patterns)

DECISION 04

ClickHouse over PostgreSQL for analytics storage

Enterprise agent fleets emit hundreds of millions of spans per day. ClickHouse achieves sub-millisecond aggregation on this volume and enables real-time DAG queries, 30-day backtesting, and 90-day projections without degrading. PostgreSQL cannot serve these query patterns at agent fleet scale.

Alternative: PostgreSQL (rejected — aggregation latency unacceptable above ~50M spans/day; DAG topology queries require columnar storage)

DECISION 05

gRPC/SSE persistent connections for policy propagation over polling

Policy updates must reach every Sidecar globally in under 5 milliseconds without restarting pods. Long-lived gRPC streams eliminate polling overhead and achieve near-instant push. SSE provides a fallback for environments where gRPC is restricted by network policy.

Alternative: Polling / webhook (rejected — polling latency unacceptable for security-critical policy updates; webhook delivery not guaranteed under load)

DECISION 06

OTel as the universal integration standard

By anchoring the entire integration surface on OpenTelemetry — a CNCF standard with native support in Google ADK, LangGraph, CrewAI, and every major agent framework — CAS Framework achieves true framework agnosticism. Two lines of openlit.init() is the complete integration burden. No proprietary protocol, no framework-specific adapters, no lock-in.

Alternative: Custom span protocol (rejected — would require framework-specific adapters, defeats zero-code-change goal)

How CAS Framework
Actually Works

System Topology — Zero-Egress Architecture

OTel Span → DAG → CAS Score
Full Evaluation Pipeline

OTel Trace → DAG Reconstruction
→ Per-Node CAS Scoring

Dynamic Signature Sync Engine

Component Reference

Architecture Decision Log

Ready to Deploy?

How CAS FrameworkActually Works

System Topology — Zero-Egress Architecture

OTel Span → DAG → CAS ScoreFull Evaluation Pipeline

OTel Trace → DAG Reconstruction→ Per-Node CAS Scoring

Dynamic Signature Sync Engine

Component Reference

Architecture Decision Log

Ready to Deploy?

How CAS Framework
Actually Works

OTel Span → DAG → CAS Score
Full Evaluation Pipeline

OTel Trace → DAG Reconstruction
→ Per-Node CAS Scoring