Skip to table of contents
Overview

Linh Truong  ·  MA (Harvard), MBA  ·  LinhTruong.com  ·  Linh@Alumni.Harvard.edu

Agentic AI System Architecture

I designed this reference architecture to map the full structural anatomy of autonomous, tool-using, multi-agent AI systems — from the user & interaction boundary through perception, orchestration, reasoning, memory, tools, knowledge retrieval, multi-agent collaboration, action, reflection, safety & governance, and infrastructure. Twelve layers, each with its own detailed diagram.

OverviewMaster Architecture Diagram

Agentic AI System Architecture

A reference architecture for autonomous, tool-using, multi-agent AI systems — perception, reasoning, memory, action, reflection, and governance.

Research Paper Diagram  ·  Updated 2026  ·  v1.0

1 · User & Interaction Layer Human users, applications, and channels that issue goals and receive results Human User Goals · Preferences · Feedback Chat / Voice / Multimodal UI Text · Speech · Images · Video IDE / CLI / SDK Claude Code · API · Agent SDK Application Channels Web · Mobile · Email · Slack Autonomous Triggers Cron · Webhooks · Events Other Agents (A2A / MCP-Client) Inter-agent requests & delegations 2 · Perception & Input Processing Normalize, parse, and ground incoming signals into a structured task representation Intent & Goal Extraction Multimodal Encoders (V/A/T) Context Assembly & Grounding Prompt Compilation & Caching Input Guardrails / PII Scrub Session & Identity Context 3 · Orchestration, Planning & Control Decompose goals into plans, route work to agents/tools, and manage the agent loop Agent Orchestrator ReAct · Plan-and-Execute · Tree/Graph-of-Thought State Machine · Loop Control · Budget & Timeouts LangGraph / Agent SDK / custom controllers Task Decomposer HTN · Hierarchical Plans Planner / Re-planner CoT · Self-Ask · ToT Router / Dispatcher Skill & agent selection Policy Engine Permissions · Action gating Scheduler & Queue Async · Priorities · Retries Concurrency Manager Parallel sub-agents · Forks Cost / Token / Latency Budgeter Per-task budgets · Stop conditions Human-in-the-Loop Gateway Approvals · Clarifications · Overrides 4 · Reasoning Core — Foundation Models & Cognition The cognitive engine: LLMs/LMMs with extended thinking, tool-use, and structured output Foundation Model(s) Claude Opus 4.7 · Sonnet 4.6 · Haiku 4.5 GPT · Gemini · Llama · Mistral · Qwen Routed by task complexity & cost SLMs for tools / classification Extended Thinking Reasoning tokens · Scratchpad Self-Reflection / Critic Reflexion · Self-Refine · Debate Tool-Use / Function Calling Structured args · Parallel calls Structured Output JSON schema · Pydantic · Grammars Multimodal Reasoning Vision · Audio · Code · Docs In-Context Learning Few-shot · Skills · Examples Adaptation Layer Fine-tune · LoRA · DPO · RLHF · RLAIF Inference Controls Sampling · Constrained decoding · Prompt caching 5 · Memory Subsystem Multi-tier memory enabling continuity, learning, and personalization Working / Context Memory Live conversation buffer Compaction · Summarization Episodic Memory Past sessions & trajectories Time-stamped events Semantic Memory Facts · Entities · Concepts User profile · Project memory Procedural Memory Skills · Workflows · Recipes Learned tool sequences Vector / Embedding Store pgvector · Pinecone · Weaviate Hybrid & semantic search Knowledge Graph Entities · Relations · Provenance Neo4j · RDF · GraphRAG Memory Manager Read · Write · Update · Forget · Consolidate · Re-rank · Privacy & TTL · Conflict resolution 6 · Tools, Skills & Capabilities Composable actions the agent can invoke through standardized interfaces Web Browsing Search · Fetch Computer use Code Execution Sandboxed runtime Bash / Python / JS File & Repo Ops Read · Write · Diff Git · FS · S3 External APIs REST · GraphQL SaaS · Webhooks Databases SQL · NoSQL Warehouses Communication Email · Slack Calendar · Meet Workflow Tools CI/CD · Jira Notion · Linear Domain Models Vision · ASR · TTS Specialist SLMs Tool Gateway · MCP Servers · Skill Registry Schema validation · Auth · Rate-limits · Idempotency · Caching · Retries · Sandboxing 7 · Knowledge & Retrieval (RAG) Grounding the agent in fresh, verifiable knowledge from internal & external sources Retrievers BM25 · Dense · Hybrid Multi-query · Fusion Re-rankers & Filters Cross-encoder · LLM-rerank Recency · ACL filters Advanced RAG GraphRAG · HyDE · Self-RAG Agentic / Corrective RAG Document Pipelines Parse · Chunk · Embed OCR · Layout · Tables Knowledge Sources Wiki · Docs · Tickets Code · Web · Live data Citation & Provenance Inline citations Source attribution 8 · Multi-Agent Collaboration Specialized agents cooperating, debating, and verifying each other's work Researcher Search · Read Synthesize Coder Edit · Run · Test Debug · Refactor Critic / Reviewer Verify · Score Red-team Domain Experts Legal · Medical Finance · DevOps Coordination Patterns Supervisor · Hierarchical · Swarm · Debate · Blackboard CrewAI · AutoGen · LangGraph · Magentic Inter-Agent Protocols A2A · MCP · ACP · Shared scratchpad Message bus · Contract net · Voting 9 · Action & Environment Interface Where agents take real-world effects — through digital and physical environments Computer Use GUI control · Screen + keyboard Browser Agents DOM · Forms · Navigation Code Sandboxes Containers · VMs · Firecracker Enterprise Systems CRM · ERP · ITSM · Data lake Physical / IoT Robotics · Sensors · Actuators Output Channels & Side-Effect Bus Notifications · Commits · Tickets · Reports 10 · Reflection, Evaluation & Continual Learning Closed-loop self-improvement — evaluate trajectories, learn skills, refine prompts & models Trajectory Evaluator LLM-as-Judge · Rubrics Pass/fail · Quality scores Reward / Verifier Tests · Constraints · Goals Process & outcome rewards Self-Reflection Loop Reflexion · Self-Refine Lessons & corrections Skill / Recipe Distiller Voyager-style libraries Reusable workflows Eval Harness Benchmarks · Regression Online & offline eval Continual Training SFT · DPO · RLAIF Prompt & tool tuning 11 · Safety, Governance, Trust & Observability Cross-cutting controls — guardrails, policy, monitoring, security, and compliance Input/Output Guardrails Toxicity · Jailbreak · Schema Prompt-Injection Defense Trust boundaries · Confirmation PII / DLP Redaction · Tokenization AuthN / AuthZ OAuth · RBAC · Scoped tokens Action Approval HITL · Risky-action gating Compliance & Audit SOC 2 · GDPR · HIPAA · EU AI Act Observability & Tracing OpenTelemetry · LangSmith · Langfuse · Helicone Cost & Performance Monitoring Tokens · Latency · Tool errors · SLOs Red-Teaming & Safety Evals Adversarial probes · Capability gating Model & Tool Governance Versioning · Allow-lists · Kill-switches · Explainability 12 · Infrastructure & Platform The substrate — compute, serving, storage, and networking that make agents run reliably at scale Model Serving vLLM · TGI · TensorRT-LLM · SGLang Compute GPU · TPU · Inference accelerators Agent Runtimes LangGraph · Agent SDK · CrewAI Container & Sandbox Layer Docker · Kubernetes · Firecracker Storage Object · Vector · Graph · OLTP/OLAP Event Bus & Networking Kafka · Pub/Sub · gRPC · Service mesh Secrets · Identity · Key Management Vault · KMS · OAuth providers · Workload identity Deployment Topologies Cloud · On-prem · Hybrid · Edge · Multi-region failover Cross-cutting Governance, Safety & Observability Cross-cutting Governance, Safety & Observability
User & Interaction
Perception & Orchestration
Reasoning Core / Reflection
Memory
Tools & Capabilities
Knowledge / Multi-Agent
Action / Infrastructure
Safety & Governance
Forward data flow
Feedback / learning
Reference architecture for the research paper “Agentic AI System Architecture”. Layers are conceptual — concrete deployments may merge, split, or substitute components.

Layer 1User & Interaction Layer

Agentic AI System Architecture  ›  Layer 1 Detail

User & Interaction Layer

The boundary between humans, applications, and other agents and the agentic system — channels, modalities, sessions, identity, presentation, and the contract that hands a well-formed request to the Perception layer.

Detailed Diagram  ·  v1.0  ·  2026

A · Initiators — Who or What Issues a Request Humans, applications, autonomous schedules, and other agents — every interaction begins here End User Consumer of agent outcomes · Goals, preferences, feedback · Approvals & clarifications · Implicit signals (clicks, dwell) Power User / Operator Configures & supervises agents · Skill / tool authoring · Prompt / persona tuning · Slash commands · CLAUDE.md Developer / Builder Integrates the agent into systems · SDK / API consumers · Hooks · MCP servers · Custom UIs & workflows Admin / Governance Sets policy & entitlements · RBAC / ABAC roles · Quotas · Allow-lists · Audit & compliance review Automation / System Non-human triggers · Cron · Schedulers · Webhooks · Event bus · Sensors / IoT triggers Other Agents Inter-agent delegation · A2A protocol · MCP-client agents · Sub-agent callbacks B · Channels & Surfaces — Where Interaction Happens Concrete touchpoints that capture intent and render output across human, developer, app, and machine surfaces Conversational UIs Synchronous & streaming Web chat · Mobile chat In-product copilot panels Inline assist (autocomplete) Threaded long-running runs Artifact & canvas surfaces Voice & Telephony Real-time speech I/O Smart speakers · Phone bots Streaming ASR + TTS Barge-in · VAD · diarization SIP / WebRTC bridges Multi-language detection Developer Surfaces Programmatic & tool-driven CLI (Claude Code, custom) IDE plugins (VS Code, JetBrains) SDKs · REST · gRPC · WebSocket Notebook / REPL · Terminal Slash commands · /skills Embedded App Channels Asynchronous workflows Email inboxes · SMS Slack · Teams · Discord CRM / ITSM in-app widgets Document & sheet sidebars Browser extensions Autonomous Triggers No human in the request path Cron / schedules Webhooks · Event topics File / DB change feeds Alert / threshold triggers Loop / self-paced runs Agent ↔ Agent Federated invocation A2A protocol MCP client requests RPC · message bus Capability discovery Signed handoffs C · Input Modalities & Capture Each surface produces typed signals that the layer normalizes into a unified request envelope Text Chat · Email · Markdown Voice / Audio Mic stream · Audio files Image / Vision Photos · Screenshots · OCR Video / Screen Capture · Screencast · Frames Documents / Files PDF · DOCX · Spreadsheets Code / Diffs Repo · Patches · Snippets Structured Data JSON · CSV · Forms · Schemas Sensor / Telemetry IoT · Logs · Metrics · Geo D · Interaction Patterns & UX Affordances How users steer, supervise, and recover during long-running, tool-using agent runs Streaming & Stop Token stream · Cancel · Pause Approvals & HITL Risky-action confirmations Clarifying Questions Slot-fill · Disambiguation Plan / Step Preview Plan mode · Diff before write Feedback Capture 👍 / 👎 · Comments · Ratings Citations & Trace UI Sources · Tool-call timeline Undo / Rollback Compensating actions Personalization Themes · Locale · A11y E · Identity, Session & Context Management Stable identity per actor, durable conversation state, and context that travels with every request Authentication SSO · OAuth · OIDC · SAML Passkeys · MFA · API keys Service-account / workload ID Token refresh & revocation Authorization RBAC · ABAC · scopes Tool / skill entitlements Tenant & project isolation Delegated & on-behalf-of Session State Conversation thread & turns Resumable runs · checkpoints Attached files & artifacts Multi-device continuity User & Org Context Profile · preferences · locale Org / workspace · project Memory references · CLAUDE.md Persona & tone bindings Device & Environment UA · OS · IDE · viewport Network class · time zone Geo · accessibility settings Capability flags · feature gates Consent & Privacy Data-use scopes · ToS Memory opt-in / opt-out Recording & training flags Data residency policy F · Edge & API Gateway — Reliability and Safety on the Wire All channels converge through a hardened gateway before requests reach Perception TLS / Edge Termination CDN · WAF · DDoS shield mTLS for service callers Bot / abuse detection Geo & IP policy Protocol Adapters REST · GraphQL · gRPC WebSocket / SSE streams Webhook receiver Email / SMS bridge Rate & Quota Per-user / org / token quotas Concurrency caps Burst smoothing · backoff Fair scheduling Idempotency & Retry Idempotency-Key header Request de-dup window Replay protection (nonce) At-least-once delivery Schema Validation OpenAPI / JSON Schema Size / type / depth limits MIME & encoding checks Versioning & compatibility Trust Boundaries User-vs-tool-vs-content tagging Prompt-injection pre-filter Origin / referer enforcement Data-classification labels G · Unified Request Envelope The contract handed to the Perception layer — one shape for every channel Request Envelope (canonical) Identity · principal · tenant · org · auth_method · scopes · consent_flags Session · thread_id · turn_id · run_id · resume_token · checkpoint · trace_id (OTel) Channel · surface · device · locale · tz Intent & Content · goal / message · attachments · modality · MIME · size · references (doc, repo, URL) Controls · model preference · tools allow-list · budget (tokens, time, $) · stream · response_format Policy · data_class · retention · region H · Output, Rendering & Delivery How agent results are returned, rendered, and made interactive on each surface Streaming Renderer Token / event stream Markdown · code · math Live tool-call updates Rich Artifacts Canvas · diagrams · charts Tables · interactive HTML Generated files (PDF, XLSX) Voice / Audio Out Streaming TTS Voice persona Captions / transcripts Interactive UI Cards Buttons · forms · pickers Slack blocks / Adaptive Cards Confirm / approve / cancel Citations & Provenance Inline source links Tool-call timeline Confidence & caveats Notifications Push · email · SMS Run-completed events Digest summaries Output Guardrails & Compliance PII redaction · safety filters Watermarking · content tags Schema-conformant responses Accessibility & i18n WCAG · screen-reader semantics RTL · locale formatting Translation & transliteration I · Cross-Cutting — Safety, Telemetry & Feedback Loops Always-on concerns that wrap every interaction in this layer Input Guardrails Toxicity · jailbreak · injection PII / DLP Pre-filter Detect · redact · tokenize Abuse & Bot Defense CAPTCHA · velocity · anomaly Telemetry & Tracing OTel spans · structured logs Analytics & A/B Funnels · retention · experiments Audit Log Immutable, signed events Feedback & Signals → Memory / Eval 👍 / 👎 · edits · regenerate · session ratings · escalations Incident & Recovery Hooks Kill-switch · graceful degrade · fallback model · status page Compliance & Residency GDPR · CCPA · HIPAA · SOC 2 · EU AI Act · regional routing J · Handoff to Layer 2 · Perception & Input Processing The Interaction Layer's output: a validated, classified, traceable envelope ready for grounding Validated Request Schema-checked envelope Identity & scopes attached Trust Labels user · tool · external content Data classification tags Trace Context trace_id · span · baggage SLO & budget hints Attached Context Files · history · references Memory / project pointers Output Contract Response shape · streaming Tool / channel callbacks Policy Hints HITL · risk class Region · retention Cross-cutting Safety, Identity & Telemetry Cross-cutting Safety, Identity & Telemetry
Initiators / Identity
Channels / Handoff
Modalities / UX
Session & Context
Edge / Gateway
Output / Presentation
Safety / Governance
Inbound request
Outbound delivery
Feedback signal
Detailed view of Layer 1 — User & Interaction Layer from the Agentic AI System Architecture reference. All channels are normalized into a canonical request envelope and handed off to Layer 2 (Perception). Outputs flow back through the same surfaces with streaming, citations, and policy-aware rendering.

Layer 2Perception & Input Processing

Agentic AI System Architecture  ›  Layer 2 Detail

Perception & Input Processing

Transform the validated request envelope from Layer 1 into a grounded, structured task representation — parsing modalities, extracting intent and entities, assembling context, enforcing safety, and compiling the prompt that Layer 3 will plan against.

Detailed Diagram  ·  v1.0  ·  2026

⇣ Inbound — Validated Request Envelope from Layer 1 (User & Interaction) Identity · Session · Channel · Intent · Controls · Policy hints · Trust labels · Trace context · Attached context principal · scopes · tenant thread_id · run_id · trace_id message · attachments · MIME model pref · tools allow-list budget · stream · format data class · retention · region · trust labels A · Ingestion & Normalization Demultiplex incoming payloads, normalize encodings, sanitize, and enforce size/shape limits Payload Demuxer Split by part / modality · Multipart / form-data · JSON message blocks · File attachments · Inline URIs & data: URLs Encoding & Charset Norm. Stable canonical form · UTF-8 NFC normalization · Newline / whitespace fix · Strip control / zero-width · Bidi & homoglyph guard Sanitization Reduce attack surface · HTML / Markdown clean · Script / event handler strip · File type sniff & verify · Anti-virus / malware scan Limits & Quotas Bound work and cost · Max tokens / chars · Max files / total size · Max audio / video duration · Per-tenant byte quotas Language & Locale Detect & route correctly · Language ID (per segment) · Script / dialect detection · Locale formatting hints · Optional MT pre-translation Caching & Dedup Avoid re-processing · Content-hash cache · Idempotent re-entry · Embedding / parse reuse · CDN-cached artifacts B · Multimodal Encoders & Parsers Convert each modality into structured tokens, embeddings, and document trees the reasoner can consume Text Pipeline Tokens · structure · meta · Sentence / paragraph split · Tokenization (BPE / SP) · Markdown / HTML AST · Code-block tagging · Math / LaTeX detection · Embeddings (BGE / E5) · Token-count budget Vision Pipeline Images · screenshots · UI · Decode · resize · color norm · EXIF / orientation strip · OCR (Tesseract / docTR) · Object & layout detection · Captioning / VQA model · CLIP / SigLIP embeddings · NSFW / safety classifier Audio / Speech Pipeline Voice · music · environment · Resample · denoise · VAD · ASR (Whisper / streaming) · Speaker diarization & ID · Language / dialect detect · Prosody & emotion cues · Audio embeddings · Transcript timestamps Video / Screen Pipeline Frames · scenes · UI graphs · Demux + transcode · Keyframe / shot detection · Frame sampling strategy · Action / event detection · Audio track → ASR · Screen DOM / a11y tree · Temporal embeddings Document Pipeline PDF · DOCX · XLSX · slides · Layout-aware parsing · Heading / section tree · Table extraction · Figure & chart capture · Footnote / citation linkage · Form-field extraction · Chunking + embeddings Code · Structured · Sensor Programmatic inputs · Tree-sitter AST parse · LSP symbols / refs · Diff / hunk extraction · JSON · CSV · schema infer · Time-series resample · Geo / spatial indexing · Unit / dimension normalize C · Language Understanding & Intent Convert raw signals into a structured task — what the user wants and what's needed to act Intent Classifier Task type · domain · urgency Multi-label · confidence scores Entity / Slot Extraction NER · dates · amounts · IDs Pydantic / JSON-schema slots Coreference & Anaphora "it" · "that PR" · "the file" Mention → entity linking Goal Decomposition Top-level objective Sub-goals · constraints · DoD Disambiguation Ambiguity detector Triggers HITL clarification Sentiment / Tone Frustration · urgency Style hints for response Task Schema (structured representation) objective · constraints · slots · entities · success criteria · risk class · suggested skills D · Grounding & Reference Resolution Bind language to real-world entities, files, repos, and prior context Entity Linking KG · directory · Wikidata Org-internal canonical IDs Resource Resolution URLs · file paths · repos PR / ticket / doc IDs Time & Date Norm. Relative → absolute TZ-aware ISO-8601 Geospatial Grounding Geocoding · POI lookup User-locale defaults Quantity / Unit Norm. Currency · SI units FX-rate & precision rules Cross-Modal Align Caption ↔ region Transcript ↔ frame Grounded Reference Graph Mentions · entities · resources · times · places — emitted with provenance & confidence E · Context Assembly & Retrieval Pull just-enough context from memory, knowledge, and session — pack within budget, with provenance Session History Selector Recent turns · pinned items · Salience scoring · Compaction summaries · Tool-call traces · Run checkpoints · Conversation graph Memory Reader Episodic · semantic · procedural · User profile · preferences · Project memory · CLAUDE.md · Learned skills / recipes · Past trajectories · Privacy & TTL filtering Knowledge Retrieval (RAG) Hybrid search across stores · BM25 + dense fusion · Multi-query expansion / HyDE · KG / GraphRAG hops · ACL-aware filtering · Recency & freshness boost Re-rank & Compress Pick the highest-value tokens · Cross-encoder reranker · LLM-based reranker · Extractive snippeting · Map-reduce summarization · Diversity / dedup (MMR) Tool / Capability Hints Which skills are likely · Skill / tool retriever · MCP server discovery · Few-shot example pull · Schema / signature attach · Cost & latency profile Context Budgeter Token / latency / $ caps · Per-section quotas · Lossy vs lossless drop · Cache-aware ordering · Prompt-cache key plan · Overflow → tool offload F · Safety, Trust & Privacy Filters Defend the reasoner from hostile or unsafe inputs and protect user data before context leaves this layer Prompt-Injection Detection Quarantine untrusted text · Heuristic + classifier · Embedded-instruction scan · Tool-result wrapping · Spotlighting / delimiters PII / DLP Scrubber Detect, redact, tokenize · Names · IDs · phones · Cards · accounts · keys · Health / financial data · Reversible vault tokens Content Safety Block harmful inputs early · Toxicity · hate · violence · CSAM & abuse hashing · Dangerous-capability cues · Policy lookup & routing Trust-Boundary Tagger Provenance per token block · user · system · tool · retrieved content (untrusted) · Per-source confidence · ACL / sensitivity labels Adversarial Defense Resist obfuscated attacks · Hidden / steganographic text · Image / OCR injections · Audio whisper attacks · Encoded payload decoder Consent & Residency Honor user / tenant policy · Train-on-data flags · Region pinning · Retention TTL · Right-to-be-forgotten G · Prompt Compilation & Caching Assemble the final messages: layered, schema-aware, cache-friendly, and provenance-preserving Template Engine Layered system / persona Skill prompts · few-shot Per-tenant overrides Tool / Schema Binder JSON-schema · grammars Function signatures Argument hints & types Cache-Key Planner Stable prefix layout cache_control breakpoints TTL · invalidation rules Multimodal Packer Interleave text · img · audio Captions for non-text blocks Inline vs reference attach Token Budgeter / Truncator Section-aware truncation Lossy summary fallback Reserve for completion Provenance Annotator Source IDs per snippet Trust labels carried Citation hooks H · Routing Hints & Quality Signals Annotate the task with hints the Orchestrator can use to choose models, agents, and policies Complexity Estimator Easy / standard / hard Reasoning depth hint Multi-step likelihood Risk & Sensitivity Class Reversibility · scope Regulated-data flag HITL recommendation Model Routing Hint Haiku / Sonnet / Opus Specialist vs generalist Cost / latency target Confidence Scoring Per slot / entity Calibrated thresholds Trigger clarification Locale & Persona Hint Output language Tone / formality Domain persona SLA & Budget Hints Latency target Token / $ ceiling Stop conditions I · Observability, Telemetry & Feedback Every step emits traces, metrics, and signals consumed by Layer 11 (Governance) and the Reflection loop OTel spans · per-stage Latency / token / cost meters Classifier confidence logs Drift / anomaly detection Audit log · signed Eval & Reflection feedback ⇣ Handoff — Structured Task Bundle to Layer 3 · Orchestration & Planning Compiled prompt · tool catalog · task schema · grounded references · routing & risk hints · context budget · trace / provenance objective & sub-goals grounded entities retrieved context (provenance) candidate tools / skills model / risk / SLA hints trust-tagged compiled prompt + cache plan
Ingestion / Handoff
Encoders / Compilation
Understanding / Routing
Grounding
Context Assembly
Safety & Privacy
Observability
Forward flow
Clarification back to user
Feedback / drift signal
Detailed view of Layer 2 — Perception & Input Processing from the Agentic AI System Architecture reference. Inputs flow top-down from Layer 1's request envelope through ingestion, multimodal encoding, language understanding, grounding, context assembly, safety filtering, prompt compilation, and routing-hint generation, before being handed off as a structured task bundle to Layer 3 (Orchestration & Planning).

Layer 3Orchestration, Planning & Control

Agentic AI System Architecture  ›  Layer 3 Detail

Orchestration, Planning & Control

The control plane of the agent — turns the structured task into an executable plan, routes work to models, tools, and sub-agents, manages state and concurrency, enforces budgets and policy, and drives the agent loop until the goal is met or escalated.

Detailed Diagram  ·  v1.0  ·  2026

⇣ Inbound — Structured Task Bundle from Layer 2 (Perception & Input Processing) objective · sub-goals · grounded entities · retrieved context · candidate tools · risk & SLA hints · trust-tagged compiled prompt + cache plan Task Schema · Reference Graph · Tool Catalog · Routing Hints · Policy Constraints · Trace Context · Budget Envelope A · Plan Generation & Decomposition Translate the goal into a structured, executable plan — hierarchical, costed, and revisable Goal Reasoner Analyze objective & constraints · Definition of Done · Acceptance criteria · Hard / soft constraints · Implicit assumptions Hierarchical Decomposer Goal → tasks → steps · HTN-style decomposition · Dependency DAG · Parallel vs serial annotate · Per-step success checks Plan Synthesizer LLM-drafted & validated plan · Schema-constrained output · Tool / agent assignment · Pre/post conditions · Plan Mode preview to user Plan Critic / Verifier Sanity-check before execute · Self-critique pass · Policy / risk lookup · Cost & latency estimate · Counterfactual / what-if Re-planner Adapt plan during execution · On error / observation · Belief revision · Partial-plan repair · Backtrack / abandon Plan Repository Reusable workflow library · Skill / recipe registry · Versioned templates · Distilled from past runs · Org-shared playbooks B · Reasoning & Control Strategies Strategy library the orchestrator selects from based on task class, risk, and budget ReAct Thought → Act → Observe Interleaved reasoning Best for tool-use loops Plan-and-Execute Plan once, execute steps Re-plan on failure Predictable for long jobs Tree / Graph of Thought Branching exploration Beam / MCTS · scoring Hard reasoning problems Reflexion / Self-Refine Critic + retry loop Lessons captured per run Quality-sensitive tasks Debate / Multi-Agent Proposer vs critic Voting / arbitration High-stakes decisions Direct / CoT / Skill-Triggered Single-shot for simple tasks CoT for medium reasoning Pre-built skill / sub-graph fast-path C · Agent Orchestrator — The Control Loop Central state machine that drives the agent through observe → think → act → reflect cycles Agent Orchestrator (Controller) Finite-state / graph-based loop · LangGraph · Agent SDK · custom controllers OBSERVE THINK DECIDE ACT REFLECT Step / iteration counter · Stop conditions · Run-state checkpoints · Resume tokens Run / Trajectory Store Step log · tool I/O · scratchpad · checkpoints · resume token Working / Scratchpad Memory Live thought stream · intermediate facts · action history Belief / World State Known facts · pending unknowns · environment snapshot Loop Controller Max iterations · timeouts · stop / continue conditions Stop Criteria Evaluator DoD met · budget exhausted · escalate · user cancel Checkpoint & Resume Pause · serialize · long-running runs · cross-host resume D · Router, Dispatcher & Tool Selection Decide WHAT to call next: model, tool, sub-agent — and bind arguments Skill / Tool Retriever Top-k by intent + history MCP server discovery Skill cards loaded JIT Model Router Haiku / Sonnet / Opus tiers Specialist SLMs · vendors Quality / cost / latency mix Sub-Agent Dispatcher Researcher · Coder · Critic A2A / MCP-client calls Capability matching Argument Binder Schema-conformant args Type coercion · defaults Reference resolution Pre-flight Validator JSON-schema check Dry-run / what-if Side-effect prediction Fallback Strategy Alternate tool / model Degraded-mode path Ask-user fallback E · Policy Engine & Action Gating Decide if a chosen action is allowed, requires approval, or must be blocked Permission Manager RBAC / ABAC / scopes Tool allow / deny lists Per-tenant entitlements Risk Classifier Reversible · destructive Blast radius estimate Regulated-data flag Prompt-Injection Guard Confirm tool-driven actions from content Untrusted-source check Policy-as-Code OPA / Rego rules Versioned · auditable Tenant overrides Action Approval Auto · HITL · admin Step-up authentication Two-person rule Compliance Filter Region · residency PII handling rules Sector regulations F · Multi-Agent Coordination & Concurrency When the plan requires multiple agents — coordination patterns, communication, and consensus Coordination Patterns Topology selector · Supervisor / hierarchical · Swarm / blackboard · Pipeline / staged · Debate / proposer-critic · Contract net Agent Spawn Manager Lifecycle · isolation · Sub-agent factory · Sandboxed contexts · Inherited permissions · Per-agent token budget · Deadline propagation Inter-Agent Bus Messages & shared state · A2A · MCP · ACP · Shared scratchpad / KV · Pub/sub topics · Signed handoff envelopes · Trace propagation Concurrency Manager Parallel · fork / join · DAG runner · Map-reduce / fan-out · Race & first-win · Cancellation propagation · Deadlock detection Consensus & Arbitration Aggregate sub-agent output · Voting / majority · Weighted by confidence · Judge / referee agent · Tie-breakers · fallbacks · Conflict resolution Roles & Personas Specialist agent registry · Researcher · Planner · Coder · Reviewer · Critic · Verifier · Domain experts · Tool persona templates G · Scheduling, Budget & Resilience Make agent runs predictable, bounded, and recoverable under load and failure Scheduler & Queue Priorities · fair-share Delayed · cron-driven Per-tenant queues Budget Manager Tokens · steps · $ · time Per-task & per-run caps Soft / hard limits Rate & Concurrency Limiter Per model / tool / org Token-bucket backoff Adaptive throttling Retry & Backoff Exponential · jittered Idempotency keys Poison-message handling Circuit Breakers Per tool / model / agent Open · half-open · closed Health-check probes Cost Optimizer Cache-aware ordering Cheaper-model first Early-stop heuristics Error & Recovery Manager Classify (retryable · permanent · policy) · compensating actions Saga / rollback · transactional groups · poisoning detection Failure → re-plan · escalate · graceful degrade Loop Safety Max steps · max depth · runaway detection Cycle detection (revisited state) · diversity bonus Watchdog · liveness probes · hard kill Durable Execution Workflow engines (Temporal · Cadence · Restate) Replay-safe steps · deterministic checkpoints Long-running runs · cross-host failover H · Human-in-the-Loop & Steering Pause, ask, approve, redirect — keep humans in control of risky or ambiguous moves Approval Gate Risky / irreversible action Step-up auth · two-person Clarification Manager Ask follow-up questions Slot-fill · disambiguation Steering & Override Pause · cancel · redirect Modify plan mid-run Plan Mode Preview Show plan before execute Diff before write Escalation Router Tier 1 / 2 / human expert SLA-driven routing Feedback Capture Inline edits · ratings Routes to Memory / Eval I · Observability, Trace & Cross-Cutting Every decision is traced, costed, and auditable; signals feed Layer 10 (Reflection) and 11 (Governance) Trace & Span Emission OTel · LangSmith · Langfuse Per step / tool / agent Cost & Token Meters Per task / org / model Streaming cost gauges Decision Logs Why this tool · why now Plan diff history Replay & Time-Travel Re-run from checkpoint Counterfactual debug Anomaly & Drift Tool-error spikes Plan-shape regressions Audit Log Signed · immutable Compliance evidence ⇣ Outbound — Coordinated Calls to Downstream Layers The orchestrator dispatches typed calls to Reasoning, Memory, Tools, Knowledge, and Multi-Agent layers → Layer 4 · Reasoning Compiled prompt · model · params Tool catalog · stop tokens → Layer 5 · Memory Read · write · update Episodic / semantic deltas → Layer 6 · Tools Schema-validated calls Idempotency · deadlines → Layer 7 · RAG Targeted retrievals Citations required → Layer 8 · Multi-Agent Sub-agent dispatch A2A / MCP envelopes ↑ Layer 1 · User Approvals · clarifications Streaming partial output Cross-cutting Policy, Safety & Telemetry Cross-cutting Policy, Safety & Telemetry
Inbound / Orchestrator
Planning / Multi-agent
Strategies / HITL
Routing
Policy
Scheduling / Observability
Forward control flow
Re-plan / reflection loop
HITL back to user
Detailed view of Layer 3 — Orchestration, Planning & Control from the Agentic AI System Architecture reference. The orchestrator drives the OBSERVE → THINK → DECIDE → ACT → REFLECT loop; planning, routing, policy, multi-agent coordination, and scheduling are coordinated services around it. All decisions emit traces and feed Reflection (Layer 10) and Governance (Layer 11).

Layer 4Reasoning Core — Foundation Models & Cognition

Agentic AI System Architecture  ›  Layer 4 Detail

Reasoning Core — Foundation Models & Cognition

The cognitive engine of the agent — foundation models, extended thinking, tool-use, structured output, multimodal reasoning, self-reflection, adaptation, and the inference fabric that makes them fast, cheap, and reliable.

Detailed Diagram  ·  v1.0  ·  2026

⇣ Inbound — Inference Request from Layer 3 (Orchestration & Planning) Compiled prompt · model preference · tool catalog · sampling params · structured-output schema · stop tokens · budget · trace context messages[] · system · tools[] · response_format · temperature · max_tokens · cache_control · thinking_budget · stream A · Model Selection & Routing Fabric Pick the right model for the job — by capability, latency, cost, region, and trust Capability Matcher Intent → required skills · Reasoning depth · vision · Long-context · code · math · Tool-use · structured output Tier Router Right-size by complexity · Haiku → fast / cheap · Sonnet → workhorse · Opus → hardest reasoning Cost / Latency Optimizer SLA-aware selection · $/1k token meter · P50 / P95 latency targets · Cache-hit aware Region & Residency Data-locality routing · EU · US · APAC pinning · On-prem / private VPC · Sovereign-cloud routing Vendor & Failover Multi-provider abstraction · Anthropic · OpenAI · Google · Self-hosted OSS · Health-check failover Model Cascade Cheap-first, escalate · SLM → LLM escalation · Confidence-gated retry · Mixture-of-experts router B · Foundation Model Pool A heterogeneous fleet — frontier LLMs, multimodal LMMs, and small specialist models Anthropic Claude Family Frontier reasoning · agentic tool-use · long context · Claude Opus 4.7 — deepest reasoning · Claude Sonnet 4.6 — balanced workhorse · Claude Haiku 4.5 — fast / low cost Extended thinking · vision · tool-use · 200k+ context · prompt caching Other Frontier LLMs Multi-vendor coverage · OpenAI GPT-5 / o-series · Google Gemini 2.x · xAI Grok · DeepSeek · Qwen · Mistral Large Open-Weights / Self-Host On-prem & sovereign · Llama 4 / 5 · Qwen3 · DeepSeek-R · Mistral · Mixtral · Gemma · Phi · Domain-tuned variants Multimodal Models Vision · audio · video · VLMs (image + text) · Speech-to-speech models · Video-understanding LLMs · Image-generation models · TTS / ASR specialists Specialist Small Models (SLMs) Cheap, fast, narrow · Embedders (BGE · E5 · Voyage) · Re-rankers (cross-encoder) · Classifiers (intent · safety · PII) · Code models (Codex-style) · Math / theorem provers C · Cognitive Capabilities — How the Model Thinks First-class capabilities the orchestrator can compose: reasoning, tool-use, reflection, and learning in-context Extended Thinking Private reasoning tokens · Reasoning scratchpad · thinking_budget control · Visible vs hidden CoT · Plan-before-act · Multi-step decomposition · Self-consistency / voting · Long-horizon arithmetic Tool Use / Function Calling Bridge to the world · Schema-constrained args · Parallel tool calls · Tool selection & chaining · Tool-result integration · Computer-use actions · MCP-server tool calls · Function-call streaming Structured Output Reliable machine-readable · JSON Schema enforcement · Pydantic / Zod models · Regex / grammar-guided · Type-safe SDK responses · Citations & spans · Field-level validation · Retry on parse failure Multimodal Reasoning Beyond text · Vision: docs · charts · UI · Audio: transcribe + reason · Video frame reasoning · Code: AST + repo context · Tables & spreadsheets · Cross-modal grounding · Generation across modes Self-Reflection / Critic Inner verification · Reflexion · Self-Refine · Generator + critic split · LLM-as-Judge scoring · Self-consistency vote · Confidence calibration · Hallucination probes · Verifier tool calls In-Context Learning Adapt without training · Few-shot exemplars · Skills / system cards · Persona / style transfer · Negative examples · Long-context recall · Demonstration learning · Test-time compute scale D · Inference & Decoding Controls Knobs that shape the distribution and shape of generated tokens Sampling Parameters temperature · top-p · top-k min-p · repetition penalty seed for reproducibility Constrained Decoding Grammar / regex / GBNF JSON-schema masking Outlines · LMQL · XGrammar Logit Biasing Boost / suppress tokens Stop sequences Banned-phrase enforcement Streaming & Stop Logic SSE / event stream Stop tokens · max_tokens Mid-stream cancel Speculative Decoding Draft model + verify Medusa / EAGLE heads 2-3× faster decode Test-Time Compute Best-of-N · majority vote Tree search · MCTS Verifier-guided search E · Context & Caching Subsystem Make long contexts fast and cheap — KV reuse, prompt caching, and attention efficiency Prompt Cache cache_control breakpoints 5-min TTL · 1-hour TTL 90% cost / latency cut KV-Cache Manager PagedAttention (vLLM) Prefix sharing across reqs Eviction · radix tree Long-Context Handling 200k–2M tokens Chunk · map-reduce · skim Needle-in-haystack tuning Attention Efficiency FlashAttention 3 Sliding-window · sparse Linear / state-space hybrids Compaction / Summarize Auto-compact threshold Recursive summarization Token-budget reclaim Response Cache Semantic cache (embed) Idempotent re-runs Read-through · TTL F · Adaptation & Customization Specialize the base model to your domain — prompts, parameter-efficient tuning, full fine-tunes, and preference optimization Prompt Engineering Lightweight, no training · System / persona design · Few-shot exemplar library · Skill / sub-prompt files · Auto-prompt optimization · DSPy compilation PEFT — LoRA / Adapters Parameter-efficient tuning · LoRA · QLoRA · DoRA · Prefix / prompt tuning · Adapter fusion · Per-tenant / per-task · Hot-swap at inference Supervised Fine-Tune SFT on curated data · Instruction tuning · Domain-corpus continued · Tool-use distillation · Rejection-sampled SFT · Curriculum & staging Preference Optimization Align to human / AI prefs · RLHF · PPO · DPO · IPO · KTO · RLAIF (constitutional AI) · Reward modeling · GRPO · process rewards Distillation & Compression Smaller, cheaper, faster · Teacher → student · Quantization (INT8/4) · Pruning · sparsity · Speculative draft training · Edge-deploy variants Continual Learning Improve from production · Trace mining · Feedback → SFT data · Self-play / synthetic · Catastrophic-forget guard · Online eval gating G · Inference Engine & Serving Fabric High-throughput, low-latency execution: schedulers, batching, kernels, accelerators Serving Runtimes Production model servers · vLLM · SGLang · TensorRT-LLM · TGI · llama.cpp · Triton Inference Server · Hosted (Anthropic / OpenAI) Batching & Scheduling Throughput optimization · Continuous batching · Chunked prefill · Disaggregated P/D · Priority queues · SLO-aware scheduling Compute & Accelerators Hardware substrate · NVIDIA H100 / B200 · Google TPU v5p / v6 · AMD MI300 · Trainium · Groq · Cerebras · SambaNova · Edge / mobile NPUs Distributed Inference Scale beyond one node · Tensor parallelism · Pipeline parallelism · Expert parallelism (MoE) · Sequence parallelism · NCCL / RDMA fabric Optimized Kernels Squeeze more per token · FlashAttention / FA3 · Fused MLP / RMSNorm · Triton / CUDA kernels · FP8 / INT4 GEMM · Compiler stacks (XLA, Mojo) Quantization & Deployment Tradeoff quality vs cost · FP16 · BF16 · FP8 · INT8 · INT4 · AWQ · GPTQ · Weight streaming · Multi-tenant serving · Cold-start optimization H · Output Processing & Validation Parse, validate, and certify the model's response before returning to the orchestrator Token / Logprob Stream SSE chunks · partials Confidence per token Tool-Call Parser Extract function calls Schema-validate args Structured Output Verifier JSON / Pydantic check Auto-repair on failure Citation & Span Extractor Source links · char ranges Provenance carryover Hallucination Probes NLI · entailment · self-check Cross-source verifier Confidence Calibrator Temperature scaling Score normalization I · Safety, Telemetry & Governance Cross-cutting controls — model-level guardrails, traces, evals, and lifecycle Output Safety Filters Toxicity · jailbreak · PII Refusal classifier Watermarking & Provenance SynthID · token traces C2PA content credentials Inference Telemetry TTFT · TPOT · tokens/s Cache hit-rate · cost Eval & Regression Suite Offline + online evals Capability benchmarks Model Lifecycle Versioning · canary · rollback Deprecation policy Capability Gating RSP · ASL tiers Red-team gated release ⇣ Outbound — Inference Result Bundle to Layer 3 (Orchestration) Returned in a single shape regardless of model — text · tool calls · structured object · citations · usage · trace Generated Text Streamed or batched Stop reason annotated Tool Calls Validated args Parallel-call list Structured Object JSON / Pydantic Schema-conformant Reasoning Trace Thinking tokens Self-critique notes Citations & Confidence Source spans · scores Calibrated uncertainty Usage & Trace Tokens · cost · latency Cache stats · trace_id Cross-cutting Safety, Eval & Lifecycle Cross-cutting Safety, Eval & Lifecycle
Inbound / Outbound · Routing
Foundation Models
Cognitive Capabilities
Decoding / Output
Caching / Inference Engine
Adaptation
Safety & Lifecycle
Forward inference flow
Reflection / continual learning
Detailed view of Layer 4 — Reasoning Core: Foundation Models & Cognition from the Agentic AI System Architecture reference. Inference requests flow from Layer 3 through model routing, the foundation-model pool, cognitive capabilities (extended thinking, tool-use, structured output, multimodal, reflection, ICL), decoding/cache controls, the adaptation stack, and the inference engine, returning a typed result bundle to the orchestrator. Telemetry & reflection signals feed Layers 10 (Reflection) and 11 (Governance).

Layer 5Memory Subsystem

Agentic AI System Architecture  ›  Layer 5 Detail

Memory Subsystem

Multi-tier memory that gives the agent continuity, personalization, and learning across turns, sessions, and lifetimes — working, episodic, semantic, and procedural memory backed by vector, graph, key-value, and document stores, with a memory manager that reads, writes, consolidates, and forgets.

Detailed Diagram  ·  v1.0  ·  2026

⇄ Memory Operations from Layer 3 (Orchestrator) and Layer 4 (Reasoning Core) read · write · upsert · update · forget · consolidate · search · subscribe — scoped to user, project, tenant, agent read(query, scope, k) write(item, type, scope, ttl) update(id, patch, evidence) forget(scope · subject · GDPR) consolidate(window) subscribe(event) A · Memory Manager — The Memory Control Plane A unified API on top of heterogeneous stores — handles routing, scoping, consistency, and lifecycle Memory Manager (Controller) Single entry-point · scope resolution · ACL · transactions · cross-store fan-out READ WRITE UPDATE FORGET CONSOLIDATE Scope Resolver user · project · org · agent · global Access Control RBAC / ABAC · row-level · ACL filters Routing & Sharding Pick store by type / size / region Consistency & Txn eventual · read-your-write · 2PC Conflict Resolver Recency · evidence-weighted merge Versioning & Audit Provenance · immutable history B · Memory Types — A Cognitive-Inspired Taxonomy Specialized memory tiers, each with its own write triggers, retrieval pattern, and lifetime Working / Context Memory Live conversation buffer · Current turn / run state · Tool I/O scratchpad · Compaction summaries · Pinned items · Lifetime: minutes–hours · Storage: in-context · KV Volatile · session-scoped Episodic Memory "What happened when" · Past sessions / runs · Trajectories & outcomes · Time-stamped events · User interactions log · Lifetime: weeks–years · Storage: vector + KV Persistent · timeline-ordered Semantic Memory Facts & concepts · User profile · preferences · Project / domain knowledge · Entities · relations · taxonomy · Distilled from episodes · Lifetime: long / permanent · Storage: KG + vector Persistent · timeless Procedural Memory Skills · workflows · "how" · Reusable tool sequences · Skill / recipe library · Plan templates · Voyager-style distillation · Lifetime: long · versioned · Storage: doc + repo Executable artifacts Affective / Persona Memory User mood · style · trust · Communication style · Tone · formality · Frustration / engagement · Relationship trust score · Lifetime: rolling · Storage: KV / profile Personalization layer Shared / Org Memory Cross-user knowledge · Team playbooks · CLAUDE.md / repo notes · Curated FAQs · Lessons learned · Lifetime: long · governed · Storage: docs + KG Org-shared knowledge C · Storage Backends — Polyglot Persistence Use the right database for each access pattern; the manager hides which is which Vector Stores Semantic similarity search · pgvector · Pinecone · Weaviate · Qdrant · Milvus · Chroma · LanceDB · HNSW · IVF · DiskANN · Quantization · binary Knowledge Graphs Entities · relations · paths · Neo4j · ArangoDB · Memgraph · NebulaGraph · RDF · SPARQL stores · GraphRAG-friendly · Property + temporal edges KV / Cache Hot, fast, simple · Redis · KeyDB · Dragonfly · DynamoDB · Cosmos DB · Memcached · TTL · LRU eviction · Pub/sub for invalidation Document Stores Rich nested objects · MongoDB · Couchbase · Firestore · Elastic · OpenSearch (BM25) · JSONB / Postgres · Object storage (S3, R2) Relational / OLTP Strong consistency, joins · Postgres · MySQL · CockroachDB · Spanner · Schema-validated facts · Audit / version tables · Row-level security Time-Series & Event Append-only timelines · TimescaleDB · InfluxDB · Kafka · Pulsar · NATS · Event-sourced runs · CDC streams · Replay-able trajectories D · Encoding & Indexing Pipeline (Write Path) Turn raw events into searchable, structured, deduplicated memory items Capture & Normalize From traces · turns · tools Schema-canonical events Stable IDs · timestamps Chunking & Summarize Semantic / sliding windows Hierarchical summaries Headline + body + facts Importance Scorer Should we remember this? Surprise · novelty · utility User-flagged · pinned Embedding Pipeline Dense + sparse vectors BGE · E5 · Voyage · OpenAI Multi-vector / ColBERT Entity & Relation Extractor Triples for the KG Linker · canonicalizer Coreference resolution Index Builder HNSW · IVF · BM25 Field metadata indexes Async + batch builds E · Retrieval & Recall (Read Path) Surface the right memories at the right time, with provenance and freshness Query Planner Decide which stores / types Multi-query expansion · HyDE Query rewriting Hybrid Search BM25 + dense + KG hops Reciprocal-rank fusion Field filters · ACL filter Re-ranker Cross-encoder · LLM-rerank Recency / freshness boost Diversity (MMR) Salience Scorer Relevance · importance Decay function (Ebbinghaus) Per-user weighting Provenance & Citation Source IDs · timestamps Confidence per item Trust labels carried Read Cache Semantic / exact TTL · invalidation hooks Per-scope keys F · Memory Lifecycle — Consolidation, Update, Forgetting Memory must change: episodes get distilled into facts, stale knowledge gets revised, and what shouldn't persist must be removed Consolidation Episodic → Semantic · Periodic distillation jobs · LLM-based summarizer · Pattern → general fact · Sleep-cycle inspired · Hot → warm → cold tiers Update & Belief Revision Keep facts current · Newer evidence wins · Contradiction detector · Soft / hard updates · Provenance preservation · Conflict resolution policy Forgetting / Deletion Bounded growth, compliance · TTL expiration · Decay curves · LRU · Right-to-be-forgotten · User opt-out / opt-in · Cascading delete (KG) Reflection & Skill Distill Episodes → procedures · Voyager-style skills · Lessons learned · Reflexion notes · Recipe extraction · Promote to org memory De-duplication Avoid memory bloat · Near-duplicate detection · SimHash / embedding sim · Merge duplicates · Canonicalize entities · Compaction passes Tiering & Archival Cost-optimized storage · Hot RAM / NVMe · Warm SSD · Cold object store · Glacier / deep archive · Promote on access G · Privacy, Security & Compliance Memory holds the most sensitive long-lived data — protect, scope, and prove control Encryption & Keys At-rest · in-transit · in-use · KMS / HSM-managed keys · Per-tenant key isolation · BYOK / HYOK options · Confidential compute Access Control & Scoping Least-privilege everywhere · Row / namespace ACL · Tenant isolation · Per-agent token scopes · Cross-tenant leakage tests PII & DLP Detect, redact, vault · PII classifier on write · Tokenization vault · Differential privacy · Sensitive-field masking Consent & Residency Honor user intent & law · Memory opt-in / opt-out · Region pinning (EU/US/APAC) · Train-on-data flags · Retention policy enforcement Right-to-Be-Forgotten GDPR / CCPA / CPRA · Subject-erasure request · Cascade across stores · Retraining-aware deletion · Tombstones & receipts Audit & Compliance Every read & write traced · Signed, immutable log · SOC 2 · HIPAA · ISO 27001 · Data lineage graph · Compliance dashboards H · Operations, Observability & Quality Make memory measurable, debuggable, and reliable in production Telemetry Read / write / hit-rate Latency P50 / P95 Memory Health Drift · staleness · bloat Index integrity checks Backup & DR Snapshots · PITR Cross-region replicas Quality Evals Recall@K · MRR · NDCG A/B retrieval experiments Cost Monitoring Storage / IO / embedding $ Per-tenant chargeback Schema Migration Embedding model upgrades Re-indexing pipelines I · Personalization & Memory APIs How other layers consume memory — typed, scoped, and traceable Profile API User · org · agent profile Get / patch · merge logic Search API Semantic / hybrid query Filtered & scoped Skill / Recipe API Procedural memory access Versioned look-ups Event Stream Memory-changed events Subscribe · webhook Admin / DSAR API Export · erase · audit User self-service portal Personalization Hooks Inject context per request Style · preferences · history ⇄ Cross-Layer Integrations Memory is consumed by, and feeds, every neighboring layer ↔ Layer 2 · Perception Profile · history selector Few-shot retrieval ↔ Layer 3 · Orchestrator Plan repository Run trajectories ↔ Layer 4 · Reasoning Working / scratchpad Persona & style cues ↔ Layer 7 · RAG Shared vector / KG indexes Curated knowledge facts ↔ Layer 10 · Reflection Lessons in · skills out Trajectory mining ↔ Layer 11 · Governance Audit · DSAR · policy Compliance evidence All exchanges are scoped, ACL-checked, traced, and logged through the Memory Manager. Cross-cutting Privacy, Audit & Lifecycle Cross-cutting Privacy, Audit & Lifecycle
Memory Manager / Lifecycle
Memory Types
Storage Backends
Write Pipeline
Read / APIs
Privacy & Compliance
Ops & Observability
Forward flow
Read path
Reflection / skill loop
Detailed view of Layer 5 — Memory Subsystem from the Agentic AI System Architecture reference. All memory operations flow through a single Memory Manager that fans out to typed memory tiers (working, episodic, semantic, procedural, affective, shared) backed by polyglot stores. Write & read pipelines, a lifecycle for consolidation/update/forgetting, privacy & compliance controls, and observability surround the manager. Skill distillation feeds procedural memory back into the Reasoning Core (Layer 4) and Reflection (Layer 10).

Layer 6Tools, Skills & Capabilities

Agentic AI System Architecture  ›  Layer 6 Detail

Tools, Skills & Capabilities

Composable actions the agent can invoke through standardized interfaces — a registry of tools, MCP servers, and skills, fronted by a hardened gateway that handles auth, validation, sandboxing, retries, and observability for every external call.

Detailed Diagram  ·  v1.0  ·  2026

⇣ Inbound — Tool Call from Layer 3 (Orchestrator) / Layer 4 (Reasoning) tool_name · arguments · principal · scopes · trace_id · idempotency_key · deadline · retry_policy · trust labels { tool: "get_pull_request", arguments: {...}, ctx: { trace_id, principal, scopes, deadline, idem_key, trust: "tool" } } A · Tool Gateway — The Universal Adapter Every tool invocation is normalized, authorized, validated, executed, and traced through this gateway Tool Gateway / Skill Runtime Single entry-point · spec resolution · auth · validation · invocation · result normalization RESOLVE spec / schema AUTHORIZE scopes · policy VALIDATE args · types INVOKE execute SHAPE result · trace Schema Validation JSON Schema · Pydantic · Zod Auth & Scope Check OAuth · OIDC · token vault Policy Pre-flight OPA / Rego · risk & HITL Idempotency · de-dup · request signing Result Normalizer Stable schema · trim · redact Retry & Backoff Exponential · circuit breaker Trace & Cost Emit OTel spans · token / $ meters Streaming results · pagination · partial outputs B · Tool Registry, Specs & Discovery A versioned catalog of available tools, MCP servers, and skills — what they do, how to call them, and who can use them Spec Registry Source of truth · OpenAPI · JSON Schema · MCP tool descriptors · Examples · cost hints · Side-effect labels Discovery / Indexing Find right tool fast · Embedding-based retrieval · Tag · category · capability · MCP server enumeration · JIT spec injection Versioning & Lifecycle Evolve safely · SemVer per tool · Canary · rollback · Deprecation windows · Backwards-compat tests Capability Matcher Plan → tools mapping · Required vs optional · Pre/post conditions · Cost / latency profile · Substitutable equivalents Permissions Matrix Who can call what · Allow / deny lists · Per-tenant overrides · Risk-tier gating · HITL-required flag Marketplace Distribution & sharing · Internal tool hub · Public MCP registry · Signed publisher · Reviews · ratings C · Tool Categories — The Capability Surface A catalog of what an agent can do — grouped by domain, each adapter conforming to the gateway's contract Web & Browsing Read the open internet · Web search (Bing · Google · Brave) · URL fetch · readability extract · Browser agent (Playwright) · Computer-use UI control · Form fill · click · navigate · Screenshot & DOM capture · Headless · headful modes · Crawl + sitemap traversal · robots.txt & ToS aware Code Execution Compute · transform · test · Python / Node / Bash REPL · Code interpreter · Notebook (Jupyter) · Compiler / linter / formatter · Test runner · fuzzers · Build & package tools · Container exec · SSH · Static analysis · SAST · Math / symbolic (SymPy) File & Repo Code & document operations · Read · Write · Edit · Glob · grep · ripgrep · ast-grep · git · diff · patch · blame · GitHub / GitLab / Bitbucket · PR / commit / branch ops · LSP symbols · tree-sitter · Object storage (S3 · GCS · R2) · File conversion (PDF · DOCX) · Diff & merge tooling External APIs SaaS & partner systems · REST · GraphQL · gRPC · Webhooks (in & out) · Stripe · Twilio · SendGrid · Salesforce · HubSpot · Google / Microsoft Graph · OpenAPI auto-clients · OAuth flow handler · SDK adapters (Python · TS) · Mock / sandbox endpoints Data & Databases Read & write structured data · SQL (Postgres · MySQL) · NoSQL (Mongo · Dynamo) · Warehouse (BigQuery · Snowflake) · Vector / KG queries · Read-only safe-mode · DDL gated by approval · Query plan inspector · dbt · Airflow runs · CSV / Excel I/O Communication Reach humans & teams · Email · SMS · Push · Slack · Teams · Discord · Calendar / meeting invite · Voice call (Twilio) · Pager / on-call (PagerDuty) · Templates · approvals · Localization aware · Quiet-hours respect · Send-rate caps C · Tool Categories — Continued Workflow integrations, AI specialists, knowledge access, computer use, and physical-world adapters Workflow & PM Tickets · docs · planning · Jira · Linear · Asana · Notion · Confluence · Google Docs · Office 365 · CI/CD (GitHub Actions) · Terraform · Ansible · Status pages · runbooks AI Specialist Models Models exposed as tools · Vision (OCR · detection) · Speech (ASR · TTS · diarize) · Image / video generation · Translation · summarize · Embedders · re-rankers · Classifiers · NER · safety Knowledge & Search Internal & curated knowledge · Vector / hybrid retrievers · KG / GraphRAG queries · Wikipedia · Wolfram · Research (arXiv · PubMed) · Internal wikis · runbooks · Maps · weather · finance Computer Use Operate desktop / mobile · Screen + keyboard + mouse · OS-level automation · Accessibility tree access · VNC / RDP isolated VM · Mobile emulator control · Action recorder & replay Enterprise Systems Systems of record · CRM · ERP · ITSM · HRIS · billing · payroll · Identity / IDM (Okta · AAD) · Data lake / lakehouse · SOC / SIEM · monitoring · EHR · LIS (regulated) Physical / IoT Real-world actuation · Robotics control APIs · Sensor read · actuator · Smart-home (Matter) · Industrial PLC / SCADA · Drone / vehicle telemetry · Edge / on-device runtime D · Skills — Composed, Reusable Capabilities Higher-level building blocks: prompts + tools + sub-flows packaged as named, versioned skills Skill Definition SKILL.md · system prompt Triggers · examples · args Required tools manifest Skill Composition Sub-graphs · pipelines Sequenced tool calls Conditional branches Trigger Engine Auto-load on intent Slash commands Path / context match Skill Library Built-in · org · personal Marketplace import Versioned & signed Distillation Promote successful runs Voyager-style learning From procedural memory Runtime Sandbox Scoped tool subset Per-skill budget Isolated state E · MCP Servers — The Standardized Tool Protocol Model Context Protocol — open standard for exposing tools, resources, and prompts to any agent Server Registry Discover available servers Local · remote · cloud Capability negotiation Transport & Session stdio · SSE · WebSocket JSON-RPC 2.0 framing Bi-di streaming Resources & Prompts Files · URIs · templates Sampling requests Subscribe / notify Server Catalog GitHub · GitLab · Slack Filesystem · DB · Search Custom enterprise servers Trust & Sandboxing Per-server permissions Signed publishers Capability review SDK & Hosting Python · TS · Rust SDKs Docker · serverless Multi-tenant gateways F · Execution Environment & Sandboxing Where tools actually run — isolated, limited, observable, and recoverable Sandbox Runtimes Hard isolation per call · Containers (Docker · OCI) · MicroVMs (Firecracker) · gVisor · Kata · WASM · Browser-based VMs (E2B) · Ephemeral · per-task Resource Limits Bound blast radius · CPU · RAM · disk caps · Wall-clock timeouts · Egress allow / deny · File-system quotas · Process count limits Network Policy Egress & DNS control · Domain allow-list · No-egress mode · Outbound proxy & logs · Service-mesh mTLS · Rate-limit per host State & Persistence Workspace lifecycle · Scratch FS per run · Persistent volumes · Snapshot & restore · Worktree isolation (git) · Auto-cleanup TTL Concurrency & Pooling Throughput & warm starts · Sandbox warm pool · Per-tool concurrency cap · Connection pooling · Backpressure signaling · Cold-start optimization Adapters & Drivers Speak each tool's protocol · HTTP / gRPC / WS clients · DB drivers · ODBC / JDBC · SDK wrappers · Protocol bridges · Mock / replay drivers G · Security, Trust & Risk Controls Tool calls are the highest-risk surface — defend against injection, exfiltration, and over-permission Secret & Token Vault Short-lived credentials · HashiCorp Vault · KMS · OAuth token exchange · Just-in-time credentials · Rotate & revoke Risky-Action Gate Reversibility check · Destructive ops require HITL · Two-person rule · Dry-run / what-if · Step-up auth Injection Defense Trust-boundary aware · Quarantine tool results · No instruction-following · Spotlighting / delimiters · SSRF / SQLi guards DLP & Egress Filter Block exfiltration · Outbound PII scan · Secret pattern detection · Tenant-data scoping · URL allow-listing Anti-Abuse Detect bad behavior · Anomaly detection · Quota / spike alarms · Honeypot tools · Auto-disable rogue agent Compliance Hooks Regulated tool use · SOC 2 · HIPAA · PCI · Region-bound tool routing · Tool-level audit evidence · Data-residency proofs H · Reliability & Observability Make every tool call diagnosable, replayable, and within SLO Idempotency Keys De-dup retried calls Retries & Backoff Jittered exponential Circuit Breakers Per tool / endpoint Tracing & Spans OTel · LangSmith Caching Result · semantic Cost & Latency Meters P50 / P95 · $ per call Replay & Debug Recorded I/O SLO Tracking Error budget burn ⇣ Outbound — Tool Result & Effect to Layer 9 (Action / Environment) Normalized result · side-effect record · provenance · latency & cost · trust label Result Object Schema-conformant Side-Effect Log What changed Provenance Source · timestamp Trust Label "tool" untrusted Compensations Rollback hooks Usage Stats Tokens · $ · ms Citations URLs · refs Trace ID For replay Cross-cutting Auth, Sandbox & Audit Cross-cutting Auth, Sandbox & Audit
Tool Gateway
Registry / MCP
Tool Categories
Skills
Sandboxing
Security & Trust
Reliability / Observability
Forward call flow
Tool result return
Skill distillation
Detailed view of Layer 6 — Tools, Skills & Capabilities from the Agentic AI System Architecture reference. All tool invocations flow through a single Tool Gateway that resolves specs from a versioned registry, enforces auth and policy, validates arguments, executes inside hardened sandboxes, and emits a normalized result with traces, costs, and side-effect logs. Skills and MCP servers extend the catalog with composable capabilities; security and observability wrap every call.

Layer 7Knowledge & Retrieval (RAG)

Agentic AI System Architecture  ›  Layer 7 Detail

Knowledge & Retrieval (RAG)

Ground the agent in fresh, verifiable knowledge — connectors, ingestion, embeddings, indexes, hybrid retrieval, advanced RAG patterns, faithfulness checking, and citation-aware delivery — turning raw sources into trusted, traceable context.

Detailed Diagram  ·  v1.0  ·  2026

⇣ Inbound — Retrieval Request from Layer 2 (Perception) / Layer 3 (Orchestrator) / Layer 4 (Reasoning) query · intent · scopes (user/tenant/project) · ACLs · k · filters · freshness · trace_id · budget · response shape retrieve(query, scope, k=20, filters={recency, ACL, source}, mode=hybrid, freshness=24h, with_citations=true) A · Knowledge Sources & Connectors All authoritative knowledge surfaces — internal, external, structured, unstructured — flow in through governed connectors Internal Docs & Wiki Tribal knowledge · Confluence · Notion · SharePoint · Coda · Google Docs · Drive · Quip · Bear · Obsidian · Internal handbooks · Onboarding guides Code & Repos Source-grounded answers · GitHub · GitLab · Bitbucket · Source files + symbols · README · CLAUDE.md · PRs · issues · discussions · Commit history · API docstrings Tickets & Runbooks Operational know-how · Jira · Linear · ServiceNow · Zendesk · Freshdesk · HelpScout · PagerDuty post-mortems · Runbooks · playbooks · Incident timelines · Change requests Communications Conversational record · Slack · Teams · Discord · Email (Gmail · O365) · Meeting transcripts · Chat threads · Customer call notes · Forum posts Structured Data Systems of record · OLTP / SQL DBs · Warehouses (Snowflake · BQ) · CRM · ERP · ITSM · Data lakes / lakehouses · APIs (Text-to-SQL, NL2API) · CSV / spreadsheets External / Public Web Open knowledge · Live web search · Crawled domains · Wikipedia · Wikidata · arXiv · PubMed · SSRN · News · regulatory feeds · Industry datasets B · Ingestion & Document Processing Pipeline From raw source to clean, chunked, enriched documents — the foundation of retrieval quality Connectors & Loaders Ingest from each source · OAuth-scoped access · Full crawl + delta sync · CDC / change feeds · Webhook event push · Permission propagation · Source provenance tags Parsing & Extraction Get the structure right · Layout-aware PDF (Unstructured) · DOCX · PPTX · XLSX · HTML cleanup · readability · OCR (Tesseract · docTR) · Table & figure extraction · Audio / video transcription Cleaning & Dedup Reduce noise & bloat · Boilerplate stripping · Near-duplicate (SimHash) · Encoding normalization · Language detection · Quality score · spam filter · Translation (optional) Chunking Strategies Right-size for retrieval · Fixed token / overlap · Semantic / sentence split · Hierarchical (parent / child) · Markdown heading-aware · Code: AST-based split · Late-chunking with context Metadata Enrichment Make filtering precise · Source · author · date · ACL · sensitivity tags · LLM-generated summary · Auto-generated questions · Entities · keywords · topics · Section path · breadcrumbs Sync & Freshness Keep the index live · Incremental updates · Tombstones · soft delete · Re-index on schema change · Embedding-model upgrade · Backfill orchestration · DLQ · failed-doc replay C · Embeddings & Index Construction Multiple complementary indexes — dense, sparse, graph — for hybrid retrieval Embedding Models Multi-model strategy · OpenAI · Voyage · Cohere · BGE · E5 · GTE · Jina · Multimodal (CLIP · SigLIP) · Multi-vector (ColBERT · Late) · Matryoshka (truncatable) · Domain fine-tunes Vector Indexes Scalable ANN search · pgvector · Pinecone · Weaviate · Qdrant · Milvus · Vespa · Turbopuffer · HNSW · IVF · DiskANN · PQ · binary quantization · Per-tenant namespacing Sparse / Lexical Index Exact-match recall · BM25 / Okapi · Elasticsearch · OpenSearch · Tantivy · Quickwit · SPLADE · uniCOIL learned-sparse · Token n-grams · synonyms · Field boosts · phrase Knowledge Graph Relations & multi-hop · Entity / relation extraction · Neo4j · ArangoDB · Memgraph · RDF · SPARQL stores · Community detection (GraphRAG) · Schema · ontology · Temporal & provenance edges Metadata & Filter Index Pre-filter at scale · ACL bitmap / posting list · Date / numeric ranges · Source / type facets · Geospatial (R-tree · S2) · Tenant / workspace shard · Field-level tokenizers Index Operations Build · update · evolve · Async batch builds · Streaming upserts · Blue-green re-index · Snapshot & restore · Compaction · vacuum · Per-store sharding D · Query Understanding & Expansion Turn the raw user / agent query into multiple, well-formed search inputs that hit the right indexes Query Rewriting Make queries searchable · Pronoun resolution · History-aware rewriting · Synonym expansion · Acronym expansion · Spell / typo correction Multi-Query & HyDE Cover the answer space · LLM generates N variants · Sub-question decomposition · HyDE: hypothetical doc · Step-back prompting · Translate query to index lang Routing & Filtering Pick the right haystacks · Source classifier · Index router (per query) · Metadata filters (ACL · date) · Tenant / project scope · Mode select (text · KG · SQL) E · Retrieval Engine — Hybrid Search & Re-ranking Run parallel searches across stores, fuse, re-rank, and diversify into a final candidate set Hybrid Searcher Dense + sparse + KG · Parallel index queries · KG multi-hop expansion · Score normalization · Reciprocal-Rank Fusion (RRF) · Per-source weights Re-ranker Boost real relevance · Cross-encoder (BGE · Cohere) · LLM-as-rerank (listwise) · Recency / freshness boost · Authority / source weight · Click / engagement signals Diversify & Compress Pack the best signal · MMR diversification · Cluster & pick · Extractive snippeting · Map-reduce summarize · Token-budget enforce F · Advanced RAG Patterns Move beyond single-shot RAG: agentic, corrective, graph, and multi-hop strategies Agentic RAG Retrieval as a tool · Iterative retrieve + reason · Decide when / what to fetch · Multi-step exploration · ReAct-style retrieval · Tool-using sub-agents Self-RAG / Self-Critique Retrieve only when needed · Need-retrieval classifier · Self-reflection tokens · Critique & revise · Score support per claim · Skip when high confidence Corrective RAG (CRAG) Recover from bad retrieval · Retrieval-quality grader · Fall back to web search · Re-query with new terms · Decompose & recombine · Abstain when unsure GraphRAG Multi-hop & community · Entity graph from corpus · Community summaries · Local + global search · Path / hop reasoning · Schema-guided retrieval Multi-Hop & Decomposition Answer compound queries · Sub-question retrieval · Iterative refinement · Evidence chaining · Plan-and-retrieve · FLARE active retrieval Hierarchical & RAPTOR Tree-of-summaries · Cluster & summarize tree · Parent-child retrieval · Coarse-to-fine drill-down · Section · doc · corpus levels · Long-corpus efficient G · Faithfulness, Citations & Hallucination Control Make answers verifiable — every claim grounded in a source the user can check Provenance Tracker Lineage end-to-end · Doc · chunk · char-span IDs · Author · timestamp · Source URL · version · Trust label per source Citation Generator Inline source links · Sentence-level cites · Span-level highlighting · Click-through-able URLs · Bibliography assembly Faithfulness Verifier Does answer match sources? · NLI / entailment scoring · Claim → evidence map · LLM-as-judge faithfulness · Refuse on low support Hallucination Probes Catch ungrounded claims · Cross-source verifier · Self-check QA · Numeric / fact extractor · Confidence calibration Conflict & Recency When sources disagree · Newer source preference · Authority weighting · Surface conflicts to user · Disagreement marker Abstention & Refusal Know when to say "I don't know" · No-evidence threshold · Out-of-scope detector · Suggest follow-up · Human escalation hook H · Governance, ACL, Privacy & Compliance Retrieved knowledge inherits source permissions and policy — never expose more than the user is allowed to see ACL Propagation Source perms → index · User · group · folder ACL · Live permission lookup · Pre-filter at search time PII / DLP Detect & redact · Sensitive-field masking · Tokenization vault · Sector-specific policies Residency & Sovereignty Region-bound data · Index per region · Geo-sharded retrieval · Sovereign-cloud routing Source Trust Tags Untrusted by default · "external content" label · No-instruction-follow rule · Spotlighting / delimiters Audit & Lineage Who saw what, when · Query logs (signed) · Result lineage graph · DSAR · evidence pack Retention & TTL Bounded shelf-life · Per-source TTL · Right-to-be-forgotten · Tombstones cascade I · Operations & Retrieval Quality Make RAG measurable, debuggable, and continuously improving Retrieval Evals Recall@K · MRR · NDCG RAG Metrics Faithfulness · context P/R A/B Experiments Embedding · chunk · prompt Drift & Anomaly Stale index · empty results Cost & Latency P50 · P95 · $ per query Caching Embed · query · result Feedback Loop 👍 / 👎 · click-through Eval Datasets Golden Q/A · regression ⇣ Outbound — Grounded Context to Layer 4 (Reasoning) / Layer 3 (Orchestrator) Ranked passages · citations · faithfulness scores · trust labels · usage stats — ready for prompt assembly Passages[] id · text · score Citations URLs · spans · authors Faithfulness Per-claim score Trust Labels Untrusted content Coverage Report Gaps · conflicts Usage Stats Tokens · ms · $ Trace ID Replay key Abstain Flag Low coverage signal Cross-cutting ACL, Audit & Compliance Cross-cutting ACL, Audit & Compliance
Knowledge Sources
Ingestion / Operations
Embeddings / Indexes
Query Understanding
Retrieval / Faithfulness
Advanced RAG Patterns
Governance & Compliance
Forward retrieval flow
Grounded context return
Corrective & feedback loops
Detailed view of Layer 7 — Knowledge & Retrieval (RAG) from the Agentic AI System Architecture reference. Knowledge flows top-down from heterogeneous sources through ingestion, polyglot indexes, query understanding, hybrid retrieval, advanced RAG patterns, and faithfulness checking, returning trust-labeled grounded context with citations to the Reasoning Core. Corrective & feedback loops continuously improve retrieval quality.

Layer 8Multi-Agent Collaboration

Agentic AI System Architecture  ›  Layer 8 Detail

Multi-Agent Collaboration

Specialized agents cooperating, debating, and verifying each other's work — coordination patterns, agent roles, communication protocols, lifecycle management, consensus, and trust controls that turn a swarm of agents into a reliable team.

Detailed Diagram  ·  v1.0  ·  2026

⇣ Inbound — Delegation Request from Layer 3 (Orchestrator) complex goal · sub-task DAG · required roles · budget · deadline · trust scope · coordination preference · result schema spawn_team({ goal, pattern: "supervisor", roles: ["researcher", "coder", "critic"], budget, deadline, trust_scope }) A · Coordination Patterns & Topologies Pick the right organizational shape for the task — each pattern trades autonomy, parallelism, and quality differently Supervisor / Hierarchical Top-down delegation · Single supervisor agent · Specialist sub-agents · Predictable accountability · Easy to debug · Best for clear goals Pipeline / Sequential Stage-by-stage handoff · Each agent → next stage · Strict ordering · Schema-checked transitions · Ideal for ETL / workflows · Easy retry per stage Debate / Dialectic Adversarial verification · Proposer vs critic · Multi-round argument · Judge / referee agent · Reduces hallucinations · High-stakes decisions Swarm / Peer Decentralized cooperation · No central authority · Hand-off-capable peers · Emergent specialization · Resilient · scalable · Open-ended exploration Blackboard Shared workspace · Common knowledge state · Agents read & write · Trigger on changes · Loose coupling · Heterogeneous experts Contract Net Bid-based assignment · Manager broadcasts task · Workers bid · capability · Best bid wins · contract · Marketplace dynamics · Cross-org agents B · Supervisor / Coordinator — The Team Leader Spawns the team, distributes work, monitors progress, aggregates results, and decides when the team is done Multi-Agent Coordinator Pattern selector · agent factory · message broker · result aggregator PLAN team & tasks DELEGATE assign work MONITOR progress AGGREGATE merge results CONCLUDE finalize Pattern Selector Choose topology by task Team Composer Pick roles · models · skills Budget Allocator Tokens · time · $ per agent Progress Tracker Per-agent status & SLA Termination Manager When the team is "done" Escalation Hooks HITL · admin · stop-the-team C · Agent Roles & Specialist Personas A library of role templates — each with its own system prompt, tools, model, memory scope, and quality bar Planner / Architect High-level decomposition · Goal → sub-goals · DAG / step graph · Risk & budget plan · Strong reasoning model · Plan repository access · Re-plan on failure Researcher Find & synthesize info · Web · KB · RAG access · Citation discipline · Multi-source synthesis · Long-context model · Read-only tool scope · Coverage / gap reports Coder / Builder Write & test code · Edit · run · test · Debug · refactor · Sandboxed exec scope · Repo & PR tools · Code-tuned model · Writes worktree branch Critic / Reviewer Verify & challenge · Independent context · Rubric · checklist · Score · pass / fail · LLM-as-Judge persona · No-write tool scope · Red-team variant Domain Experts Vertical-specific knowledge · Legal · medical · finance · DevOps · security · QA · Domain-tuned models · Curated knowledge base · Compliance-aware · Regulated tool sets Operator / Executor Take action in the world · Computer-use tools · Browser / GUI control · Enterprise app actions · HITL gating · Compensating actions · Receipts & logs D · Inter-Agent Communication Protocols Standardized message formats and transports — typed, signed, traceable, and replay-safe A2A Protocol Agent-to-Agent Capability-card discovery Cross-vendor / org agents MCP Client Calls Tool-style sub-agent Sub-agent as MCP server Schema-typed I/O ACP / Custom RPC Internal team protocol gRPC · JSON-RPC Strong typing · streaming Message Bus Pub/sub · queues Kafka · NATS · Redis Streams Topic-per-conversation Shared Scratchpad Blackboard / KV CRDT for concurrent edits Subscribe to changes Message Envelope Standard wrapper from · to · trace · sig trust · replyTo · ttl E · Agent Lifecycle & State Management Spawn safely, scope tightly, checkpoint reliably, and tear down cleanly Spawn Factory Instantiate sub-agent Role · model · tools · prompt Inherited / scoped context Permission Inheritance Least-privilege subset Token attenuation Tool / data scope clamp Isolated Runtime Sandbox · worktree · VM No leak between agents Per-agent secrets Checkpoint & Resume Durable per-agent state Pause · serialize · re-hydrate Cross-host failover Health & Watchdog Liveness probes Stuck-loop detection Auto-respawn / abort Graceful Termination Drain · finalize · cleanup Hand-off in-flight tasks Audit log per-agent F · Coordination Mechanics — How Work Gets Done Decompose, assign, run in parallel, aggregate, and resolve conflicts — the core team plumbing Task Decomposition Break the goal apart · HTN-style sub-tasks · Dependency DAG · Parallel-safe units · Per-task DoD · Roll-up criteria Task Assignment Match work to agent · Capability matching · Bidding (contract net) · Load balancing · Sticky / affinity routing · Reassignment on failure Concurrency Control Run agents in parallel · Fork / join · barriers · Map-reduce / fan-out · Race · first-win · Cancellation propagation · Deadlock detection Result Aggregation Merge sub-agent outputs · Schema-aware merge · Citation preservation · De-dup · canonicalize · LLM synthesizer agent · Coverage report Consensus & Voting When agents disagree · Majority / weighted vote · Confidence-weighted · Judge / referee agent · Self-consistency check · Tie-breakers · HITL Conflict Resolution Reconcile contradictions · Recency · authority · Evidence weighting · Surface to user · Escalate to expert · Compensating undo G · Trust, Identity & Security Across Agents Multi-agent systems amplify both capability and attack surface — every message and handoff must be authenticated and bounded Agent Identity Signed, attestable agents · Cryptographic agent ID · Workload identity (SPIFFE) · Signed capability cards · Agent provenance ledger Message Authentication Tamper-evident envelopes · Signed JWT / DPoP · Replay-protection nonce · Origin agent verified · mTLS on transport Cross-Agent Injection Treat peer output as untrusted · No instruction-following · Quarantine peer messages · Trust labels carried · Spotlighting / delimiters Permission Attenuation Sub-agents get less, never more · Token down-scoping · Read-only by default · Tool allow-lists · Data-scope clamping Rogue-Agent Defense Catch & isolate misbehavior · Anomaly detection · Loop / abuse detector · Auto-quarantine · Kill-switch · admin alert Privacy & Data Boundaries Don't leak between agents · PII redaction in handoffs · Tenant-scoped contexts · Need-to-know filtering · Cross-tenant blocks H · Multi-Agent Frameworks & Runtimes Production-ready stacks for orchestrating teams of agents LangGraph / Agent SDK Graph-based agent flows Stateful · checkpointable Anthropic / LangChain CrewAI Role-based crews Tasks · processes · tools Hierarchical / sequential AutoGen / Magentic Conversation-driven Group chat patterns Microsoft OpenAI Swarm / Agents Lightweight handoffs Tool-driven routing Stateless agents Durable Execution Temporal · Cadence · Restate Replay-safe orchestration Long-running multi-agent Custom Runtimes Bespoke controllers Actor model · Akka · Ray Ray Serve · Dapr I · Observability & Multi-Agent Telemetry Trace every agent, every message, every handoff — across the whole team Distributed Tracing OTel spans across agents Conversation graph view Per-agent sub-trace Cost & Token Roll-up Per agent · per team · total Budget burn tracking Top-spender attribution Conversation Replay Step through messages Time-travel debug Counterfactual re-runs Team Health Metrics Throughput · success rate Per-role error rates Idle / loop detection Decision Logs Why this agent · this task Vote / consensus history Plan diffs across team Audit & Compliance Signed team transcripts Per-agent attribution Evidence for review ⇣ Outbound — Aggregated Team Result to Layer 3 (Orchestrator) Synthesized answer · per-agent contributions · consensus / dissent · citations · cost · trace · escalations Final Answer Schema-conformant synthesis Per-Agent Contributions Attribution · diffs Consensus / Dissent Vote tallies · open issues Citations & Evidence Source spans · trust labels Team Trace Conversation graph · spans Cost & Escalations Tokens · $ · HITL flags All artifacts are signed, traced, and attributable to the originating agent. Cross-cutting Identity, Trust & Telemetry Cross-cutting Identity, Trust & Telemetry
Coordination Patterns / Frameworks
Supervisor / Coordinator
Agent Roles
Communication Protocols
Lifecycle / Mechanics
Trust & Security
Observability
Forward team flow
Inter-agent messaging
Critique / re-plan loop
Aggregated result return
Detailed view of Layer 8 — Multi-Agent Collaboration from the Agentic AI System Architecture reference. A single Coordinator picks a topology, spawns specialist roles, brokers signed messages over typed protocols, manages lifecycle & permissions, runs concurrent tasks, aggregates results with consensus, and emits a single team artifact back to the orchestrator. Trust controls and observability span every agent and every handoff.

Layer 9Action & Environment Interface

Agentic AI System Architecture  ›  Layer 9 Detail

Action & Environment Interface

Where agents take real-world effects — through digital and physical environments. Pre-flight validation, isolated execution, side-effect tracking, compensating actions, receipts, and a reversible record of every change the agent makes.

Detailed Diagram  ·  v1.0  ·  2026

⇣ Inbound — Action Request from Layer 6 (Tools) / Layer 3 (Orchestrator) action_type · target environment · arguments · principal · scopes · idempotency_key · deadline · risk_class · approval_token · trace_id execute({ env: "browser", action: "submit_form", args: {...}, risk: "medium", reversible: true, approval: "auto", deadline: 30s }) A · Pre-Flight Gate — Decide Whether the Action May Proceed Verify scope, classify risk, dry-run side effects, secure approvals — refuse early when in doubt Risk Classifier How dangerous is this? · Reversible vs destructive · Blast radius estimate · Public · private · regulated · Money / identity impact Authorization & Scope Right to act here? · OAuth / OIDC scope check · Tenant / project boundary · Token attenuation · Per-environment ACL Dry-Run / Simulation What would happen? · What-if effect preview · Plan mode (read-only) · Diff before write · Sandbox replay Approval Gate (HITL) Human in the loop · Auto-approve · prompt user · Two-person rule for high-risk · Step-up auth (MFA) · Approval token signing Idempotency & Dedup Don't apply twice · Idempotency-Key header · Request fingerprint · Replay-detection window · De-dup ledger Compliance Pre-Check Regulatory & policy · Region / residency · GDPR / HIPAA / PCI · Quiet-hours respect · Quota / budget caps B · Effector — Universal Action Dispatcher A single, audited entry-point that translates intent into environment-specific commands Action Effector / Environment Bridge Resolve env adapter · acquire lease · execute · capture effect · emit receipt RESOLVE env adapter LEASE workspace · slot EXECUTE command CAPTURE effects RECEIPT sign · log Adapter Registry Per-environment drivers Action Schema Validator Typed args · invariants Lease & Concurrency Per-resource locks · queues Result Normalizer Stable shape · trim · redact Receipt Signer Cryptographic evidence Telemetry Emitter OTel span · cost · latency C · Environment Targets — Where Actions Land The catalog of environments the agent can manipulate — each with its own driver, capabilities, and risk profile Computer Use Operate desktop / mobile · Mouse · keyboard · screen · Click · type · scroll · drag · Accessibility tree · OS shortcuts · clipboard · Window / app focus · VNC / RDP isolated VM · Mobile emulator (iOS · Android) · Action recorder & replay · Visual grounding / OCR Browser Agents Operate the web · Navigate · click · fill · DOM & ARIA selectors · Form auto-fill · File upload · download · Headless · headful · Playwright · Puppeteer · Cookie · session vault · robots.txt & ToS aware · CAPTCHA detection (refuse) Code Sandboxes Execute & build · Python · Node · Bash · Containers (Docker · OCI) · MicroVMs (Firecracker) · gVisor · Kata · WASM · Browser-side VMs (E2B) · Notebook (Jupyter) · Build pipelines · CI runs · Test suites · benchmarks · Ephemeral & persistent Enterprise Systems Systems of record · CRM · ERP · ITSM · HRIS · billing · finance · Salesforce · ServiceNow · SAP · Workday · NetSuite · Data lake / lakehouse · DevOps (CI/CD · IaC) · SOC / SIEM · monitoring · Identity / IDM (Okta · AAD) · EHR · LIS (regulated) Physical / IoT Real-world actuation · Robotics control APIs · Sensors · actuators · Smart-home (Matter · HA) · Industrial PLC / SCADA · Drone / vehicle telemetry · Edge / on-device runtime · ROS / OPC-UA bridges · Safety interlocks · e-stops · Geo-fenced operation Output Channels Reach humans & teams · Notifications · push · Email · SMS · voice · Slack · Teams posts · Git commits · PRs · Tickets · Jira · Linear · Reports · dashboards · Pager / on-call · Status pages · Templates · approvals D · Isolation, Sandboxing & Resource Governance Bound the blast radius — every action runs in a constrained environment with enforced limits Sandbox Runtimes Hard isolation per action · Containers (Docker · OCI) · MicroVMs (Firecracker · QEMU) · gVisor · Kata · WASM · Browser-tab VMs (E2B) · Ephemeral · per-task Resource Limits CPU · RAM · disk · time · cgroups · ulimits · Wall-clock timeouts · File-system quotas · Process count limits · Per-tenant capacity Network Policy Egress & DNS control · Domain allow-list · No-egress mode · Outbound proxy & logs · Service-mesh mTLS · Rate-limit per host Workspace State Per-action persistence · Scratch FS per run · Persistent volumes · Snapshot & restore · Worktree isolation (git) · Auto-cleanup TTL Secrets & Credentials Just-in-time injection · Vault / KMS / HSM · Short-lived tokens · OAuth on-behalf-of · Rotated · scoped · Memory-only injection Concurrency & Pooling Throughput & warm starts · Sandbox warm pool · Per-env concurrency cap · Connection pooling · Backpressure signals · Cold-start optimization E · Side-Effect Capture & Causality Tracking Record exactly what changed in the world — for review, replay, and rollback Effect Recorder What did it do? · Before / after diff · Resource IDs touched · DOM mutations · API calls · FS writes · DB rows · Network egress log Causality & Lineage Why did it happen? · Trigger trace_id · Plan step → action map · Agent attribution · Approval evidence · Causal chain graph Receipts & Evidence Tamper-evident proof · Signed action receipt · Hash-chained log · External system IDs · Screenshot · DOM snapshot · External provider receipt Streaming Output Live progress to user · Stdout / stderr stream · Progress events · Partial-result emit · Cancel signal listener · Live screen capture Output Sanitization Make output safe · PII / secret redact · Size truncation · Trust label tagging · Schema-conformant · Encoding normalize Notification Hook Tell who needs to know · Action-completed event · Failure / alert webhook · Audit topic publish · User receipt UI · Status-page hook F · Reversibility, Compensation & Recovery Plan for "undo" before you act — rollback, compensate, or escalate when the world doesn't cooperate Compensation Registry "Undo" recipes per action · Inverse-action mapping · Soft-delete patterns · Restore-from-snapshot · Manual-undo runbooks Saga Coordinator Multi-step transactions · Forward + compensate steps · Per-step idempotency · Failure → cascade undo · Temporal · Cadence engines Snapshot & Rollback Time-travel state · FS / VM snapshots · DB point-in-time recovery · Git revert · branch · Worktree restore Failure Classifier What went wrong? · Retryable transient · Permanent / policy reject · Partial-success / dangling · Decide retry / undo / abort Retry & Backoff Recover gracefully · Exponential · jittered · Idempotency-key reuse · Circuit breaker per env · Poison-message DLQ Escalation Path When automation isn't enough · Page on-call · Open ticket / runbook · Pause & ask user · Manual-step inventory G · Safety Interlocks & Hard Stops Independent safety controls that no agent can override Hard Limits Forbidden ops list Geo · sector · scope blocks Never auto-allow Kill-Switch Stop all actions Per-agent / per-env / global Operator-controlled Velocity Caps Per-min / hour rate Anomaly auto-pause Spike detection Tripwires Auto-trigger conditions Honeypot resources Forbidden domain hits Manual Override Operator pause / cancel Approve · reject · modify Real-time intervention Physical E-Stops For robotics / IoT Hardware interlocks Geo-fence violations H · Observability, Audit & Forensics Every action is traced, signed, and replayable — for debugging, compliance, and incident response Action Tracing OTel spans per call env · agent · trace_id Latency / cost meters Audit Log (signed) Hash-chained · immutable Who · what · when · why Compliance evidence Replay & Forensics Reconstruct any run Recorded I/O · screen Counterfactual replay Anomaly Detection Drift · spikes · errors Per-env baselines Auto-quarantine triggers Cost & SLO Tracking Per-env $ · success rate Error budget burn Top-spender attribution User Receipt UI Show what was done Effect timeline Undo / inspect controls ⇣ Outbound — Action Outcome to Layer 3 (Orchestrator) / Layer 10 (Reflection) / Layer 11 (Governance) Status · effect record · receipt · compensation handle · trace · cost · escalation flags — a complete account of what happened Status success · partial · failed · refused Effect Record Resources changed · diffs Signed Receipt Hash · external IDs · proof Compensation Handle Inverse-action token Trace & Telemetry Spans · cost · latency Escalations HITL · alerts · pages All outcomes are signed, traced, and reversible (or marked as one-way) — never silently applied. Cross-cutting Approval, Audit & Reversibility Cross-cutting Approval, Audit & Reversibility
Pre-Flight / Safety
Effector / Dispatch
Environment Targets
Isolation / Sandbox
Side-Effect Capture
Reversibility
Observability
Forward action flow
Effect on environment
Rollback / hard-stop loop
Outcome / approval return
Detailed view of Layer 9 — Action & Environment Interface from the Agentic AI System Architecture reference. Every action passes a pre-flight gate, runs through a unified Effector into a sandboxed environment adapter, captures its side-effects with signed receipts, and emits an outcome bundle plus a compensation handle. Independent safety interlocks and observability surround the whole pipeline so no change to the world is silent or irreversible.

Layer 10Reflection, Evaluation & Continual Learning

Agentic AI System Architecture  ›  Layer 10 Detail

Reflection, Evaluation & Continual Learning

The closed-loop self-improvement layer — collect trajectories, evaluate quality, reflect on lessons, distill skills, run benchmarks, retrain, and ship improvements safely back into prompts, models, and tools.

Detailed Diagram  ·  v1.0  ·  2026

⇣ Inbound — Signals from Every Layer (1–9 & 11) trajectories · tool I/O · effects & receipts · user feedback · ratings · escalations · safety violations · cost / latency · audit events { run_id, trace, plan, prompts, tools[], outputs, effects[], feedback{thumbs, edits, regenerate}, slo, errors[] } A · Trajectory & Feedback Collection Build the canonical record of every agent run — the raw material for every downstream improvement Trace Ingestor Stream every run · OTel spans · structured logs · LangSmith / Langfuse · Per-step metadata · Tool I/O · model calls · Replay-safe serialization Explicit Feedback User-stated signal · 👍 / 👎 votes · Star / scale ratings · Free-text comments · Survey forms · Bug / issue reports Implicit Signals Behavior tells the story · Edit / accept rate · Regenerate clicks · Abandonment · stop · Dwell · scroll · revisit · Follow-up question rate System Outcomes Did it actually work? · Test pass / fail · Tool error rates · Goal attainment · Side-effect reversals · HITL approval rates Cost / SLO Telemetry Operational fitness · Tokens · $ per run · Latency P50 / P95 · Cache hit-rate · Tool retry counts · Error budget burn Trajectory Store Durable archive · Object storage (S3) · Indexed (vector + KV) · PII-redacted variant · Versioned · TTL'd · Replay-able B · Evaluation Engine — Scoring Trajectories Multiple scoring strategies — programmatic, model-based, human — combined into a final quality signal LLM-as-Judge Model-based scoring · Pairwise comparison · Rubric scoring (1–5) · Single / multi-judge · Calibration vs humans · Bias mitigation · Reasoning traces logged Programmatic Verifiers Ground-truth checks · Unit / integration tests · Property-based checks · Schema · invariants · Numeric / string equality · Constraint satisfaction · Linters · formatters Reward Models Learned quality scorer · Outcome reward (ORM) · Process reward (PRM) · Per-step reward dense · Trained from prefs · Calibration audit · Reward-hacking probes Faithfulness & Safety Eval Truthful & safe · NLI / entailment grader · Citation grounding check · Hallucination detector · Toxicity / harm classifier · Jailbreak resistance · PII leakage probe Human Annotation Gold-standard labels · Expert review queue · Pairwise preferences · Inter-rater agreement · Active learning (uncertain) · SME consult for domain · Calibrate LLM judges Score Aggregator Combine signals · Weighted ensemble · Pass / fail thresholds · Confidence intervals · Per-dimension breakdown · Outlier flagging · Trend tracking C · Self-Reflection & Learning In-the-Loop Mid-run improvement: critique, revise, and capture lessons that transfer to the next attempt Reflexion / Self-Refine Critique then retry · Generator + critic split · Verbal-feedback memory · N-round revision loop · Stop on quality plateau · Reduces hallucination Lessons-Learned Capture Episode → insight · "What went wrong" notes · Retry strategy hints · Failure-mode taxonomy · Stored in episodic memory · Retrieved next attempt Inner-Monologue Critic Built-in challenge · "Does this look right?" · Confidence calibration · Self-consistency vote · Pre-commit review · Confess uncertainty D · Skill & Recipe Distillation Promote successful patterns into reusable, named, versioned skills Pattern Miner Find recurring success · Cluster successful runs · Extract common steps · Identify pre/post conditions · Tool-sequence subgraphs · Cost / latency profile Voyager-style Skills Self-grown library · Auto-curated repertoire · Compositional reuse · Trigger-condition tags · Refactor on improvement · Personal & org-shared Skill Promotion Validate & publish · Eval gate (offline) · Canary in production · Versioned · signed · Push to skill registry · Auto-deprecate weaker E · Eval Harness — Offline, Online & Capability A continuous quality bar — golden datasets, A/B tests, regressions, capability suites, and red-team evals Golden Datasets Source of truth · Curated Q&A · scenarios · Domain-specific suites · Synthetic + real mix · Edge-case coverage · Adversarial probes · Versioned · maintained Offline Benchmarks Repeatable scoring · Public (MMLU · HELM · BIG) · Agent (SWE-bench · WebArena) · Tool-use evals (BFCL) · Custom domain suites · Cost / latency budgets · Statistical significance Online A/B & Shadow Live experimentation · % traffic split · Shadow / dark-launch · Power analysis · Guardrail metrics · Auto-stop on regression · Holdout cohorts Regression & CI Eval No silent quality drops · Per-PR eval gate · Snapshot diff vs baseline · Per-dimension regression · Win/loss flake guard · Eval flakiness tracker · Weekly trend reports Capability & Red-Team Push limits safely · Dangerous-capability eval · Jailbreak / adversarial · Bias / fairness audits · Privacy / leakage probes · RSP / ASL gating · Scheduled red-team runs Eval Reports Decision-grade artifacts · Dashboards · scorecards · Per-cohort breakdown · Failure-case galleries · Approval evidence packs · Release-readiness signal · Distributed to stakeholders F · Reflection & Improvement Hub — The Closed-Loop Engine Synthesize scored trajectories into prioritized improvements: prompts, data, models, tools, policies Improvement Synthesizer Triage failures · cluster · prioritize · propose intervention · track to completion CLUSTER DIAGNOSE PROPOSE PRIORITIZE SHIP Failure Triage Cluster · taxonomy · root cause Bucket by error type Intervention Picker Prompt · data · model · tool · policy Choose the right lever Backlog & Tracker Prioritized improvement queue Owner · due-date · impact Closure Verification Re-eval after fix lands Confirm metric moved G · Continual Training & Adaptation Turn production trajectories into better prompts, fine-tunes, and reward signals Trace Mining & Curation From logs → datasets · Extract good trajectories · Rejection sampling · De-dup & balance · PII scrub before training · Synthetic augmentation · Consent & opt-in only Prompt Optimization Cheap, fast wins · Auto-prompt search · DSPy compilation · Few-shot exemplar mining · Persona / system tuning · Skill / SKILL.md updates · Rubric-guided rewriting Supervised Fine-Tune SFT on curated data · Instruction tuning · Tool-use distillation · Rejection-sampled SFT · LoRA / QLoRA adapters · Per-tenant adapters · Curriculum staging Preference Optimization Align to human / AI prefs · DPO · IPO · KTO · RLHF · PPO · RLAIF (Constitutional AI) · GRPO · process rewards · Reward-model training · KL-controlled updates Tool / Retrieval Tuning Beyond the model · Embedding fine-tunes · Re-ranker training · Tool-spec rewrites · Chunking strategy tuning · Skill cards optimization · Router / cascade tuning Self-Play / Synthetic Augment scarce data · Self-play trajectories · LLM-generated tasks · Adversarial generation · Verifier-filtered output · Counterfactual replays · Quality watermarking H · Quality Control & Learning Safety Don't make the system worse — guard against drift, forgetting, poisoning, and reward hacking Drift Detection Distribution shift Input · output drift Auto-alert & pause Catastrophic-Forget Guard Don't lose old skills Replay buffer EWC / KL constraints Reward Hacking Probes Specification gaming Multi-judge cross-check Process & outcome both Data-Poisoning Defense Untrusted-trace screening Anomaly filtering Provenance required Eval Gating No-regress release rule Capability + safety pass Rollback on fail Privacy & Consent Train-on-data flags DP / k-anonymity RTBF cascade I · Safe Deployment & Rollout Ship improvements with the same rigor as code — versioned, canaried, monitored, reversible Versioning & Registry Prompts · models · skills SemVer · signed artifacts Provenance ledger Canary Rollout Gradual % traffic Health-gate auto-promote Auto-rollback on regression Feature Flags Per-tenant gates Kill-switch flags Dynamic config Post-Deploy Monitor Live metric watch Quality · cost · safety SLO breach → rollback Change Log & Audit What changed · who · when Eval-evidence linked Reviewer sign-off User Communication Release notes Behavior-change alerts Known-issue digest ⇣ Outbound — Improvements Pushed Back into the Stack Updated artifacts deployed across every layer they affect — closing the agentic improvement loop → Layer 4 · Reasoning Adapters · prompts · model versions → Layer 3 · Orchestrator Plan templates · routing rules → Layer 5 · Memory Lessons · skills · profiles → Layer 6 · Tools / Skills New & revised skill cards → Layer 7 · RAG Embeddings · re-rankers · chunking → Layer 11 · Governance Eval evidence · safety reports Every push is versioned, eval-gated, canaried, and reversible — no silent drift. Cross-cutting Eval, Safety & Lifecycle Cross-cutting Eval, Safety & Lifecycle
Collection / Hub
Evaluation
Reflection
Skill Distillation / Deploy
Eval Harness
Continual Training
Quality / Safety
Forward learning flow
Deployed improvement
Closed-loop / reflection
Detailed view of Layer 10 — Reflection, Evaluation & Continual Learning from the Agentic AI System Architecture reference. Trajectories and feedback flow in from every layer, are scored by a multi-method evaluation engine, fuel mid-run reflection and offline distillation, run through a comprehensive eval harness, and converge on an Improvement Synthesizer that triages failures into prioritized interventions. Continual-training and safe-deployment pipelines push versioned, canaried, eval-gated improvements back across the stack — closing the agentic loop.

Layer 11Safety, Governance, Trust & Observability

Agentic AI System Architecture  ›  Layer 11 Detail

Safety, Governance, Trust & Observability

The cross-cutting control plane that wraps every other layer — guardrails, identity, policy-as-code, compliance, observability, red-teaming, and incident response — making the agent system safe, accountable, and operable in production.

Detailed Diagram  ·  v1.0  ·  2026

⇄ Cross-Cutting Signals — Wraps Layers 1-10 (every request, action, and effect crosses this layer) requests · responses · tool calls · effects · trajectories · feedback · cost · errors · audit events · safety incidents L1 User · L2 Perception · L3 Orchestrator · L4 Reasoning · L5 Memory · L6 Tools · L7 RAG · L8 Multi-Agent · L9 Action · L10 Reflection A · Guardrails — Input, Output & Behavior Filters Defend the model and the user — block harmful inputs and unsafe outputs at every boundary Input Filters First line of defense · Toxicity · hate · violence · CSAM hash matching · Self-harm classifier · Dangerous-cap probe · Schema / size limits · Bidi / homoglyph guard Prompt-Injection Defense Trust-boundary enforcement · Quarantine tool results · No-instruction-follow rule · Spotlighting / delimiters · Hidden-text decoder · Multimodal probe · Confirm sensitive ops PII / DLP Detect & redact · Names · IDs · phones · Cards · accounts · keys · Health / financial data · Tokenization vault · Differential privacy · Egress DLP scan Output Filters Last-mile safety · Refusal classifier · Hallucination probes · Schema-conformant · Watermarking · C2PA · Content tags · Bias / fairness checks Behavioral Guardrails Constrain agent action · Topic allow / deny · Persona & tone constraints · Refusal templates · Off-task detection · Loop / runaway breaker · Action allow-list scope Frameworks Standardized stacks · NeMo Guardrails · LMRails · Llama Guard · Granite · Azure Content Safety · OpenAI Moderation · Custom rules engine · Versioned · A/B-tested B · Identity, Access & Secrets Verify who is acting, what they're allowed to do, and protect every credential along the way Authentication Prove identity · SSO · OAuth · OIDC · SAML · Passkeys · MFA · step-up · Service-account / SPIFFE · Workload identity · Token refresh / revoke Authorization Decide what's allowed · RBAC · ABAC · ReBAC · Scopes · entitlements · Tool allow / deny lists · Tenant / project isolation · Delegated / on-behalf-of Agent Identity First-class principals · Cryptographic agent ID · Capability cards (signed) · Agent provenance ledger · Per-agent token scopes · Sub-agent attenuation Secrets & Keys JIT, scoped, rotated · HashiCorp Vault · KMS / HSM-managed keys · OAuth token exchange · Short-lived tokens · Memory-only injection Network & Boundary Zero-trust transport · mTLS service-to-service · Service mesh (Istio · Linkerd) · Egress proxy & allow-list · Private VPC · sovereign cloud · WAF / DDoS shield Encryption Data confidentiality · At-rest (AES-GCM) · In-transit (TLS 1.3) · In-use (confidential VM) · BYOK / HYOK · Per-tenant key isolation C · Policy-as-Code & Action Gating Encode rules once, enforce them everywhere — versioned, auditable, testable Policy Engine Decision point · Open Policy Agent (OPA) · Cedar · Rego rules · Versioned · signed bundles · Tenant overrides Action Approval Gate risky operations · Auto · HITL · admin · Two-person rule · Step-up auth · Signed approval token Risk Classifier How dangerous is this? · Reversible vs destructive · Blast radius estimate · Data sensitivity tier · Money / identity impact Quotas & Rate Limits Bound consumption · Token / $ caps · Per-tenant quotas · Velocity / spike caps · Fair-share scheduling Policy Authoring & Test Treat policy as code · Code review · CI tests · Canary policy rollout · Counterfactual evaluation · Rollback on regression Decision Log Why allowed / denied · Per-decision evidence · Rule version applied · Counterexample queries · Appeals workflow D · Compliance, Audit & Regulatory Mapping Demonstrate trustworthy operation to regulators, auditors, and customers — with evidence Regulatory Mapping Frameworks & standards · EU AI Act · NIST AI RMF · ISO/IEC 42001 (AI MS) · GDPR · CCPA / CPRA · HIPAA · PCI · SOX · SOC 2 · ISO 27001 Audit Log Tamper-evident record · Hash-chained · signed · Immutable storage (WORM) · Who · what · when · why · Cross-layer correlation · Long-term retention Data Residency & Sovereignty Where data lives matters · EU · US · APAC pinning · Sovereign-cloud routing · Cross-border guards · On-prem deployment · Air-gapped envs DSAR & Subject Rights User data rights · Access · export · portability · Right-to-be-forgotten · Cascade across stores · Self-service portal · Erasure receipts AI-Specific Disclosures Be transparent · Model cards · system cards · Datasheets · data lineage · "Talking to AI" disclosure · Synthetic-content labeling · Risk-tier reporting Evidence & Reporting Audit-ready exports · Auto-generated evidence · Control-mapping matrix · Regulator-ready packs · Drata · Vanta · Secureframe · Customer trust portal E · Trust & Safety Operations Hub Central console where humans monitor, intervene, investigate, and escalate Trust & Safety Console Live dashboards · alerts · approvals · investigations · kill-switches DETECT TRIAGE CONTAIN RECOVER REPORT SOC for AI 24/7 monitoring · on-call Alert routing & escalation Approval / HITL Inbox Pending high-risk actions SLA-driven decisions Kill-Switch Console Per-agent · env · global Operator-only authority Investigation Workbench Replay · search · evidence Forensic timeline F · Observability — Tracing, Metrics, Logs & Cost See everything the agent does — with replay, attribution, and SLO accountability Distributed Tracing End-to-end view · OpenTelemetry spans · LangSmith · Langfuse · Helicone · Phoenix · W&B · Per-step / tool / agent · Conversation graph view · Cross-layer trace_id Metrics & SLOs Quantified health · Latency P50 / P95 / P99 · Success / abandon rate · Tool error rate · Cache hit-rate · SLO & error-budget burn · Prometheus · Datadog Structured Logging Forensic detail · Event-sourced runs · Per-step input / output · Tool I/O recorded · PII-redacted variant · Search · query · alert · Retention policy Cost Observability $ accountability · Tokens · $ per call / run · Per-tenant chargeback · Top-spender attribution · Budget burn dashboards · Cache-hit savings · Cost-anomaly alerts Replay & Time-Travel Reconstruct any run · Recorded I/O · Counterfactual debug · Step-through inspector · Screenshot / DOM cap · Reproducible re-runs · Diff vs golden Anomaly & Alerting Catch issues fast · Drift detection · Tool-error spikes · Refusal-rate jumps · Cost / latency spikes · Auto-page on-call · PagerDuty · OpsGenie G · Red-Team & Capability Gating Stress-test the system before adversaries do — and gate dangerous capabilities responsibly Adversarial Red-Team Find the failure modes · Jailbreak attempts · Prompt-injection probes · Tool-abuse scenarios · Multi-step exploit chains · Continuous + scheduled Capability Evals Measure dangerous skills · CBRN · cyber · autonomy · Persuasion · manipulation · Self-replication probes · Long-horizon planning · Independent evals RSP / ASL Gating Tiered release controls · Responsible Scaling Policy · AI Safety Levels (ASL) · Deployment thresholds · If/then commitments · Public reporting H · Model & Tool Lifecycle · Incident Response Every AI artifact is versioned, monitored, and recoverable — drills keep the team sharp Model / Tool Governance Versioned & controlled · Model registry · cards · Tool allow-list / deny · Canary · rollback · Deprecation policy · Provenance ledger Incident Response When things go wrong · Runbooks & on-call · Containment · isolate agent · User & regulator notify · Root-cause analysis · Blameless post-mortem Drills & Game-Days Practice for crises · Chaos exercises · Tabletop simulations · Kill-switch drill · DSAR rehearsal · Recovery time targets I · Transparency, Explainability & User Trust Help users understand what the agent did and give them meaningful control Decision Explanations Why did it do that? · Reasoning trace UI · Tool-call timeline · Citation panels · Confidence indicators User Controls Stay in charge · Memory opt-in / opt-out · Train-on-data flags · Tool / scope toggles · Cancel · undo · redo Disclosures & Receipts Set expectations · "AI-generated" labels · Action receipts · Limitations notice · Customer trust portal ⇄ Enforcement & Signals to Every Layer Governance is bidirectional — signals collected from layers, enforcement decisions sent back → L3 Orchestrator allow / deny decisions HITL · risk class → L4 Reasoning model allow-list capability flags → L6 Tools tool allow / deny scope & quota → L9 Action approval tokens kill-switch state → L5 / L7 Memory · RAG ACL · DSAR · TTL residency rules → L10 Reflection eval gates release approval ⇣ External Outputs — Stakeholders, Regulators & Public Trust Audit packs · model / system cards · transparency reports · safety disclosures · DSAR fulfillment · breach notifications · customer trust portal Cross-cutting · Wraps All Layers (1–10) Cross-cutting · Wraps All Layers (1–10)
Guardrails
Identity / Transparency
Policy
Compliance
Trust Hub / Observability
Red-Team / Capability
Lifecycle / Incident Response
Forward governance flow
Enforcement / disclosure
Live override / kill-switch
Detailed view of Layer 11 — Safety, Governance, Trust & Observability from the Agentic AI System Architecture reference. This layer is cross-cutting: it wraps Layers 1–10. Signals from every layer flow in; guardrails, identity, policy, compliance, observability, red-teaming, lifecycle, and transparency controls flow out as enforcement decisions, audit evidence, and stakeholder disclosures. The Trust & Safety Hub provides a live console for humans to detect, contain, and recover from incidents — and the kill-switch path lets operators stop the system at any time.

Layer 12Infrastructure & Platform

Agentic AI System Architecture  ›  Layer 12 Detail

Infrastructure & Platform

The substrate beneath every agent — compute, accelerators, model serving, runtimes, storage, networking, deployment topologies, and the SRE / FinOps machinery that keeps it all running reliably, securely, and economically at scale.

Detailed Diagram  ·  v1.0  ·  2026

⇣ Workload Demand — Every Other Layer Runs on This Substrate model inference · agent runs · tool execution · vector / graph queries · multi-agent coordination · evaluation jobs · training L1–L11 workloads · sync / async · batch / stream · long-running · global / regional · per-tenant SLOs A · Compute & Accelerator Fleet A heterogeneous, capacity-managed fleet — right hardware for training, inference, agents, and tools NVIDIA GPUs Workhorse for training & inference · H100 · H200 · B100 · B200 · GB200 NVL72 racks · NVLink · NVSwitch fabric · FP8 / FP16 / BF16 · MIG partitioning · CUDA · cuDNN · NCCL Cloud Accelerators Hyperscaler-native silicon · Google TPU v5p · v6 (Trillium) · AWS Trainium · Inferentia · Azure Maia · Cobalt · AMD MI300X / MI350 · Intel Gaudi 3 · OCI / Lambda / CoreWeave Specialty Accelerators Ultra-low-latency inference · Groq LPU · Cerebras WSE-3 · SambaNova SN40L · Tenstorrent Wormhole · d-Matrix · etched.ai · FPGA / ASIC fast paths CPU & General Compute Agents · tools · orchestration · x86 · ARM (Graviton · Ampere) · High-mem · high-CPU SKUs · Spot / preemptible · Confidential VMs (SEV · TDX) · Burstable instances · Dedicated tenancy Edge & Device On-device inference · Apple Neural Engine · Qualcomm Hexagon NPU · NVIDIA Jetson · Orin · Coral TPU · Hailo · WebGPU / WASM · Quantized SLM models Capacity Management Right-size, right-time · Reservations · commitments · Spot · preemptible mix · Cluster autoscaler · Multi-cloud burst · Per-tenant quotas · Forecast-driven planning B · Model Serving, Training & ML Platform From research to production — high-throughput inference, distributed training, and the MLOps glue around them Inference Servers High throughput · low TTFT · vLLM · SGLang · TensorRT-LLM · TGI · Triton Inference Server · llama.cpp · MLX (edge) · Continuous batching Hosted Model APIs Provider-managed · Anthropic · OpenAI · Google · Mistral · xAI · Bedrock · Vertex AI · Azure OpenAI Service · Together · Fireworks · Replicate Distributed Training Pre-train · fine-tune · DPO · PyTorch · JAX · DeepSpeed · Megatron · NeMo · Axolotl · FSDP · ZeRO · TP / PP / EP · Slurm · Ray · Kubeflow · Checkpoint · resume Compiler & Kernels Squeeze every flop · FlashAttention 3 · FA-decoder · Triton · CUDA · ROCm · torch.compile · XLA · TVM · Mojo · IREE · FP8 / INT4 GEMM Model Registry & MLOps Lifecycle of every artifact · MLflow · W&B · Comet · Hugging Face Hub · Versioning · provenance · Approval · canary · rollback · Signed artifacts (SLSA) Optimization & Deploy From weights to traffic · Quantize · prune · distill · AWQ · GPTQ · SmoothQuant · Speculative decoding · Multi-tenant serving · Cold-start optimization C · Agent & Workflow Runtimes Stateful execution engines that drive long-running, resumable agent loops Agent Frameworks Build & run agents · Anthropic Agent SDK · LangGraph · LangChain · CrewAI · AutoGen · Magentic · LlamaIndex · Haystack Durable Execution Replay-safe orchestration · Temporal · Cadence · Restate · Inngest · DBOS · Trigger.dev · Long-running runs Distributed Compute Map-reduce / actors · Ray · Ray Serve · Spark · Dask · Modal · Akka · Erlang OTP · Dapr · Restate Sandbox & Tool Runtime Per-action isolation · Firecracker · Kata · gVisor · WASM · WASI runtimes · E2B · Daytona · CodeSandbox · Browser-tab VMs MCP Hosting Tool-server platform · Local stdio servers · Remote SSE / WebSocket · Multi-tenant gateway · Capability negotiation Schedulers / Queues Async & cron · Celery · Sidekiq · BullMQ · Argo Workflows · Airflow · Kubernetes Jobs · CronJobs · Priority & fair-share D · Container, Orchestration & Cluster Platform The unified runtime — schedule, isolate, scale, and recover every workload Kubernetes & Container Platform Schedules pods · GPU operator · autoscaling · service discovery · secrets · multi-tenant namespaces SCHEDULE SCALE ISOLATE HEAL UPGRADE Container Runtimes containerd · CRI-O · Docker OCI images · BuildKit · Buildpacks GPU / Accelerator Operators NVIDIA GPU operator · device plugins Topology-aware · MIG · time-slicing Autoscaling HPA · VPA · KEDA · Karpenter Predictive · queue-driven scale Multi-Tenant Isolation Namespaces · NetworkPolicy · OPA vCluster · gVisor / Kata sandboxing E · Storage & Data Platform Polyglot persistence — choose the right database for each agent workload Object & Block Bulk & durable · S3 · GCS · Azure Blob · R2 · EBS · Persistent Disk · MinIO · Ceph (on-prem) · Lifecycle · tiering · glacier · Object lock · WORM Vector Stores Semantic search · pgvector · Pinecone · Weaviate · Qdrant · Milvus · Vespa · Turbopuffer · LanceDB · ChromaDB · HNSW · IVF · DiskANN Knowledge Graphs Relations & paths · Neo4j · ArangoDB · Memgraph · NebulaGraph · TigerGraph · Amazon Neptune · RDF · SPARQL stores · Property + temporal edges OLTP / OLAP Transactional & analytical · Postgres · MySQL · Aurora · Spanner · CockroachDB · Snowflake · BigQuery · Databricks · Iceberg · DuckDB · ClickHouse KV / Cache / Doc Hot & flexible · Redis · KeyDB · Dragonfly · DynamoDB · Cosmos · Bigtable · MongoDB · Couchbase · Firestore · Elastic · OpenSearch · Memcached · Hazelcast Time-Series & Stream Append-only timelines · TimescaleDB · InfluxDB · QuestDB · Prometheus TSDB · Event-sourced state · CDC streams (Debezium) · Replay-able trajectories F · Networking, Messaging & Edge Move bytes safely and quickly — between users, services, agents, and tools Edge & CDN Closer to users · Cloudflare · Fastly · Akamai · AWS CloudFront · GCP CDN · Edge functions / Workers · WAF · DDoS shield · Bot / abuse detection Load Balancing & Ingress Route & balance · L4 / L7 LBs · Envoy · NGINX · HAProxy · Traefik · K8s Ingress / Gateway API · Sticky sessions · health · Global anycast Service Mesh Zero-trust east-west · Istio · Linkerd · Cilium · mTLS service-to-service · Retries · timeouts · circuit · Traffic shifting · canary · eBPF data plane RPC & Streaming Inter-service calls · gRPC · Connect · Twirp · REST / OpenAPI · GraphQL Federation · Server-sent events (SSE) · WebSocket · WebRTC Event Bus / Queueing Async & pub/sub · Kafka · Redpanda · NATS · Pulsar · RabbitMQ · AWS SQS / SNS / EventBridge · GCP Pub/Sub · Azure Service Bus · DLQ · ordered delivery High-Perf Fabrics Training / inference net · InfiniBand · NDR / XDR · RDMA · RoCE · NVLink · NVSwitch · UCX · NCCL · libfabric · Topology-aware routing G · Identity, Secrets & Platform Security Workload identity, secrets, supply-chain integrity, and confidential compute Workload Identity Service-to-service auth · SPIFFE / SPIRE · Cloud IAM (IRSA · WIF) · OIDC trust federation · Per-pod / per-agent identity · Short-lived certs Secrets & Key Management Centralized · rotated · HashiCorp Vault · Infisical · AWS / GCP / Azure KMS · HSM · CloudHSM · External Secrets Operator · Just-in-time injection Supply-Chain & Confidential Trust the binaries · SBOM · SLSA levels · Sigstore · cosign signing · Image scanning (Trivy · Snyk) · Confidential VMs (SEV · TDX) · TEE · attestation H · Deployment, CI/CD & Infrastructure-as-Code Reproducible, auditable, GitOps-driven delivery for every artifact in the stack CI/CD Pipelines Build · test · deploy · GitHub Actions · GitLab CI · Buildkite · CircleCI · Jenkins · Eval gates · safety gates · Reproducible builds · Promotion across envs GitOps & Continuous Delivery Declarative · auditable · Argo CD · Flux · Helm · Kustomize · Progressive delivery (Argo Rollouts) · Canary · blue-green · feature flags · Auto-rollback on regression Infrastructure-as-Code Codify the stack · Terraform · OpenTofu · Pulumi · Crossplane · CDK · Bicep · ARM · Policy as code (OPA · Sentinel) · Drift detection · plan reviews I · Deployment Topologies · Observability · SRE & FinOps Where the stack runs, how to keep it up, and how to keep it affordable Deployment Topologies Where the stack lives · Public cloud (AWS · GCP · Azure) · On-prem · co-location · Hybrid · private cloud · Edge · device · air-gapped · Sovereign cloud Multi-Region & HA Resilience & locality · Active / active · A/P · Cross-region replication · Failover & DR drills · Backup · point-in-time restore · Data-residency routing Observability Stack Metrics · logs · traces · OpenTelemetry collectors · Prometheus · Grafana · Loki · Datadog · New Relic · Honeycomb · AI-specific (Langfuse · LangSmith) · Profiling (pprof · Pyroscope) SRE & Reliability Run it like production · SLO / SLI / error budgets · On-call · runbooks · Chaos engineering · Post-mortems & lessons · PagerDuty · OpsGenie FinOps & Cost Control $ accountability · Token / GPU / $ meters · Per-tenant chargeback · Reserved + spot blending · Anomaly & budget alerts · Rightsizing recommendations Sustainability Energy & carbon-aware · Carbon-aware scheduling · PUE / WUE tracking · Renewable-region routing · Per-token energy meters · Sustainability reporting ⇣ Platform Outputs — Capacity, SLOs, Cost & Compliance Evidence SLO dashboards · capacity forecasts · cost & carbon reports · compliance evidence · DR & failover posture · supply-chain attestations Cross-cutting Reliability, Cost & Sustainability Cross-cutting Reliability, Cost & Sustainability
Compute & Accelerators
Serving / Identity
Runtimes / Deploy
Containers / K8s
Storage
Networking
SRE / FinOps / Sustainability
Stack dependency
Platform reports
Auto-scale / FinOps loop
Detailed view of Layer 12 — Infrastructure & Platform from the Agentic AI System Architecture reference. Workloads from L1–L11 land on a heterogeneous compute fleet, are served via inference engines and agent runtimes, scheduled on Kubernetes, backed by polyglot storage, and connected through service-mesh networking. Identity, secrets, supply-chain integrity, deployment automation, and SRE / FinOps / sustainability practices keep the substrate trustworthy, available, and economical at scale.