Jump to
Overview
Orchestration
Reasoning
Memory
Tools
RAG
Safety
Linh Truong
·
MA (Harvard), MBA
·
LinhTruong.com
·
Linh@Alumni.Harvard.edu
Agentic AI System Architecture
I designed this reference architecture to map the full structural anatomy of autonomous, tool-using, multi-agent AI systems — from the user & interaction boundary through perception, orchestration, reasoning, memory, tools, knowledge retrieval, multi-agent collaboration, action, reflection, safety & governance, and infrastructure. Twelve layers, each with its own detailed diagram.
Overview Master Architecture Diagram
Agentic AI System Architecture
A reference architecture for autonomous, tool-using, multi-agent AI systems — perception, reasoning, memory, action, reflection, and governance.
Research Paper Diagram · Updated 2026 · v1.0
1 · User & Interaction Layer
Human users, applications, and channels that issue goals and receive results
Human User
Goals · Preferences · Feedback
Chat / Voice / Multimodal UI
Text · Speech · Images · Video
IDE / CLI / SDK
Claude Code · API · Agent SDK
Application Channels
Web · Mobile · Email · Slack
Autonomous Triggers
Cron · Webhooks · Events
Other Agents (A2A / MCP-Client)
Inter-agent requests & delegations
2 · Perception & Input Processing
Normalize, parse, and ground incoming signals into a structured task representation
Intent & Goal Extraction
Multimodal Encoders (V/A/T)
Context Assembly & Grounding
Prompt Compilation & Caching
Input Guardrails / PII Scrub
Session & Identity Context
3 · Orchestration, Planning & Control
Decompose goals into plans, route work to agents/tools, and manage the agent loop
Agent Orchestrator
ReAct · Plan-and-Execute · Tree/Graph-of-Thought
State Machine · Loop Control · Budget & Timeouts
LangGraph / Agent SDK / custom controllers
Task Decomposer
HTN · Hierarchical Plans
Planner / Re-planner
CoT · Self-Ask · ToT
Router / Dispatcher
Skill & agent selection
Policy Engine
Permissions · Action gating
Scheduler & Queue
Async · Priorities · Retries
Concurrency Manager
Parallel sub-agents · Forks
Cost / Token / Latency Budgeter
Per-task budgets · Stop conditions
Human-in-the-Loop Gateway
Approvals · Clarifications · Overrides
4 · Reasoning Core — Foundation Models & Cognition
The cognitive engine: LLMs/LMMs with extended thinking, tool-use, and structured output
Foundation Model(s)
Claude Opus 4.7 · Sonnet 4.6 · Haiku 4.5
GPT · Gemini · Llama · Mistral · Qwen
Routed by task complexity & cost
SLMs for tools / classification
Extended Thinking
Reasoning tokens · Scratchpad
Self-Reflection / Critic
Reflexion · Self-Refine · Debate
Tool-Use / Function Calling
Structured args · Parallel calls
Structured Output
JSON schema · Pydantic · Grammars
Multimodal Reasoning
Vision · Audio · Code · Docs
In-Context Learning
Few-shot · Skills · Examples
Adaptation Layer
Fine-tune · LoRA · DPO · RLHF · RLAIF
Inference Controls
Sampling · Constrained decoding · Prompt caching
5 · Memory Subsystem
Multi-tier memory enabling continuity, learning, and personalization
Working / Context Memory
Live conversation buffer
Compaction · Summarization
Episodic Memory
Past sessions & trajectories
Time-stamped events
Semantic Memory
Facts · Entities · Concepts
User profile · Project memory
Procedural Memory
Skills · Workflows · Recipes
Learned tool sequences
Vector / Embedding Store
pgvector · Pinecone · Weaviate
Hybrid & semantic search
Knowledge Graph
Entities · Relations · Provenance
Neo4j · RDF · GraphRAG
Memory Manager
Read · Write · Update · Forget · Consolidate · Re-rank · Privacy & TTL · Conflict resolution
6 · Tools, Skills & Capabilities
Composable actions the agent can invoke through standardized interfaces
Web Browsing
Search · Fetch
Computer use
Code Execution
Sandboxed runtime
Bash / Python / JS
File & Repo Ops
Read · Write · Diff
Git · FS · S3
External APIs
REST · GraphQL
SaaS · Webhooks
Databases
SQL · NoSQL
Warehouses
Communication
Email · Slack
Calendar · Meet
Workflow Tools
CI/CD · Jira
Notion · Linear
Domain Models
Vision · ASR · TTS
Specialist SLMs
Tool Gateway · MCP Servers · Skill Registry
Schema validation · Auth · Rate-limits · Idempotency · Caching · Retries · Sandboxing
7 · Knowledge & Retrieval (RAG)
Grounding the agent in fresh, verifiable knowledge from internal & external sources
Retrievers
BM25 · Dense · Hybrid
Multi-query · Fusion
Re-rankers & Filters
Cross-encoder · LLM-rerank
Recency · ACL filters
Advanced RAG
GraphRAG · HyDE · Self-RAG
Agentic / Corrective RAG
Document Pipelines
Parse · Chunk · Embed
OCR · Layout · Tables
Knowledge Sources
Wiki · Docs · Tickets
Code · Web · Live data
Citation & Provenance
Inline citations
Source attribution
8 · Multi-Agent Collaboration
Specialized agents cooperating, debating, and verifying each other's work
Researcher
Search · Read
Synthesize
Coder
Edit · Run · Test
Debug · Refactor
Critic / Reviewer
Verify · Score
Red-team
Domain Experts
Legal · Medical
Finance · DevOps
Coordination Patterns
Supervisor · Hierarchical · Swarm · Debate · Blackboard
CrewAI · AutoGen · LangGraph · Magentic
Inter-Agent Protocols
A2A · MCP · ACP · Shared scratchpad
Message bus · Contract net · Voting
9 · Action & Environment Interface
Where agents take real-world effects — through digital and physical environments
Computer Use
GUI control · Screen + keyboard
Browser Agents
DOM · Forms · Navigation
Code Sandboxes
Containers · VMs · Firecracker
Enterprise Systems
CRM · ERP · ITSM · Data lake
Physical / IoT
Robotics · Sensors · Actuators
Output Channels & Side-Effect Bus
Notifications · Commits · Tickets · Reports
10 · Reflection, Evaluation & Continual Learning
Closed-loop self-improvement — evaluate trajectories, learn skills, refine prompts & models
Trajectory Evaluator
LLM-as-Judge · Rubrics
Pass/fail · Quality scores
Reward / Verifier
Tests · Constraints · Goals
Process & outcome rewards
Self-Reflection Loop
Reflexion · Self-Refine
Lessons & corrections
Skill / Recipe Distiller
Voyager-style libraries
Reusable workflows
Eval Harness
Benchmarks · Regression
Online & offline eval
Continual Training
SFT · DPO · RLAIF
Prompt & tool tuning
11 · Safety, Governance, Trust & Observability
Cross-cutting controls — guardrails, policy, monitoring, security, and compliance
Input/Output Guardrails
Toxicity · Jailbreak · Schema
Prompt-Injection Defense
Trust boundaries · Confirmation
PII / DLP
Redaction · Tokenization
AuthN / AuthZ
OAuth · RBAC · Scoped tokens
Action Approval
HITL · Risky-action gating
Compliance & Audit
SOC 2 · GDPR · HIPAA · EU AI Act
Observability & Tracing
OpenTelemetry · LangSmith · Langfuse · Helicone
Cost & Performance Monitoring
Tokens · Latency · Tool errors · SLOs
Red-Teaming & Safety Evals
Adversarial probes · Capability gating
Model & Tool Governance
Versioning · Allow-lists · Kill-switches · Explainability
12 · Infrastructure & Platform
The substrate — compute, serving, storage, and networking that make agents run reliably at scale
Model Serving
vLLM · TGI · TensorRT-LLM · SGLang
Compute
GPU · TPU · Inference accelerators
Agent Runtimes
LangGraph · Agent SDK · CrewAI
Container & Sandbox Layer
Docker · Kubernetes · Firecracker
Storage
Object · Vector · Graph · OLTP/OLAP
Event Bus & Networking
Kafka · Pub/Sub · gRPC · Service mesh
Secrets · Identity · Key Management
Vault · KMS · OAuth providers · Workload identity
Deployment Topologies
Cloud · On-prem · Hybrid · Edge · Multi-region failover
Cross-cutting Governance, Safety & Observability
Cross-cutting Governance, Safety & Observability
User & Interaction
Perception & Orchestration
Reasoning Core / Reflection
Memory
Tools & Capabilities
Knowledge / Multi-Agent
Action / Infrastructure
Safety & Governance
Forward data flow
Feedback / learning
Reference architecture for the research paper “Agentic AI System Architecture” .
Layers are conceptual — concrete deployments may merge, split, or substitute components.
Layer 1 User & Interaction Layer
Agentic AI System Architecture › Layer 1 Detail
User & Interaction Layer
The boundary between humans, applications, and other agents and the agentic system — channels, modalities, sessions, identity, presentation, and the contract that hands a well-formed request to the Perception layer.
Detailed Diagram · v1.0 · 2026
A · Initiators — Who or What Issues a Request
Humans, applications, autonomous schedules, and other agents — every interaction begins here
End User
Consumer of agent outcomes
· Goals, preferences, feedback
· Approvals & clarifications
· Implicit signals (clicks, dwell)
Power User / Operator
Configures & supervises agents
· Skill / tool authoring
· Prompt / persona tuning
· Slash commands · CLAUDE.md
Developer / Builder
Integrates the agent into systems
· SDK / API consumers
· Hooks · MCP servers
· Custom UIs & workflows
Admin / Governance
Sets policy & entitlements
· RBAC / ABAC roles
· Quotas · Allow-lists
· Audit & compliance review
Automation / System
Non-human triggers
· Cron · Schedulers
· Webhooks · Event bus
· Sensors / IoT triggers
Other Agents
Inter-agent delegation
· A2A protocol
· MCP-client agents
· Sub-agent callbacks
B · Channels & Surfaces — Where Interaction Happens
Concrete touchpoints that capture intent and render output across human, developer, app, and machine surfaces
Conversational UIs
Synchronous & streaming
Web chat · Mobile chat
In-product copilot panels
Inline assist (autocomplete)
Threaded long-running runs
Artifact & canvas surfaces
Voice & Telephony
Real-time speech I/O
Smart speakers · Phone bots
Streaming ASR + TTS
Barge-in · VAD · diarization
SIP / WebRTC bridges
Multi-language detection
Developer Surfaces
Programmatic & tool-driven
CLI (Claude Code, custom)
IDE plugins (VS Code, JetBrains)
SDKs · REST · gRPC · WebSocket
Notebook / REPL · Terminal
Slash commands · /skills
Embedded App Channels
Asynchronous workflows
Email inboxes · SMS
Slack · Teams · Discord
CRM / ITSM in-app widgets
Document & sheet sidebars
Browser extensions
Autonomous Triggers
No human in the request path
Cron / schedules
Webhooks · Event topics
File / DB change feeds
Alert / threshold triggers
Loop / self-paced runs
Agent ↔ Agent
Federated invocation
A2A protocol
MCP client requests
RPC · message bus
Capability discovery
Signed handoffs
C · Input Modalities & Capture
Each surface produces typed signals that the layer normalizes into a unified request envelope
Text
Chat · Email · Markdown
Voice / Audio
Mic stream · Audio files
Image / Vision
Photos · Screenshots · OCR
Video / Screen
Capture · Screencast · Frames
Documents / Files
PDF · DOCX · Spreadsheets
Code / Diffs
Repo · Patches · Snippets
Structured Data
JSON · CSV · Forms · Schemas
Sensor / Telemetry
IoT · Logs · Metrics · Geo
D · Interaction Patterns & UX Affordances
How users steer, supervise, and recover during long-running, tool-using agent runs
Streaming & Stop
Token stream · Cancel · Pause
Approvals & HITL
Risky-action confirmations
Clarifying Questions
Slot-fill · Disambiguation
Plan / Step Preview
Plan mode · Diff before write
Feedback Capture
👍 / 👎 · Comments · Ratings
Citations & Trace UI
Sources · Tool-call timeline
Undo / Rollback
Compensating actions
Personalization
Themes · Locale · A11y
E · Identity, Session & Context Management
Stable identity per actor, durable conversation state, and context that travels with every request
Authentication
SSO · OAuth · OIDC · SAML
Passkeys · MFA · API keys
Service-account / workload ID
Token refresh & revocation
Authorization
RBAC · ABAC · scopes
Tool / skill entitlements
Tenant & project isolation
Delegated & on-behalf-of
Session State
Conversation thread & turns
Resumable runs · checkpoints
Attached files & artifacts
Multi-device continuity
User & Org Context
Profile · preferences · locale
Org / workspace · project
Memory references · CLAUDE.md
Persona & tone bindings
Device & Environment
UA · OS · IDE · viewport
Network class · time zone
Geo · accessibility settings
Capability flags · feature gates
Consent & Privacy
Data-use scopes · ToS
Memory opt-in / opt-out
Recording & training flags
Data residency policy
F · Edge & API Gateway — Reliability and Safety on the Wire
All channels converge through a hardened gateway before requests reach Perception
TLS / Edge Termination
CDN · WAF · DDoS shield
mTLS for service callers
Bot / abuse detection
Geo & IP policy
Protocol Adapters
REST · GraphQL · gRPC
WebSocket / SSE streams
Webhook receiver
Email / SMS bridge
Rate & Quota
Per-user / org / token quotas
Concurrency caps
Burst smoothing · backoff
Fair scheduling
Idempotency & Retry
Idempotency-Key header
Request de-dup window
Replay protection (nonce)
At-least-once delivery
Schema Validation
OpenAPI / JSON Schema
Size / type / depth limits
MIME & encoding checks
Versioning & compatibility
Trust Boundaries
User-vs-tool-vs-content tagging
Prompt-injection pre-filter
Origin / referer enforcement
Data-classification labels
G · Unified Request Envelope
The contract handed to the Perception layer — one shape for every channel
Request Envelope (canonical)
Identity
· principal · tenant · org
· auth_method · scopes
· consent_flags
Session
· thread_id · turn_id · run_id
· resume_token · checkpoint
· trace_id (OTel)
Channel
· surface · device · locale · tz
Intent & Content
· goal / message · attachments
· modality · MIME · size
· references (doc, repo, URL)
Controls
· model preference · tools allow-list
· budget (tokens, time, $)
· stream · response_format
Policy
· data_class · retention · region
H · Output, Rendering & Delivery
How agent results are returned, rendered, and made interactive on each surface
Streaming Renderer
Token / event stream
Markdown · code · math
Live tool-call updates
Rich Artifacts
Canvas · diagrams · charts
Tables · interactive HTML
Generated files (PDF, XLSX)
Voice / Audio Out
Streaming TTS
Voice persona
Captions / transcripts
Interactive UI Cards
Buttons · forms · pickers
Slack blocks / Adaptive Cards
Confirm / approve / cancel
Citations & Provenance
Inline source links
Tool-call timeline
Confidence & caveats
Notifications
Push · email · SMS
Run-completed events
Digest summaries
Output Guardrails & Compliance
PII redaction · safety filters
Watermarking · content tags
Schema-conformant responses
Accessibility & i18n
WCAG · screen-reader semantics
RTL · locale formatting
Translation & transliteration
I · Cross-Cutting — Safety, Telemetry & Feedback Loops
Always-on concerns that wrap every interaction in this layer
Input Guardrails
Toxicity · jailbreak · injection
PII / DLP Pre-filter
Detect · redact · tokenize
Abuse & Bot Defense
CAPTCHA · velocity · anomaly
Telemetry & Tracing
OTel spans · structured logs
Analytics & A/B
Funnels · retention · experiments
Audit Log
Immutable, signed events
Feedback & Signals → Memory / Eval
👍 / 👎 · edits · regenerate · session ratings · escalations
Incident & Recovery Hooks
Kill-switch · graceful degrade · fallback model · status page
Compliance & Residency
GDPR · CCPA · HIPAA · SOC 2 · EU AI Act · regional routing
J · Handoff to Layer 2 · Perception & Input Processing
The Interaction Layer's output: a validated, classified, traceable envelope ready for grounding
Validated Request
Schema-checked envelope
Identity & scopes attached
Trust Labels
user · tool · external content
Data classification tags
Trace Context
trace_id · span · baggage
SLO & budget hints
Attached Context
Files · history · references
Memory / project pointers
Output Contract
Response shape · streaming
Tool / channel callbacks
Policy Hints
HITL · risk class
Region · retention
Cross-cutting Safety, Identity & Telemetry
Cross-cutting Safety, Identity & Telemetry
Initiators / Identity
Channels / Handoff
Modalities / UX
Session & Context
Edge / Gateway
Output / Presentation
Safety / Governance
Inbound request
Outbound delivery
Feedback signal
Detailed view of Layer 1 — User & Interaction Layer from the Agentic AI System Architecture reference.
All channels are normalized into a canonical request envelope and handed off to Layer 2 (Perception). Outputs flow back through the same surfaces with streaming, citations, and policy-aware rendering.
Layer 2 Perception & Input Processing
Agentic AI System Architecture › Layer 2 Detail
Perception & Input Processing
Transform the validated request envelope from Layer 1 into a grounded, structured task representation — parsing modalities, extracting intent and entities, assembling context, enforcing safety, and compiling the prompt that Layer 3 will plan against.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Validated Request Envelope from Layer 1 (User & Interaction)
Identity · Session · Channel · Intent · Controls · Policy hints · Trust labels · Trace context · Attached context
principal · scopes · tenant
thread_id · run_id · trace_id
message · attachments · MIME
model pref · tools allow-list
budget · stream · format
data class · retention · region · trust labels
A · Ingestion & Normalization
Demultiplex incoming payloads, normalize encodings, sanitize, and enforce size/shape limits
Payload Demuxer
Split by part / modality
· Multipart / form-data
· JSON message blocks
· File attachments
· Inline URIs & data: URLs
Encoding & Charset Norm.
Stable canonical form
· UTF-8 NFC normalization
· Newline / whitespace fix
· Strip control / zero-width
· Bidi & homoglyph guard
Sanitization
Reduce attack surface
· HTML / Markdown clean
· Script / event handler strip
· File type sniff & verify
· Anti-virus / malware scan
Limits & Quotas
Bound work and cost
· Max tokens / chars
· Max files / total size
· Max audio / video duration
· Per-tenant byte quotas
Language & Locale
Detect & route correctly
· Language ID (per segment)
· Script / dialect detection
· Locale formatting hints
· Optional MT pre-translation
Caching & Dedup
Avoid re-processing
· Content-hash cache
· Idempotent re-entry
· Embedding / parse reuse
· CDN-cached artifacts
B · Multimodal Encoders & Parsers
Convert each modality into structured tokens, embeddings, and document trees the reasoner can consume
Text Pipeline
Tokens · structure · meta
· Sentence / paragraph split
· Tokenization (BPE / SP)
· Markdown / HTML AST
· Code-block tagging
· Math / LaTeX detection
· Embeddings (BGE / E5)
· Token-count budget
Vision Pipeline
Images · screenshots · UI
· Decode · resize · color norm
· EXIF / orientation strip
· OCR (Tesseract / docTR)
· Object & layout detection
· Captioning / VQA model
· CLIP / SigLIP embeddings
· NSFW / safety classifier
Audio / Speech Pipeline
Voice · music · environment
· Resample · denoise · VAD
· ASR (Whisper / streaming)
· Speaker diarization & ID
· Language / dialect detect
· Prosody & emotion cues
· Audio embeddings
· Transcript timestamps
Video / Screen Pipeline
Frames · scenes · UI graphs
· Demux + transcode
· Keyframe / shot detection
· Frame sampling strategy
· Action / event detection
· Audio track → ASR
· Screen DOM / a11y tree
· Temporal embeddings
Document Pipeline
PDF · DOCX · XLSX · slides
· Layout-aware parsing
· Heading / section tree
· Table extraction
· Figure & chart capture
· Footnote / citation linkage
· Form-field extraction
· Chunking + embeddings
Code · Structured · Sensor
Programmatic inputs
· Tree-sitter AST parse
· LSP symbols / refs
· Diff / hunk extraction
· JSON · CSV · schema infer
· Time-series resample
· Geo / spatial indexing
· Unit / dimension normalize
C · Language Understanding & Intent
Convert raw signals into a structured task — what the user wants and what's needed to act
Intent Classifier
Task type · domain · urgency
Multi-label · confidence scores
Entity / Slot Extraction
NER · dates · amounts · IDs
Pydantic / JSON-schema slots
Coreference & Anaphora
"it" · "that PR" · "the file"
Mention → entity linking
Goal Decomposition
Top-level objective
Sub-goals · constraints · DoD
Disambiguation
Ambiguity detector
Triggers HITL clarification
Sentiment / Tone
Frustration · urgency
Style hints for response
Task Schema (structured representation)
objective · constraints · slots · entities · success criteria · risk class · suggested skills
D · Grounding & Reference Resolution
Bind language to real-world entities, files, repos, and prior context
Entity Linking
KG · directory · Wikidata
Org-internal canonical IDs
Resource Resolution
URLs · file paths · repos
PR / ticket / doc IDs
Time & Date Norm.
Relative → absolute
TZ-aware ISO-8601
Geospatial Grounding
Geocoding · POI lookup
User-locale defaults
Quantity / Unit Norm.
Currency · SI units
FX-rate & precision rules
Cross-Modal Align
Caption ↔ region
Transcript ↔ frame
Grounded Reference Graph
Mentions · entities · resources · times · places — emitted with provenance & confidence
E · Context Assembly & Retrieval
Pull just-enough context from memory, knowledge, and session — pack within budget, with provenance
Session History Selector
Recent turns · pinned items
· Salience scoring
· Compaction summaries
· Tool-call traces
· Run checkpoints
· Conversation graph
Memory Reader
Episodic · semantic · procedural
· User profile · preferences
· Project memory · CLAUDE.md
· Learned skills / recipes
· Past trajectories
· Privacy & TTL filtering
Knowledge Retrieval (RAG)
Hybrid search across stores
· BM25 + dense fusion
· Multi-query expansion / HyDE
· KG / GraphRAG hops
· ACL-aware filtering
· Recency & freshness boost
Re-rank & Compress
Pick the highest-value tokens
· Cross-encoder reranker
· LLM-based reranker
· Extractive snippeting
· Map-reduce summarization
· Diversity / dedup (MMR)
Tool / Capability Hints
Which skills are likely
· Skill / tool retriever
· MCP server discovery
· Few-shot example pull
· Schema / signature attach
· Cost & latency profile
Context Budgeter
Token / latency / $ caps
· Per-section quotas
· Lossy vs lossless drop
· Cache-aware ordering
· Prompt-cache key plan
· Overflow → tool offload
F · Safety, Trust & Privacy Filters
Defend the reasoner from hostile or unsafe inputs and protect user data before context leaves this layer
Prompt-Injection Detection
Quarantine untrusted text
· Heuristic + classifier
· Embedded-instruction scan
· Tool-result wrapping
· Spotlighting / delimiters
PII / DLP Scrubber
Detect, redact, tokenize
· Names · IDs · phones
· Cards · accounts · keys
· Health / financial data
· Reversible vault tokens
Content Safety
Block harmful inputs early
· Toxicity · hate · violence
· CSAM & abuse hashing
· Dangerous-capability cues
· Policy lookup & routing
Trust-Boundary Tagger
Provenance per token block
· user · system · tool
· retrieved content (untrusted)
· Per-source confidence
· ACL / sensitivity labels
Adversarial Defense
Resist obfuscated attacks
· Hidden / steganographic text
· Image / OCR injections
· Audio whisper attacks
· Encoded payload decoder
Consent & Residency
Honor user / tenant policy
· Train-on-data flags
· Region pinning
· Retention TTL
· Right-to-be-forgotten
G · Prompt Compilation & Caching
Assemble the final messages: layered, schema-aware, cache-friendly, and provenance-preserving
Template Engine
Layered system / persona
Skill prompts · few-shot
Per-tenant overrides
Tool / Schema Binder
JSON-schema · grammars
Function signatures
Argument hints & types
Cache-Key Planner
Stable prefix layout
cache_control breakpoints
TTL · invalidation rules
Multimodal Packer
Interleave text · img · audio
Captions for non-text blocks
Inline vs reference attach
Token Budgeter / Truncator
Section-aware truncation
Lossy summary fallback
Reserve for completion
Provenance Annotator
Source IDs per snippet
Trust labels carried
Citation hooks
H · Routing Hints & Quality Signals
Annotate the task with hints the Orchestrator can use to choose models, agents, and policies
Complexity Estimator
Easy / standard / hard
Reasoning depth hint
Multi-step likelihood
Risk & Sensitivity Class
Reversibility · scope
Regulated-data flag
HITL recommendation
Model Routing Hint
Haiku / Sonnet / Opus
Specialist vs generalist
Cost / latency target
Confidence Scoring
Per slot / entity
Calibrated thresholds
Trigger clarification
Locale & Persona Hint
Output language
Tone / formality
Domain persona
SLA & Budget Hints
Latency target
Token / $ ceiling
Stop conditions
I · Observability, Telemetry & Feedback
Every step emits traces, metrics, and signals consumed by Layer 11 (Governance) and the Reflection loop
OTel spans · per-stage
Latency / token / cost meters
Classifier confidence logs
Drift / anomaly detection
Audit log · signed
Eval & Reflection feedback
⇣ Handoff — Structured Task Bundle to Layer 3 · Orchestration & Planning
Compiled prompt · tool catalog · task schema · grounded references · routing & risk hints · context budget · trace / provenance
objective & sub-goals
grounded entities
retrieved context (provenance)
candidate tools / skills
model / risk / SLA hints
trust-tagged compiled prompt + cache plan
Ingestion / Handoff
Encoders / Compilation
Understanding / Routing
Grounding
Context Assembly
Safety & Privacy
Observability
Forward flow
Clarification back to user
Feedback / drift signal
Detailed view of Layer 2 — Perception & Input Processing from the Agentic AI System Architecture reference.
Inputs flow top-down from Layer 1's request envelope through ingestion, multimodal encoding, language understanding, grounding, context assembly, safety filtering, prompt compilation, and routing-hint generation, before being handed off as a structured task bundle to Layer 3 (Orchestration & Planning).
Layer 3 Orchestration, Planning & Control
Agentic AI System Architecture › Layer 3 Detail
Orchestration, Planning & Control
The control plane of the agent — turns the structured task into an executable plan, routes work to models, tools, and sub-agents, manages state and concurrency, enforces budgets and policy, and drives the agent loop until the goal is met or escalated.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Structured Task Bundle from Layer 2 (Perception & Input Processing)
objective · sub-goals · grounded entities · retrieved context · candidate tools · risk & SLA hints · trust-tagged compiled prompt + cache plan
Task Schema · Reference Graph · Tool Catalog · Routing Hints · Policy Constraints · Trace Context · Budget Envelope
A · Plan Generation & Decomposition
Translate the goal into a structured, executable plan — hierarchical, costed, and revisable
Goal Reasoner
Analyze objective & constraints
· Definition of Done
· Acceptance criteria
· Hard / soft constraints
· Implicit assumptions
Hierarchical Decomposer
Goal → tasks → steps
· HTN-style decomposition
· Dependency DAG
· Parallel vs serial annotate
· Per-step success checks
Plan Synthesizer
LLM-drafted & validated plan
· Schema-constrained output
· Tool / agent assignment
· Pre/post conditions
· Plan Mode preview to user
Plan Critic / Verifier
Sanity-check before execute
· Self-critique pass
· Policy / risk lookup
· Cost & latency estimate
· Counterfactual / what-if
Re-planner
Adapt plan during execution
· On error / observation
· Belief revision
· Partial-plan repair
· Backtrack / abandon
Plan Repository
Reusable workflow library
· Skill / recipe registry
· Versioned templates
· Distilled from past runs
· Org-shared playbooks
B · Reasoning & Control Strategies
Strategy library the orchestrator selects from based on task class, risk, and budget
ReAct
Thought → Act → Observe
Interleaved reasoning
Best for tool-use loops
Plan-and-Execute
Plan once, execute steps
Re-plan on failure
Predictable for long jobs
Tree / Graph of Thought
Branching exploration
Beam / MCTS · scoring
Hard reasoning problems
Reflexion / Self-Refine
Critic + retry loop
Lessons captured per run
Quality-sensitive tasks
Debate / Multi-Agent
Proposer vs critic
Voting / arbitration
High-stakes decisions
Direct / CoT / Skill-Triggered
Single-shot for simple tasks
CoT for medium reasoning
Pre-built skill / sub-graph fast-path
C · Agent Orchestrator — The Control Loop
Central state machine that drives the agent through observe → think → act → reflect cycles
Agent Orchestrator (Controller)
Finite-state / graph-based loop · LangGraph · Agent SDK · custom controllers
OBSERVE
THINK
DECIDE
ACT
REFLECT
Step / iteration counter · Stop conditions · Run-state checkpoints · Resume tokens
Run / Trajectory Store
Step log · tool I/O · scratchpad · checkpoints · resume token
Working / Scratchpad Memory
Live thought stream · intermediate facts · action history
Belief / World State
Known facts · pending unknowns · environment snapshot
Loop Controller
Max iterations · timeouts · stop / continue conditions
Stop Criteria Evaluator
DoD met · budget exhausted · escalate · user cancel
Checkpoint & Resume
Pause · serialize · long-running runs · cross-host resume
D · Router, Dispatcher & Tool Selection
Decide WHAT to call next: model, tool, sub-agent — and bind arguments
Skill / Tool Retriever
Top-k by intent + history
MCP server discovery
Skill cards loaded JIT
Model Router
Haiku / Sonnet / Opus tiers
Specialist SLMs · vendors
Quality / cost / latency mix
Sub-Agent Dispatcher
Researcher · Coder · Critic
A2A / MCP-client calls
Capability matching
Argument Binder
Schema-conformant args
Type coercion · defaults
Reference resolution
Pre-flight Validator
JSON-schema check
Dry-run / what-if
Side-effect prediction
Fallback Strategy
Alternate tool / model
Degraded-mode path
Ask-user fallback
E · Policy Engine & Action Gating
Decide if a chosen action is allowed, requires approval, or must be blocked
Permission Manager
RBAC / ABAC / scopes
Tool allow / deny lists
Per-tenant entitlements
Risk Classifier
Reversible · destructive
Blast radius estimate
Regulated-data flag
Prompt-Injection Guard
Confirm tool-driven
actions from content
Untrusted-source check
Policy-as-Code
OPA / Rego rules
Versioned · auditable
Tenant overrides
Action Approval
Auto · HITL · admin
Step-up authentication
Two-person rule
Compliance Filter
Region · residency
PII handling rules
Sector regulations
F · Multi-Agent Coordination & Concurrency
When the plan requires multiple agents — coordination patterns, communication, and consensus
Coordination Patterns
Topology selector
· Supervisor / hierarchical
· Swarm / blackboard
· Pipeline / staged
· Debate / proposer-critic
· Contract net
Agent Spawn Manager
Lifecycle · isolation
· Sub-agent factory
· Sandboxed contexts
· Inherited permissions
· Per-agent token budget
· Deadline propagation
Inter-Agent Bus
Messages & shared state
· A2A · MCP · ACP
· Shared scratchpad / KV
· Pub/sub topics
· Signed handoff envelopes
· Trace propagation
Concurrency Manager
Parallel · fork / join
· DAG runner
· Map-reduce / fan-out
· Race & first-win
· Cancellation propagation
· Deadlock detection
Consensus & Arbitration
Aggregate sub-agent output
· Voting / majority
· Weighted by confidence
· Judge / referee agent
· Tie-breakers · fallbacks
· Conflict resolution
Roles & Personas
Specialist agent registry
· Researcher · Planner
· Coder · Reviewer
· Critic · Verifier
· Domain experts
· Tool persona templates
G · Scheduling, Budget & Resilience
Make agent runs predictable, bounded, and recoverable under load and failure
Scheduler & Queue
Priorities · fair-share
Delayed · cron-driven
Per-tenant queues
Budget Manager
Tokens · steps · $ · time
Per-task & per-run caps
Soft / hard limits
Rate & Concurrency Limiter
Per model / tool / org
Token-bucket backoff
Adaptive throttling
Retry & Backoff
Exponential · jittered
Idempotency keys
Poison-message handling
Circuit Breakers
Per tool / model / agent
Open · half-open · closed
Health-check probes
Cost Optimizer
Cache-aware ordering
Cheaper-model first
Early-stop heuristics
Error & Recovery Manager
Classify (retryable · permanent · policy) · compensating actions
Saga / rollback · transactional groups · poisoning detection
Failure → re-plan · escalate · graceful degrade
Loop Safety
Max steps · max depth · runaway detection
Cycle detection (revisited state) · diversity bonus
Watchdog · liveness probes · hard kill
Durable Execution
Workflow engines (Temporal · Cadence · Restate)
Replay-safe steps · deterministic checkpoints
Long-running runs · cross-host failover
H · Human-in-the-Loop & Steering
Pause, ask, approve, redirect — keep humans in control of risky or ambiguous moves
Approval Gate
Risky / irreversible action
Step-up auth · two-person
Clarification Manager
Ask follow-up questions
Slot-fill · disambiguation
Steering & Override
Pause · cancel · redirect
Modify plan mid-run
Plan Mode Preview
Show plan before execute
Diff before write
Escalation Router
Tier 1 / 2 / human expert
SLA-driven routing
Feedback Capture
Inline edits · ratings
Routes to Memory / Eval
I · Observability, Trace & Cross-Cutting
Every decision is traced, costed, and auditable; signals feed Layer 10 (Reflection) and 11 (Governance)
Trace & Span Emission
OTel · LangSmith · Langfuse
Per step / tool / agent
Cost & Token Meters
Per task / org / model
Streaming cost gauges
Decision Logs
Why this tool · why now
Plan diff history
Replay & Time-Travel
Re-run from checkpoint
Counterfactual debug
Anomaly & Drift
Tool-error spikes
Plan-shape regressions
Audit Log
Signed · immutable
Compliance evidence
⇣ Outbound — Coordinated Calls to Downstream Layers
The orchestrator dispatches typed calls to Reasoning, Memory, Tools, Knowledge, and Multi-Agent layers
→ Layer 4 · Reasoning
Compiled prompt · model · params
Tool catalog · stop tokens
→ Layer 5 · Memory
Read · write · update
Episodic / semantic deltas
→ Layer 6 · Tools
Schema-validated calls
Idempotency · deadlines
→ Layer 7 · RAG
Targeted retrievals
Citations required
→ Layer 8 · Multi-Agent
Sub-agent dispatch
A2A / MCP envelopes
↑ Layer 1 · User
Approvals · clarifications
Streaming partial output
Cross-cutting Policy, Safety & Telemetry
Cross-cutting Policy, Safety & Telemetry
Inbound / Orchestrator
Planning / Multi-agent
Strategies / HITL
Routing
Policy
Scheduling / Observability
Forward control flow
Re-plan / reflection loop
HITL back to user
Detailed view of Layer 3 — Orchestration, Planning & Control from the Agentic AI System Architecture reference.
The orchestrator drives the OBSERVE → THINK → DECIDE → ACT → REFLECT loop; planning, routing, policy, multi-agent coordination, and scheduling are coordinated services around it. All decisions emit traces and feed Reflection (Layer 10) and Governance (Layer 11).
Layer 4 Reasoning Core — Foundation Models & Cognition
Agentic AI System Architecture › Layer 4 Detail
Reasoning Core — Foundation Models & Cognition
The cognitive engine of the agent — foundation models, extended thinking, tool-use, structured output, multimodal reasoning, self-reflection, adaptation, and the inference fabric that makes them fast, cheap, and reliable.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Inference Request from Layer 3 (Orchestration & Planning)
Compiled prompt · model preference · tool catalog · sampling params · structured-output schema · stop tokens · budget · trace context
messages[] · system · tools[] · response_format · temperature · max_tokens · cache_control · thinking_budget · stream
A · Model Selection & Routing Fabric
Pick the right model for the job — by capability, latency, cost, region, and trust
Capability Matcher
Intent → required skills
· Reasoning depth · vision
· Long-context · code · math
· Tool-use · structured output
Tier Router
Right-size by complexity
· Haiku → fast / cheap
· Sonnet → workhorse
· Opus → hardest reasoning
Cost / Latency Optimizer
SLA-aware selection
· $/1k token meter
· P50 / P95 latency targets
· Cache-hit aware
Region & Residency
Data-locality routing
· EU · US · APAC pinning
· On-prem / private VPC
· Sovereign-cloud routing
Vendor & Failover
Multi-provider abstraction
· Anthropic · OpenAI · Google
· Self-hosted OSS
· Health-check failover
Model Cascade
Cheap-first, escalate
· SLM → LLM escalation
· Confidence-gated retry
· Mixture-of-experts router
B · Foundation Model Pool
A heterogeneous fleet — frontier LLMs, multimodal LMMs, and small specialist models
Anthropic Claude Family
Frontier reasoning · agentic tool-use · long context
· Claude Opus 4.7 — deepest reasoning
· Claude Sonnet 4.6 — balanced workhorse
· Claude Haiku 4.5 — fast / low cost
Extended thinking · vision · tool-use · 200k+ context · prompt caching
Other Frontier LLMs
Multi-vendor coverage
· OpenAI GPT-5 / o-series
· Google Gemini 2.x
· xAI Grok
· DeepSeek · Qwen
· Mistral Large
Open-Weights / Self-Host
On-prem & sovereign
· Llama 4 / 5
· Qwen3 · DeepSeek-R
· Mistral · Mixtral
· Gemma · Phi
· Domain-tuned variants
Multimodal Models
Vision · audio · video
· VLMs (image + text)
· Speech-to-speech models
· Video-understanding LLMs
· Image-generation models
· TTS / ASR specialists
Specialist Small Models (SLMs)
Cheap, fast, narrow
· Embedders (BGE · E5 · Voyage)
· Re-rankers (cross-encoder)
· Classifiers (intent · safety · PII)
· Code models (Codex-style)
· Math / theorem provers
C · Cognitive Capabilities — How the Model Thinks
First-class capabilities the orchestrator can compose: reasoning, tool-use, reflection, and learning in-context
Extended Thinking
Private reasoning tokens
· Reasoning scratchpad
· thinking_budget control
· Visible vs hidden CoT
· Plan-before-act
· Multi-step decomposition
· Self-consistency / voting
· Long-horizon arithmetic
Tool Use / Function Calling
Bridge to the world
· Schema-constrained args
· Parallel tool calls
· Tool selection & chaining
· Tool-result integration
· Computer-use actions
· MCP-server tool calls
· Function-call streaming
Structured Output
Reliable machine-readable
· JSON Schema enforcement
· Pydantic / Zod models
· Regex / grammar-guided
· Type-safe SDK responses
· Citations & spans
· Field-level validation
· Retry on parse failure
Multimodal Reasoning
Beyond text
· Vision: docs · charts · UI
· Audio: transcribe + reason
· Video frame reasoning
· Code: AST + repo context
· Tables & spreadsheets
· Cross-modal grounding
· Generation across modes
Self-Reflection / Critic
Inner verification
· Reflexion · Self-Refine
· Generator + critic split
· LLM-as-Judge scoring
· Self-consistency vote
· Confidence calibration
· Hallucination probes
· Verifier tool calls
In-Context Learning
Adapt without training
· Few-shot exemplars
· Skills / system cards
· Persona / style transfer
· Negative examples
· Long-context recall
· Demonstration learning
· Test-time compute scale
D · Inference & Decoding Controls
Knobs that shape the distribution and shape of generated tokens
Sampling Parameters
temperature · top-p · top-k
min-p · repetition penalty
seed for reproducibility
Constrained Decoding
Grammar / regex / GBNF
JSON-schema masking
Outlines · LMQL · XGrammar
Logit Biasing
Boost / suppress tokens
Stop sequences
Banned-phrase enforcement
Streaming & Stop Logic
SSE / event stream
Stop tokens · max_tokens
Mid-stream cancel
Speculative Decoding
Draft model + verify
Medusa / EAGLE heads
2-3× faster decode
Test-Time Compute
Best-of-N · majority vote
Tree search · MCTS
Verifier-guided search
E · Context & Caching Subsystem
Make long contexts fast and cheap — KV reuse, prompt caching, and attention efficiency
Prompt Cache
cache_control breakpoints
5-min TTL · 1-hour TTL
90% cost / latency cut
KV-Cache Manager
PagedAttention (vLLM)
Prefix sharing across reqs
Eviction · radix tree
Long-Context Handling
200k–2M tokens
Chunk · map-reduce · skim
Needle-in-haystack tuning
Attention Efficiency
FlashAttention 3
Sliding-window · sparse
Linear / state-space hybrids
Compaction / Summarize
Auto-compact threshold
Recursive summarization
Token-budget reclaim
Response Cache
Semantic cache (embed)
Idempotent re-runs
Read-through · TTL
F · Adaptation & Customization
Specialize the base model to your domain — prompts, parameter-efficient tuning, full fine-tunes, and preference optimization
Prompt Engineering
Lightweight, no training
· System / persona design
· Few-shot exemplar library
· Skill / sub-prompt files
· Auto-prompt optimization
· DSPy compilation
PEFT — LoRA / Adapters
Parameter-efficient tuning
· LoRA · QLoRA · DoRA
· Prefix / prompt tuning
· Adapter fusion
· Per-tenant / per-task
· Hot-swap at inference
Supervised Fine-Tune
SFT on curated data
· Instruction tuning
· Domain-corpus continued
· Tool-use distillation
· Rejection-sampled SFT
· Curriculum & staging
Preference Optimization
Align to human / AI prefs
· RLHF · PPO
· DPO · IPO · KTO
· RLAIF (constitutional AI)
· Reward modeling
· GRPO · process rewards
Distillation & Compression
Smaller, cheaper, faster
· Teacher → student
· Quantization (INT8/4)
· Pruning · sparsity
· Speculative draft training
· Edge-deploy variants
Continual Learning
Improve from production
· Trace mining
· Feedback → SFT data
· Self-play / synthetic
· Catastrophic-forget guard
· Online eval gating
G · Inference Engine & Serving Fabric
High-throughput, low-latency execution: schedulers, batching, kernels, accelerators
Serving Runtimes
Production model servers
· vLLM · SGLang
· TensorRT-LLM
· TGI · llama.cpp
· Triton Inference Server
· Hosted (Anthropic / OpenAI)
Batching & Scheduling
Throughput optimization
· Continuous batching
· Chunked prefill
· Disaggregated P/D
· Priority queues
· SLO-aware scheduling
Compute & Accelerators
Hardware substrate
· NVIDIA H100 / B200
· Google TPU v5p / v6
· AMD MI300 · Trainium
· Groq · Cerebras · SambaNova
· Edge / mobile NPUs
Distributed Inference
Scale beyond one node
· Tensor parallelism
· Pipeline parallelism
· Expert parallelism (MoE)
· Sequence parallelism
· NCCL / RDMA fabric
Optimized Kernels
Squeeze more per token
· FlashAttention / FA3
· Fused MLP / RMSNorm
· Triton / CUDA kernels
· FP8 / INT4 GEMM
· Compiler stacks (XLA, Mojo)
Quantization & Deployment
Tradeoff quality vs cost
· FP16 · BF16 · FP8
· INT8 · INT4 · AWQ · GPTQ
· Weight streaming
· Multi-tenant serving
· Cold-start optimization
H · Output Processing & Validation
Parse, validate, and certify the model's response before returning to the orchestrator
Token / Logprob Stream
SSE chunks · partials
Confidence per token
Tool-Call Parser
Extract function calls
Schema-validate args
Structured Output Verifier
JSON / Pydantic check
Auto-repair on failure
Citation & Span Extractor
Source links · char ranges
Provenance carryover
Hallucination Probes
NLI · entailment · self-check
Cross-source verifier
Confidence Calibrator
Temperature scaling
Score normalization
I · Safety, Telemetry & Governance
Cross-cutting controls — model-level guardrails, traces, evals, and lifecycle
Output Safety Filters
Toxicity · jailbreak · PII
Refusal classifier
Watermarking & Provenance
SynthID · token traces
C2PA content credentials
Inference Telemetry
TTFT · TPOT · tokens/s
Cache hit-rate · cost
Eval & Regression Suite
Offline + online evals
Capability benchmarks
Model Lifecycle
Versioning · canary · rollback
Deprecation policy
Capability Gating
RSP · ASL tiers
Red-team gated release
⇣ Outbound — Inference Result Bundle to Layer 3 (Orchestration)
Returned in a single shape regardless of model — text · tool calls · structured object · citations · usage · trace
Generated Text
Streamed or batched
Stop reason annotated
Tool Calls
Validated args
Parallel-call list
Structured Object
JSON / Pydantic
Schema-conformant
Reasoning Trace
Thinking tokens
Self-critique notes
Citations & Confidence
Source spans · scores
Calibrated uncertainty
Usage & Trace
Tokens · cost · latency
Cache stats · trace_id
Cross-cutting Safety, Eval & Lifecycle
Cross-cutting Safety, Eval & Lifecycle
Inbound / Outbound · Routing
Foundation Models
Cognitive Capabilities
Decoding / Output
Caching / Inference Engine
Adaptation
Safety & Lifecycle
Forward inference flow
Reflection / continual learning
Detailed view of Layer 4 — Reasoning Core: Foundation Models & Cognition from the Agentic AI System Architecture reference.
Inference requests flow from Layer 3 through model routing, the foundation-model pool, cognitive capabilities (extended thinking, tool-use, structured output, multimodal, reflection, ICL), decoding/cache controls, the adaptation stack, and the inference engine, returning a typed result bundle to the orchestrator. Telemetry & reflection signals feed Layers 10 (Reflection) and 11 (Governance).
Agentic AI System Architecture › Layer 5 Detail
Memory Subsystem
Multi-tier memory that gives the agent continuity, personalization, and learning across turns, sessions, and lifetimes — working, episodic, semantic, and procedural memory backed by vector, graph, key-value, and document stores, with a memory manager that reads, writes, consolidates, and forgets.
Detailed Diagram · v1.0 · 2026
⇄ Memory Operations from Layer 3 (Orchestrator) and Layer 4 (Reasoning Core)
read · write · upsert · update · forget · consolidate · search · subscribe — scoped to user, project, tenant, agent
read(query, scope, k)
write(item, type, scope, ttl)
update(id, patch, evidence)
forget(scope · subject · GDPR)
consolidate(window)
subscribe(event)
A · Memory Manager — The Memory Control Plane
A unified API on top of heterogeneous stores — handles routing, scoping, consistency, and lifecycle
Memory Manager (Controller)
Single entry-point · scope resolution · ACL · transactions · cross-store fan-out
READ
WRITE
UPDATE
FORGET
CONSOLIDATE
Scope Resolver
user · project · org · agent · global
Access Control
RBAC / ABAC · row-level · ACL filters
Routing & Sharding
Pick store by type / size / region
Consistency & Txn
eventual · read-your-write · 2PC
Conflict Resolver
Recency · evidence-weighted merge
Versioning & Audit
Provenance · immutable history
B · Memory Types — A Cognitive-Inspired Taxonomy
Specialized memory tiers, each with its own write triggers, retrieval pattern, and lifetime
Working / Context Memory
Live conversation buffer
· Current turn / run state
· Tool I/O scratchpad
· Compaction summaries
· Pinned items
· Lifetime: minutes–hours
· Storage: in-context · KV
Volatile · session-scoped
Episodic Memory
"What happened when"
· Past sessions / runs
· Trajectories & outcomes
· Time-stamped events
· User interactions log
· Lifetime: weeks–years
· Storage: vector + KV
Persistent · timeline-ordered
Semantic Memory
Facts & concepts
· User profile · preferences
· Project / domain knowledge
· Entities · relations · taxonomy
· Distilled from episodes
· Lifetime: long / permanent
· Storage: KG + vector
Persistent · timeless
Procedural Memory
Skills · workflows · "how"
· Reusable tool sequences
· Skill / recipe library
· Plan templates
· Voyager-style distillation
· Lifetime: long · versioned
· Storage: doc + repo
Executable artifacts
Affective / Persona Memory
User mood · style · trust
· Communication style
· Tone · formality
· Frustration / engagement
· Relationship trust score
· Lifetime: rolling
· Storage: KV / profile
Personalization layer
Shared / Org Memory
Cross-user knowledge
· Team playbooks
· CLAUDE.md / repo notes
· Curated FAQs
· Lessons learned
· Lifetime: long · governed
· Storage: docs + KG
Org-shared knowledge
C · Storage Backends — Polyglot Persistence
Use the right database for each access pattern; the manager hides which is which
Vector Stores
Semantic similarity search
· pgvector · Pinecone
· Weaviate · Qdrant
· Milvus · Chroma · LanceDB
· HNSW · IVF · DiskANN
· Quantization · binary
Knowledge Graphs
Entities · relations · paths
· Neo4j · ArangoDB
· Memgraph · NebulaGraph
· RDF · SPARQL stores
· GraphRAG-friendly
· Property + temporal edges
KV / Cache
Hot, fast, simple
· Redis · KeyDB · Dragonfly
· DynamoDB · Cosmos DB
· Memcached
· TTL · LRU eviction
· Pub/sub for invalidation
Document Stores
Rich nested objects
· MongoDB · Couchbase
· Firestore · Elastic
· OpenSearch (BM25)
· JSONB / Postgres
· Object storage (S3, R2)
Relational / OLTP
Strong consistency, joins
· Postgres · MySQL
· CockroachDB · Spanner
· Schema-validated facts
· Audit / version tables
· Row-level security
Time-Series & Event
Append-only timelines
· TimescaleDB · InfluxDB
· Kafka · Pulsar · NATS
· Event-sourced runs
· CDC streams
· Replay-able trajectories
D · Encoding & Indexing Pipeline (Write Path)
Turn raw events into searchable, structured, deduplicated memory items
Capture & Normalize
From traces · turns · tools
Schema-canonical events
Stable IDs · timestamps
Chunking & Summarize
Semantic / sliding windows
Hierarchical summaries
Headline + body + facts
Importance Scorer
Should we remember this?
Surprise · novelty · utility
User-flagged · pinned
Embedding Pipeline
Dense + sparse vectors
BGE · E5 · Voyage · OpenAI
Multi-vector / ColBERT
Entity & Relation Extractor
Triples for the KG
Linker · canonicalizer
Coreference resolution
Index Builder
HNSW · IVF · BM25
Field metadata indexes
Async + batch builds
E · Retrieval & Recall (Read Path)
Surface the right memories at the right time, with provenance and freshness
Query Planner
Decide which stores / types
Multi-query expansion · HyDE
Query rewriting
Hybrid Search
BM25 + dense + KG hops
Reciprocal-rank fusion
Field filters · ACL filter
Re-ranker
Cross-encoder · LLM-rerank
Recency / freshness boost
Diversity (MMR)
Salience Scorer
Relevance · importance
Decay function (Ebbinghaus)
Per-user weighting
Provenance & Citation
Source IDs · timestamps
Confidence per item
Trust labels carried
Read Cache
Semantic / exact
TTL · invalidation hooks
Per-scope keys
F · Memory Lifecycle — Consolidation, Update, Forgetting
Memory must change: episodes get distilled into facts, stale knowledge gets revised, and what shouldn't persist must be removed
Consolidation
Episodic → Semantic
· Periodic distillation jobs
· LLM-based summarizer
· Pattern → general fact
· Sleep-cycle inspired
· Hot → warm → cold tiers
Update & Belief Revision
Keep facts current
· Newer evidence wins
· Contradiction detector
· Soft / hard updates
· Provenance preservation
· Conflict resolution policy
Forgetting / Deletion
Bounded growth, compliance
· TTL expiration
· Decay curves · LRU
· Right-to-be-forgotten
· User opt-out / opt-in
· Cascading delete (KG)
Reflection & Skill Distill
Episodes → procedures
· Voyager-style skills
· Lessons learned
· Reflexion notes
· Recipe extraction
· Promote to org memory
De-duplication
Avoid memory bloat
· Near-duplicate detection
· SimHash / embedding sim
· Merge duplicates
· Canonicalize entities
· Compaction passes
Tiering & Archival
Cost-optimized storage
· Hot RAM / NVMe
· Warm SSD
· Cold object store
· Glacier / deep archive
· Promote on access
G · Privacy, Security & Compliance
Memory holds the most sensitive long-lived data — protect, scope, and prove control
Encryption & Keys
At-rest · in-transit · in-use
· KMS / HSM-managed keys
· Per-tenant key isolation
· BYOK / HYOK options
· Confidential compute
Access Control & Scoping
Least-privilege everywhere
· Row / namespace ACL
· Tenant isolation
· Per-agent token scopes
· Cross-tenant leakage tests
PII & DLP
Detect, redact, vault
· PII classifier on write
· Tokenization vault
· Differential privacy
· Sensitive-field masking
Consent & Residency
Honor user intent & law
· Memory opt-in / opt-out
· Region pinning (EU/US/APAC)
· Train-on-data flags
· Retention policy enforcement
Right-to-Be-Forgotten
GDPR / CCPA / CPRA
· Subject-erasure request
· Cascade across stores
· Retraining-aware deletion
· Tombstones & receipts
Audit & Compliance
Every read & write traced
· Signed, immutable log
· SOC 2 · HIPAA · ISO 27001
· Data lineage graph
· Compliance dashboards
H · Operations, Observability & Quality
Make memory measurable, debuggable, and reliable in production
Telemetry
Read / write / hit-rate
Latency P50 / P95
Memory Health
Drift · staleness · bloat
Index integrity checks
Backup & DR
Snapshots · PITR
Cross-region replicas
Quality Evals
Recall@K · MRR · NDCG
A/B retrieval experiments
Cost Monitoring
Storage / IO / embedding $
Per-tenant chargeback
Schema Migration
Embedding model upgrades
Re-indexing pipelines
I · Personalization & Memory APIs
How other layers consume memory — typed, scoped, and traceable
Profile API
User · org · agent profile
Get / patch · merge logic
Search API
Semantic / hybrid query
Filtered & scoped
Skill / Recipe API
Procedural memory access
Versioned look-ups
Event Stream
Memory-changed events
Subscribe · webhook
Admin / DSAR API
Export · erase · audit
User self-service portal
Personalization Hooks
Inject context per request
Style · preferences · history
⇄ Cross-Layer Integrations
Memory is consumed by, and feeds, every neighboring layer
↔ Layer 2 · Perception
Profile · history selector
Few-shot retrieval
↔ Layer 3 · Orchestrator
Plan repository
Run trajectories
↔ Layer 4 · Reasoning
Working / scratchpad
Persona & style cues
↔ Layer 7 · RAG
Shared vector / KG indexes
Curated knowledge facts
↔ Layer 10 · Reflection
Lessons in · skills out
Trajectory mining
↔ Layer 11 · Governance
Audit · DSAR · policy
Compliance evidence
All exchanges are scoped, ACL-checked, traced, and logged through the Memory Manager.
Cross-cutting Privacy, Audit & Lifecycle
Cross-cutting Privacy, Audit & Lifecycle
Memory Manager / Lifecycle
Memory Types
Storage Backends
Write Pipeline
Read / APIs
Privacy & Compliance
Ops & Observability
Forward flow
Read path
Reflection / skill loop
Detailed view of Layer 5 — Memory Subsystem from the Agentic AI System Architecture reference.
All memory operations flow through a single Memory Manager that fans out to typed memory tiers (working, episodic, semantic, procedural, affective, shared) backed by polyglot stores. Write & read pipelines, a lifecycle for consolidation/update/forgetting, privacy & compliance controls, and observability surround the manager. Skill distillation feeds procedural memory back into the Reasoning Core (Layer 4) and Reflection (Layer 10).
Layer 6 Tools, Skills & Capabilities
Agentic AI System Architecture › Layer 6 Detail
Tools, Skills & Capabilities
Composable actions the agent can invoke through standardized interfaces — a registry of tools, MCP servers, and skills, fronted by a hardened gateway that handles auth, validation, sandboxing, retries, and observability for every external call.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Tool Call from Layer 3 (Orchestrator) / Layer 4 (Reasoning)
tool_name · arguments · principal · scopes · trace_id · idempotency_key · deadline · retry_policy · trust labels
{ tool: "get_pull_request", arguments: {...}, ctx: { trace_id, principal, scopes, deadline, idem_key, trust: "tool" } }
A · Tool Gateway — The Universal Adapter
Every tool invocation is normalized, authorized, validated, executed, and traced through this gateway
Tool Gateway / Skill Runtime
Single entry-point · spec resolution · auth · validation · invocation · result normalization
RESOLVE
spec / schema
AUTHORIZE
scopes · policy
VALIDATE
args · types
INVOKE
execute
SHAPE
result · trace
Schema Validation
JSON Schema · Pydantic · Zod
Auth & Scope Check
OAuth · OIDC · token vault
Policy Pre-flight
OPA / Rego · risk & HITL
Idempotency · de-dup · request signing
Result Normalizer
Stable schema · trim · redact
Retry & Backoff
Exponential · circuit breaker
Trace & Cost Emit
OTel spans · token / $ meters
Streaming results · pagination · partial outputs
B · Tool Registry, Specs & Discovery
A versioned catalog of available tools, MCP servers, and skills — what they do, how to call them, and who can use them
Spec Registry
Source of truth
· OpenAPI · JSON Schema
· MCP tool descriptors
· Examples · cost hints
· Side-effect labels
Discovery / Indexing
Find right tool fast
· Embedding-based retrieval
· Tag · category · capability
· MCP server enumeration
· JIT spec injection
Versioning & Lifecycle
Evolve safely
· SemVer per tool
· Canary · rollback
· Deprecation windows
· Backwards-compat tests
Capability Matcher
Plan → tools mapping
· Required vs optional
· Pre/post conditions
· Cost / latency profile
· Substitutable equivalents
Permissions Matrix
Who can call what
· Allow / deny lists
· Per-tenant overrides
· Risk-tier gating
· HITL-required flag
Marketplace
Distribution & sharing
· Internal tool hub
· Public MCP registry
· Signed publisher
· Reviews · ratings
C · Tool Categories — The Capability Surface
A catalog of what an agent can do — grouped by domain, each adapter conforming to the gateway's contract
Web & Browsing
Read the open internet
· Web search (Bing · Google · Brave)
· URL fetch · readability extract
· Browser agent (Playwright)
· Computer-use UI control
· Form fill · click · navigate
· Screenshot & DOM capture
· Headless · headful modes
· Crawl + sitemap traversal
· robots.txt & ToS aware
Code Execution
Compute · transform · test
· Python / Node / Bash REPL
· Code interpreter
· Notebook (Jupyter)
· Compiler / linter / formatter
· Test runner · fuzzers
· Build & package tools
· Container exec · SSH
· Static analysis · SAST
· Math / symbolic (SymPy)
File & Repo
Code & document operations
· Read · Write · Edit · Glob
· grep · ripgrep · ast-grep
· git · diff · patch · blame
· GitHub / GitLab / Bitbucket
· PR / commit / branch ops
· LSP symbols · tree-sitter
· Object storage (S3 · GCS · R2)
· File conversion (PDF · DOCX)
· Diff & merge tooling
External APIs
SaaS & partner systems
· REST · GraphQL · gRPC
· Webhooks (in & out)
· Stripe · Twilio · SendGrid
· Salesforce · HubSpot
· Google / Microsoft Graph
· OpenAPI auto-clients
· OAuth flow handler
· SDK adapters (Python · TS)
· Mock / sandbox endpoints
Data & Databases
Read & write structured data
· SQL (Postgres · MySQL)
· NoSQL (Mongo · Dynamo)
· Warehouse (BigQuery · Snowflake)
· Vector / KG queries
· Read-only safe-mode
· DDL gated by approval
· Query plan inspector
· dbt · Airflow runs
· CSV / Excel I/O
Communication
Reach humans & teams
· Email · SMS · Push
· Slack · Teams · Discord
· Calendar / meeting invite
· Voice call (Twilio)
· Pager / on-call (PagerDuty)
· Templates · approvals
· Localization aware
· Quiet-hours respect
· Send-rate caps
C · Tool Categories — Continued
Workflow integrations, AI specialists, knowledge access, computer use, and physical-world adapters
Workflow & PM
Tickets · docs · planning
· Jira · Linear · Asana
· Notion · Confluence
· Google Docs · Office 365
· CI/CD (GitHub Actions)
· Terraform · Ansible
· Status pages · runbooks
AI Specialist Models
Models exposed as tools
· Vision (OCR · detection)
· Speech (ASR · TTS · diarize)
· Image / video generation
· Translation · summarize
· Embedders · re-rankers
· Classifiers · NER · safety
Knowledge & Search
Internal & curated knowledge
· Vector / hybrid retrievers
· KG / GraphRAG queries
· Wikipedia · Wolfram
· Research (arXiv · PubMed)
· Internal wikis · runbooks
· Maps · weather · finance
Computer Use
Operate desktop / mobile
· Screen + keyboard + mouse
· OS-level automation
· Accessibility tree access
· VNC / RDP isolated VM
· Mobile emulator control
· Action recorder & replay
Enterprise Systems
Systems of record
· CRM · ERP · ITSM
· HRIS · billing · payroll
· Identity / IDM (Okta · AAD)
· Data lake / lakehouse
· SOC / SIEM · monitoring
· EHR · LIS (regulated)
Physical / IoT
Real-world actuation
· Robotics control APIs
· Sensor read · actuator
· Smart-home (Matter)
· Industrial PLC / SCADA
· Drone / vehicle telemetry
· Edge / on-device runtime
D · Skills — Composed, Reusable Capabilities
Higher-level building blocks: prompts + tools + sub-flows packaged as named, versioned skills
Skill Definition
SKILL.md · system prompt
Triggers · examples · args
Required tools manifest
Skill Composition
Sub-graphs · pipelines
Sequenced tool calls
Conditional branches
Trigger Engine
Auto-load on intent
Slash commands
Path / context match
Skill Library
Built-in · org · personal
Marketplace import
Versioned & signed
Distillation
Promote successful runs
Voyager-style learning
From procedural memory
Runtime Sandbox
Scoped tool subset
Per-skill budget
Isolated state
E · MCP Servers — The Standardized Tool Protocol
Model Context Protocol — open standard for exposing tools, resources, and prompts to any agent
Server Registry
Discover available servers
Local · remote · cloud
Capability negotiation
Transport & Session
stdio · SSE · WebSocket
JSON-RPC 2.0 framing
Bi-di streaming
Resources & Prompts
Files · URIs · templates
Sampling requests
Subscribe / notify
Server Catalog
GitHub · GitLab · Slack
Filesystem · DB · Search
Custom enterprise servers
Trust & Sandboxing
Per-server permissions
Signed publishers
Capability review
SDK & Hosting
Python · TS · Rust SDKs
Docker · serverless
Multi-tenant gateways
F · Execution Environment & Sandboxing
Where tools actually run — isolated, limited, observable, and recoverable
Sandbox Runtimes
Hard isolation per call
· Containers (Docker · OCI)
· MicroVMs (Firecracker)
· gVisor · Kata · WASM
· Browser-based VMs (E2B)
· Ephemeral · per-task
Resource Limits
Bound blast radius
· CPU · RAM · disk caps
· Wall-clock timeouts
· Egress allow / deny
· File-system quotas
· Process count limits
Network Policy
Egress & DNS control
· Domain allow-list
· No-egress mode
· Outbound proxy & logs
· Service-mesh mTLS
· Rate-limit per host
State & Persistence
Workspace lifecycle
· Scratch FS per run
· Persistent volumes
· Snapshot & restore
· Worktree isolation (git)
· Auto-cleanup TTL
Concurrency & Pooling
Throughput & warm starts
· Sandbox warm pool
· Per-tool concurrency cap
· Connection pooling
· Backpressure signaling
· Cold-start optimization
Adapters & Drivers
Speak each tool's protocol
· HTTP / gRPC / WS clients
· DB drivers · ODBC / JDBC
· SDK wrappers
· Protocol bridges
· Mock / replay drivers
G · Security, Trust & Risk Controls
Tool calls are the highest-risk surface — defend against injection, exfiltration, and over-permission
Secret & Token Vault
Short-lived credentials
· HashiCorp Vault · KMS
· OAuth token exchange
· Just-in-time credentials
· Rotate & revoke
Risky-Action Gate
Reversibility check
· Destructive ops require HITL
· Two-person rule
· Dry-run / what-if
· Step-up auth
Injection Defense
Trust-boundary aware
· Quarantine tool results
· No instruction-following
· Spotlighting / delimiters
· SSRF / SQLi guards
DLP & Egress Filter
Block exfiltration
· Outbound PII scan
· Secret pattern detection
· Tenant-data scoping
· URL allow-listing
Anti-Abuse
Detect bad behavior
· Anomaly detection
· Quota / spike alarms
· Honeypot tools
· Auto-disable rogue agent
Compliance Hooks
Regulated tool use
· SOC 2 · HIPAA · PCI
· Region-bound tool routing
· Tool-level audit evidence
· Data-residency proofs
H · Reliability & Observability
Make every tool call diagnosable, replayable, and within SLO
Idempotency Keys
De-dup retried calls
Retries & Backoff
Jittered exponential
Circuit Breakers
Per tool / endpoint
Tracing & Spans
OTel · LangSmith
Caching
Result · semantic
Cost & Latency Meters
P50 / P95 · $ per call
Replay & Debug
Recorded I/O
SLO Tracking
Error budget burn
⇣ Outbound — Tool Result & Effect to Layer 9 (Action / Environment)
Normalized result · side-effect record · provenance · latency & cost · trust label
Result Object
Schema-conformant
Side-Effect Log
What changed
Provenance
Source · timestamp
Trust Label
"tool" untrusted
Compensations
Rollback hooks
Usage Stats
Tokens · $ · ms
Citations
URLs · refs
Trace ID
For replay
Cross-cutting Auth, Sandbox & Audit
Cross-cutting Auth, Sandbox & Audit
Tool Gateway
Registry / MCP
Tool Categories
Skills
Sandboxing
Security & Trust
Reliability / Observability
Forward call flow
Tool result return
Skill distillation
Detailed view of Layer 6 — Tools, Skills & Capabilities from the Agentic AI System Architecture reference.
All tool invocations flow through a single Tool Gateway that resolves specs from a versioned registry, enforces auth and policy, validates arguments, executes inside hardened sandboxes, and emits a normalized result with traces, costs, and side-effect logs. Skills and MCP servers extend the catalog with composable capabilities; security and observability wrap every call.
Layer 7 Knowledge & Retrieval (RAG)
Agentic AI System Architecture › Layer 7 Detail
Knowledge & Retrieval (RAG)
Ground the agent in fresh, verifiable knowledge — connectors, ingestion, embeddings, indexes, hybrid retrieval, advanced RAG patterns, faithfulness checking, and citation-aware delivery — turning raw sources into trusted, traceable context.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Retrieval Request from Layer 2 (Perception) / Layer 3 (Orchestrator) / Layer 4 (Reasoning)
query · intent · scopes (user/tenant/project) · ACLs · k · filters · freshness · trace_id · budget · response shape
retrieve(query, scope, k=20, filters={recency, ACL, source}, mode=hybrid, freshness=24h, with_citations=true)
A · Knowledge Sources & Connectors
All authoritative knowledge surfaces — internal, external, structured, unstructured — flow in through governed connectors
Internal Docs & Wiki
Tribal knowledge
· Confluence · Notion
· SharePoint · Coda
· Google Docs · Drive
· Quip · Bear · Obsidian
· Internal handbooks
· Onboarding guides
Code & Repos
Source-grounded answers
· GitHub · GitLab · Bitbucket
· Source files + symbols
· README · CLAUDE.md
· PRs · issues · discussions
· Commit history
· API docstrings
Tickets & Runbooks
Operational know-how
· Jira · Linear · ServiceNow
· Zendesk · Freshdesk · HelpScout
· PagerDuty post-mortems
· Runbooks · playbooks
· Incident timelines
· Change requests
Communications
Conversational record
· Slack · Teams · Discord
· Email (Gmail · O365)
· Meeting transcripts
· Chat threads
· Customer call notes
· Forum posts
Structured Data
Systems of record
· OLTP / SQL DBs
· Warehouses (Snowflake · BQ)
· CRM · ERP · ITSM
· Data lakes / lakehouses
· APIs (Text-to-SQL, NL2API)
· CSV / spreadsheets
External / Public Web
Open knowledge
· Live web search
· Crawled domains
· Wikipedia · Wikidata
· arXiv · PubMed · SSRN
· News · regulatory feeds
· Industry datasets
B · Ingestion & Document Processing Pipeline
From raw source to clean, chunked, enriched documents — the foundation of retrieval quality
Connectors & Loaders
Ingest from each source
· OAuth-scoped access
· Full crawl + delta sync
· CDC / change feeds
· Webhook event push
· Permission propagation
· Source provenance tags
Parsing & Extraction
Get the structure right
· Layout-aware PDF (Unstructured)
· DOCX · PPTX · XLSX
· HTML cleanup · readability
· OCR (Tesseract · docTR)
· Table & figure extraction
· Audio / video transcription
Cleaning & Dedup
Reduce noise & bloat
· Boilerplate stripping
· Near-duplicate (SimHash)
· Encoding normalization
· Language detection
· Quality score · spam filter
· Translation (optional)
Chunking Strategies
Right-size for retrieval
· Fixed token / overlap
· Semantic / sentence split
· Hierarchical (parent / child)
· Markdown heading-aware
· Code: AST-based split
· Late-chunking with context
Metadata Enrichment
Make filtering precise
· Source · author · date
· ACL · sensitivity tags
· LLM-generated summary
· Auto-generated questions
· Entities · keywords · topics
· Section path · breadcrumbs
Sync & Freshness
Keep the index live
· Incremental updates
· Tombstones · soft delete
· Re-index on schema change
· Embedding-model upgrade
· Backfill orchestration
· DLQ · failed-doc replay
C · Embeddings & Index Construction
Multiple complementary indexes — dense, sparse, graph — for hybrid retrieval
Embedding Models
Multi-model strategy
· OpenAI · Voyage · Cohere
· BGE · E5 · GTE · Jina
· Multimodal (CLIP · SigLIP)
· Multi-vector (ColBERT · Late)
· Matryoshka (truncatable)
· Domain fine-tunes
Vector Indexes
Scalable ANN search
· pgvector · Pinecone
· Weaviate · Qdrant · Milvus
· Vespa · Turbopuffer
· HNSW · IVF · DiskANN
· PQ · binary quantization
· Per-tenant namespacing
Sparse / Lexical Index
Exact-match recall
· BM25 / Okapi
· Elasticsearch · OpenSearch
· Tantivy · Quickwit
· SPLADE · uniCOIL learned-sparse
· Token n-grams · synonyms
· Field boosts · phrase
Knowledge Graph
Relations & multi-hop
· Entity / relation extraction
· Neo4j · ArangoDB · Memgraph
· RDF · SPARQL stores
· Community detection (GraphRAG)
· Schema · ontology
· Temporal & provenance edges
Metadata & Filter Index
Pre-filter at scale
· ACL bitmap / posting list
· Date / numeric ranges
· Source / type facets
· Geospatial (R-tree · S2)
· Tenant / workspace shard
· Field-level tokenizers
Index Operations
Build · update · evolve
· Async batch builds
· Streaming upserts
· Blue-green re-index
· Snapshot & restore
· Compaction · vacuum
· Per-store sharding
D · Query Understanding & Expansion
Turn the raw user / agent query into multiple, well-formed search inputs that hit the right indexes
Query Rewriting
Make queries searchable
· Pronoun resolution
· History-aware rewriting
· Synonym expansion
· Acronym expansion
· Spell / typo correction
Multi-Query & HyDE
Cover the answer space
· LLM generates N variants
· Sub-question decomposition
· HyDE: hypothetical doc
· Step-back prompting
· Translate query to index lang
Routing & Filtering
Pick the right haystacks
· Source classifier
· Index router (per query)
· Metadata filters (ACL · date)
· Tenant / project scope
· Mode select (text · KG · SQL)
E · Retrieval Engine — Hybrid Search & Re-ranking
Run parallel searches across stores, fuse, re-rank, and diversify into a final candidate set
Hybrid Searcher
Dense + sparse + KG
· Parallel index queries
· KG multi-hop expansion
· Score normalization
· Reciprocal-Rank Fusion (RRF)
· Per-source weights
Re-ranker
Boost real relevance
· Cross-encoder (BGE · Cohere)
· LLM-as-rerank (listwise)
· Recency / freshness boost
· Authority / source weight
· Click / engagement signals
Diversify & Compress
Pack the best signal
· MMR diversification
· Cluster & pick
· Extractive snippeting
· Map-reduce summarize
· Token-budget enforce
F · Advanced RAG Patterns
Move beyond single-shot RAG: agentic, corrective, graph, and multi-hop strategies
Agentic RAG
Retrieval as a tool
· Iterative retrieve + reason
· Decide when / what to fetch
· Multi-step exploration
· ReAct-style retrieval
· Tool-using sub-agents
Self-RAG / Self-Critique
Retrieve only when needed
· Need-retrieval classifier
· Self-reflection tokens
· Critique & revise
· Score support per claim
· Skip when high confidence
Corrective RAG (CRAG)
Recover from bad retrieval
· Retrieval-quality grader
· Fall back to web search
· Re-query with new terms
· Decompose & recombine
· Abstain when unsure
GraphRAG
Multi-hop & community
· Entity graph from corpus
· Community summaries
· Local + global search
· Path / hop reasoning
· Schema-guided retrieval
Multi-Hop & Decomposition
Answer compound queries
· Sub-question retrieval
· Iterative refinement
· Evidence chaining
· Plan-and-retrieve
· FLARE active retrieval
Hierarchical & RAPTOR
Tree-of-summaries
· Cluster & summarize tree
· Parent-child retrieval
· Coarse-to-fine drill-down
· Section · doc · corpus levels
· Long-corpus efficient
G · Faithfulness, Citations & Hallucination Control
Make answers verifiable — every claim grounded in a source the user can check
Provenance Tracker
Lineage end-to-end
· Doc · chunk · char-span IDs
· Author · timestamp
· Source URL · version
· Trust label per source
Citation Generator
Inline source links
· Sentence-level cites
· Span-level highlighting
· Click-through-able URLs
· Bibliography assembly
Faithfulness Verifier
Does answer match sources?
· NLI / entailment scoring
· Claim → evidence map
· LLM-as-judge faithfulness
· Refuse on low support
Hallucination Probes
Catch ungrounded claims
· Cross-source verifier
· Self-check QA
· Numeric / fact extractor
· Confidence calibration
Conflict & Recency
When sources disagree
· Newer source preference
· Authority weighting
· Surface conflicts to user
· Disagreement marker
Abstention & Refusal
Know when to say "I don't know"
· No-evidence threshold
· Out-of-scope detector
· Suggest follow-up
· Human escalation hook
H · Governance, ACL, Privacy & Compliance
Retrieved knowledge inherits source permissions and policy — never expose more than the user is allowed to see
ACL Propagation
Source perms → index
· User · group · folder ACL
· Live permission lookup
· Pre-filter at search time
PII / DLP
Detect & redact
· Sensitive-field masking
· Tokenization vault
· Sector-specific policies
Residency & Sovereignty
Region-bound data
· Index per region
· Geo-sharded retrieval
· Sovereign-cloud routing
Source Trust Tags
Untrusted by default
· "external content" label
· No-instruction-follow rule
· Spotlighting / delimiters
Audit & Lineage
Who saw what, when
· Query logs (signed)
· Result lineage graph
· DSAR · evidence pack
Retention & TTL
Bounded shelf-life
· Per-source TTL
· Right-to-be-forgotten
· Tombstones cascade
I · Operations & Retrieval Quality
Make RAG measurable, debuggable, and continuously improving
Retrieval Evals
Recall@K · MRR · NDCG
RAG Metrics
Faithfulness · context P/R
A/B Experiments
Embedding · chunk · prompt
Drift & Anomaly
Stale index · empty results
Cost & Latency
P50 · P95 · $ per query
Caching
Embed · query · result
Feedback Loop
👍 / 👎 · click-through
Eval Datasets
Golden Q/A · regression
⇣ Outbound — Grounded Context to Layer 4 (Reasoning) / Layer 3 (Orchestrator)
Ranked passages · citations · faithfulness scores · trust labels · usage stats — ready for prompt assembly
Passages[]
id · text · score
Citations
URLs · spans · authors
Faithfulness
Per-claim score
Trust Labels
Untrusted content
Coverage Report
Gaps · conflicts
Usage Stats
Tokens · ms · $
Trace ID
Replay key
Abstain Flag
Low coverage signal
Cross-cutting ACL, Audit & Compliance
Cross-cutting ACL, Audit & Compliance
Knowledge Sources
Ingestion / Operations
Embeddings / Indexes
Query Understanding
Retrieval / Faithfulness
Advanced RAG Patterns
Governance & Compliance
Forward retrieval flow
Grounded context return
Corrective & feedback loops
Detailed view of Layer 7 — Knowledge & Retrieval (RAG) from the Agentic AI System Architecture reference.
Knowledge flows top-down from heterogeneous sources through ingestion, polyglot indexes, query understanding, hybrid retrieval, advanced RAG patterns, and faithfulness checking, returning trust-labeled grounded context with citations to the Reasoning Core. Corrective & feedback loops continuously improve retrieval quality.
Layer 8 Multi-Agent Collaboration
Agentic AI System Architecture › Layer 8 Detail
Multi-Agent Collaboration
Specialized agents cooperating, debating, and verifying each other's work — coordination patterns, agent roles, communication protocols, lifecycle management, consensus, and trust controls that turn a swarm of agents into a reliable team.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Delegation Request from Layer 3 (Orchestrator)
complex goal · sub-task DAG · required roles · budget · deadline · trust scope · coordination preference · result schema
spawn_team({ goal, pattern: "supervisor", roles: ["researcher", "coder", "critic"], budget, deadline, trust_scope })
A · Coordination Patterns & Topologies
Pick the right organizational shape for the task — each pattern trades autonomy, parallelism, and quality differently
Supervisor / Hierarchical
Top-down delegation
· Single supervisor agent
· Specialist sub-agents
· Predictable accountability
· Easy to debug
· Best for clear goals
Pipeline / Sequential
Stage-by-stage handoff
· Each agent → next stage
· Strict ordering
· Schema-checked transitions
· Ideal for ETL / workflows
· Easy retry per stage
Debate / Dialectic
Adversarial verification
· Proposer vs critic
· Multi-round argument
· Judge / referee agent
· Reduces hallucinations
· High-stakes decisions
Swarm / Peer
Decentralized cooperation
· No central authority
· Hand-off-capable peers
· Emergent specialization
· Resilient · scalable
· Open-ended exploration
Blackboard
Shared workspace
· Common knowledge state
· Agents read & write
· Trigger on changes
· Loose coupling
· Heterogeneous experts
Contract Net
Bid-based assignment
· Manager broadcasts task
· Workers bid · capability
· Best bid wins · contract
· Marketplace dynamics
· Cross-org agents
B · Supervisor / Coordinator — The Team Leader
Spawns the team, distributes work, monitors progress, aggregates results, and decides when the team is done
Multi-Agent Coordinator
Pattern selector · agent factory · message broker · result aggregator
PLAN
team & tasks
DELEGATE
assign work
MONITOR
progress
AGGREGATE
merge results
CONCLUDE
finalize
Pattern Selector
Choose topology by task
Team Composer
Pick roles · models · skills
Budget Allocator
Tokens · time · $ per agent
Progress Tracker
Per-agent status & SLA
Termination Manager
When the team is "done"
Escalation Hooks
HITL · admin · stop-the-team
C · Agent Roles & Specialist Personas
A library of role templates — each with its own system prompt, tools, model, memory scope, and quality bar
Planner / Architect
High-level decomposition
· Goal → sub-goals
· DAG / step graph
· Risk & budget plan
· Strong reasoning model
· Plan repository access
· Re-plan on failure
Researcher
Find & synthesize info
· Web · KB · RAG access
· Citation discipline
· Multi-source synthesis
· Long-context model
· Read-only tool scope
· Coverage / gap reports
Coder / Builder
Write & test code
· Edit · run · test
· Debug · refactor
· Sandboxed exec scope
· Repo & PR tools
· Code-tuned model
· Writes worktree branch
Critic / Reviewer
Verify & challenge
· Independent context
· Rubric · checklist
· Score · pass / fail
· LLM-as-Judge persona
· No-write tool scope
· Red-team variant
Domain Experts
Vertical-specific knowledge
· Legal · medical · finance
· DevOps · security · QA
· Domain-tuned models
· Curated knowledge base
· Compliance-aware
· Regulated tool sets
Operator / Executor
Take action in the world
· Computer-use tools
· Browser / GUI control
· Enterprise app actions
· HITL gating
· Compensating actions
· Receipts & logs
D · Inter-Agent Communication Protocols
Standardized message formats and transports — typed, signed, traceable, and replay-safe
A2A Protocol
Agent-to-Agent
Capability-card discovery
Cross-vendor / org agents
MCP Client Calls
Tool-style sub-agent
Sub-agent as MCP server
Schema-typed I/O
ACP / Custom RPC
Internal team protocol
gRPC · JSON-RPC
Strong typing · streaming
Message Bus
Pub/sub · queues
Kafka · NATS · Redis Streams
Topic-per-conversation
Shared Scratchpad
Blackboard / KV
CRDT for concurrent edits
Subscribe to changes
Message Envelope
Standard wrapper
from · to · trace · sig
trust · replyTo · ttl
E · Agent Lifecycle & State Management
Spawn safely, scope tightly, checkpoint reliably, and tear down cleanly
Spawn Factory
Instantiate sub-agent
Role · model · tools · prompt
Inherited / scoped context
Permission Inheritance
Least-privilege subset
Token attenuation
Tool / data scope clamp
Isolated Runtime
Sandbox · worktree · VM
No leak between agents
Per-agent secrets
Checkpoint & Resume
Durable per-agent state
Pause · serialize · re-hydrate
Cross-host failover
Health & Watchdog
Liveness probes
Stuck-loop detection
Auto-respawn / abort
Graceful Termination
Drain · finalize · cleanup
Hand-off in-flight tasks
Audit log per-agent
F · Coordination Mechanics — How Work Gets Done
Decompose, assign, run in parallel, aggregate, and resolve conflicts — the core team plumbing
Task Decomposition
Break the goal apart
· HTN-style sub-tasks
· Dependency DAG
· Parallel-safe units
· Per-task DoD
· Roll-up criteria
Task Assignment
Match work to agent
· Capability matching
· Bidding (contract net)
· Load balancing
· Sticky / affinity routing
· Reassignment on failure
Concurrency Control
Run agents in parallel
· Fork / join · barriers
· Map-reduce / fan-out
· Race · first-win
· Cancellation propagation
· Deadlock detection
Result Aggregation
Merge sub-agent outputs
· Schema-aware merge
· Citation preservation
· De-dup · canonicalize
· LLM synthesizer agent
· Coverage report
Consensus & Voting
When agents disagree
· Majority / weighted vote
· Confidence-weighted
· Judge / referee agent
· Self-consistency check
· Tie-breakers · HITL
Conflict Resolution
Reconcile contradictions
· Recency · authority
· Evidence weighting
· Surface to user
· Escalate to expert
· Compensating undo
G · Trust, Identity & Security Across Agents
Multi-agent systems amplify both capability and attack surface — every message and handoff must be authenticated and bounded
Agent Identity
Signed, attestable agents
· Cryptographic agent ID
· Workload identity (SPIFFE)
· Signed capability cards
· Agent provenance ledger
Message Authentication
Tamper-evident envelopes
· Signed JWT / DPoP
· Replay-protection nonce
· Origin agent verified
· mTLS on transport
Cross-Agent Injection
Treat peer output as untrusted
· No instruction-following
· Quarantine peer messages
· Trust labels carried
· Spotlighting / delimiters
Permission Attenuation
Sub-agents get less, never more
· Token down-scoping
· Read-only by default
· Tool allow-lists
· Data-scope clamping
Rogue-Agent Defense
Catch & isolate misbehavior
· Anomaly detection
· Loop / abuse detector
· Auto-quarantine
· Kill-switch · admin alert
Privacy & Data Boundaries
Don't leak between agents
· PII redaction in handoffs
· Tenant-scoped contexts
· Need-to-know filtering
· Cross-tenant blocks
H · Multi-Agent Frameworks & Runtimes
Production-ready stacks for orchestrating teams of agents
LangGraph / Agent SDK
Graph-based agent flows
Stateful · checkpointable
Anthropic / LangChain
CrewAI
Role-based crews
Tasks · processes · tools
Hierarchical / sequential
AutoGen / Magentic
Conversation-driven
Group chat patterns
Microsoft
OpenAI Swarm / Agents
Lightweight handoffs
Tool-driven routing
Stateless agents
Durable Execution
Temporal · Cadence · Restate
Replay-safe orchestration
Long-running multi-agent
Custom Runtimes
Bespoke controllers
Actor model · Akka · Ray
Ray Serve · Dapr
I · Observability & Multi-Agent Telemetry
Trace every agent, every message, every handoff — across the whole team
Distributed Tracing
OTel spans across agents
Conversation graph view
Per-agent sub-trace
Cost & Token Roll-up
Per agent · per team · total
Budget burn tracking
Top-spender attribution
Conversation Replay
Step through messages
Time-travel debug
Counterfactual re-runs
Team Health Metrics
Throughput · success rate
Per-role error rates
Idle / loop detection
Decision Logs
Why this agent · this task
Vote / consensus history
Plan diffs across team
Audit & Compliance
Signed team transcripts
Per-agent attribution
Evidence for review
⇣ Outbound — Aggregated Team Result to Layer 3 (Orchestrator)
Synthesized answer · per-agent contributions · consensus / dissent · citations · cost · trace · escalations
Final Answer
Schema-conformant synthesis
Per-Agent Contributions
Attribution · diffs
Consensus / Dissent
Vote tallies · open issues
Citations & Evidence
Source spans · trust labels
Team Trace
Conversation graph · spans
Cost & Escalations
Tokens · $ · HITL flags
All artifacts are signed, traced, and attributable to the originating agent.
Cross-cutting Identity, Trust & Telemetry
Cross-cutting Identity, Trust & Telemetry
Coordination Patterns / Frameworks
Supervisor / Coordinator
Agent Roles
Communication Protocols
Lifecycle / Mechanics
Trust & Security
Observability
Forward team flow
Inter-agent messaging
Critique / re-plan loop
Aggregated result return
Detailed view of Layer 8 — Multi-Agent Collaboration from the Agentic AI System Architecture reference.
A single Coordinator picks a topology, spawns specialist roles, brokers signed messages over typed protocols, manages lifecycle & permissions, runs concurrent tasks, aggregates results with consensus, and emits a single team artifact back to the orchestrator. Trust controls and observability span every agent and every handoff.
Layer 9 Action & Environment Interface
Agentic AI System Architecture › Layer 9 Detail
Action & Environment Interface
Where agents take real-world effects — through digital and physical environments. Pre-flight validation, isolated execution, side-effect tracking, compensating actions, receipts, and a reversible record of every change the agent makes.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Action Request from Layer 6 (Tools) / Layer 3 (Orchestrator)
action_type · target environment · arguments · principal · scopes · idempotency_key · deadline · risk_class · approval_token · trace_id
execute({ env: "browser", action: "submit_form", args: {...}, risk: "medium", reversible: true, approval: "auto", deadline: 30s })
A · Pre-Flight Gate — Decide Whether the Action May Proceed
Verify scope, classify risk, dry-run side effects, secure approvals — refuse early when in doubt
Risk Classifier
How dangerous is this?
· Reversible vs destructive
· Blast radius estimate
· Public · private · regulated
· Money / identity impact
Authorization & Scope
Right to act here?
· OAuth / OIDC scope check
· Tenant / project boundary
· Token attenuation
· Per-environment ACL
Dry-Run / Simulation
What would happen?
· What-if effect preview
· Plan mode (read-only)
· Diff before write
· Sandbox replay
Approval Gate (HITL)
Human in the loop
· Auto-approve · prompt user
· Two-person rule for high-risk
· Step-up auth (MFA)
· Approval token signing
Idempotency & Dedup
Don't apply twice
· Idempotency-Key header
· Request fingerprint
· Replay-detection window
· De-dup ledger
Compliance Pre-Check
Regulatory & policy
· Region / residency
· GDPR / HIPAA / PCI
· Quiet-hours respect
· Quota / budget caps
B · Effector — Universal Action Dispatcher
A single, audited entry-point that translates intent into environment-specific commands
Action Effector / Environment Bridge
Resolve env adapter · acquire lease · execute · capture effect · emit receipt
RESOLVE
env adapter
LEASE
workspace · slot
EXECUTE
command
CAPTURE
effects
RECEIPT
sign · log
Adapter Registry
Per-environment drivers
Action Schema Validator
Typed args · invariants
Lease & Concurrency
Per-resource locks · queues
Result Normalizer
Stable shape · trim · redact
Receipt Signer
Cryptographic evidence
Telemetry Emitter
OTel span · cost · latency
C · Environment Targets — Where Actions Land
The catalog of environments the agent can manipulate — each with its own driver, capabilities, and risk profile
Computer Use
Operate desktop / mobile
· Mouse · keyboard · screen
· Click · type · scroll · drag
· Accessibility tree
· OS shortcuts · clipboard
· Window / app focus
· VNC / RDP isolated VM
· Mobile emulator (iOS · Android)
· Action recorder & replay
· Visual grounding / OCR
Browser Agents
Operate the web
· Navigate · click · fill
· DOM & ARIA selectors
· Form auto-fill
· File upload · download
· Headless · headful
· Playwright · Puppeteer
· Cookie · session vault
· robots.txt & ToS aware
· CAPTCHA detection (refuse)
Code Sandboxes
Execute & build
· Python · Node · Bash
· Containers (Docker · OCI)
· MicroVMs (Firecracker)
· gVisor · Kata · WASM
· Browser-side VMs (E2B)
· Notebook (Jupyter)
· Build pipelines · CI runs
· Test suites · benchmarks
· Ephemeral & persistent
Enterprise Systems
Systems of record
· CRM · ERP · ITSM
· HRIS · billing · finance
· Salesforce · ServiceNow
· SAP · Workday · NetSuite
· Data lake / lakehouse
· DevOps (CI/CD · IaC)
· SOC / SIEM · monitoring
· Identity / IDM (Okta · AAD)
· EHR · LIS (regulated)
Physical / IoT
Real-world actuation
· Robotics control APIs
· Sensors · actuators
· Smart-home (Matter · HA)
· Industrial PLC / SCADA
· Drone / vehicle telemetry
· Edge / on-device runtime
· ROS / OPC-UA bridges
· Safety interlocks · e-stops
· Geo-fenced operation
Output Channels
Reach humans & teams
· Notifications · push
· Email · SMS · voice
· Slack · Teams posts
· Git commits · PRs
· Tickets · Jira · Linear
· Reports · dashboards
· Pager / on-call
· Status pages
· Templates · approvals
D · Isolation, Sandboxing & Resource Governance
Bound the blast radius — every action runs in a constrained environment with enforced limits
Sandbox Runtimes
Hard isolation per action
· Containers (Docker · OCI)
· MicroVMs (Firecracker · QEMU)
· gVisor · Kata · WASM
· Browser-tab VMs (E2B)
· Ephemeral · per-task
Resource Limits
CPU · RAM · disk · time
· cgroups · ulimits
· Wall-clock timeouts
· File-system quotas
· Process count limits
· Per-tenant capacity
Network Policy
Egress & DNS control
· Domain allow-list
· No-egress mode
· Outbound proxy & logs
· Service-mesh mTLS
· Rate-limit per host
Workspace State
Per-action persistence
· Scratch FS per run
· Persistent volumes
· Snapshot & restore
· Worktree isolation (git)
· Auto-cleanup TTL
Secrets & Credentials
Just-in-time injection
· Vault / KMS / HSM
· Short-lived tokens
· OAuth on-behalf-of
· Rotated · scoped
· Memory-only injection
Concurrency & Pooling
Throughput & warm starts
· Sandbox warm pool
· Per-env concurrency cap
· Connection pooling
· Backpressure signals
· Cold-start optimization
E · Side-Effect Capture & Causality Tracking
Record exactly what changed in the world — for review, replay, and rollback
Effect Recorder
What did it do?
· Before / after diff
· Resource IDs touched
· DOM mutations · API calls
· FS writes · DB rows
· Network egress log
Causality & Lineage
Why did it happen?
· Trigger trace_id
· Plan step → action map
· Agent attribution
· Approval evidence
· Causal chain graph
Receipts & Evidence
Tamper-evident proof
· Signed action receipt
· Hash-chained log
· External system IDs
· Screenshot · DOM snapshot
· External provider receipt
Streaming Output
Live progress to user
· Stdout / stderr stream
· Progress events
· Partial-result emit
· Cancel signal listener
· Live screen capture
Output Sanitization
Make output safe
· PII / secret redact
· Size truncation
· Trust label tagging
· Schema-conformant
· Encoding normalize
Notification Hook
Tell who needs to know
· Action-completed event
· Failure / alert webhook
· Audit topic publish
· User receipt UI
· Status-page hook
F · Reversibility, Compensation & Recovery
Plan for "undo" before you act — rollback, compensate, or escalate when the world doesn't cooperate
Compensation Registry
"Undo" recipes per action
· Inverse-action mapping
· Soft-delete patterns
· Restore-from-snapshot
· Manual-undo runbooks
Saga Coordinator
Multi-step transactions
· Forward + compensate steps
· Per-step idempotency
· Failure → cascade undo
· Temporal · Cadence engines
Snapshot & Rollback
Time-travel state
· FS / VM snapshots
· DB point-in-time recovery
· Git revert · branch
· Worktree restore
Failure Classifier
What went wrong?
· Retryable transient
· Permanent / policy reject
· Partial-success / dangling
· Decide retry / undo / abort
Retry & Backoff
Recover gracefully
· Exponential · jittered
· Idempotency-key reuse
· Circuit breaker per env
· Poison-message DLQ
Escalation Path
When automation isn't enough
· Page on-call
· Open ticket / runbook
· Pause & ask user
· Manual-step inventory
G · Safety Interlocks & Hard Stops
Independent safety controls that no agent can override
Hard Limits
Forbidden ops list
Geo · sector · scope blocks
Never auto-allow
Kill-Switch
Stop all actions
Per-agent / per-env / global
Operator-controlled
Velocity Caps
Per-min / hour rate
Anomaly auto-pause
Spike detection
Tripwires
Auto-trigger conditions
Honeypot resources
Forbidden domain hits
Manual Override
Operator pause / cancel
Approve · reject · modify
Real-time intervention
Physical E-Stops
For robotics / IoT
Hardware interlocks
Geo-fence violations
H · Observability, Audit & Forensics
Every action is traced, signed, and replayable — for debugging, compliance, and incident response
Action Tracing
OTel spans per call
env · agent · trace_id
Latency / cost meters
Audit Log (signed)
Hash-chained · immutable
Who · what · when · why
Compliance evidence
Replay & Forensics
Reconstruct any run
Recorded I/O · screen
Counterfactual replay
Anomaly Detection
Drift · spikes · errors
Per-env baselines
Auto-quarantine triggers
Cost & SLO Tracking
Per-env $ · success rate
Error budget burn
Top-spender attribution
User Receipt UI
Show what was done
Effect timeline
Undo / inspect controls
⇣ Outbound — Action Outcome to Layer 3 (Orchestrator) / Layer 10 (Reflection) / Layer 11 (Governance)
Status · effect record · receipt · compensation handle · trace · cost · escalation flags — a complete account of what happened
Status
success · partial · failed · refused
Effect Record
Resources changed · diffs
Signed Receipt
Hash · external IDs · proof
Compensation Handle
Inverse-action token
Trace & Telemetry
Spans · cost · latency
Escalations
HITL · alerts · pages
All outcomes are signed, traced, and reversible (or marked as one-way) — never silently applied.
Cross-cutting Approval, Audit & Reversibility
Cross-cutting Approval, Audit & Reversibility
Pre-Flight / Safety
Effector / Dispatch
Environment Targets
Isolation / Sandbox
Side-Effect Capture
Reversibility
Observability
Forward action flow
Effect on environment
Rollback / hard-stop loop
Outcome / approval return
Detailed view of Layer 9 — Action & Environment Interface from the Agentic AI System Architecture reference.
Every action passes a pre-flight gate, runs through a unified Effector into a sandboxed environment adapter, captures its side-effects with signed receipts, and emits an outcome bundle plus a compensation handle. Independent safety interlocks and observability surround the whole pipeline so no change to the world is silent or irreversible.
Layer 10 Reflection, Evaluation & Continual Learning
Agentic AI System Architecture › Layer 10 Detail
Reflection, Evaluation & Continual Learning
The closed-loop self-improvement layer — collect trajectories, evaluate quality, reflect on lessons, distill skills, run benchmarks, retrain, and ship improvements safely back into prompts, models, and tools.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Signals from Every Layer (1–9 & 11)
trajectories · tool I/O · effects & receipts · user feedback · ratings · escalations · safety violations · cost / latency · audit events
{ run_id, trace, plan, prompts, tools[], outputs, effects[], feedback{thumbs, edits, regenerate}, slo, errors[] }
A · Trajectory & Feedback Collection
Build the canonical record of every agent run — the raw material for every downstream improvement
Trace Ingestor
Stream every run
· OTel spans · structured logs
· LangSmith / Langfuse
· Per-step metadata
· Tool I/O · model calls
· Replay-safe serialization
Explicit Feedback
User-stated signal
· 👍 / 👎 votes
· Star / scale ratings
· Free-text comments
· Survey forms
· Bug / issue reports
Implicit Signals
Behavior tells the story
· Edit / accept rate
· Regenerate clicks
· Abandonment · stop
· Dwell · scroll · revisit
· Follow-up question rate
System Outcomes
Did it actually work?
· Test pass / fail
· Tool error rates
· Goal attainment
· Side-effect reversals
· HITL approval rates
Cost / SLO Telemetry
Operational fitness
· Tokens · $ per run
· Latency P50 / P95
· Cache hit-rate
· Tool retry counts
· Error budget burn
Trajectory Store
Durable archive
· Object storage (S3)
· Indexed (vector + KV)
· PII-redacted variant
· Versioned · TTL'd
· Replay-able
B · Evaluation Engine — Scoring Trajectories
Multiple scoring strategies — programmatic, model-based, human — combined into a final quality signal
LLM-as-Judge
Model-based scoring
· Pairwise comparison
· Rubric scoring (1–5)
· Single / multi-judge
· Calibration vs humans
· Bias mitigation
· Reasoning traces logged
Programmatic Verifiers
Ground-truth checks
· Unit / integration tests
· Property-based checks
· Schema · invariants
· Numeric / string equality
· Constraint satisfaction
· Linters · formatters
Reward Models
Learned quality scorer
· Outcome reward (ORM)
· Process reward (PRM)
· Per-step reward dense
· Trained from prefs
· Calibration audit
· Reward-hacking probes
Faithfulness & Safety Eval
Truthful & safe
· NLI / entailment grader
· Citation grounding check
· Hallucination detector
· Toxicity / harm classifier
· Jailbreak resistance
· PII leakage probe
Human Annotation
Gold-standard labels
· Expert review queue
· Pairwise preferences
· Inter-rater agreement
· Active learning (uncertain)
· SME consult for domain
· Calibrate LLM judges
Score Aggregator
Combine signals
· Weighted ensemble
· Pass / fail thresholds
· Confidence intervals
· Per-dimension breakdown
· Outlier flagging
· Trend tracking
C · Self-Reflection & Learning In-the-Loop
Mid-run improvement: critique, revise, and capture lessons that transfer to the next attempt
Reflexion / Self-Refine
Critique then retry
· Generator + critic split
· Verbal-feedback memory
· N-round revision loop
· Stop on quality plateau
· Reduces hallucination
Lessons-Learned Capture
Episode → insight
· "What went wrong" notes
· Retry strategy hints
· Failure-mode taxonomy
· Stored in episodic memory
· Retrieved next attempt
Inner-Monologue Critic
Built-in challenge
· "Does this look right?"
· Confidence calibration
· Self-consistency vote
· Pre-commit review
· Confess uncertainty
D · Skill & Recipe Distillation
Promote successful patterns into reusable, named, versioned skills
Pattern Miner
Find recurring success
· Cluster successful runs
· Extract common steps
· Identify pre/post conditions
· Tool-sequence subgraphs
· Cost / latency profile
Voyager-style Skills
Self-grown library
· Auto-curated repertoire
· Compositional reuse
· Trigger-condition tags
· Refactor on improvement
· Personal & org-shared
Skill Promotion
Validate & publish
· Eval gate (offline)
· Canary in production
· Versioned · signed
· Push to skill registry
· Auto-deprecate weaker
E · Eval Harness — Offline, Online & Capability
A continuous quality bar — golden datasets, A/B tests, regressions, capability suites, and red-team evals
Golden Datasets
Source of truth
· Curated Q&A · scenarios
· Domain-specific suites
· Synthetic + real mix
· Edge-case coverage
· Adversarial probes
· Versioned · maintained
Offline Benchmarks
Repeatable scoring
· Public (MMLU · HELM · BIG)
· Agent (SWE-bench · WebArena)
· Tool-use evals (BFCL)
· Custom domain suites
· Cost / latency budgets
· Statistical significance
Online A/B & Shadow
Live experimentation
· % traffic split
· Shadow / dark-launch
· Power analysis
· Guardrail metrics
· Auto-stop on regression
· Holdout cohorts
Regression & CI Eval
No silent quality drops
· Per-PR eval gate
· Snapshot diff vs baseline
· Per-dimension regression
· Win/loss flake guard
· Eval flakiness tracker
· Weekly trend reports
Capability & Red-Team
Push limits safely
· Dangerous-capability eval
· Jailbreak / adversarial
· Bias / fairness audits
· Privacy / leakage probes
· RSP / ASL gating
· Scheduled red-team runs
Eval Reports
Decision-grade artifacts
· Dashboards · scorecards
· Per-cohort breakdown
· Failure-case galleries
· Approval evidence packs
· Release-readiness signal
· Distributed to stakeholders
F · Reflection & Improvement Hub — The Closed-Loop Engine
Synthesize scored trajectories into prioritized improvements: prompts, data, models, tools, policies
Improvement Synthesizer
Triage failures · cluster · prioritize · propose intervention · track to completion
CLUSTER
DIAGNOSE
PROPOSE
PRIORITIZE
SHIP
Failure Triage
Cluster · taxonomy · root cause
Bucket by error type
Intervention Picker
Prompt · data · model · tool · policy
Choose the right lever
Backlog & Tracker
Prioritized improvement queue
Owner · due-date · impact
Closure Verification
Re-eval after fix lands
Confirm metric moved
G · Continual Training & Adaptation
Turn production trajectories into better prompts, fine-tunes, and reward signals
Trace Mining & Curation
From logs → datasets
· Extract good trajectories
· Rejection sampling
· De-dup & balance
· PII scrub before training
· Synthetic augmentation
· Consent & opt-in only
Prompt Optimization
Cheap, fast wins
· Auto-prompt search
· DSPy compilation
· Few-shot exemplar mining
· Persona / system tuning
· Skill / SKILL.md updates
· Rubric-guided rewriting
Supervised Fine-Tune
SFT on curated data
· Instruction tuning
· Tool-use distillation
· Rejection-sampled SFT
· LoRA / QLoRA adapters
· Per-tenant adapters
· Curriculum staging
Preference Optimization
Align to human / AI prefs
· DPO · IPO · KTO
· RLHF · PPO
· RLAIF (Constitutional AI)
· GRPO · process rewards
· Reward-model training
· KL-controlled updates
Tool / Retrieval Tuning
Beyond the model
· Embedding fine-tunes
· Re-ranker training
· Tool-spec rewrites
· Chunking strategy tuning
· Skill cards optimization
· Router / cascade tuning
Self-Play / Synthetic
Augment scarce data
· Self-play trajectories
· LLM-generated tasks
· Adversarial generation
· Verifier-filtered output
· Counterfactual replays
· Quality watermarking
H · Quality Control & Learning Safety
Don't make the system worse — guard against drift, forgetting, poisoning, and reward hacking
Drift Detection
Distribution shift
Input · output drift
Auto-alert & pause
Catastrophic-Forget Guard
Don't lose old skills
Replay buffer
EWC / KL constraints
Reward Hacking Probes
Specification gaming
Multi-judge cross-check
Process & outcome both
Data-Poisoning Defense
Untrusted-trace screening
Anomaly filtering
Provenance required
Eval Gating
No-regress release rule
Capability + safety pass
Rollback on fail
Privacy & Consent
Train-on-data flags
DP / k-anonymity
RTBF cascade
I · Safe Deployment & Rollout
Ship improvements with the same rigor as code — versioned, canaried, monitored, reversible
Versioning & Registry
Prompts · models · skills
SemVer · signed artifacts
Provenance ledger
Canary Rollout
Gradual % traffic
Health-gate auto-promote
Auto-rollback on regression
Feature Flags
Per-tenant gates
Kill-switch flags
Dynamic config
Post-Deploy Monitor
Live metric watch
Quality · cost · safety
SLO breach → rollback
Change Log & Audit
What changed · who · when
Eval-evidence linked
Reviewer sign-off
User Communication
Release notes
Behavior-change alerts
Known-issue digest
⇣ Outbound — Improvements Pushed Back into the Stack
Updated artifacts deployed across every layer they affect — closing the agentic improvement loop
→ Layer 4 · Reasoning
Adapters · prompts · model versions
→ Layer 3 · Orchestrator
Plan templates · routing rules
→ Layer 5 · Memory
Lessons · skills · profiles
→ Layer 6 · Tools / Skills
New & revised skill cards
→ Layer 7 · RAG
Embeddings · re-rankers · chunking
→ Layer 11 · Governance
Eval evidence · safety reports
Every push is versioned, eval-gated, canaried, and reversible — no silent drift.
Cross-cutting Eval, Safety & Lifecycle
Cross-cutting Eval, Safety & Lifecycle
Collection / Hub
Evaluation
Reflection
Skill Distillation / Deploy
Eval Harness
Continual Training
Quality / Safety
Forward learning flow
Deployed improvement
Closed-loop / reflection
Detailed view of Layer 10 — Reflection, Evaluation & Continual Learning from the Agentic AI System Architecture reference.
Trajectories and feedback flow in from every layer, are scored by a multi-method evaluation engine, fuel mid-run reflection and offline distillation, run through a comprehensive eval harness, and converge on an Improvement Synthesizer that triages failures into prioritized interventions. Continual-training and safe-deployment pipelines push versioned, canaried, eval-gated improvements back across the stack — closing the agentic loop.
Layer 11 Safety, Governance, Trust & Observability
Agentic AI System Architecture › Layer 11 Detail
Safety, Governance, Trust & Observability
The cross-cutting control plane that wraps every other layer — guardrails, identity, policy-as-code, compliance, observability, red-teaming, and incident response — making the agent system safe, accountable, and operable in production.
Detailed Diagram · v1.0 · 2026
⇄ Cross-Cutting Signals — Wraps Layers 1-10 (every request, action, and effect crosses this layer)
requests · responses · tool calls · effects · trajectories · feedback · cost · errors · audit events · safety incidents
L1 User · L2 Perception · L3 Orchestrator · L4 Reasoning · L5 Memory · L6 Tools · L7 RAG · L8 Multi-Agent · L9 Action · L10 Reflection
A · Guardrails — Input, Output & Behavior Filters
Defend the model and the user — block harmful inputs and unsafe outputs at every boundary
Input Filters
First line of defense
· Toxicity · hate · violence
· CSAM hash matching
· Self-harm classifier
· Dangerous-cap probe
· Schema / size limits
· Bidi / homoglyph guard
Prompt-Injection Defense
Trust-boundary enforcement
· Quarantine tool results
· No-instruction-follow rule
· Spotlighting / delimiters
· Hidden-text decoder
· Multimodal probe
· Confirm sensitive ops
PII / DLP
Detect & redact
· Names · IDs · phones
· Cards · accounts · keys
· Health / financial data
· Tokenization vault
· Differential privacy
· Egress DLP scan
Output Filters
Last-mile safety
· Refusal classifier
· Hallucination probes
· Schema-conformant
· Watermarking · C2PA
· Content tags
· Bias / fairness checks
Behavioral Guardrails
Constrain agent action
· Topic allow / deny
· Persona & tone constraints
· Refusal templates
· Off-task detection
· Loop / runaway breaker
· Action allow-list scope
Frameworks
Standardized stacks
· NeMo Guardrails · LMRails
· Llama Guard · Granite
· Azure Content Safety
· OpenAI Moderation
· Custom rules engine
· Versioned · A/B-tested
B · Identity, Access & Secrets
Verify who is acting, what they're allowed to do, and protect every credential along the way
Authentication
Prove identity
· SSO · OAuth · OIDC · SAML
· Passkeys · MFA · step-up
· Service-account / SPIFFE
· Workload identity
· Token refresh / revoke
Authorization
Decide what's allowed
· RBAC · ABAC · ReBAC
· Scopes · entitlements
· Tool allow / deny lists
· Tenant / project isolation
· Delegated / on-behalf-of
Agent Identity
First-class principals
· Cryptographic agent ID
· Capability cards (signed)
· Agent provenance ledger
· Per-agent token scopes
· Sub-agent attenuation
Secrets & Keys
JIT, scoped, rotated
· HashiCorp Vault
· KMS / HSM-managed keys
· OAuth token exchange
· Short-lived tokens
· Memory-only injection
Network & Boundary
Zero-trust transport
· mTLS service-to-service
· Service mesh (Istio · Linkerd)
· Egress proxy & allow-list
· Private VPC · sovereign cloud
· WAF / DDoS shield
Encryption
Data confidentiality
· At-rest (AES-GCM)
· In-transit (TLS 1.3)
· In-use (confidential VM)
· BYOK / HYOK
· Per-tenant key isolation
C · Policy-as-Code & Action Gating
Encode rules once, enforce them everywhere — versioned, auditable, testable
Policy Engine
Decision point
· Open Policy Agent (OPA)
· Cedar · Rego rules
· Versioned · signed bundles
· Tenant overrides
Action Approval
Gate risky operations
· Auto · HITL · admin
· Two-person rule
· Step-up auth
· Signed approval token
Risk Classifier
How dangerous is this?
· Reversible vs destructive
· Blast radius estimate
· Data sensitivity tier
· Money / identity impact
Quotas & Rate Limits
Bound consumption
· Token / $ caps
· Per-tenant quotas
· Velocity / spike caps
· Fair-share scheduling
Policy Authoring & Test
Treat policy as code
· Code review · CI tests
· Canary policy rollout
· Counterfactual evaluation
· Rollback on regression
Decision Log
Why allowed / denied
· Per-decision evidence
· Rule version applied
· Counterexample queries
· Appeals workflow
D · Compliance, Audit & Regulatory Mapping
Demonstrate trustworthy operation to regulators, auditors, and customers — with evidence
Regulatory Mapping
Frameworks & standards
· EU AI Act · NIST AI RMF
· ISO/IEC 42001 (AI MS)
· GDPR · CCPA / CPRA
· HIPAA · PCI · SOX
· SOC 2 · ISO 27001
Audit Log
Tamper-evident record
· Hash-chained · signed
· Immutable storage (WORM)
· Who · what · when · why
· Cross-layer correlation
· Long-term retention
Data Residency & Sovereignty
Where data lives matters
· EU · US · APAC pinning
· Sovereign-cloud routing
· Cross-border guards
· On-prem deployment
· Air-gapped envs
DSAR & Subject Rights
User data rights
· Access · export · portability
· Right-to-be-forgotten
· Cascade across stores
· Self-service portal
· Erasure receipts
AI-Specific Disclosures
Be transparent
· Model cards · system cards
· Datasheets · data lineage
· "Talking to AI" disclosure
· Synthetic-content labeling
· Risk-tier reporting
Evidence & Reporting
Audit-ready exports
· Auto-generated evidence
· Control-mapping matrix
· Regulator-ready packs
· Drata · Vanta · Secureframe
· Customer trust portal
E · Trust & Safety Operations Hub
Central console where humans monitor, intervene, investigate, and escalate
Trust & Safety Console
Live dashboards · alerts · approvals · investigations · kill-switches
DETECT
TRIAGE
CONTAIN
RECOVER
REPORT
SOC for AI
24/7 monitoring · on-call
Alert routing & escalation
Approval / HITL Inbox
Pending high-risk actions
SLA-driven decisions
Kill-Switch Console
Per-agent · env · global
Operator-only authority
Investigation Workbench
Replay · search · evidence
Forensic timeline
F · Observability — Tracing, Metrics, Logs & Cost
See everything the agent does — with replay, attribution, and SLO accountability
Distributed Tracing
End-to-end view
· OpenTelemetry spans
· LangSmith · Langfuse
· Helicone · Phoenix · W&B
· Per-step / tool / agent
· Conversation graph view
· Cross-layer trace_id
Metrics & SLOs
Quantified health
· Latency P50 / P95 / P99
· Success / abandon rate
· Tool error rate
· Cache hit-rate
· SLO & error-budget burn
· Prometheus · Datadog
Structured Logging
Forensic detail
· Event-sourced runs
· Per-step input / output
· Tool I/O recorded
· PII-redacted variant
· Search · query · alert
· Retention policy
Cost Observability
$ accountability
· Tokens · $ per call / run
· Per-tenant chargeback
· Top-spender attribution
· Budget burn dashboards
· Cache-hit savings
· Cost-anomaly alerts
Replay & Time-Travel
Reconstruct any run
· Recorded I/O
· Counterfactual debug
· Step-through inspector
· Screenshot / DOM cap
· Reproducible re-runs
· Diff vs golden
Anomaly & Alerting
Catch issues fast
· Drift detection
· Tool-error spikes
· Refusal-rate jumps
· Cost / latency spikes
· Auto-page on-call
· PagerDuty · OpsGenie
G · Red-Team & Capability Gating
Stress-test the system before adversaries do — and gate dangerous capabilities responsibly
Adversarial Red-Team
Find the failure modes
· Jailbreak attempts
· Prompt-injection probes
· Tool-abuse scenarios
· Multi-step exploit chains
· Continuous + scheduled
Capability Evals
Measure dangerous skills
· CBRN · cyber · autonomy
· Persuasion · manipulation
· Self-replication probes
· Long-horizon planning
· Independent evals
RSP / ASL Gating
Tiered release controls
· Responsible Scaling Policy
· AI Safety Levels (ASL)
· Deployment thresholds
· If/then commitments
· Public reporting
H · Model & Tool Lifecycle · Incident Response
Every AI artifact is versioned, monitored, and recoverable — drills keep the team sharp
Model / Tool Governance
Versioned & controlled
· Model registry · cards
· Tool allow-list / deny
· Canary · rollback
· Deprecation policy
· Provenance ledger
Incident Response
When things go wrong
· Runbooks & on-call
· Containment · isolate agent
· User & regulator notify
· Root-cause analysis
· Blameless post-mortem
Drills & Game-Days
Practice for crises
· Chaos exercises
· Tabletop simulations
· Kill-switch drill
· DSAR rehearsal
· Recovery time targets
I · Transparency, Explainability & User Trust
Help users understand what the agent did and give them meaningful control
Decision Explanations
Why did it do that?
· Reasoning trace UI
· Tool-call timeline
· Citation panels
· Confidence indicators
User Controls
Stay in charge
· Memory opt-in / opt-out
· Train-on-data flags
· Tool / scope toggles
· Cancel · undo · redo
Disclosures & Receipts
Set expectations
· "AI-generated" labels
· Action receipts
· Limitations notice
· Customer trust portal
⇄ Enforcement & Signals to Every Layer
Governance is bidirectional — signals collected from layers, enforcement decisions sent back
→ L3 Orchestrator
allow / deny decisions
HITL · risk class
→ L4 Reasoning
model allow-list
capability flags
→ L6 Tools
tool allow / deny
scope & quota
→ L9 Action
approval tokens
kill-switch state
→ L5 / L7 Memory · RAG
ACL · DSAR · TTL
residency rules
→ L10 Reflection
eval gates
release approval
⇣ External Outputs — Stakeholders, Regulators & Public Trust
Audit packs · model / system cards · transparency reports · safety disclosures · DSAR fulfillment · breach notifications · customer trust portal
Cross-cutting · Wraps All Layers (1–10)
Cross-cutting · Wraps All Layers (1–10)
Guardrails
Identity / Transparency
Policy
Compliance
Trust Hub / Observability
Red-Team / Capability
Lifecycle / Incident Response
Forward governance flow
Enforcement / disclosure
Live override / kill-switch
Detailed view of Layer 11 — Safety, Governance, Trust & Observability from the Agentic AI System Architecture reference.
This layer is cross-cutting: it wraps Layers 1–10. Signals from every layer flow in; guardrails, identity, policy, compliance, observability, red-teaming, lifecycle, and transparency controls flow out as enforcement decisions, audit evidence, and stakeholder disclosures. The Trust & Safety Hub provides a live console for humans to detect, contain, and recover from incidents — and the kill-switch path lets operators stop the system at any time.
Layer 12 Infrastructure & Platform
Agentic AI System Architecture › Layer 12 Detail
Infrastructure & Platform
The substrate beneath every agent — compute, accelerators, model serving, runtimes, storage, networking, deployment topologies, and the SRE / FinOps machinery that keeps it all running reliably, securely, and economically at scale.
Detailed Diagram · v1.0 · 2026
⇣ Workload Demand — Every Other Layer Runs on This Substrate
model inference · agent runs · tool execution · vector / graph queries · multi-agent coordination · evaluation jobs · training
L1–L11 workloads · sync / async · batch / stream · long-running · global / regional · per-tenant SLOs
A · Compute & Accelerator Fleet
A heterogeneous, capacity-managed fleet — right hardware for training, inference, agents, and tools
NVIDIA GPUs
Workhorse for training & inference
· H100 · H200 · B100 · B200
· GB200 NVL72 racks
· NVLink · NVSwitch fabric
· FP8 / FP16 / BF16
· MIG partitioning
· CUDA · cuDNN · NCCL
Cloud Accelerators
Hyperscaler-native silicon
· Google TPU v5p · v6 (Trillium)
· AWS Trainium · Inferentia
· Azure Maia · Cobalt
· AMD MI300X / MI350
· Intel Gaudi 3
· OCI / Lambda / CoreWeave
Specialty Accelerators
Ultra-low-latency inference
· Groq LPU
· Cerebras WSE-3
· SambaNova SN40L
· Tenstorrent Wormhole
· d-Matrix · etched.ai
· FPGA / ASIC fast paths
CPU & General Compute
Agents · tools · orchestration
· x86 · ARM (Graviton · Ampere)
· High-mem · high-CPU SKUs
· Spot / preemptible
· Confidential VMs (SEV · TDX)
· Burstable instances
· Dedicated tenancy
Edge & Device
On-device inference
· Apple Neural Engine
· Qualcomm Hexagon NPU
· NVIDIA Jetson · Orin
· Coral TPU · Hailo
· WebGPU / WASM
· Quantized SLM models
Capacity Management
Right-size, right-time
· Reservations · commitments
· Spot · preemptible mix
· Cluster autoscaler
· Multi-cloud burst
· Per-tenant quotas
· Forecast-driven planning
B · Model Serving, Training & ML Platform
From research to production — high-throughput inference, distributed training, and the MLOps glue around them
Inference Servers
High throughput · low TTFT
· vLLM · SGLang
· TensorRT-LLM · TGI
· Triton Inference Server
· llama.cpp · MLX (edge)
· Continuous batching
Hosted Model APIs
Provider-managed
· Anthropic · OpenAI
· Google · Mistral · xAI
· Bedrock · Vertex AI
· Azure OpenAI Service
· Together · Fireworks · Replicate
Distributed Training
Pre-train · fine-tune · DPO
· PyTorch · JAX · DeepSpeed
· Megatron · NeMo · Axolotl
· FSDP · ZeRO · TP / PP / EP
· Slurm · Ray · Kubeflow
· Checkpoint · resume
Compiler & Kernels
Squeeze every flop
· FlashAttention 3 · FA-decoder
· Triton · CUDA · ROCm
· torch.compile · XLA
· TVM · Mojo · IREE
· FP8 / INT4 GEMM
Model Registry & MLOps
Lifecycle of every artifact
· MLflow · W&B · Comet
· Hugging Face Hub
· Versioning · provenance
· Approval · canary · rollback
· Signed artifacts (SLSA)
Optimization & Deploy
From weights to traffic
· Quantize · prune · distill
· AWQ · GPTQ · SmoothQuant
· Speculative decoding
· Multi-tenant serving
· Cold-start optimization
C · Agent & Workflow Runtimes
Stateful execution engines that drive long-running, resumable agent loops
Agent Frameworks
Build & run agents
· Anthropic Agent SDK
· LangGraph · LangChain
· CrewAI · AutoGen · Magentic
· LlamaIndex · Haystack
Durable Execution
Replay-safe orchestration
· Temporal · Cadence
· Restate · Inngest
· DBOS · Trigger.dev
· Long-running runs
Distributed Compute
Map-reduce / actors
· Ray · Ray Serve
· Spark · Dask · Modal
· Akka · Erlang OTP
· Dapr · Restate
Sandbox & Tool Runtime
Per-action isolation
· Firecracker · Kata · gVisor
· WASM · WASI runtimes
· E2B · Daytona · CodeSandbox
· Browser-tab VMs
MCP Hosting
Tool-server platform
· Local stdio servers
· Remote SSE / WebSocket
· Multi-tenant gateway
· Capability negotiation
Schedulers / Queues
Async & cron
· Celery · Sidekiq · BullMQ
· Argo Workflows · Airflow
· Kubernetes Jobs · CronJobs
· Priority & fair-share
D · Container, Orchestration & Cluster Platform
The unified runtime — schedule, isolate, scale, and recover every workload
Kubernetes & Container Platform
Schedules pods · GPU operator · autoscaling · service discovery · secrets · multi-tenant namespaces
SCHEDULE
SCALE
ISOLATE
HEAL
UPGRADE
Container Runtimes
containerd · CRI-O · Docker
OCI images · BuildKit · Buildpacks
GPU / Accelerator Operators
NVIDIA GPU operator · device plugins
Topology-aware · MIG · time-slicing
Autoscaling
HPA · VPA · KEDA · Karpenter
Predictive · queue-driven scale
Multi-Tenant Isolation
Namespaces · NetworkPolicy · OPA
vCluster · gVisor / Kata sandboxing
E · Storage & Data Platform
Polyglot persistence — choose the right database for each agent workload
Object & Block
Bulk & durable
· S3 · GCS · Azure Blob · R2
· EBS · Persistent Disk
· MinIO · Ceph (on-prem)
· Lifecycle · tiering · glacier
· Object lock · WORM
Vector Stores
Semantic search
· pgvector · Pinecone
· Weaviate · Qdrant
· Milvus · Vespa · Turbopuffer
· LanceDB · ChromaDB
· HNSW · IVF · DiskANN
Knowledge Graphs
Relations & paths
· Neo4j · ArangoDB
· Memgraph · NebulaGraph
· TigerGraph · Amazon Neptune
· RDF · SPARQL stores
· Property + temporal edges
OLTP / OLAP
Transactional & analytical
· Postgres · MySQL · Aurora
· Spanner · CockroachDB
· Snowflake · BigQuery
· Databricks · Iceberg
· DuckDB · ClickHouse
KV / Cache / Doc
Hot & flexible
· Redis · KeyDB · Dragonfly
· DynamoDB · Cosmos · Bigtable
· MongoDB · Couchbase · Firestore
· Elastic · OpenSearch
· Memcached · Hazelcast
Time-Series & Stream
Append-only timelines
· TimescaleDB · InfluxDB
· QuestDB · Prometheus TSDB
· Event-sourced state
· CDC streams (Debezium)
· Replay-able trajectories
F · Networking, Messaging & Edge
Move bytes safely and quickly — between users, services, agents, and tools
Edge & CDN
Closer to users
· Cloudflare · Fastly · Akamai
· AWS CloudFront · GCP CDN
· Edge functions / Workers
· WAF · DDoS shield
· Bot / abuse detection
Load Balancing & Ingress
Route & balance
· L4 / L7 LBs · Envoy
· NGINX · HAProxy · Traefik
· K8s Ingress / Gateway API
· Sticky sessions · health
· Global anycast
Service Mesh
Zero-trust east-west
· Istio · Linkerd · Cilium
· mTLS service-to-service
· Retries · timeouts · circuit
· Traffic shifting · canary
· eBPF data plane
RPC & Streaming
Inter-service calls
· gRPC · Connect · Twirp
· REST / OpenAPI
· GraphQL Federation
· Server-sent events (SSE)
· WebSocket · WebRTC
Event Bus / Queueing
Async & pub/sub
· Kafka · Redpanda
· NATS · Pulsar · RabbitMQ
· AWS SQS / SNS / EventBridge
· GCP Pub/Sub · Azure Service Bus
· DLQ · ordered delivery
High-Perf Fabrics
Training / inference net
· InfiniBand · NDR / XDR
· RDMA · RoCE
· NVLink · NVSwitch
· UCX · NCCL · libfabric
· Topology-aware routing
G · Identity, Secrets & Platform Security
Workload identity, secrets, supply-chain integrity, and confidential compute
Workload Identity
Service-to-service auth
· SPIFFE / SPIRE
· Cloud IAM (IRSA · WIF)
· OIDC trust federation
· Per-pod / per-agent identity
· Short-lived certs
Secrets & Key Management
Centralized · rotated
· HashiCorp Vault · Infisical
· AWS / GCP / Azure KMS
· HSM · CloudHSM
· External Secrets Operator
· Just-in-time injection
Supply-Chain & Confidential
Trust the binaries
· SBOM · SLSA levels
· Sigstore · cosign signing
· Image scanning (Trivy · Snyk)
· Confidential VMs (SEV · TDX)
· TEE · attestation
H · Deployment, CI/CD & Infrastructure-as-Code
Reproducible, auditable, GitOps-driven delivery for every artifact in the stack
CI/CD Pipelines
Build · test · deploy
· GitHub Actions · GitLab CI
· Buildkite · CircleCI · Jenkins
· Eval gates · safety gates
· Reproducible builds
· Promotion across envs
GitOps & Continuous Delivery
Declarative · auditable
· Argo CD · Flux
· Helm · Kustomize
· Progressive delivery (Argo Rollouts)
· Canary · blue-green · feature flags
· Auto-rollback on regression
Infrastructure-as-Code
Codify the stack
· Terraform · OpenTofu
· Pulumi · Crossplane
· CDK · Bicep · ARM
· Policy as code (OPA · Sentinel)
· Drift detection · plan reviews
I · Deployment Topologies · Observability · SRE & FinOps
Where the stack runs, how to keep it up, and how to keep it affordable
Deployment Topologies
Where the stack lives
· Public cloud (AWS · GCP · Azure)
· On-prem · co-location
· Hybrid · private cloud
· Edge · device · air-gapped
· Sovereign cloud
Multi-Region & HA
Resilience & locality
· Active / active · A/P
· Cross-region replication
· Failover & DR drills
· Backup · point-in-time restore
· Data-residency routing
Observability Stack
Metrics · logs · traces
· OpenTelemetry collectors
· Prometheus · Grafana · Loki
· Datadog · New Relic · Honeycomb
· AI-specific (Langfuse · LangSmith)
· Profiling (pprof · Pyroscope)
SRE & Reliability
Run it like production
· SLO / SLI / error budgets
· On-call · runbooks
· Chaos engineering
· Post-mortems & lessons
· PagerDuty · OpsGenie
FinOps & Cost Control
$ accountability
· Token / GPU / $ meters
· Per-tenant chargeback
· Reserved + spot blending
· Anomaly & budget alerts
· Rightsizing recommendations
Sustainability
Energy & carbon-aware
· Carbon-aware scheduling
· PUE / WUE tracking
· Renewable-region routing
· Per-token energy meters
· Sustainability reporting
⇣ Platform Outputs — Capacity, SLOs, Cost & Compliance Evidence
SLO dashboards · capacity forecasts · cost & carbon reports · compliance evidence · DR & failover posture · supply-chain attestations
Cross-cutting Reliability, Cost & Sustainability
Cross-cutting Reliability, Cost & Sustainability
Compute & Accelerators
Serving / Identity
Runtimes / Deploy
Containers / K8s
Storage
Networking
SRE / FinOps / Sustainability
Stack dependency
Platform reports
Auto-scale / FinOps loop
Detailed view of Layer 12 — Infrastructure & Platform from the Agentic AI System Architecture reference.
Workloads from L1–L11 land on a heterogeneous compute fleet, are served via inference engines and agent runtimes, scheduled on Kubernetes, backed by polyglot storage, and connected through service-mesh networking. Identity, secrets, supply-chain integrity, deployment automation, and SRE / FinOps / sustainability practices keep the substrate trustworthy, available, and economical at scale.