Jump to
Overview
Orchestration
Reasoning
Memory
Tools
RAG
Safety
References
Linh Truong
·
MA (Harvard), MBA
·
LinhTruong.com
·
Linh@Alumni.Harvard.edu
Agentic AI System Architecture
I designed this reference architecture to map the full structural anatomy of autonomous, tool-using, multi-agent AI systems — from the user & interaction boundary through perception, orchestration, reasoning, memory, tools, knowledge retrieval, multi-agent collaboration, action, reflection, safety & governance, and infrastructure. Twelve layers, each with its own detailed diagram.
Overview Master Architecture Diagram
Agentic AI System Architecture
A reference architecture for autonomous, tool-using, multi-agent AI systems — perception, reasoning, memory, action, reflection, and governance.
Research Paper Diagram · Updated 2026 · v1.0
1 · User & Interaction Layer
Human users, applications, and channels that issue goals and receive results
Human User
Goals · Preferences · Feedback
Chat / Voice / Multimodal UI
Text · Speech · Images · Video
IDE / CLI / SDK
Claude Code · API · Agent SDK
Application Channels
Web · Mobile · Email · Slack
Autonomous Triggers
Cron · Webhooks · Events
Other Agents (A2A / MCP-Client)
Inter-agent requests & delegations
2 · Perception & Input Processing
Normalize, parse, and ground incoming signals into a structured task representation
Intent & Goal Extraction
Multimodal Encoders (V/A/T)
Context Assembly & Grounding
Prompt Compilation & Caching
Input Guardrails / PII Scrub
Session & Identity Context
3 · Orchestration, Planning & Control
Decompose goals into plans, route work to agents/tools, and manage the agent loop
Agent Orchestrator
ReAct · Plan-and-Execute · Tree/Graph-of-Thought
State Machine · Loop Control · Budget & Timeouts
LangGraph / Agent SDK / custom controllers
Task Decomposer
HTN · Hierarchical Plans
Planner / Re-planner
CoT · Self-Ask · ToT
Router / Dispatcher
Skill & agent selection
Policy Engine
Permissions · Action gating
Scheduler & Queue
Async · Priorities · Retries
Concurrency Manager
Parallel sub-agents · Forks
Cost / Token / Latency Budgeter
Per-task budgets · Stop conditions
Human-in-the-Loop Gateway
Approvals · Clarifications · Overrides
4 · Reasoning Core — Foundation Models & Cognition
The cognitive engine: LLMs/LMMs with extended thinking, tool-use, and structured output
Foundation Model(s)
Claude Opus 4.7 · Sonnet 4.6 · Haiku 4.5
GPT · Gemini · Llama · Mistral · Qwen
Routed by task complexity & cost
SLMs for tools / classification
Extended Thinking
Reasoning tokens · Scratchpad
Self-Reflection / Critic
Reflexion · Self-Refine · Debate
Tool-Use / Function Calling
Structured args · Parallel calls
Structured Output
JSON schema · Pydantic · Grammars
Multimodal Reasoning
Vision · Audio · Code · Docs
In-Context Learning
Few-shot · Skills · Examples
Adaptation Layer
Fine-tune · LoRA · DPO · RLHF · RLAIF
Inference Controls
Sampling · Constrained decoding · Prompt caching
5 · Memory Subsystem
Multi-tier memory enabling continuity, learning, and personalization
Working / Context Memory
Live conversation buffer
Compaction · Summarization
Episodic Memory
Past sessions & trajectories
Time-stamped events
Semantic Memory
Facts · Entities · Concepts
User profile · Project memory
Procedural Memory
Skills · Workflows · Recipes
Learned tool sequences
Vector / Embedding Store
pgvector · Pinecone · Weaviate
Hybrid & semantic search
Knowledge Graph
Entities · Relations · Provenance
Neo4j · RDF · GraphRAG
Memory Manager
Read · Write · Update · Forget · Consolidate · Re-rank · Privacy & TTL · Conflict resolution
6 · Tools, Skills & Capabilities
Composable actions the agent can invoke through standardized interfaces
Web Browsing
Search · Fetch
Computer use
Code Execution
Sandboxed runtime
Bash / Python / JS
File & Repo Ops
Read · Write · Diff
Git · FS · S3
External APIs
REST · GraphQL
SaaS · Webhooks
Databases
SQL · NoSQL
Warehouses
Communication
Email · Slack
Calendar · Meet
Workflow Tools
CI/CD · Jira
Notion · Linear
Domain Models
Vision · ASR · TTS
Specialist SLMs
Tool Gateway · MCP Servers · Skill Registry
Schema validation · Auth · Rate-limits · Idempotency · Caching · Retries · Sandboxing
7 · Knowledge & Retrieval (RAG)
Grounding the agent in fresh, verifiable knowledge from internal & external sources
Retrievers
BM25 · Dense · Hybrid
Multi-query · Fusion
Re-rankers & Filters
Cross-encoder · LLM-rerank
Recency · ACL filters
Advanced RAG
GraphRAG · HyDE · Self-RAG
Agentic / Corrective RAG
Document Pipelines
Parse · Chunk · Embed
OCR · Layout · Tables
Knowledge Sources
Wiki · Docs · Tickets
Code · Web · Live data
Citation & Provenance
Inline citations
Source attribution
8 · Multi-Agent Collaboration
Specialized agents cooperating, debating, and verifying each other's work
Researcher
Search · Read
Synthesize
Coder
Edit · Run · Test
Debug · Refactor
Critic / Reviewer
Verify · Score
Red-team
Domain Experts
Legal · Medical
Finance · DevOps
Coordination Patterns
Supervisor · Hierarchical · Swarm · Debate · Blackboard
CrewAI · AutoGen · LangGraph · Magentic
Inter-Agent Protocols
A2A · MCP · ACP · Shared scratchpad
Message bus · Contract net · Voting
9 · Action & Environment Interface
Where agents take real-world effects — through digital and physical environments
Computer Use
GUI control · Screen + keyboard
Browser Agents
DOM · Forms · Navigation
Code Sandboxes
Containers · VMs · Firecracker
Enterprise Systems
CRM · ERP · ITSM · Data lake
Physical / IoT
Robotics · Sensors · Actuators
Output Channels & Side-Effect Bus
Notifications · Commits · Tickets · Reports
10 · Reflection, Evaluation & Continual Learning
Closed-loop self-improvement — evaluate trajectories, learn skills, refine prompts & models
Trajectory Evaluator
LLM-as-Judge · Rubrics
Pass/fail · Quality scores
Reward / Verifier
Tests · Constraints · Goals
Process & outcome rewards
Self-Reflection Loop
Reflexion · Self-Refine
Lessons & corrections
Skill / Recipe Distiller
Voyager-style libraries
Reusable workflows
Eval Harness
Benchmarks · Regression
Online & offline eval
Continual Training
SFT · DPO · RLAIF
Prompt & tool tuning
11 · Safety, Governance, Trust & Observability
Cross-cutting controls — guardrails, policy, monitoring, security, and compliance
Input/Output Guardrails
Toxicity · Jailbreak · Schema
Prompt-Injection Defense
Trust boundaries · Confirmation
PII / DLP
Redaction · Tokenization
AuthN / AuthZ
OAuth · RBAC · Scoped tokens
Action Approval
HITL · Risky-action gating
Compliance & Audit
SOC 2 · GDPR · HIPAA · EU AI Act
Observability & Tracing
OpenTelemetry · LangSmith · Langfuse · Helicone
Cost & Performance Monitoring
Tokens · Latency · Tool errors · SLOs
Red-Teaming & Safety Evals
Adversarial probes · Capability gating
Model & Tool Governance
Versioning · Allow-lists · Kill-switches · Explainability
12 · Infrastructure & Platform
The substrate — compute, serving, storage, and networking that make agents run reliably at scale
Model Serving
vLLM · TGI · TensorRT-LLM · SGLang
Compute
GPU · TPU · Inference accelerators
Agent Runtimes
LangGraph · Agent SDK · CrewAI
Container & Sandbox Layer
Docker · Kubernetes · Firecracker
Storage
Object · Vector · Graph · OLTP/OLAP
Event Bus & Networking
Kafka · Pub/Sub · gRPC · Service mesh
Secrets · Identity · Key Management
Vault · KMS · OAuth providers · Workload identity
Deployment Topologies
Cloud · On-prem · Hybrid · Edge · Multi-region failover
Cross-cutting Governance, Safety & Observability
Cross-cutting Governance, Safety & Observability
User & Interaction
Perception & Orchestration
Reasoning Core / Reflection
Memory
Tools & Capabilities
Knowledge / Multi-Agent
Action / Infrastructure
Safety & Governance
Forward data flow
Feedback / learning
Reference architecture for the research paper “Agentic AI System Architecture” .
Layers are conceptual — concrete deployments may merge, split, or substitute components.
Layer 1 User & Interaction Layer
Agentic AI System Architecture › Layer 1 Detail
User & Interaction Layer
The boundary between humans, applications, and other agents and the agentic system — channels, modalities, sessions, identity, presentation, and the contract that hands a well-formed request to the Perception layer.
Detailed Diagram · v1.0 · 2026
A · Initiators — Who or What Issues a Request
Humans, applications, autonomous schedules, and other agents — every interaction begins here
End User
Consumer of agent outcomes
· Goals, preferences, feedback
· Approvals & clarifications
· Implicit signals (clicks, dwell)
Power User / Operator
Configures & supervises agents
· Skill / tool authoring
· Prompt / persona tuning
· Slash commands · CLAUDE.md
Developer / Builder
Integrates the agent into systems
· SDK / API consumers
· Hooks · MCP servers
· Custom UIs & workflows
Admin / Governance
Sets policy & entitlements
· RBAC / ABAC roles
· Quotas · Allow-lists
· Audit & compliance review
Automation / System
Non-human triggers
· Cron · Schedulers
· Webhooks · Event bus
· Sensors / IoT triggers
Other Agents
Inter-agent delegation
· A2A protocol
· MCP-client agents
· Sub-agent callbacks
B · Channels & Surfaces — Where Interaction Happens
Concrete touchpoints that capture intent and render output across human, developer, app, and machine surfaces
Conversational UIs
Synchronous & streaming
Web chat · Mobile chat
In-product copilot panels
Inline assist (autocomplete)
Threaded long-running runs
Artifact & canvas surfaces
Voice & Telephony
Real-time speech I/O
Smart speakers · Phone bots
Streaming ASR + TTS
Barge-in · VAD · diarization
SIP / WebRTC bridges
Multi-language detection
Developer Surfaces
Programmatic & tool-driven
CLI (Claude Code, custom)
IDE plugins (VS Code, JetBrains)
SDKs · REST · gRPC · WebSocket
Notebook / REPL · Terminal
Slash commands · /skills
Embedded App Channels
Asynchronous workflows
Email inboxes · SMS
Slack · Teams · Discord
CRM / ITSM in-app widgets
Document & sheet sidebars
Browser extensions
Autonomous Triggers
No human in the request path
Cron / schedules
Webhooks · Event topics
File / DB change feeds
Alert / threshold triggers
Loop / self-paced runs
Agent ↔ Agent
Federated invocation
A2A protocol
MCP client requests
RPC · message bus
Capability discovery
Signed handoffs
C · Input Modalities & Capture
Each surface produces typed signals that the layer normalizes into a unified request envelope
Text
Chat · Email · Markdown
Voice / Audio
Mic stream · Audio files
Image / Vision
Photos · Screenshots · OCR
Video / Screen
Capture · Screencast · Frames
Documents / Files
PDF · DOCX · Spreadsheets
Code / Diffs
Repo · Patches · Snippets
Structured Data
JSON · CSV · Forms · Schemas
Sensor / Telemetry
IoT · Logs · Metrics · Geo
D · Interaction Patterns & UX Affordances
How users steer, supervise, and recover during long-running, tool-using agent runs
Streaming & Stop
Token stream · Cancel · Pause
Approvals & HITL
Risky-action confirmations
Clarifying Questions
Slot-fill · Disambiguation
Plan / Step Preview
Plan mode · Diff before write
Feedback Capture
👍 / 👎 · Comments · Ratings
Citations & Trace UI
Sources · Tool-call timeline
Undo / Rollback
Compensating actions
Personalization
Themes · Locale · A11y
E · Identity, Session & Context Management
Stable identity per actor, durable conversation state, and context that travels with every request
Authentication
SSO · OAuth · OIDC · SAML
Passkeys · MFA · API keys
Service-account / workload ID
Token refresh & revocation
Authorization
RBAC · ABAC · scopes
Tool / skill entitlements
Tenant & project isolation
Delegated & on-behalf-of
Session State
Conversation thread & turns
Resumable runs · checkpoints
Attached files & artifacts
Multi-device continuity
User & Org Context
Profile · preferences · locale
Org / workspace · project
Memory references · CLAUDE.md
Persona & tone bindings
Device & Environment
UA · OS · IDE · viewport
Network class · time zone
Geo · accessibility settings
Capability flags · feature gates
Consent & Privacy
Data-use scopes · ToS
Memory opt-in / opt-out
Recording & training flags
Data residency policy
F · Edge & API Gateway — Reliability and Safety on the Wire
All channels converge through a hardened gateway before requests reach Perception
TLS / Edge Termination
CDN · WAF · DDoS shield
mTLS for service callers
Bot / abuse detection
Geo & IP policy
Protocol Adapters
REST · GraphQL · gRPC
WebSocket / SSE streams
Webhook receiver
Email / SMS bridge
Rate & Quota
Per-user / org / token quotas
Concurrency caps
Burst smoothing · backoff
Fair scheduling
Idempotency & Retry
Idempotency-Key header
Request de-dup window
Replay protection (nonce)
At-least-once delivery
Schema Validation
OpenAPI / JSON Schema
Size / type / depth limits
MIME & encoding checks
Versioning & compatibility
Trust Boundaries
User-vs-tool-vs-content tagging
Prompt-injection pre-filter
Origin / referer enforcement
Data-classification labels
G · Unified Request Envelope
The contract handed to the Perception layer — one shape for every channel
Request Envelope (canonical)
Identity
· principal · tenant · org
· auth_method · scopes
· consent_flags
Session
· thread_id · turn_id · run_id
· resume_token · checkpoint
· trace_id (OTel)
Channel
· surface · device · locale · tz
Intent & Content
· goal / message · attachments
· modality · MIME · size
· references (doc, repo, URL)
Controls
· model preference · tools allow-list
· budget (tokens, time, $)
· stream · response_format
Policy
· data_class · retention · region
H · Output, Rendering & Delivery
How agent results are returned, rendered, and made interactive on each surface
Streaming Renderer
Token / event stream
Markdown · code · math
Live tool-call updates
Rich Artifacts
Canvas · diagrams · charts
Tables · interactive HTML
Generated files (PDF, XLSX)
Voice / Audio Out
Streaming TTS
Voice persona
Captions / transcripts
Interactive UI Cards
Buttons · forms · pickers
Slack blocks / Adaptive Cards
Confirm / approve / cancel
Citations & Provenance
Inline source links
Tool-call timeline
Confidence & caveats
Notifications
Push · email · SMS
Run-completed events
Digest summaries
Output Guardrails & Compliance
PII redaction · safety filters
Watermarking · content tags
Schema-conformant responses
Accessibility & i18n
WCAG · screen-reader semantics
RTL · locale formatting
Translation & transliteration
I · Cross-Cutting — Safety, Telemetry & Feedback Loops
Always-on concerns that wrap every interaction in this layer
Input Guardrails
Toxicity · jailbreak · injection
PII / DLP Pre-filter
Detect · redact · tokenize
Abuse & Bot Defense
CAPTCHA · velocity · anomaly
Telemetry & Tracing
OTel spans · structured logs
Analytics & A/B
Funnels · retention · experiments
Audit Log
Immutable, signed events
Feedback & Signals → Memory / Eval
👍 / 👎 · edits · regenerate · session ratings · escalations
Incident & Recovery Hooks
Kill-switch · graceful degrade · fallback model · status page
Compliance & Residency
GDPR · CCPA · HIPAA · SOC 2 · EU AI Act · regional routing
J · Handoff to Layer 2 · Perception & Input Processing
The Interaction Layer's output: a validated, classified, traceable envelope ready for grounding
Validated Request
Schema-checked envelope
Identity & scopes attached
Trust Labels
user · tool · external content
Data classification tags
Trace Context
trace_id · span · baggage
SLO & budget hints
Attached Context
Files · history · references
Memory / project pointers
Output Contract
Response shape · streaming
Tool / channel callbacks
Policy Hints
HITL · risk class
Region · retention
Cross-cutting Safety, Identity & Telemetry
Cross-cutting Safety, Identity & Telemetry
Initiators / Identity
Channels / Handoff
Modalities / UX
Session & Context
Edge / Gateway
Output / Presentation
Safety / Governance
Inbound request
Outbound delivery
Feedback signal
Detailed view of Layer 1 — User & Interaction Layer from the Agentic AI System Architecture reference.
All channels are normalized into a canonical request envelope and handed off to Layer 2 (Perception). Outputs flow back through the same surfaces with streaming, citations, and policy-aware rendering.
Layer 2 Perception & Input Processing
Agentic AI System Architecture › Layer 2 Detail
Perception & Input Processing
Transform the validated request envelope from Layer 1 into a grounded, structured task representation — parsing modalities, extracting intent and entities, assembling context, enforcing safety, and compiling the prompt that Layer 3 will plan against.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Validated Request Envelope from Layer 1 (User & Interaction)
Identity · Session · Channel · Intent · Controls · Policy hints · Trust labels · Trace context · Attached context
principal · scopes · tenant
thread_id · run_id · trace_id
message · attachments · MIME
model pref · tools allow-list
budget · stream · format
data class · retention · region · trust labels
A · Ingestion & Normalization
Demultiplex incoming payloads, normalize encodings, sanitize, and enforce size/shape limits
Payload Demuxer
Split by part / modality
· Multipart / form-data
· JSON message blocks
· File attachments
· Inline URIs & data: URLs
Encoding & Charset Norm.
Stable canonical form
· UTF-8 NFC normalization
· Newline / whitespace fix
· Strip control / zero-width
· Bidi & homoglyph guard
Sanitization
Reduce attack surface
· HTML / Markdown clean
· Script / event handler strip
· File type sniff & verify
· Anti-virus / malware scan
Limits & Quotas
Bound work and cost
· Max tokens / chars
· Max files / total size
· Max audio / video duration
· Per-tenant byte quotas
Language & Locale
Detect & route correctly
· Language ID (per segment)
· Script / dialect detection
· Locale formatting hints
· Optional MT pre-translation
Caching & Dedup
Avoid re-processing
· Content-hash cache
· Idempotent re-entry
· Embedding / parse reuse
· CDN-cached artifacts
B · Multimodal Encoders & Parsers
Convert each modality into structured tokens, embeddings, and document trees the reasoner can consume
Text Pipeline
Tokens · structure · meta
· Sentence / paragraph split
· Tokenization (BPE / SP)
· Markdown / HTML AST
· Code-block tagging
· Math / LaTeX detection
· Embeddings (BGE / E5)
· Token-count budget
Vision Pipeline
Images · screenshots · UI
· Decode · resize · color norm
· EXIF / orientation strip
· OCR (Tesseract / docTR)
· Object & layout detection
· Captioning / VQA model
· CLIP / SigLIP embeddings
· NSFW / safety classifier
Audio / Speech Pipeline
Voice · music · environment
· Resample · denoise · VAD
· ASR (Whisper / streaming)
· Speaker diarization & ID
· Language / dialect detect
· Prosody & emotion cues
· Audio embeddings
· Transcript timestamps
Video / Screen Pipeline
Frames · scenes · UI graphs
· Demux + transcode
· Keyframe / shot detection
· Frame sampling strategy
· Action / event detection
· Audio track → ASR
· Screen DOM / a11y tree
· Temporal embeddings
Document Pipeline
PDF · DOCX · XLSX · slides
· Layout-aware parsing
· Heading / section tree
· Table extraction
· Figure & chart capture
· Footnote / citation linkage
· Form-field extraction
· Chunking + embeddings
Code · Structured · Sensor
Programmatic inputs
· Tree-sitter AST parse
· LSP symbols / refs
· Diff / hunk extraction
· JSON · CSV · schema infer
· Time-series resample
· Geo / spatial indexing
· Unit / dimension normalize
C · Language Understanding & Intent
Convert raw signals into a structured task — what the user wants and what's needed to act
Intent Classifier
Task type · domain · urgency
Multi-label · confidence scores
Entity / Slot Extraction
NER · dates · amounts · IDs
Pydantic / JSON-schema slots
Coreference & Anaphora
"it" · "that PR" · "the file"
Mention → entity linking
Goal Decomposition
Top-level objective
Sub-goals · constraints · DoD
Disambiguation
Ambiguity detector
Triggers HITL clarification
Sentiment / Tone
Frustration · urgency
Style hints for response
Task Schema (structured representation)
objective · constraints · slots · entities · success criteria · risk class · suggested skills
D · Grounding & Reference Resolution
Bind language to real-world entities, files, repos, and prior context
Entity Linking
KG · directory · Wikidata
Org-internal canonical IDs
Resource Resolution
URLs · file paths · repos
PR / ticket / doc IDs
Time & Date Norm.
Relative → absolute
TZ-aware ISO-8601
Geospatial Grounding
Geocoding · POI lookup
User-locale defaults
Quantity / Unit Norm.
Currency · SI units
FX-rate & precision rules
Cross-Modal Align
Caption ↔ region
Transcript ↔ frame
Grounded Reference Graph
Mentions · entities · resources · times · places — emitted with provenance & confidence
E · Context Assembly & Retrieval
Pull just-enough context from memory, knowledge, and session — pack within budget, with provenance
Session History Selector
Recent turns · pinned items
· Salience scoring
· Compaction summaries
· Tool-call traces
· Run checkpoints
· Conversation graph
Memory Reader
Episodic · semantic · procedural
· User profile · preferences
· Project memory · CLAUDE.md
· Learned skills / recipes
· Past trajectories
· Privacy & TTL filtering
Knowledge Retrieval (RAG)
Hybrid search across stores
· BM25 + dense fusion
· Multi-query expansion / HyDE
· KG / GraphRAG hops
· ACL-aware filtering
· Recency & freshness boost
Re-rank & Compress
Pick the highest-value tokens
· Cross-encoder reranker
· LLM-based reranker
· Extractive snippeting
· Map-reduce summarization
· Diversity / dedup (MMR)
Tool / Capability Hints
Which skills are likely
· Skill / tool retriever
· MCP server discovery
· Few-shot example pull
· Schema / signature attach
· Cost & latency profile
Context Budgeter
Token / latency / $ caps
· Per-section quotas
· Lossy vs lossless drop
· Cache-aware ordering
· Prompt-cache key plan
· Overflow → tool offload
F · Safety, Trust & Privacy Filters
Defend the reasoner from hostile or unsafe inputs and protect user data before context leaves this layer
Prompt-Injection Detection
Quarantine untrusted text
· Heuristic + classifier
· Embedded-instruction scan
· Tool-result wrapping
· Spotlighting / delimiters
PII / DLP Scrubber
Detect, redact, tokenize
· Names · IDs · phones
· Cards · accounts · keys
· Health / financial data
· Reversible vault tokens
Content Safety
Block harmful inputs early
· Toxicity · hate · violence
· CSAM & abuse hashing
· Dangerous-capability cues
· Policy lookup & routing
Trust-Boundary Tagger
Provenance per token block
· user · system · tool
· retrieved content (untrusted)
· Per-source confidence
· ACL / sensitivity labels
Adversarial Defense
Resist obfuscated attacks
· Hidden / steganographic text
· Image / OCR injections
· Audio whisper attacks
· Encoded payload decoder
Consent & Residency
Honor user / tenant policy
· Train-on-data flags
· Region pinning
· Retention TTL
· Right-to-be-forgotten
G · Prompt Compilation & Caching
Assemble the final messages: layered, schema-aware, cache-friendly, and provenance-preserving
Template Engine
Layered system / persona
Skill prompts · few-shot
Per-tenant overrides
Tool / Schema Binder
JSON-schema · grammars
Function signatures
Argument hints & types
Cache-Key Planner
Stable prefix layout
cache_control breakpoints
TTL · invalidation rules
Multimodal Packer
Interleave text · img · audio
Captions for non-text blocks
Inline vs reference attach
Token Budgeter / Truncator
Section-aware truncation
Lossy summary fallback
Reserve for completion
Provenance Annotator
Source IDs per snippet
Trust labels carried
Citation hooks
H · Routing Hints & Quality Signals
Annotate the task with hints the Orchestrator can use to choose models, agents, and policies
Complexity Estimator
Easy / standard / hard
Reasoning depth hint
Multi-step likelihood
Risk & Sensitivity Class
Reversibility · scope
Regulated-data flag
HITL recommendation
Model Routing Hint
Haiku / Sonnet / Opus
Specialist vs generalist
Cost / latency target
Confidence Scoring
Per slot / entity
Calibrated thresholds
Trigger clarification
Locale & Persona Hint
Output language
Tone / formality
Domain persona
SLA & Budget Hints
Latency target
Token / $ ceiling
Stop conditions
I · Observability, Telemetry & Feedback
Every step emits traces, metrics, and signals consumed by Layer 11 (Governance) and the Reflection loop
OTel spans · per-stage
Latency / token / cost meters
Classifier confidence logs
Drift / anomaly detection
Audit log · signed
Eval & Reflection feedback
⇣ Handoff — Structured Task Bundle to Layer 3 · Orchestration & Planning
Compiled prompt · tool catalog · task schema · grounded references · routing & risk hints · context budget · trace / provenance
objective & sub-goals
grounded entities
retrieved context (provenance)
candidate tools / skills
model / risk / SLA hints
trust-tagged compiled prompt + cache plan
Ingestion / Handoff
Encoders / Compilation
Understanding / Routing
Grounding
Context Assembly
Safety & Privacy
Observability
Forward flow
Clarification back to user
Feedback / drift signal
Detailed view of Layer 2 — Perception & Input Processing from the Agentic AI System Architecture reference.
Inputs flow top-down from Layer 1's request envelope through ingestion, multimodal encoding, language understanding, grounding, context assembly, safety filtering, prompt compilation, and routing-hint generation, before being handed off as a structured task bundle to Layer 3 (Orchestration & Planning).
Layer 3 Orchestration, Planning & Control
Agentic AI System Architecture › Layer 3 Detail
Orchestration, Planning & Control
The control plane of the agent — turns the structured task into an executable plan, routes work to models, tools, and sub-agents, manages state and concurrency, enforces budgets and policy, and drives the agent loop until the goal is met or escalated.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Structured Task Bundle from Layer 2 (Perception & Input Processing)
objective · sub-goals · grounded entities · retrieved context · candidate tools · risk & SLA hints · trust-tagged compiled prompt + cache plan
Task Schema · Reference Graph · Tool Catalog · Routing Hints · Policy Constraints · Trace Context · Budget Envelope
A · Plan Generation & Decomposition
Translate the goal into a structured, executable plan — hierarchical, costed, and revisable
Goal Reasoner
Analyze objective & constraints
· Definition of Done
· Acceptance criteria
· Hard / soft constraints
· Implicit assumptions
Hierarchical Decomposer
Goal → tasks → steps
· HTN-style decomposition
· Dependency DAG
· Parallel vs serial annotate
· Per-step success checks
Plan Synthesizer
LLM-drafted & validated plan
· Schema-constrained output
· Tool / agent assignment
· Pre/post conditions
· Plan Mode preview to user
Plan Critic / Verifier
Sanity-check before execute
· Self-critique pass
· Policy / risk lookup
· Cost & latency estimate
· Counterfactual / what-if
Re-planner
Adapt plan during execution
· On error / observation
· Belief revision
· Partial-plan repair
· Backtrack / abandon
Plan Repository
Reusable workflow library
· Skill / recipe registry
· Versioned templates
· Distilled from past runs
· Org-shared playbooks
B · Reasoning & Control Strategies
Strategy library the orchestrator selects from based on task class, risk, and budget
ReAct
Thought → Act → Observe
Interleaved reasoning
Best for tool-use loops
Plan-and-Execute
Plan once, execute steps
Re-plan on failure
Predictable for long jobs
Tree / Graph of Thought
Branching exploration
Beam / MCTS · scoring
Hard reasoning problems
Reflexion / Self-Refine
Critic + retry loop
Lessons captured per run
Quality-sensitive tasks
Debate / Multi-Agent
Proposer vs critic
Voting / arbitration
High-stakes decisions
Direct / CoT / Skill-Triggered
Single-shot for simple tasks
CoT for medium reasoning
Pre-built skill / sub-graph fast-path
C · Agent Orchestrator — The Control Loop
Central state machine that drives the agent through observe → think → act → reflect cycles
Agent Orchestrator (Controller)
Finite-state / graph-based loop · LangGraph · Agent SDK · custom controllers
OBSERVE
THINK
DECIDE
ACT
REFLECT
Step / iteration counter · Stop conditions · Run-state checkpoints · Resume tokens
Run / Trajectory Store
Step log · tool I/O · scratchpad · checkpoints · resume token
Working / Scratchpad Memory
Live thought stream · intermediate facts · action history
Belief / World State
Known facts · pending unknowns · environment snapshot
Loop Controller
Max iterations · timeouts · stop / continue conditions
Stop Criteria Evaluator
DoD met · budget exhausted · escalate · user cancel
Checkpoint & Resume
Pause · serialize · long-running runs · cross-host resume
D · Router, Dispatcher & Tool Selection
Decide WHAT to call next: model, tool, sub-agent — and bind arguments
Skill / Tool Retriever
Top-k by intent + history
MCP server discovery
Skill cards loaded JIT
Model Router
Haiku / Sonnet / Opus tiers
Specialist SLMs · vendors
Quality / cost / latency mix
Sub-Agent Dispatcher
Researcher · Coder · Critic
A2A / MCP-client calls
Capability matching
Argument Binder
Schema-conformant args
Type coercion · defaults
Reference resolution
Pre-flight Validator
JSON-schema check
Dry-run / what-if
Side-effect prediction
Fallback Strategy
Alternate tool / model
Degraded-mode path
Ask-user fallback
E · Policy Engine & Action Gating
Decide if a chosen action is allowed, requires approval, or must be blocked
Permission Manager
RBAC / ABAC / scopes
Tool allow / deny lists
Per-tenant entitlements
Risk Classifier
Reversible · destructive
Blast radius estimate
Regulated-data flag
Prompt-Injection Guard
Confirm tool-driven
actions from content
Untrusted-source check
Policy-as-Code
OPA / Rego rules
Versioned · auditable
Tenant overrides
Action Approval
Auto · HITL · admin
Step-up authentication
Two-person rule
Compliance Filter
Region · residency
PII handling rules
Sector regulations
F · Multi-Agent Coordination & Concurrency
When the plan requires multiple agents — coordination patterns, communication, and consensus
Coordination Patterns
Topology selector
· Supervisor / hierarchical
· Swarm / blackboard
· Pipeline / staged
· Debate / proposer-critic
· Contract net
Agent Spawn Manager
Lifecycle · isolation
· Sub-agent factory
· Sandboxed contexts
· Inherited permissions
· Per-agent token budget
· Deadline propagation
Inter-Agent Bus
Messages & shared state
· A2A · MCP · ACP
· Shared scratchpad / KV
· Pub/sub topics
· Signed handoff envelopes
· Trace propagation
Concurrency Manager
Parallel · fork / join
· DAG runner
· Map-reduce / fan-out
· Race & first-win
· Cancellation propagation
· Deadlock detection
Consensus & Arbitration
Aggregate sub-agent output
· Voting / majority
· Weighted by confidence
· Judge / referee agent
· Tie-breakers · fallbacks
· Conflict resolution
Roles & Personas
Specialist agent registry
· Researcher · Planner
· Coder · Reviewer
· Critic · Verifier
· Domain experts
· Tool persona templates
G · Scheduling, Budget & Resilience
Make agent runs predictable, bounded, and recoverable under load and failure
Scheduler & Queue
Priorities · fair-share
Delayed · cron-driven
Per-tenant queues
Budget Manager
Tokens · steps · $ · time
Per-task & per-run caps
Soft / hard limits
Rate & Concurrency Limiter
Per model / tool / org
Token-bucket backoff
Adaptive throttling
Retry & Backoff
Exponential · jittered
Idempotency keys
Poison-message handling
Circuit Breakers
Per tool / model / agent
Open · half-open · closed
Health-check probes
Cost Optimizer
Cache-aware ordering
Cheaper-model first
Early-stop heuristics
Error & Recovery Manager
Classify (retryable · permanent · policy) · compensating actions
Saga / rollback · transactional groups · poisoning detection
Failure → re-plan · escalate · graceful degrade
Loop Safety
Max steps · max depth · runaway detection
Cycle detection (revisited state) · diversity bonus
Watchdog · liveness probes · hard kill
Durable Execution
Workflow engines (Temporal · Cadence · Restate)
Replay-safe steps · deterministic checkpoints
Long-running runs · cross-host failover
H · Human-in-the-Loop & Steering
Pause, ask, approve, redirect — keep humans in control of risky or ambiguous moves
Approval Gate
Risky / irreversible action
Step-up auth · two-person
Clarification Manager
Ask follow-up questions
Slot-fill · disambiguation
Steering & Override
Pause · cancel · redirect
Modify plan mid-run
Plan Mode Preview
Show plan before execute
Diff before write
Escalation Router
Tier 1 / 2 / human expert
SLA-driven routing
Feedback Capture
Inline edits · ratings
Routes to Memory / Eval
I · Observability, Trace & Cross-Cutting
Every decision is traced, costed, and auditable; signals feed Layer 10 (Reflection) and 11 (Governance)
Trace & Span Emission
OTel · LangSmith · Langfuse
Per step / tool / agent
Cost & Token Meters
Per task / org / model
Streaming cost gauges
Decision Logs
Why this tool · why now
Plan diff history
Replay & Time-Travel
Re-run from checkpoint
Counterfactual debug
Anomaly & Drift
Tool-error spikes
Plan-shape regressions
Audit Log
Signed · immutable
Compliance evidence
⇣ Outbound — Coordinated Calls to Downstream Layers
The orchestrator dispatches typed calls to Reasoning, Memory, Tools, Knowledge, and Multi-Agent layers
→ Layer 4 · Reasoning
Compiled prompt · model · params
Tool catalog · stop tokens
→ Layer 5 · Memory
Read · write · update
Episodic / semantic deltas
→ Layer 6 · Tools
Schema-validated calls
Idempotency · deadlines
→ Layer 7 · RAG
Targeted retrievals
Citations required
→ Layer 8 · Multi-Agent
Sub-agent dispatch
A2A / MCP envelopes
↑ Layer 1 · User
Approvals · clarifications
Streaming partial output
Cross-cutting Policy, Safety & Telemetry
Cross-cutting Policy, Safety & Telemetry
Inbound / Orchestrator
Planning / Multi-agent
Strategies / HITL
Routing
Policy
Scheduling / Observability
Forward control flow
Re-plan / reflection loop
HITL back to user
Detailed view of Layer 3 — Orchestration, Planning & Control from the Agentic AI System Architecture reference.
The orchestrator drives the OBSERVE → THINK → DECIDE → ACT → REFLECT loop; planning, routing, policy, multi-agent coordination, and scheduling are coordinated services around it. All decisions emit traces and feed Reflection (Layer 10) and Governance (Layer 11).
Layer 4 Reasoning Core — Foundation Models & Cognition
Agentic AI System Architecture › Layer 4 Detail
Reasoning Core — Foundation Models & Cognition
The cognitive engine of the agent — foundation models, extended thinking, tool-use, structured output, multimodal reasoning, self-reflection, adaptation, and the inference fabric that makes them fast, cheap, and reliable.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Inference Request from Layer 3 (Orchestration & Planning)
Compiled prompt · model preference · tool catalog · sampling params · structured-output schema · stop tokens · budget · trace context
messages[] · system · tools[] · response_format · temperature · max_tokens · cache_control · thinking_budget · stream
A · Model Selection & Routing Fabric
Pick the right model for the job — by capability, latency, cost, region, and trust
Capability Matcher
Intent → required skills
· Reasoning depth · vision
· Long-context · code · math
· Tool-use · structured output
Tier Router
Right-size by complexity
· Haiku → fast / cheap
· Sonnet → workhorse
· Opus → hardest reasoning
Cost / Latency Optimizer
SLA-aware selection
· $/1k token meter
· P50 / P95 latency targets
· Cache-hit aware
Region & Residency
Data-locality routing
· EU · US · APAC pinning
· On-prem / private VPC
· Sovereign-cloud routing
Vendor & Failover
Multi-provider abstraction
· Anthropic · OpenAI · Google
· Self-hosted OSS
· Health-check failover
Model Cascade
Cheap-first, escalate
· SLM → LLM escalation
· Confidence-gated retry
· Mixture-of-experts router
B · Foundation Model Pool
A heterogeneous fleet — frontier LLMs, multimodal LMMs, and small specialist models
Anthropic Claude Family
Frontier reasoning · agentic tool-use · long context
· Claude Opus 4.7 — deepest reasoning
· Claude Sonnet 4.6 — balanced workhorse
· Claude Haiku 4.5 — fast / low cost
Extended thinking · vision · tool-use · 200k+ context · prompt caching
Other Frontier LLMs
Multi-vendor coverage
· OpenAI GPT-5 / o-series
· Google Gemini 2.x
· xAI Grok
· DeepSeek · Qwen
· Mistral Large
Open-Weights / Self-Host
On-prem & sovereign
· Llama 4 / 5
· Qwen3 · DeepSeek-R
· Mistral · Mixtral
· Gemma · Phi
· Domain-tuned variants
Multimodal Models
Vision · audio · video
· VLMs (image + text)
· Speech-to-speech models
· Video-understanding LLMs
· Image-generation models
· TTS / ASR specialists
Specialist Small Models (SLMs)
Cheap, fast, narrow
· Embedders (BGE · E5 · Voyage)
· Re-rankers (cross-encoder)
· Classifiers (intent · safety · PII)
· Code models (Codex-style)
· Math / theorem provers
C · Cognitive Capabilities — How the Model Thinks
First-class capabilities the orchestrator can compose: reasoning, tool-use, reflection, and learning in-context
Extended Thinking
Private reasoning tokens
· Reasoning scratchpad
· thinking_budget control
· Visible vs hidden CoT
· Plan-before-act
· Multi-step decomposition
· Self-consistency / voting
· Long-horizon arithmetic
Tool Use / Function Calling
Bridge to the world
· Schema-constrained args
· Parallel tool calls
· Tool selection & chaining
· Tool-result integration
· Computer-use actions
· MCP-server tool calls
· Function-call streaming
Structured Output
Reliable machine-readable
· JSON Schema enforcement
· Pydantic / Zod models
· Regex / grammar-guided
· Type-safe SDK responses
· Citations & spans
· Field-level validation
· Retry on parse failure
Multimodal Reasoning
Beyond text
· Vision: docs · charts · UI
· Audio: transcribe + reason
· Video frame reasoning
· Code: AST + repo context
· Tables & spreadsheets
· Cross-modal grounding
· Generation across modes
Self-Reflection / Critic
Inner verification
· Reflexion · Self-Refine
· Generator + critic split
· LLM-as-Judge scoring
· Self-consistency vote
· Confidence calibration
· Hallucination probes
· Verifier tool calls
In-Context Learning
Adapt without training
· Few-shot exemplars
· Skills / system cards
· Persona / style transfer
· Negative examples
· Long-context recall
· Demonstration learning
· Test-time compute scale
D · Inference & Decoding Controls
Knobs that shape the distribution and shape of generated tokens
Sampling Parameters
temperature · top-p · top-k
min-p · repetition penalty
seed for reproducibility
Constrained Decoding
Grammar / regex / GBNF
JSON-schema masking
Outlines · LMQL · XGrammar
Logit Biasing
Boost / suppress tokens
Stop sequences
Banned-phrase enforcement
Streaming & Stop Logic
SSE / event stream
Stop tokens · max_tokens
Mid-stream cancel
Speculative Decoding
Draft model + verify
Medusa / EAGLE heads
2-3× faster decode
Test-Time Compute
Best-of-N · majority vote
Tree search · MCTS
Verifier-guided search
E · Context & Caching Subsystem
Make long contexts fast and cheap — KV reuse, prompt caching, and attention efficiency
Prompt Cache
cache_control breakpoints
5-min TTL · 1-hour TTL
90% cost / latency cut
KV-Cache Manager
PagedAttention (vLLM)
Prefix sharing across reqs
Eviction · radix tree
Long-Context Handling
200k–2M tokens
Chunk · map-reduce · skim
Needle-in-haystack tuning
Attention Efficiency
FlashAttention 3
Sliding-window · sparse
Linear / state-space hybrids
Compaction / Summarize
Auto-compact threshold
Recursive summarization
Token-budget reclaim
Response Cache
Semantic cache (embed)
Idempotent re-runs
Read-through · TTL
F · Adaptation & Customization
Specialize the base model to your domain — prompts, parameter-efficient tuning, full fine-tunes, and preference optimization
Prompt Engineering
Lightweight, no training
· System / persona design
· Few-shot exemplar library
· Skill / sub-prompt files
· Auto-prompt optimization
· DSPy compilation
PEFT — LoRA / Adapters
Parameter-efficient tuning
· LoRA · QLoRA · DoRA
· Prefix / prompt tuning
· Adapter fusion
· Per-tenant / per-task
· Hot-swap at inference
Supervised Fine-Tune
SFT on curated data
· Instruction tuning
· Domain-corpus continued
· Tool-use distillation
· Rejection-sampled SFT
· Curriculum & staging
Preference Optimization
Align to human / AI prefs
· RLHF · PPO
· DPO · IPO · KTO
· RLAIF (constitutional AI)
· Reward modeling
· GRPO · process rewards
Distillation & Compression
Smaller, cheaper, faster
· Teacher → student
· Quantization (INT8/4)
· Pruning · sparsity
· Speculative draft training
· Edge-deploy variants
Continual Learning
Improve from production
· Trace mining
· Feedback → SFT data
· Self-play / synthetic
· Catastrophic-forget guard
· Online eval gating
G · Inference Engine & Serving Fabric
High-throughput, low-latency execution: schedulers, batching, kernels, accelerators
Serving Runtimes
Production model servers
· vLLM · SGLang
· TensorRT-LLM
· TGI · llama.cpp
· Triton Inference Server
· Hosted (Anthropic / OpenAI)
Batching & Scheduling
Throughput optimization
· Continuous batching
· Chunked prefill
· Disaggregated P/D
· Priority queues
· SLO-aware scheduling
Compute & Accelerators
Hardware substrate
· NVIDIA H100 / B200
· Google TPU v5p / v6
· AMD MI300 · Trainium
· Groq · Cerebras · SambaNova
· Edge / mobile NPUs
Distributed Inference
Scale beyond one node
· Tensor parallelism
· Pipeline parallelism
· Expert parallelism (MoE)
· Sequence parallelism
· NCCL / RDMA fabric
Optimized Kernels
Squeeze more per token
· FlashAttention / FA3
· Fused MLP / RMSNorm
· Triton / CUDA kernels
· FP8 / INT4 GEMM
· Compiler stacks (XLA, Mojo)
Quantization & Deployment
Tradeoff quality vs cost
· FP16 · BF16 · FP8
· INT8 · INT4 · AWQ · GPTQ
· Weight streaming
· Multi-tenant serving
· Cold-start optimization
H · Output Processing & Validation
Parse, validate, and certify the model's response before returning to the orchestrator
Token / Logprob Stream
SSE chunks · partials
Confidence per token
Tool-Call Parser
Extract function calls
Schema-validate args
Structured Output Verifier
JSON / Pydantic check
Auto-repair on failure
Citation & Span Extractor
Source links · char ranges
Provenance carryover
Hallucination Probes
NLI · entailment · self-check
Cross-source verifier
Confidence Calibrator
Temperature scaling
Score normalization
I · Safety, Telemetry & Governance
Cross-cutting controls — model-level guardrails, traces, evals, and lifecycle
Output Safety Filters
Toxicity · jailbreak · PII
Refusal classifier
Watermarking & Provenance
SynthID · token traces
C2PA content credentials
Inference Telemetry
TTFT · TPOT · tokens/s
Cache hit-rate · cost
Eval & Regression Suite
Offline + online evals
Capability benchmarks
Model Lifecycle
Versioning · canary · rollback
Deprecation policy
Capability Gating
RSP · ASL tiers
Red-team gated release
⇣ Outbound — Inference Result Bundle to Layer 3 (Orchestration)
Returned in a single shape regardless of model — text · tool calls · structured object · citations · usage · trace
Generated Text
Streamed or batched
Stop reason annotated
Tool Calls
Validated args
Parallel-call list
Structured Object
JSON / Pydantic
Schema-conformant
Reasoning Trace
Thinking tokens
Self-critique notes
Citations & Confidence
Source spans · scores
Calibrated uncertainty
Usage & Trace
Tokens · cost · latency
Cache stats · trace_id
Cross-cutting Safety, Eval & Lifecycle
Cross-cutting Safety, Eval & Lifecycle
Inbound / Outbound · Routing
Foundation Models
Cognitive Capabilities
Decoding / Output
Caching / Inference Engine
Adaptation
Safety & Lifecycle
Forward inference flow
Reflection / continual learning
Detailed view of Layer 4 — Reasoning Core: Foundation Models & Cognition from the Agentic AI System Architecture reference.
Inference requests flow from Layer 3 through model routing, the foundation-model pool, cognitive capabilities (extended thinking, tool-use, structured output, multimodal, reflection, ICL), decoding/cache controls, the adaptation stack, and the inference engine, returning a typed result bundle to the orchestrator. Telemetry & reflection signals feed Layers 10 (Reflection) and 11 (Governance).
Agentic AI System Architecture › Layer 5 Detail
Memory Subsystem
Multi-tier memory that gives the agent continuity, personalization, and learning across turns, sessions, and lifetimes — working, episodic, semantic, and procedural memory backed by vector, graph, key-value, and document stores, with a memory manager that reads, writes, consolidates, and forgets.
Detailed Diagram · v1.0 · 2026
⇄ Memory Operations from Layer 3 (Orchestrator) and Layer 4 (Reasoning Core)
read · write · upsert · update · forget · consolidate · search · subscribe — scoped to user, project, tenant, agent
read(query, scope, k)
write(item, type, scope, ttl)
update(id, patch, evidence)
forget(scope · subject · GDPR)
consolidate(window)
subscribe(event)
A · Memory Manager — The Memory Control Plane
A unified API on top of heterogeneous stores — handles routing, scoping, consistency, and lifecycle
Memory Manager (Controller)
Single entry-point · scope resolution · ACL · transactions · cross-store fan-out
READ
WRITE
UPDATE
FORGET
CONSOLIDATE
Scope Resolver
user · project · org · agent · global
Access Control
RBAC / ABAC · row-level · ACL filters
Routing & Sharding
Pick store by type / size / region
Consistency & Txn
eventual · read-your-write · 2PC
Conflict Resolver
Recency · evidence-weighted merge
Versioning & Audit
Provenance · immutable history
B · Memory Types — A Cognitive-Inspired Taxonomy
Specialized memory tiers, each with its own write triggers, retrieval pattern, and lifetime
Working / Context Memory
Live conversation buffer
· Current turn / run state
· Tool I/O scratchpad
· Compaction summaries
· Pinned items
· Lifetime: minutes–hours
· Storage: in-context · KV
Volatile · session-scoped
Episodic Memory
"What happened when"
· Past sessions / runs
· Trajectories & outcomes
· Time-stamped events
· User interactions log
· Lifetime: weeks–years
· Storage: vector + KV
Persistent · timeline-ordered
Semantic Memory
Facts & concepts
· User profile · preferences
· Project / domain knowledge
· Entities · relations · taxonomy
· Distilled from episodes
· Lifetime: long / permanent
· Storage: KG + vector
Persistent · timeless
Procedural Memory
Skills · workflows · "how"
· Reusable tool sequences
· Skill / recipe library
· Plan templates
· Voyager-style distillation
· Lifetime: long · versioned
· Storage: doc + repo
Executable artifacts
Affective / Persona Memory
User mood · style · trust
· Communication style
· Tone · formality
· Frustration / engagement
· Relationship trust score
· Lifetime: rolling
· Storage: KV / profile
Personalization layer
Shared / Org Memory
Cross-user knowledge
· Team playbooks
· CLAUDE.md / repo notes
· Curated FAQs
· Lessons learned
· Lifetime: long · governed
· Storage: docs + KG
Org-shared knowledge
C · Storage Backends — Polyglot Persistence
Use the right database for each access pattern; the manager hides which is which
Vector Stores
Semantic similarity search
· pgvector · Pinecone
· Weaviate · Qdrant
· Milvus · Chroma · LanceDB
· HNSW · IVF · DiskANN
· Quantization · binary
Knowledge Graphs
Entities · relations · paths
· Neo4j · ArangoDB
· Memgraph · NebulaGraph
· RDF · SPARQL stores
· GraphRAG-friendly
· Property + temporal edges
KV / Cache
Hot, fast, simple
· Redis · KeyDB · Dragonfly
· DynamoDB · Cosmos DB
· Memcached
· TTL · LRU eviction
· Pub/sub for invalidation
Document Stores
Rich nested objects
· MongoDB · Couchbase
· Firestore · Elastic
· OpenSearch (BM25)
· JSONB / Postgres
· Object storage (S3, R2)
Relational / OLTP
Strong consistency, joins
· Postgres · MySQL
· CockroachDB · Spanner
· Schema-validated facts
· Audit / version tables
· Row-level security
Time-Series & Event
Append-only timelines
· TimescaleDB · InfluxDB
· Kafka · Pulsar · NATS
· Event-sourced runs
· CDC streams
· Replay-able trajectories
D · Encoding & Indexing Pipeline (Write Path)
Turn raw events into searchable, structured, deduplicated memory items
Capture & Normalize
From traces · turns · tools
Schema-canonical events
Stable IDs · timestamps
Chunking & Summarize
Semantic / sliding windows
Hierarchical summaries
Headline + body + facts
Importance Scorer
Should we remember this?
Surprise · novelty · utility
User-flagged · pinned
Embedding Pipeline
Dense + sparse vectors
BGE · E5 · Voyage · OpenAI
Multi-vector / ColBERT
Entity & Relation Extractor
Triples for the KG
Linker · canonicalizer
Coreference resolution
Index Builder
HNSW · IVF · BM25
Field metadata indexes
Async + batch builds
E · Retrieval & Recall (Read Path)
Surface the right memories at the right time, with provenance and freshness
Query Planner
Decide which stores / types
Multi-query expansion · HyDE
Query rewriting
Hybrid Search
BM25 + dense + KG hops
Reciprocal-rank fusion
Field filters · ACL filter
Re-ranker
Cross-encoder · LLM-rerank
Recency / freshness boost
Diversity (MMR)
Salience Scorer
Relevance · importance
Decay function (Ebbinghaus)
Per-user weighting
Provenance & Citation
Source IDs · timestamps
Confidence per item
Trust labels carried
Read Cache
Semantic / exact
TTL · invalidation hooks
Per-scope keys
F · Memory Lifecycle — Consolidation, Update, Forgetting
Memory must change: episodes get distilled into facts, stale knowledge gets revised, and what shouldn't persist must be removed
Consolidation
Episodic → Semantic
· Periodic distillation jobs
· LLM-based summarizer
· Pattern → general fact
· Sleep-cycle inspired
· Hot → warm → cold tiers
Update & Belief Revision
Keep facts current
· Newer evidence wins
· Contradiction detector
· Soft / hard updates
· Provenance preservation
· Conflict resolution policy
Forgetting / Deletion
Bounded growth, compliance
· TTL expiration
· Decay curves · LRU
· Right-to-be-forgotten
· User opt-out / opt-in
· Cascading delete (KG)
Reflection & Skill Distill
Episodes → procedures
· Voyager-style skills
· Lessons learned
· Reflexion notes
· Recipe extraction
· Promote to org memory
De-duplication
Avoid memory bloat
· Near-duplicate detection
· SimHash / embedding sim
· Merge duplicates
· Canonicalize entities
· Compaction passes
Tiering & Archival
Cost-optimized storage
· Hot RAM / NVMe
· Warm SSD
· Cold object store
· Glacier / deep archive
· Promote on access
G · Privacy, Security & Compliance
Memory holds the most sensitive long-lived data — protect, scope, and prove control
Encryption & Keys
At-rest · in-transit · in-use
· KMS / HSM-managed keys
· Per-tenant key isolation
· BYOK / HYOK options
· Confidential compute
Access Control & Scoping
Least-privilege everywhere
· Row / namespace ACL
· Tenant isolation
· Per-agent token scopes
· Cross-tenant leakage tests
PII & DLP
Detect, redact, vault
· PII classifier on write
· Tokenization vault
· Differential privacy
· Sensitive-field masking
Consent & Residency
Honor user intent & law
· Memory opt-in / opt-out
· Region pinning (EU/US/APAC)
· Train-on-data flags
· Retention policy enforcement
Right-to-Be-Forgotten
GDPR / CCPA / CPRA
· Subject-erasure request
· Cascade across stores
· Retraining-aware deletion
· Tombstones & receipts
Audit & Compliance
Every read & write traced
· Signed, immutable log
· SOC 2 · HIPAA · ISO 27001
· Data lineage graph
· Compliance dashboards
H · Operations, Observability & Quality
Make memory measurable, debuggable, and reliable in production
Telemetry
Read / write / hit-rate
Latency P50 / P95
Memory Health
Drift · staleness · bloat
Index integrity checks
Backup & DR
Snapshots · PITR
Cross-region replicas
Quality Evals
Recall@K · MRR · NDCG
A/B retrieval experiments
Cost Monitoring
Storage / IO / embedding $
Per-tenant chargeback
Schema Migration
Embedding model upgrades
Re-indexing pipelines
I · Personalization & Memory APIs
How other layers consume memory — typed, scoped, and traceable
Profile API
User · org · agent profile
Get / patch · merge logic
Search API
Semantic / hybrid query
Filtered & scoped
Skill / Recipe API
Procedural memory access
Versioned look-ups
Event Stream
Memory-changed events
Subscribe · webhook
Admin / DSAR API
Export · erase · audit
User self-service portal
Personalization Hooks
Inject context per request
Style · preferences · history
⇄ Cross-Layer Integrations
Memory is consumed by, and feeds, every neighboring layer
↔ Layer 2 · Perception
Profile · history selector
Few-shot retrieval
↔ Layer 3 · Orchestrator
Plan repository
Run trajectories
↔ Layer 4 · Reasoning
Working / scratchpad
Persona & style cues
↔ Layer 7 · RAG
Shared vector / KG indexes
Curated knowledge facts
↔ Layer 10 · Reflection
Lessons in · skills out
Trajectory mining
↔ Layer 11 · Governance
Audit · DSAR · policy
Compliance evidence
All exchanges are scoped, ACL-checked, traced, and logged through the Memory Manager.
Cross-cutting Privacy, Audit & Lifecycle
Cross-cutting Privacy, Audit & Lifecycle
Memory Manager / Lifecycle
Memory Types
Storage Backends
Write Pipeline
Read / APIs
Privacy & Compliance
Ops & Observability
Forward flow
Read path
Reflection / skill loop
Detailed view of Layer 5 — Memory Subsystem from the Agentic AI System Architecture reference.
All memory operations flow through a single Memory Manager that fans out to typed memory tiers (working, episodic, semantic, procedural, affective, shared) backed by polyglot stores. Write & read pipelines, a lifecycle for consolidation/update/forgetting, privacy & compliance controls, and observability surround the manager. Skill distillation feeds procedural memory back into the Reasoning Core (Layer 4) and Reflection (Layer 10).
Layer 6 Tools, Skills & Capabilities
Agentic AI System Architecture › Layer 6 Detail
Tools, Skills & Capabilities
Composable actions the agent can invoke through standardized interfaces — a registry of tools, MCP servers, and skills, fronted by a hardened gateway that handles auth, validation, sandboxing, retries, and observability for every external call.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Tool Call from Layer 3 (Orchestrator) / Layer 4 (Reasoning)
tool_name · arguments · principal · scopes · trace_id · idempotency_key · deadline · retry_policy · trust labels
{ tool: "get_pull_request", arguments: {...}, ctx: { trace_id, principal, scopes, deadline, idem_key, trust: "tool" } }
A · Tool Gateway — The Universal Adapter
Every tool invocation is normalized, authorized, validated, executed, and traced through this gateway
Tool Gateway / Skill Runtime
Single entry-point · spec resolution · auth · validation · invocation · result normalization
RESOLVE
spec / schema
AUTHORIZE
scopes · policy
VALIDATE
args · types
INVOKE
execute
SHAPE
result · trace
Schema Validation
JSON Schema · Pydantic · Zod
Auth & Scope Check
OAuth · OIDC · token vault
Policy Pre-flight
OPA / Rego · risk & HITL
Idempotency · de-dup · request signing
Result Normalizer
Stable schema · trim · redact
Retry & Backoff
Exponential · circuit breaker
Trace & Cost Emit
OTel spans · token / $ meters
Streaming results · pagination · partial outputs
B · Tool Registry, Specs & Discovery
A versioned catalog of available tools, MCP servers, and skills — what they do, how to call them, and who can use them
Spec Registry
Source of truth
· OpenAPI · JSON Schema
· MCP tool descriptors
· Examples · cost hints
· Side-effect labels
Discovery / Indexing
Find right tool fast
· Embedding-based retrieval
· Tag · category · capability
· MCP server enumeration
· JIT spec injection
Versioning & Lifecycle
Evolve safely
· SemVer per tool
· Canary · rollback
· Deprecation windows
· Backwards-compat tests
Capability Matcher
Plan → tools mapping
· Required vs optional
· Pre/post conditions
· Cost / latency profile
· Substitutable equivalents
Permissions Matrix
Who can call what
· Allow / deny lists
· Per-tenant overrides
· Risk-tier gating
· HITL-required flag
Marketplace
Distribution & sharing
· Internal tool hub
· Public MCP registry
· Signed publisher
· Reviews · ratings
C · Tool Categories — The Capability Surface
A catalog of what an agent can do — grouped by domain, each adapter conforming to the gateway's contract
Web & Browsing
Read the open internet
· Web search (Bing · Google · Brave)
· URL fetch · readability extract
· Browser agent (Playwright)
· Computer-use UI control
· Form fill · click · navigate
· Screenshot & DOM capture
· Headless · headful modes
· Crawl + sitemap traversal
· robots.txt & ToS aware
Code Execution
Compute · transform · test
· Python / Node / Bash REPL
· Code interpreter
· Notebook (Jupyter)
· Compiler / linter / formatter
· Test runner · fuzzers
· Build & package tools
· Container exec · SSH
· Static analysis · SAST
· Math / symbolic (SymPy)
File & Repo
Code & document operations
· Read · Write · Edit · Glob
· grep · ripgrep · ast-grep
· git · diff · patch · blame
· GitHub / GitLab / Bitbucket
· PR / commit / branch ops
· LSP symbols · tree-sitter
· Object storage (S3 · GCS · R2)
· File conversion (PDF · DOCX)
· Diff & merge tooling
External APIs
SaaS & partner systems
· REST · GraphQL · gRPC
· Webhooks (in & out)
· Stripe · Twilio · SendGrid
· Salesforce · HubSpot
· Google / Microsoft Graph
· OpenAPI auto-clients
· OAuth flow handler
· SDK adapters (Python · TS)
· Mock / sandbox endpoints
Data & Databases
Read & write structured data
· SQL (Postgres · MySQL)
· NoSQL (Mongo · Dynamo)
· Warehouse (BigQuery · Snowflake)
· Vector / KG queries
· Read-only safe-mode
· DDL gated by approval
· Query plan inspector
· dbt · Airflow runs
· CSV / Excel I/O
Communication
Reach humans & teams
· Email · SMS · Push
· Slack · Teams · Discord
· Calendar / meeting invite
· Voice call (Twilio)
· Pager / on-call (PagerDuty)
· Templates · approvals
· Localization aware
· Quiet-hours respect
· Send-rate caps
C · Tool Categories — Continued
Workflow integrations, AI specialists, knowledge access, computer use, and physical-world adapters
Workflow & PM
Tickets · docs · planning
· Jira · Linear · Asana
· Notion · Confluence
· Google Docs · Office 365
· CI/CD (GitHub Actions)
· Terraform · Ansible
· Status pages · runbooks
AI Specialist Models
Models exposed as tools
· Vision (OCR · detection)
· Speech (ASR · TTS · diarize)
· Image / video generation
· Translation · summarize
· Embedders · re-rankers
· Classifiers · NER · safety
Knowledge & Search
Internal & curated knowledge
· Vector / hybrid retrievers
· KG / GraphRAG queries
· Wikipedia · Wolfram
· Research (arXiv · PubMed)
· Internal wikis · runbooks
· Maps · weather · finance
Computer Use
Operate desktop / mobile
· Screen + keyboard + mouse
· OS-level automation
· Accessibility tree access
· VNC / RDP isolated VM
· Mobile emulator control
· Action recorder & replay
Enterprise Systems
Systems of record
· CRM · ERP · ITSM
· HRIS · billing · payroll
· Identity / IDM (Okta · AAD)
· Data lake / lakehouse
· SOC / SIEM · monitoring
· EHR · LIS (regulated)
Physical / IoT
Real-world actuation
· Robotics control APIs
· Sensor read · actuator
· Smart-home (Matter)
· Industrial PLC / SCADA
· Drone / vehicle telemetry
· Edge / on-device runtime
D · Skills — Composed, Reusable Capabilities
Higher-level building blocks: prompts + tools + sub-flows packaged as named, versioned skills
Skill Definition
SKILL.md · system prompt
Triggers · examples · args
Required tools manifest
Skill Composition
Sub-graphs · pipelines
Sequenced tool calls
Conditional branches
Trigger Engine
Auto-load on intent
Slash commands
Path / context match
Skill Library
Built-in · org · personal
Marketplace import
Versioned & signed
Distillation
Promote successful runs
Voyager-style learning
From procedural memory
Runtime Sandbox
Scoped tool subset
Per-skill budget
Isolated state
E · MCP Servers — The Standardized Tool Protocol
Model Context Protocol — open standard for exposing tools, resources, and prompts to any agent
Server Registry
Discover available servers
Local · remote · cloud
Capability negotiation
Transport & Session
stdio · SSE · WebSocket
JSON-RPC 2.0 framing
Bi-di streaming
Resources & Prompts
Files · URIs · templates
Sampling requests
Subscribe / notify
Server Catalog
GitHub · GitLab · Slack
Filesystem · DB · Search
Custom enterprise servers
Trust & Sandboxing
Per-server permissions
Signed publishers
Capability review
SDK & Hosting
Python · TS · Rust SDKs
Docker · serverless
Multi-tenant gateways
F · Execution Environment & Sandboxing
Where tools actually run — isolated, limited, observable, and recoverable
Sandbox Runtimes
Hard isolation per call
· Containers (Docker · OCI)
· MicroVMs (Firecracker)
· gVisor · Kata · WASM
· Browser-based VMs (E2B)
· Ephemeral · per-task
Resource Limits
Bound blast radius
· CPU · RAM · disk caps
· Wall-clock timeouts
· Egress allow / deny
· File-system quotas
· Process count limits
Network Policy
Egress & DNS control
· Domain allow-list
· No-egress mode
· Outbound proxy & logs
· Service-mesh mTLS
· Rate-limit per host
State & Persistence
Workspace lifecycle
· Scratch FS per run
· Persistent volumes
· Snapshot & restore
· Worktree isolation (git)
· Auto-cleanup TTL
Concurrency & Pooling
Throughput & warm starts
· Sandbox warm pool
· Per-tool concurrency cap
· Connection pooling
· Backpressure signaling
· Cold-start optimization
Adapters & Drivers
Speak each tool's protocol
· HTTP / gRPC / WS clients
· DB drivers · ODBC / JDBC
· SDK wrappers
· Protocol bridges
· Mock / replay drivers
G · Security, Trust & Risk Controls
Tool calls are the highest-risk surface — defend against injection, exfiltration, and over-permission
Secret & Token Vault
Short-lived credentials
· HashiCorp Vault · KMS
· OAuth token exchange
· Just-in-time credentials
· Rotate & revoke
Risky-Action Gate
Reversibility check
· Destructive ops require HITL
· Two-person rule
· Dry-run / what-if
· Step-up auth
Injection Defense
Trust-boundary aware
· Quarantine tool results
· No instruction-following
· Spotlighting / delimiters
· SSRF / SQLi guards
DLP & Egress Filter
Block exfiltration
· Outbound PII scan
· Secret pattern detection
· Tenant-data scoping
· URL allow-listing
Anti-Abuse
Detect bad behavior
· Anomaly detection
· Quota / spike alarms
· Honeypot tools
· Auto-disable rogue agent
Compliance Hooks
Regulated tool use
· SOC 2 · HIPAA · PCI
· Region-bound tool routing
· Tool-level audit evidence
· Data-residency proofs
H · Reliability & Observability
Make every tool call diagnosable, replayable, and within SLO
Idempotency Keys
De-dup retried calls
Retries & Backoff
Jittered exponential
Circuit Breakers
Per tool / endpoint
Tracing & Spans
OTel · LangSmith
Caching
Result · semantic
Cost & Latency Meters
P50 / P95 · $ per call
Replay & Debug
Recorded I/O
SLO Tracking
Error budget burn
⇣ Outbound — Tool Result & Effect to Layer 9 (Action / Environment)
Normalized result · side-effect record · provenance · latency & cost · trust label
Result Object
Schema-conformant
Side-Effect Log
What changed
Provenance
Source · timestamp
Trust Label
"tool" untrusted
Compensations
Rollback hooks
Usage Stats
Tokens · $ · ms
Citations
URLs · refs
Trace ID
For replay
Cross-cutting Auth, Sandbox & Audit
Cross-cutting Auth, Sandbox & Audit
Tool Gateway
Registry / MCP
Tool Categories
Skills
Sandboxing
Security & Trust
Reliability / Observability
Forward call flow
Tool result return
Skill distillation
Detailed view of Layer 6 — Tools, Skills & Capabilities from the Agentic AI System Architecture reference.
All tool invocations flow through a single Tool Gateway that resolves specs from a versioned registry, enforces auth and policy, validates arguments, executes inside hardened sandboxes, and emits a normalized result with traces, costs, and side-effect logs. Skills and MCP servers extend the catalog with composable capabilities; security and observability wrap every call.
Layer 7 Knowledge & Retrieval (RAG)
Agentic AI System Architecture › Layer 7 Detail
Knowledge & Retrieval (RAG)
Ground the agent in fresh, verifiable knowledge — connectors, ingestion, embeddings, indexes, hybrid retrieval, advanced RAG patterns, faithfulness checking, and citation-aware delivery — turning raw sources into trusted, traceable context.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Retrieval Request from Layer 2 (Perception) / Layer 3 (Orchestrator) / Layer 4 (Reasoning)
query · intent · scopes (user/tenant/project) · ACLs · k · filters · freshness · trace_id · budget · response shape
retrieve(query, scope, k=20, filters={recency, ACL, source}, mode=hybrid, freshness=24h, with_citations=true)
A · Knowledge Sources & Connectors
All authoritative knowledge surfaces — internal, external, structured, unstructured — flow in through governed connectors
Internal Docs & Wiki
Tribal knowledge
· Confluence · Notion
· SharePoint · Coda
· Google Docs · Drive
· Quip · Bear · Obsidian
· Internal handbooks
· Onboarding guides
Code & Repos
Source-grounded answers
· GitHub · GitLab · Bitbucket
· Source files + symbols
· README · CLAUDE.md
· PRs · issues · discussions
· Commit history
· API docstrings
Tickets & Runbooks
Operational know-how
· Jira · Linear · ServiceNow
· Zendesk · Freshdesk · HelpScout
· PagerDuty post-mortems
· Runbooks · playbooks
· Incident timelines
· Change requests
Communications
Conversational record
· Slack · Teams · Discord
· Email (Gmail · O365)
· Meeting transcripts
· Chat threads
· Customer call notes
· Forum posts
Structured Data
Systems of record
· OLTP / SQL DBs
· Warehouses (Snowflake · BQ)
· CRM · ERP · ITSM
· Data lakes / lakehouses
· APIs (Text-to-SQL, NL2API)
· CSV / spreadsheets
External / Public Web
Open knowledge
· Live web search
· Crawled domains
· Wikipedia · Wikidata
· arXiv · PubMed · SSRN
· News · regulatory feeds
· Industry datasets
B · Ingestion & Document Processing Pipeline
From raw source to clean, chunked, enriched documents — the foundation of retrieval quality
Connectors & Loaders
Ingest from each source
· OAuth-scoped access
· Full crawl + delta sync
· CDC / change feeds
· Webhook event push
· Permission propagation
· Source provenance tags
Parsing & Extraction
Get the structure right
· Layout-aware PDF (Unstructured)
· DOCX · PPTX · XLSX
· HTML cleanup · readability
· OCR (Tesseract · docTR)
· Table & figure extraction
· Audio / video transcription
Cleaning & Dedup
Reduce noise & bloat
· Boilerplate stripping
· Near-duplicate (SimHash)
· Encoding normalization
· Language detection
· Quality score · spam filter
· Translation (optional)
Chunking Strategies
Right-size for retrieval
· Fixed token / overlap
· Semantic / sentence split
· Hierarchical (parent / child)
· Markdown heading-aware
· Code: AST-based split
· Late-chunking with context
Metadata Enrichment
Make filtering precise
· Source · author · date
· ACL · sensitivity tags
· LLM-generated summary
· Auto-generated questions
· Entities · keywords · topics
· Section path · breadcrumbs
Sync & Freshness
Keep the index live
· Incremental updates
· Tombstones · soft delete
· Re-index on schema change
· Embedding-model upgrade
· Backfill orchestration
· DLQ · failed-doc replay
C · Embeddings & Index Construction
Multiple complementary indexes — dense, sparse, graph — for hybrid retrieval
Embedding Models
Multi-model strategy
· OpenAI · Voyage · Cohere
· BGE · E5 · GTE · Jina
· Multimodal (CLIP · SigLIP)
· Multi-vector (ColBERT · Late)
· Matryoshka (truncatable)
· Domain fine-tunes
Vector Indexes
Scalable ANN search
· pgvector · Pinecone
· Weaviate · Qdrant · Milvus
· Vespa · Turbopuffer
· HNSW · IVF · DiskANN
· PQ · binary quantization
· Per-tenant namespacing
Sparse / Lexical Index
Exact-match recall
· BM25 / Okapi
· Elasticsearch · OpenSearch
· Tantivy · Quickwit
· SPLADE · uniCOIL learned-sparse
· Token n-grams · synonyms
· Field boosts · phrase
Knowledge Graph
Relations & multi-hop
· Entity / relation extraction
· Neo4j · ArangoDB · Memgraph
· RDF · SPARQL stores
· Community detection (GraphRAG)
· Schema · ontology
· Temporal & provenance edges
Metadata & Filter Index
Pre-filter at scale
· ACL bitmap / posting list
· Date / numeric ranges
· Source / type facets
· Geospatial (R-tree · S2)
· Tenant / workspace shard
· Field-level tokenizers
Index Operations
Build · update · evolve
· Async batch builds
· Streaming upserts
· Blue-green re-index
· Snapshot & restore
· Compaction · vacuum
· Per-store sharding
D · Query Understanding & Expansion
Turn the raw user / agent query into multiple, well-formed search inputs that hit the right indexes
Query Rewriting
Make queries searchable
· Pronoun resolution
· History-aware rewriting
· Synonym expansion
· Acronym expansion
· Spell / typo correction
Multi-Query & HyDE
Cover the answer space
· LLM generates N variants
· Sub-question decomposition
· HyDE: hypothetical doc
· Step-back prompting
· Translate query to index lang
Routing & Filtering
Pick the right haystacks
· Source classifier
· Index router (per query)
· Metadata filters (ACL · date)
· Tenant / project scope
· Mode select (text · KG · SQL)
E · Retrieval Engine — Hybrid Search & Re-ranking
Run parallel searches across stores, fuse, re-rank, and diversify into a final candidate set
Hybrid Searcher
Dense + sparse + KG
· Parallel index queries
· KG multi-hop expansion
· Score normalization
· Reciprocal-Rank Fusion (RRF)
· Per-source weights
Re-ranker
Boost real relevance
· Cross-encoder (BGE · Cohere)
· LLM-as-rerank (listwise)
· Recency / freshness boost
· Authority / source weight
· Click / engagement signals
Diversify & Compress
Pack the best signal
· MMR diversification
· Cluster & pick
· Extractive snippeting
· Map-reduce summarize
· Token-budget enforce
F · Advanced RAG Patterns
Move beyond single-shot RAG: agentic, corrective, graph, and multi-hop strategies
Agentic RAG
Retrieval as a tool
· Iterative retrieve + reason
· Decide when / what to fetch
· Multi-step exploration
· ReAct-style retrieval
· Tool-using sub-agents
Self-RAG / Self-Critique
Retrieve only when needed
· Need-retrieval classifier
· Self-reflection tokens
· Critique & revise
· Score support per claim
· Skip when high confidence
Corrective RAG (CRAG)
Recover from bad retrieval
· Retrieval-quality grader
· Fall back to web search
· Re-query with new terms
· Decompose & recombine
· Abstain when unsure
GraphRAG
Multi-hop & community
· Entity graph from corpus
· Community summaries
· Local + global search
· Path / hop reasoning
· Schema-guided retrieval
Multi-Hop & Decomposition
Answer compound queries
· Sub-question retrieval
· Iterative refinement
· Evidence chaining
· Plan-and-retrieve
· FLARE active retrieval
Hierarchical & RAPTOR
Tree-of-summaries
· Cluster & summarize tree
· Parent-child retrieval
· Coarse-to-fine drill-down
· Section · doc · corpus levels
· Long-corpus efficient
G · Faithfulness, Citations & Hallucination Control
Make answers verifiable — every claim grounded in a source the user can check
Provenance Tracker
Lineage end-to-end
· Doc · chunk · char-span IDs
· Author · timestamp
· Source URL · version
· Trust label per source
Citation Generator
Inline source links
· Sentence-level cites
· Span-level highlighting
· Click-through-able URLs
· Bibliography assembly
Faithfulness Verifier
Does answer match sources?
· NLI / entailment scoring
· Claim → evidence map
· LLM-as-judge faithfulness
· Refuse on low support
Hallucination Probes
Catch ungrounded claims
· Cross-source verifier
· Self-check QA
· Numeric / fact extractor
· Confidence calibration
Conflict & Recency
When sources disagree
· Newer source preference
· Authority weighting
· Surface conflicts to user
· Disagreement marker
Abstention & Refusal
Know when to say "I don't know"
· No-evidence threshold
· Out-of-scope detector
· Suggest follow-up
· Human escalation hook
H · Governance, ACL, Privacy & Compliance
Retrieved knowledge inherits source permissions and policy — never expose more than the user is allowed to see
ACL Propagation
Source perms → index
· User · group · folder ACL
· Live permission lookup
· Pre-filter at search time
PII / DLP
Detect & redact
· Sensitive-field masking
· Tokenization vault
· Sector-specific policies
Residency & Sovereignty
Region-bound data
· Index per region
· Geo-sharded retrieval
· Sovereign-cloud routing
Source Trust Tags
Untrusted by default
· "external content" label
· No-instruction-follow rule
· Spotlighting / delimiters
Audit & Lineage
Who saw what, when
· Query logs (signed)
· Result lineage graph
· DSAR · evidence pack
Retention & TTL
Bounded shelf-life
· Per-source TTL
· Right-to-be-forgotten
· Tombstones cascade
I · Operations & Retrieval Quality
Make RAG measurable, debuggable, and continuously improving
Retrieval Evals
Recall@K · MRR · NDCG
RAG Metrics
Faithfulness · context P/R
A/B Experiments
Embedding · chunk · prompt
Drift & Anomaly
Stale index · empty results
Cost & Latency
P50 · P95 · $ per query
Caching
Embed · query · result
Feedback Loop
👍 / 👎 · click-through
Eval Datasets
Golden Q/A · regression
⇣ Outbound — Grounded Context to Layer 4 (Reasoning) / Layer 3 (Orchestrator)
Ranked passages · citations · faithfulness scores · trust labels · usage stats — ready for prompt assembly
Passages[]
id · text · score
Citations
URLs · spans · authors
Faithfulness
Per-claim score
Trust Labels
Untrusted content
Coverage Report
Gaps · conflicts
Usage Stats
Tokens · ms · $
Trace ID
Replay key
Abstain Flag
Low coverage signal
Cross-cutting ACL, Audit & Compliance
Cross-cutting ACL, Audit & Compliance
Knowledge Sources
Ingestion / Operations
Embeddings / Indexes
Query Understanding
Retrieval / Faithfulness
Advanced RAG Patterns
Governance & Compliance
Forward retrieval flow
Grounded context return
Corrective & feedback loops
Detailed view of Layer 7 — Knowledge & Retrieval (RAG) from the Agentic AI System Architecture reference.
Knowledge flows top-down from heterogeneous sources through ingestion, polyglot indexes, query understanding, hybrid retrieval, advanced RAG patterns, and faithfulness checking, returning trust-labeled grounded context with citations to the Reasoning Core. Corrective & feedback loops continuously improve retrieval quality.
Layer 8 Multi-Agent Collaboration
Agentic AI System Architecture › Layer 8 Detail
Multi-Agent Collaboration
Specialized agents cooperating, debating, and verifying each other's work — coordination patterns, agent roles, communication protocols, lifecycle management, consensus, and trust controls that turn a swarm of agents into a reliable team.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Delegation Request from Layer 3 (Orchestrator)
complex goal · sub-task DAG · required roles · budget · deadline · trust scope · coordination preference · result schema
spawn_team({ goal, pattern: "supervisor", roles: ["researcher", "coder", "critic"], budget, deadline, trust_scope })
A · Coordination Patterns & Topologies
Pick the right organizational shape for the task — each pattern trades autonomy, parallelism, and quality differently
Supervisor / Hierarchical
Top-down delegation
· Single supervisor agent
· Specialist sub-agents
· Predictable accountability
· Easy to debug
· Best for clear goals
Pipeline / Sequential
Stage-by-stage handoff
· Each agent → next stage
· Strict ordering
· Schema-checked transitions
· Ideal for ETL / workflows
· Easy retry per stage
Debate / Dialectic
Adversarial verification
· Proposer vs critic
· Multi-round argument
· Judge / referee agent
· Reduces hallucinations
· High-stakes decisions
Swarm / Peer
Decentralized cooperation
· No central authority
· Hand-off-capable peers
· Emergent specialization
· Resilient · scalable
· Open-ended exploration
Blackboard
Shared workspace
· Common knowledge state
· Agents read & write
· Trigger on changes
· Loose coupling
· Heterogeneous experts
Contract Net
Bid-based assignment
· Manager broadcasts task
· Workers bid · capability
· Best bid wins · contract
· Marketplace dynamics
· Cross-org agents
B · Supervisor / Coordinator — The Team Leader
Spawns the team, distributes work, monitors progress, aggregates results, and decides when the team is done
Multi-Agent Coordinator
Pattern selector · agent factory · message broker · result aggregator
PLAN
team & tasks
DELEGATE
assign work
MONITOR
progress
AGGREGATE
merge results
CONCLUDE
finalize
Pattern Selector
Choose topology by task
Team Composer
Pick roles · models · skills
Budget Allocator
Tokens · time · $ per agent
Progress Tracker
Per-agent status & SLA
Termination Manager
When the team is "done"
Escalation Hooks
HITL · admin · stop-the-team
C · Agent Roles & Specialist Personas
A library of role templates — each with its own system prompt, tools, model, memory scope, and quality bar
Planner / Architect
High-level decomposition
· Goal → sub-goals
· DAG / step graph
· Risk & budget plan
· Strong reasoning model
· Plan repository access
· Re-plan on failure
Researcher
Find & synthesize info
· Web · KB · RAG access
· Citation discipline
· Multi-source synthesis
· Long-context model
· Read-only tool scope
· Coverage / gap reports
Coder / Builder
Write & test code
· Edit · run · test
· Debug · refactor
· Sandboxed exec scope
· Repo & PR tools
· Code-tuned model
· Writes worktree branch
Critic / Reviewer
Verify & challenge
· Independent context
· Rubric · checklist
· Score · pass / fail
· LLM-as-Judge persona
· No-write tool scope
· Red-team variant
Domain Experts
Vertical-specific knowledge
· Legal · medical · finance
· DevOps · security · QA
· Domain-tuned models
· Curated knowledge base
· Compliance-aware
· Regulated tool sets
Operator / Executor
Take action in the world
· Computer-use tools
· Browser / GUI control
· Enterprise app actions
· HITL gating
· Compensating actions
· Receipts & logs
D · Inter-Agent Communication Protocols
Standardized message formats and transports — typed, signed, traceable, and replay-safe
A2A Protocol
Agent-to-Agent
Capability-card discovery
Cross-vendor / org agents
MCP Client Calls
Tool-style sub-agent
Sub-agent as MCP server
Schema-typed I/O
ACP / Custom RPC
Internal team protocol
gRPC · JSON-RPC
Strong typing · streaming
Message Bus
Pub/sub · queues
Kafka · NATS · Redis Streams
Topic-per-conversation
Shared Scratchpad
Blackboard / KV
CRDT for concurrent edits
Subscribe to changes
Message Envelope
Standard wrapper
from · to · trace · sig
trust · replyTo · ttl
E · Agent Lifecycle & State Management
Spawn safely, scope tightly, checkpoint reliably, and tear down cleanly
Spawn Factory
Instantiate sub-agent
Role · model · tools · prompt
Inherited / scoped context
Permission Inheritance
Least-privilege subset
Token attenuation
Tool / data scope clamp
Isolated Runtime
Sandbox · worktree · VM
No leak between agents
Per-agent secrets
Checkpoint & Resume
Durable per-agent state
Pause · serialize · re-hydrate
Cross-host failover
Health & Watchdog
Liveness probes
Stuck-loop detection
Auto-respawn / abort
Graceful Termination
Drain · finalize · cleanup
Hand-off in-flight tasks
Audit log per-agent
F · Coordination Mechanics — How Work Gets Done
Decompose, assign, run in parallel, aggregate, and resolve conflicts — the core team plumbing
Task Decomposition
Break the goal apart
· HTN-style sub-tasks
· Dependency DAG
· Parallel-safe units
· Per-task DoD
· Roll-up criteria
Task Assignment
Match work to agent
· Capability matching
· Bidding (contract net)
· Load balancing
· Sticky / affinity routing
· Reassignment on failure
Concurrency Control
Run agents in parallel
· Fork / join · barriers
· Map-reduce / fan-out
· Race · first-win
· Cancellation propagation
· Deadlock detection
Result Aggregation
Merge sub-agent outputs
· Schema-aware merge
· Citation preservation
· De-dup · canonicalize
· LLM synthesizer agent
· Coverage report
Consensus & Voting
When agents disagree
· Majority / weighted vote
· Confidence-weighted
· Judge / referee agent
· Self-consistency check
· Tie-breakers · HITL
Conflict Resolution
Reconcile contradictions
· Recency · authority
· Evidence weighting
· Surface to user
· Escalate to expert
· Compensating undo
G · Trust, Identity & Security Across Agents
Multi-agent systems amplify both capability and attack surface — every message and handoff must be authenticated and bounded
Agent Identity
Signed, attestable agents
· Cryptographic agent ID
· Workload identity (SPIFFE)
· Signed capability cards
· Agent provenance ledger
Message Authentication
Tamper-evident envelopes
· Signed JWT / DPoP
· Replay-protection nonce
· Origin agent verified
· mTLS on transport
Cross-Agent Injection
Treat peer output as untrusted
· No instruction-following
· Quarantine peer messages
· Trust labels carried
· Spotlighting / delimiters
Permission Attenuation
Sub-agents get less, never more
· Token down-scoping
· Read-only by default
· Tool allow-lists
· Data-scope clamping
Rogue-Agent Defense
Catch & isolate misbehavior
· Anomaly detection
· Loop / abuse detector
· Auto-quarantine
· Kill-switch · admin alert
Privacy & Data Boundaries
Don't leak between agents
· PII redaction in handoffs
· Tenant-scoped contexts
· Need-to-know filtering
· Cross-tenant blocks
H · Multi-Agent Frameworks & Runtimes
Production-ready stacks for orchestrating teams of agents
LangGraph / Agent SDK
Graph-based agent flows
Stateful · checkpointable
Anthropic / LangChain
CrewAI
Role-based crews
Tasks · processes · tools
Hierarchical / sequential
AutoGen / Magentic
Conversation-driven
Group chat patterns
Microsoft
OpenAI Swarm / Agents
Lightweight handoffs
Tool-driven routing
Stateless agents
Durable Execution
Temporal · Cadence · Restate
Replay-safe orchestration
Long-running multi-agent
Custom Runtimes
Bespoke controllers
Actor model · Akka · Ray
Ray Serve · Dapr
I · Observability & Multi-Agent Telemetry
Trace every agent, every message, every handoff — across the whole team
Distributed Tracing
OTel spans across agents
Conversation graph view
Per-agent sub-trace
Cost & Token Roll-up
Per agent · per team · total
Budget burn tracking
Top-spender attribution
Conversation Replay
Step through messages
Time-travel debug
Counterfactual re-runs
Team Health Metrics
Throughput · success rate
Per-role error rates
Idle / loop detection
Decision Logs
Why this agent · this task
Vote / consensus history
Plan diffs across team
Audit & Compliance
Signed team transcripts
Per-agent attribution
Evidence for review
⇣ Outbound — Aggregated Team Result to Layer 3 (Orchestrator)
Synthesized answer · per-agent contributions · consensus / dissent · citations · cost · trace · escalations
Final Answer
Schema-conformant synthesis
Per-Agent Contributions
Attribution · diffs
Consensus / Dissent
Vote tallies · open issues
Citations & Evidence
Source spans · trust labels
Team Trace
Conversation graph · spans
Cost & Escalations
Tokens · $ · HITL flags
All artifacts are signed, traced, and attributable to the originating agent.
Cross-cutting Identity, Trust & Telemetry
Cross-cutting Identity, Trust & Telemetry
Coordination Patterns / Frameworks
Supervisor / Coordinator
Agent Roles
Communication Protocols
Lifecycle / Mechanics
Trust & Security
Observability
Forward team flow
Inter-agent messaging
Critique / re-plan loop
Aggregated result return
Detailed view of Layer 8 — Multi-Agent Collaboration from the Agentic AI System Architecture reference.
A single Coordinator picks a topology, spawns specialist roles, brokers signed messages over typed protocols, manages lifecycle & permissions, runs concurrent tasks, aggregates results with consensus, and emits a single team artifact back to the orchestrator. Trust controls and observability span every agent and every handoff.
Layer 9 Action & Environment Interface
Agentic AI System Architecture › Layer 9 Detail
Action & Environment Interface
Where agents take real-world effects — through digital and physical environments. Pre-flight validation, isolated execution, side-effect tracking, compensating actions, receipts, and a reversible record of every change the agent makes.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Action Request from Layer 6 (Tools) / Layer 3 (Orchestrator)
action_type · target environment · arguments · principal · scopes · idempotency_key · deadline · risk_class · approval_token · trace_id
execute({ env: "browser", action: "submit_form", args: {...}, risk: "medium", reversible: true, approval: "auto", deadline: 30s })
A · Pre-Flight Gate — Decide Whether the Action May Proceed
Verify scope, classify risk, dry-run side effects, secure approvals — refuse early when in doubt
Risk Classifier
How dangerous is this?
· Reversible vs destructive
· Blast radius estimate
· Public · private · regulated
· Money / identity impact
Authorization & Scope
Right to act here?
· OAuth / OIDC scope check
· Tenant / project boundary
· Token attenuation
· Per-environment ACL
Dry-Run / Simulation
What would happen?
· What-if effect preview
· Plan mode (read-only)
· Diff before write
· Sandbox replay
Approval Gate (HITL)
Human in the loop
· Auto-approve · prompt user
· Two-person rule for high-risk
· Step-up auth (MFA)
· Approval token signing
Idempotency & Dedup
Don't apply twice
· Idempotency-Key header
· Request fingerprint
· Replay-detection window
· De-dup ledger
Compliance Pre-Check
Regulatory & policy
· Region / residency
· GDPR / HIPAA / PCI
· Quiet-hours respect
· Quota / budget caps
B · Effector — Universal Action Dispatcher
A single, audited entry-point that translates intent into environment-specific commands
Action Effector / Environment Bridge
Resolve env adapter · acquire lease · execute · capture effect · emit receipt
RESOLVE
env adapter
LEASE
workspace · slot
EXECUTE
command
CAPTURE
effects
RECEIPT
sign · log
Adapter Registry
Per-environment drivers
Action Schema Validator
Typed args · invariants
Lease & Concurrency
Per-resource locks · queues
Result Normalizer
Stable shape · trim · redact
Receipt Signer
Cryptographic evidence
Telemetry Emitter
OTel span · cost · latency
C · Environment Targets — Where Actions Land
The catalog of environments the agent can manipulate — each with its own driver, capabilities, and risk profile
Computer Use
Operate desktop / mobile
· Mouse · keyboard · screen
· Click · type · scroll · drag
· Accessibility tree
· OS shortcuts · clipboard
· Window / app focus
· VNC / RDP isolated VM
· Mobile emulator (iOS · Android)
· Action recorder & replay
· Visual grounding / OCR
Browser Agents
Operate the web
· Navigate · click · fill
· DOM & ARIA selectors
· Form auto-fill
· File upload · download
· Headless · headful
· Playwright · Puppeteer
· Cookie · session vault
· robots.txt & ToS aware
· CAPTCHA detection (refuse)
Code Sandboxes
Execute & build
· Python · Node · Bash
· Containers (Docker · OCI)
· MicroVMs (Firecracker)
· gVisor · Kata · WASM
· Browser-side VMs (E2B)
· Notebook (Jupyter)
· Build pipelines · CI runs
· Test suites · benchmarks
· Ephemeral & persistent
Enterprise Systems
Systems of record
· CRM · ERP · ITSM
· HRIS · billing · finance
· Salesforce · ServiceNow
· SAP · Workday · NetSuite
· Data lake / lakehouse
· DevOps (CI/CD · IaC)
· SOC / SIEM · monitoring
· Identity / IDM (Okta · AAD)
· EHR · LIS (regulated)
Physical / IoT
Real-world actuation
· Robotics control APIs
· Sensors · actuators
· Smart-home (Matter · HA)
· Industrial PLC / SCADA
· Drone / vehicle telemetry
· Edge / on-device runtime
· ROS / OPC-UA bridges
· Safety interlocks · e-stops
· Geo-fenced operation
Output Channels
Reach humans & teams
· Notifications · push
· Email · SMS · voice
· Slack · Teams posts
· Git commits · PRs
· Tickets · Jira · Linear
· Reports · dashboards
· Pager / on-call
· Status pages
· Templates · approvals
D · Isolation, Sandboxing & Resource Governance
Bound the blast radius — every action runs in a constrained environment with enforced limits
Sandbox Runtimes
Hard isolation per action
· Containers (Docker · OCI)
· MicroVMs (Firecracker · QEMU)
· gVisor · Kata · WASM
· Browser-tab VMs (E2B)
· Ephemeral · per-task
Resource Limits
CPU · RAM · disk · time
· cgroups · ulimits
· Wall-clock timeouts
· File-system quotas
· Process count limits
· Per-tenant capacity
Network Policy
Egress & DNS control
· Domain allow-list
· No-egress mode
· Outbound proxy & logs
· Service-mesh mTLS
· Rate-limit per host
Workspace State
Per-action persistence
· Scratch FS per run
· Persistent volumes
· Snapshot & restore
· Worktree isolation (git)
· Auto-cleanup TTL
Secrets & Credentials
Just-in-time injection
· Vault / KMS / HSM
· Short-lived tokens
· OAuth on-behalf-of
· Rotated · scoped
· Memory-only injection
Concurrency & Pooling
Throughput & warm starts
· Sandbox warm pool
· Per-env concurrency cap
· Connection pooling
· Backpressure signals
· Cold-start optimization
E · Side-Effect Capture & Causality Tracking
Record exactly what changed in the world — for review, replay, and rollback
Effect Recorder
What did it do?
· Before / after diff
· Resource IDs touched
· DOM mutations · API calls
· FS writes · DB rows
· Network egress log
Causality & Lineage
Why did it happen?
· Trigger trace_id
· Plan step → action map
· Agent attribution
· Approval evidence
· Causal chain graph
Receipts & Evidence
Tamper-evident proof
· Signed action receipt
· Hash-chained log
· External system IDs
· Screenshot · DOM snapshot
· External provider receipt
Streaming Output
Live progress to user
· Stdout / stderr stream
· Progress events
· Partial-result emit
· Cancel signal listener
· Live screen capture
Output Sanitization
Make output safe
· PII / secret redact
· Size truncation
· Trust label tagging
· Schema-conformant
· Encoding normalize
Notification Hook
Tell who needs to know
· Action-completed event
· Failure / alert webhook
· Audit topic publish
· User receipt UI
· Status-page hook
F · Reversibility, Compensation & Recovery
Plan for "undo" before you act — rollback, compensate, or escalate when the world doesn't cooperate
Compensation Registry
"Undo" recipes per action
· Inverse-action mapping
· Soft-delete patterns
· Restore-from-snapshot
· Manual-undo runbooks
Saga Coordinator
Multi-step transactions
· Forward + compensate steps
· Per-step idempotency
· Failure → cascade undo
· Temporal · Cadence engines
Snapshot & Rollback
Time-travel state
· FS / VM snapshots
· DB point-in-time recovery
· Git revert · branch
· Worktree restore
Failure Classifier
What went wrong?
· Retryable transient
· Permanent / policy reject
· Partial-success / dangling
· Decide retry / undo / abort
Retry & Backoff
Recover gracefully
· Exponential · jittered
· Idempotency-key reuse
· Circuit breaker per env
· Poison-message DLQ
Escalation Path
When automation isn't enough
· Page on-call
· Open ticket / runbook
· Pause & ask user
· Manual-step inventory
G · Safety Interlocks & Hard Stops
Independent safety controls that no agent can override
Hard Limits
Forbidden ops list
Geo · sector · scope blocks
Never auto-allow
Kill-Switch
Stop all actions
Per-agent / per-env / global
Operator-controlled
Velocity Caps
Per-min / hour rate
Anomaly auto-pause
Spike detection
Tripwires
Auto-trigger conditions
Honeypot resources
Forbidden domain hits
Manual Override
Operator pause / cancel
Approve · reject · modify
Real-time intervention
Physical E-Stops
For robotics / IoT
Hardware interlocks
Geo-fence violations
H · Observability, Audit & Forensics
Every action is traced, signed, and replayable — for debugging, compliance, and incident response
Action Tracing
OTel spans per call
env · agent · trace_id
Latency / cost meters
Audit Log (signed)
Hash-chained · immutable
Who · what · when · why
Compliance evidence
Replay & Forensics
Reconstruct any run
Recorded I/O · screen
Counterfactual replay
Anomaly Detection
Drift · spikes · errors
Per-env baselines
Auto-quarantine triggers
Cost & SLO Tracking
Per-env $ · success rate
Error budget burn
Top-spender attribution
User Receipt UI
Show what was done
Effect timeline
Undo / inspect controls
⇣ Outbound — Action Outcome to Layer 3 (Orchestrator) / Layer 10 (Reflection) / Layer 11 (Governance)
Status · effect record · receipt · compensation handle · trace · cost · escalation flags — a complete account of what happened
Status
success · partial · failed · refused
Effect Record
Resources changed · diffs
Signed Receipt
Hash · external IDs · proof
Compensation Handle
Inverse-action token
Trace & Telemetry
Spans · cost · latency
Escalations
HITL · alerts · pages
All outcomes are signed, traced, and reversible (or marked as one-way) — never silently applied.
Cross-cutting Approval, Audit & Reversibility
Cross-cutting Approval, Audit & Reversibility
Pre-Flight / Safety
Effector / Dispatch
Environment Targets
Isolation / Sandbox
Side-Effect Capture
Reversibility
Observability
Forward action flow
Effect on environment
Rollback / hard-stop loop
Outcome / approval return
Detailed view of Layer 9 — Action & Environment Interface from the Agentic AI System Architecture reference.
Every action passes a pre-flight gate, runs through a unified Effector into a sandboxed environment adapter, captures its side-effects with signed receipts, and emits an outcome bundle plus a compensation handle. Independent safety interlocks and observability surround the whole pipeline so no change to the world is silent or irreversible.
Layer 10 Reflection, Evaluation & Continual Learning
Agentic AI System Architecture › Layer 10 Detail
Reflection, Evaluation & Continual Learning
The closed-loop self-improvement layer — collect trajectories, evaluate quality, reflect on lessons, distill skills, run benchmarks, retrain, and ship improvements safely back into prompts, models, and tools.
Detailed Diagram · v1.0 · 2026
⇣ Inbound — Signals from Every Layer (1–9 & 11)
trajectories · tool I/O · effects & receipts · user feedback · ratings · escalations · safety violations · cost / latency · audit events
{ run_id, trace, plan, prompts, tools[], outputs, effects[], feedback{thumbs, edits, regenerate}, slo, errors[] }
A · Trajectory & Feedback Collection
Build the canonical record of every agent run — the raw material for every downstream improvement
Trace Ingestor
Stream every run
· OTel spans · structured logs
· LangSmith / Langfuse
· Per-step metadata
· Tool I/O · model calls
· Replay-safe serialization
Explicit Feedback
User-stated signal
· 👍 / 👎 votes
· Star / scale ratings
· Free-text comments
· Survey forms
· Bug / issue reports
Implicit Signals
Behavior tells the story
· Edit / accept rate
· Regenerate clicks
· Abandonment · stop
· Dwell · scroll · revisit
· Follow-up question rate
System Outcomes
Did it actually work?
· Test pass / fail
· Tool error rates
· Goal attainment
· Side-effect reversals
· HITL approval rates
Cost / SLO Telemetry
Operational fitness
· Tokens · $ per run
· Latency P50 / P95
· Cache hit-rate
· Tool retry counts
· Error budget burn
Trajectory Store
Durable archive
· Object storage (S3)
· Indexed (vector + KV)
· PII-redacted variant
· Versioned · TTL'd
· Replay-able
B · Evaluation Engine — Scoring Trajectories
Multiple scoring strategies — programmatic, model-based, human — combined into a final quality signal
LLM-as-Judge
Model-based scoring
· Pairwise comparison
· Rubric scoring (1–5)
· Single / multi-judge
· Calibration vs humans
· Bias mitigation
· Reasoning traces logged
Programmatic Verifiers
Ground-truth checks
· Unit / integration tests
· Property-based checks
· Schema · invariants
· Numeric / string equality
· Constraint satisfaction
· Linters · formatters
Reward Models
Learned quality scorer
· Outcome reward (ORM)
· Process reward (PRM)
· Per-step reward dense
· Trained from prefs
· Calibration audit
· Reward-hacking probes
Faithfulness & Safety Eval
Truthful & safe
· NLI / entailment grader
· Citation grounding check
· Hallucination detector
· Toxicity / harm classifier
· Jailbreak resistance
· PII leakage probe
Human Annotation
Gold-standard labels
· Expert review queue
· Pairwise preferences
· Inter-rater agreement
· Active learning (uncertain)
· SME consult for domain
· Calibrate LLM judges
Score Aggregator
Combine signals
· Weighted ensemble
· Pass / fail thresholds
· Confidence intervals
· Per-dimension breakdown
· Outlier flagging
· Trend tracking
C · Self-Reflection & Learning In-the-Loop
Mid-run improvement: critique, revise, and capture lessons that transfer to the next attempt
Reflexion / Self-Refine
Critique then retry
· Generator + critic split
· Verbal-feedback memory
· N-round revision loop
· Stop on quality plateau
· Reduces hallucination
Lessons-Learned Capture
Episode → insight
· "What went wrong" notes
· Retry strategy hints
· Failure-mode taxonomy
· Stored in episodic memory
· Retrieved next attempt
Inner-Monologue Critic
Built-in challenge
· "Does this look right?"
· Confidence calibration
· Self-consistency vote
· Pre-commit review
· Confess uncertainty
D · Skill & Recipe Distillation
Promote successful patterns into reusable, named, versioned skills
Pattern Miner
Find recurring success
· Cluster successful runs
· Extract common steps
· Identify pre/post conditions
· Tool-sequence subgraphs
· Cost / latency profile
Voyager-style Skills
Self-grown library
· Auto-curated repertoire
· Compositional reuse
· Trigger-condition tags
· Refactor on improvement
· Personal & org-shared
Skill Promotion
Validate & publish
· Eval gate (offline)
· Canary in production
· Versioned · signed
· Push to skill registry
· Auto-deprecate weaker
E · Eval Harness — Offline, Online & Capability
A continuous quality bar — golden datasets, A/B tests, regressions, capability suites, and red-team evals
Golden Datasets
Source of truth
· Curated Q&A · scenarios
· Domain-specific suites
· Synthetic + real mix
· Edge-case coverage
· Adversarial probes
· Versioned · maintained
Offline Benchmarks
Repeatable scoring
· Public (MMLU · HELM · BIG)
· Agent (SWE-bench · WebArena)
· Tool-use evals (BFCL)
· Custom domain suites
· Cost / latency budgets
· Statistical significance
Online A/B & Shadow
Live experimentation
· % traffic split
· Shadow / dark-launch
· Power analysis
· Guardrail metrics
· Auto-stop on regression
· Holdout cohorts
Regression & CI Eval
No silent quality drops
· Per-PR eval gate
· Snapshot diff vs baseline
· Per-dimension regression
· Win/loss flake guard
· Eval flakiness tracker
· Weekly trend reports
Capability & Red-Team
Push limits safely
· Dangerous-capability eval
· Jailbreak / adversarial
· Bias / fairness audits
· Privacy / leakage probes
· RSP / ASL gating
· Scheduled red-team runs
Eval Reports
Decision-grade artifacts
· Dashboards · scorecards
· Per-cohort breakdown
· Failure-case galleries
· Approval evidence packs
· Release-readiness signal
· Distributed to stakeholders
F · Reflection & Improvement Hub — The Closed-Loop Engine
Synthesize scored trajectories into prioritized improvements: prompts, data, models, tools, policies
Improvement Synthesizer
Triage failures · cluster · prioritize · propose intervention · track to completion
CLUSTER
DIAGNOSE
PROPOSE
PRIORITIZE
SHIP
Failure Triage
Cluster · taxonomy · root cause
Bucket by error type
Intervention Picker
Prompt · data · model · tool · policy
Choose the right lever
Backlog & Tracker
Prioritized improvement queue
Owner · due-date · impact
Closure Verification
Re-eval after fix lands
Confirm metric moved
G · Continual Training & Adaptation
Turn production trajectories into better prompts, fine-tunes, and reward signals
Trace Mining & Curation
From logs → datasets
· Extract good trajectories
· Rejection sampling
· De-dup & balance
· PII scrub before training
· Synthetic augmentation
· Consent & opt-in only
Prompt Optimization
Cheap, fast wins
· Auto-prompt search
· DSPy compilation
· Few-shot exemplar mining
· Persona / system tuning
· Skill / SKILL.md updates
· Rubric-guided rewriting
Supervised Fine-Tune
SFT on curated data
· Instruction tuning
· Tool-use distillation
· Rejection-sampled SFT
· LoRA / QLoRA adapters
· Per-tenant adapters
· Curriculum staging
Preference Optimization
Align to human / AI prefs
· DPO · IPO · KTO
· RLHF · PPO
· RLAIF (Constitutional AI)
· GRPO · process rewards
· Reward-model training
· KL-controlled updates
Tool / Retrieval Tuning
Beyond the model
· Embedding fine-tunes
· Re-ranker training
· Tool-spec rewrites
· Chunking strategy tuning
· Skill cards optimization
· Router / cascade tuning
Self-Play / Synthetic
Augment scarce data
· Self-play trajectories
· LLM-generated tasks
· Adversarial generation
· Verifier-filtered output
· Counterfactual replays
· Quality watermarking
H · Quality Control & Learning Safety
Don't make the system worse — guard against drift, forgetting, poisoning, and reward hacking
Drift Detection
Distribution shift
Input · output drift
Auto-alert & pause
Catastrophic-Forget Guard
Don't lose old skills
Replay buffer
EWC / KL constraints
Reward Hacking Probes
Specification gaming
Multi-judge cross-check
Process & outcome both
Data-Poisoning Defense
Untrusted-trace screening
Anomaly filtering
Provenance required
Eval Gating
No-regress release rule
Capability + safety pass
Rollback on fail
Privacy & Consent
Train-on-data flags
DP / k-anonymity
RTBF cascade
I · Safe Deployment & Rollout
Ship improvements with the same rigor as code — versioned, canaried, monitored, reversible
Versioning & Registry
Prompts · models · skills
SemVer · signed artifacts
Provenance ledger
Canary Rollout
Gradual % traffic
Health-gate auto-promote
Auto-rollback on regression
Feature Flags
Per-tenant gates
Kill-switch flags
Dynamic config
Post-Deploy Monitor
Live metric watch
Quality · cost · safety
SLO breach → rollback
Change Log & Audit
What changed · who · when
Eval-evidence linked
Reviewer sign-off
User Communication
Release notes
Behavior-change alerts
Known-issue digest
⇣ Outbound — Improvements Pushed Back into the Stack
Updated artifacts deployed across every layer they affect — closing the agentic improvement loop
→ Layer 4 · Reasoning
Adapters · prompts · model versions
→ Layer 3 · Orchestrator
Plan templates · routing rules
→ Layer 5 · Memory
Lessons · skills · profiles
→ Layer 6 · Tools / Skills
New & revised skill cards
→ Layer 7 · RAG
Embeddings · re-rankers · chunking
→ Layer 11 · Governance
Eval evidence · safety reports
Every push is versioned, eval-gated, canaried, and reversible — no silent drift.
Cross-cutting Eval, Safety & Lifecycle
Cross-cutting Eval, Safety & Lifecycle
Collection / Hub
Evaluation
Reflection
Skill Distillation / Deploy
Eval Harness
Continual Training
Quality / Safety
Forward learning flow
Deployed improvement
Closed-loop / reflection
Detailed view of Layer 10 — Reflection, Evaluation & Continual Learning from the Agentic AI System Architecture reference.
Trajectories and feedback flow in from every layer, are scored by a multi-method evaluation engine, fuel mid-run reflection and offline distillation, run through a comprehensive eval harness, and converge on an Improvement Synthesizer that triages failures into prioritized interventions. Continual-training and safe-deployment pipelines push versioned, canaried, eval-gated improvements back across the stack — closing the agentic loop.
Layer 11 Safety, Governance, Trust & Observability
Agentic AI System Architecture › Layer 11 Detail
Safety, Governance, Trust & Observability
The cross-cutting control plane that wraps every other layer — guardrails, identity, policy-as-code, compliance, observability, red-teaming, and incident response — making the agent system safe, accountable, and operable in production.
Detailed Diagram · v1.0 · 2026
⇄ Cross-Cutting Signals — Wraps Layers 1-10 (every request, action, and effect crosses this layer)
requests · responses · tool calls · effects · trajectories · feedback · cost · errors · audit events · safety incidents
L1 User · L2 Perception · L3 Orchestrator · L4 Reasoning · L5 Memory · L6 Tools · L7 RAG · L8 Multi-Agent · L9 Action · L10 Reflection
A · Guardrails — Input, Output & Behavior Filters
Defend the model and the user — block harmful inputs and unsafe outputs at every boundary
Input Filters
First line of defense
· Toxicity · hate · violence
· CSAM hash matching
· Self-harm classifier
· Dangerous-cap probe
· Schema / size limits
· Bidi / homoglyph guard
Prompt-Injection Defense
Trust-boundary enforcement
· Quarantine tool results
· No-instruction-follow rule
· Spotlighting / delimiters
· Hidden-text decoder
· Multimodal probe
· Confirm sensitive ops
PII / DLP
Detect & redact
· Names · IDs · phones
· Cards · accounts · keys
· Health / financial data
· Tokenization vault
· Differential privacy
· Egress DLP scan
Output Filters
Last-mile safety
· Refusal classifier
· Hallucination probes
· Schema-conformant
· Watermarking · C2PA
· Content tags
· Bias / fairness checks
Behavioral Guardrails
Constrain agent action
· Topic allow / deny
· Persona & tone constraints
· Refusal templates
· Off-task detection
· Loop / runaway breaker
· Action allow-list scope
Frameworks
Standardized stacks
· NeMo Guardrails · LMRails
· Llama Guard · Granite
· Azure Content Safety
· OpenAI Moderation
· Custom rules engine
· Versioned · A/B-tested
B · Identity, Access & Secrets
Verify who is acting, what they're allowed to do, and protect every credential along the way
Authentication
Prove identity
· SSO · OAuth · OIDC · SAML
· Passkeys · MFA · step-up
· Service-account / SPIFFE
· Workload identity
· Token refresh / revoke
Authorization
Decide what's allowed
· RBAC · ABAC · ReBAC
· Scopes · entitlements
· Tool allow / deny lists
· Tenant / project isolation
· Delegated / on-behalf-of
Agent Identity
First-class principals
· Cryptographic agent ID
· Capability cards (signed)
· Agent provenance ledger
· Per-agent token scopes
· Sub-agent attenuation
Secrets & Keys
JIT, scoped, rotated
· HashiCorp Vault
· KMS / HSM-managed keys
· OAuth token exchange
· Short-lived tokens
· Memory-only injection
Network & Boundary
Zero-trust transport
· mTLS service-to-service
· Service mesh (Istio · Linkerd)
· Egress proxy & allow-list
· Private VPC · sovereign cloud
· WAF / DDoS shield
Encryption
Data confidentiality
· At-rest (AES-GCM)
· In-transit (TLS 1.3)
· In-use (confidential VM)
· BYOK / HYOK
· Per-tenant key isolation
C · Policy-as-Code & Action Gating
Encode rules once, enforce them everywhere — versioned, auditable, testable
Policy Engine
Decision point
· Open Policy Agent (OPA)
· Cedar · Rego rules
· Versioned · signed bundles
· Tenant overrides
Action Approval
Gate risky operations
· Auto · HITL · admin
· Two-person rule
· Step-up auth
· Signed approval token
Risk Classifier
How dangerous is this?
· Reversible vs destructive
· Blast radius estimate
· Data sensitivity tier
· Money / identity impact
Quotas & Rate Limits
Bound consumption
· Token / $ caps
· Per-tenant quotas
· Velocity / spike caps
· Fair-share scheduling
Policy Authoring & Test
Treat policy as code
· Code review · CI tests
· Canary policy rollout
· Counterfactual evaluation
· Rollback on regression
Decision Log
Why allowed / denied
· Per-decision evidence
· Rule version applied
· Counterexample queries
· Appeals workflow
D · Compliance, Audit & Regulatory Mapping
Demonstrate trustworthy operation to regulators, auditors, and customers — with evidence
Regulatory Mapping
Frameworks & standards
· EU AI Act · NIST AI RMF
· ISO/IEC 42001 (AI MS)
· GDPR · CCPA / CPRA
· HIPAA · PCI · SOX
· SOC 2 · ISO 27001
Audit Log
Tamper-evident record
· Hash-chained · signed
· Immutable storage (WORM)
· Who · what · when · why
· Cross-layer correlation
· Long-term retention
Data Residency & Sovereignty
Where data lives matters
· EU · US · APAC pinning
· Sovereign-cloud routing
· Cross-border guards
· On-prem deployment
· Air-gapped envs
DSAR & Subject Rights
User data rights
· Access · export · portability
· Right-to-be-forgotten
· Cascade across stores
· Self-service portal
· Erasure receipts
AI-Specific Disclosures
Be transparent
· Model cards · system cards
· Datasheets · data lineage
· "Talking to AI" disclosure
· Synthetic-content labeling
· Risk-tier reporting
Evidence & Reporting
Audit-ready exports
· Auto-generated evidence
· Control-mapping matrix
· Regulator-ready packs
· Drata · Vanta · Secureframe
· Customer trust portal
E · Trust & Safety Operations Hub
Central console where humans monitor, intervene, investigate, and escalate
Trust & Safety Console
Live dashboards · alerts · approvals · investigations · kill-switches
DETECT
TRIAGE
CONTAIN
RECOVER
REPORT
SOC for AI
24/7 monitoring · on-call
Alert routing & escalation
Approval / HITL Inbox
Pending high-risk actions
SLA-driven decisions
Kill-Switch Console
Per-agent · env · global
Operator-only authority
Investigation Workbench
Replay · search · evidence
Forensic timeline
F · Observability — Tracing, Metrics, Logs & Cost
See everything the agent does — with replay, attribution, and SLO accountability
Distributed Tracing
End-to-end view
· OpenTelemetry spans
· LangSmith · Langfuse
· Helicone · Phoenix · W&B
· Per-step / tool / agent
· Conversation graph view
· Cross-layer trace_id
Metrics & SLOs
Quantified health
· Latency P50 / P95 / P99
· Success / abandon rate
· Tool error rate
· Cache hit-rate
· SLO & error-budget burn
· Prometheus · Datadog
Structured Logging
Forensic detail
· Event-sourced runs
· Per-step input / output
· Tool I/O recorded
· PII-redacted variant
· Search · query · alert
· Retention policy
Cost Observability
$ accountability
· Tokens · $ per call / run
· Per-tenant chargeback
· Top-spender attribution
· Budget burn dashboards
· Cache-hit savings
· Cost-anomaly alerts
Replay & Time-Travel
Reconstruct any run
· Recorded I/O
· Counterfactual debug
· Step-through inspector
· Screenshot / DOM cap
· Reproducible re-runs
· Diff vs golden
Anomaly & Alerting
Catch issues fast
· Drift detection
· Tool-error spikes
· Refusal-rate jumps
· Cost / latency spikes
· Auto-page on-call
· PagerDuty · OpsGenie
G · Red-Team & Capability Gating
Stress-test the system before adversaries do — and gate dangerous capabilities responsibly
Adversarial Red-Team
Find the failure modes
· Jailbreak attempts
· Prompt-injection probes
· Tool-abuse scenarios
· Multi-step exploit chains
· Continuous + scheduled
Capability Evals
Measure dangerous skills
· CBRN · cyber · autonomy
· Persuasion · manipulation
· Self-replication probes
· Long-horizon planning
· Independent evals
RSP / ASL Gating
Tiered release controls
· Responsible Scaling Policy
· AI Safety Levels (ASL)
· Deployment thresholds
· If/then commitments
· Public reporting
H · Model & Tool Lifecycle · Incident Response
Every AI artifact is versioned, monitored, and recoverable — drills keep the team sharp
Model / Tool Governance
Versioned & controlled
· Model registry · cards
· Tool allow-list / deny
· Canary · rollback
· Deprecation policy
· Provenance ledger
Incident Response
When things go wrong
· Runbooks & on-call
· Containment · isolate agent
· User & regulator notify
· Root-cause analysis
· Blameless post-mortem
Drills & Game-Days
Practice for crises
· Chaos exercises
· Tabletop simulations
· Kill-switch drill
· DSAR rehearsal
· Recovery time targets
I · Transparency, Explainability & User Trust
Help users understand what the agent did and give them meaningful control
Decision Explanations
Why did it do that?
· Reasoning trace UI
· Tool-call timeline
· Citation panels
· Confidence indicators
User Controls
Stay in charge
· Memory opt-in / opt-out
· Train-on-data flags
· Tool / scope toggles
· Cancel · undo · redo
Disclosures & Receipts
Set expectations
· "AI-generated" labels
· Action receipts
· Limitations notice
· Customer trust portal
⇄ Enforcement & Signals to Every Layer
Governance is bidirectional — signals collected from layers, enforcement decisions sent back
→ L3 Orchestrator
allow / deny decisions
HITL · risk class
→ L4 Reasoning
model allow-list
capability flags
→ L6 Tools
tool allow / deny
scope & quota
→ L9 Action
approval tokens
kill-switch state
→ L5 / L7 Memory · RAG
ACL · DSAR · TTL
residency rules
→ L10 Reflection
eval gates
release approval
⇣ External Outputs — Stakeholders, Regulators & Public Trust
Audit packs · model / system cards · transparency reports · safety disclosures · DSAR fulfillment · breach notifications · customer trust portal
Cross-cutting · Wraps All Layers (1–10)
Cross-cutting · Wraps All Layers (1–10)
Guardrails
Identity / Transparency
Policy
Compliance
Trust Hub / Observability
Red-Team / Capability
Lifecycle / Incident Response
Forward governance flow
Enforcement / disclosure
Live override / kill-switch
Detailed view of Layer 11 — Safety, Governance, Trust & Observability from the Agentic AI System Architecture reference.
This layer is cross-cutting: it wraps Layers 1–10. Signals from every layer flow in; guardrails, identity, policy, compliance, observability, red-teaming, lifecycle, and transparency controls flow out as enforcement decisions, audit evidence, and stakeholder disclosures. The Trust & Safety Hub provides a live console for humans to detect, contain, and recover from incidents — and the kill-switch path lets operators stop the system at any time.
Layer 12 Infrastructure & Platform
Agentic AI System Architecture › Layer 12 Detail
Infrastructure & Platform
The substrate beneath every agent — compute, accelerators, model serving, runtimes, storage, networking, deployment topologies, and the SRE / FinOps machinery that keeps it all running reliably, securely, and economically at scale.
Detailed Diagram · v1.0 · 2026
⇣ Workload Demand — Every Other Layer Runs on This Substrate
model inference · agent runs · tool execution · vector / graph queries · multi-agent coordination · evaluation jobs · training
L1–L11 workloads · sync / async · batch / stream · long-running · global / regional · per-tenant SLOs
A · Compute & Accelerator Fleet
A heterogeneous, capacity-managed fleet — right hardware for training, inference, agents, and tools
NVIDIA GPUs
Workhorse for training & inference
· H100 · H200 · B100 · B200
· GB200 NVL72 racks
· NVLink · NVSwitch fabric
· FP8 / FP16 / BF16
· MIG partitioning
· CUDA · cuDNN · NCCL
Cloud Accelerators
Hyperscaler-native silicon
· Google TPU v5p · v6 (Trillium)
· AWS Trainium · Inferentia
· Azure Maia · Cobalt
· AMD MI300X / MI350
· Intel Gaudi 3
· OCI / Lambda / CoreWeave
Specialty Accelerators
Ultra-low-latency inference
· Groq LPU
· Cerebras WSE-3
· SambaNova SN40L
· Tenstorrent Wormhole
· d-Matrix · etched.ai
· FPGA / ASIC fast paths
CPU & General Compute
Agents · tools · orchestration
· x86 · ARM (Graviton · Ampere)
· High-mem · high-CPU SKUs
· Spot / preemptible
· Confidential VMs (SEV · TDX)
· Burstable instances
· Dedicated tenancy
Edge & Device
On-device inference
· Apple Neural Engine
· Qualcomm Hexagon NPU
· NVIDIA Jetson · Orin
· Coral TPU · Hailo
· WebGPU / WASM
· Quantized SLM models
Capacity Management
Right-size, right-time
· Reservations · commitments
· Spot · preemptible mix
· Cluster autoscaler
· Multi-cloud burst
· Per-tenant quotas
· Forecast-driven planning
B · Model Serving, Training & ML Platform
From research to production — high-throughput inference, distributed training, and the MLOps glue around them
Inference Servers
High throughput · low TTFT
· vLLM · SGLang
· TensorRT-LLM · TGI
· Triton Inference Server
· llama.cpp · MLX (edge)
· Continuous batching
Hosted Model APIs
Provider-managed
· Anthropic · OpenAI
· Google · Mistral · xAI
· Bedrock · Vertex AI
· Azure OpenAI Service
· Together · Fireworks · Replicate
Distributed Training
Pre-train · fine-tune · DPO
· PyTorch · JAX · DeepSpeed
· Megatron · NeMo · Axolotl
· FSDP · ZeRO · TP / PP / EP
· Slurm · Ray · Kubeflow
· Checkpoint · resume
Compiler & Kernels
Squeeze every flop
· FlashAttention 3 · FA-decoder
· Triton · CUDA · ROCm
· torch.compile · XLA
· TVM · Mojo · IREE
· FP8 / INT4 GEMM
Model Registry & MLOps
Lifecycle of every artifact
· MLflow · W&B · Comet
· Hugging Face Hub
· Versioning · provenance
· Approval · canary · rollback
· Signed artifacts (SLSA)
Optimization & Deploy
From weights to traffic
· Quantize · prune · distill
· AWQ · GPTQ · SmoothQuant
· Speculative decoding
· Multi-tenant serving
· Cold-start optimization
C · Agent & Workflow Runtimes
Stateful execution engines that drive long-running, resumable agent loops
Agent Frameworks
Build & run agents
· Anthropic Agent SDK
· LangGraph · LangChain
· CrewAI · AutoGen · Magentic
· LlamaIndex · Haystack
Durable Execution
Replay-safe orchestration
· Temporal · Cadence
· Restate · Inngest
· DBOS · Trigger.dev
· Long-running runs
Distributed Compute
Map-reduce / actors
· Ray · Ray Serve
· Spark · Dask · Modal
· Akka · Erlang OTP
· Dapr · Restate
Sandbox & Tool Runtime
Per-action isolation
· Firecracker · Kata · gVisor
· WASM · WASI runtimes
· E2B · Daytona · CodeSandbox
· Browser-tab VMs
MCP Hosting
Tool-server platform
· Local stdio servers
· Remote SSE / WebSocket
· Multi-tenant gateway
· Capability negotiation
Schedulers / Queues
Async & cron
· Celery · Sidekiq · BullMQ
· Argo Workflows · Airflow
· Kubernetes Jobs · CronJobs
· Priority & fair-share
D · Container, Orchestration & Cluster Platform
The unified runtime — schedule, isolate, scale, and recover every workload
Kubernetes & Container Platform
Schedules pods · GPU operator · autoscaling · service discovery · secrets · multi-tenant namespaces
SCHEDULE
SCALE
ISOLATE
HEAL
UPGRADE
Container Runtimes
containerd · CRI-O · Docker
OCI images · BuildKit · Buildpacks
GPU / Accelerator Operators
NVIDIA GPU operator · device plugins
Topology-aware · MIG · time-slicing
Autoscaling
HPA · VPA · KEDA · Karpenter
Predictive · queue-driven scale
Multi-Tenant Isolation
Namespaces · NetworkPolicy · OPA
vCluster · gVisor / Kata sandboxing
E · Storage & Data Platform
Polyglot persistence — choose the right database for each agent workload
Object & Block
Bulk & durable
· S3 · GCS · Azure Blob · R2
· EBS · Persistent Disk
· MinIO · Ceph (on-prem)
· Lifecycle · tiering · glacier
· Object lock · WORM
Vector Stores
Semantic search
· pgvector · Pinecone
· Weaviate · Qdrant
· Milvus · Vespa · Turbopuffer
· LanceDB · ChromaDB
· HNSW · IVF · DiskANN
Knowledge Graphs
Relations & paths
· Neo4j · ArangoDB
· Memgraph · NebulaGraph
· TigerGraph · Amazon Neptune
· RDF · SPARQL stores
· Property + temporal edges
OLTP / OLAP
Transactional & analytical
· Postgres · MySQL · Aurora
· Spanner · CockroachDB
· Snowflake · BigQuery
· Databricks · Iceberg
· DuckDB · ClickHouse
KV / Cache / Doc
Hot & flexible
· Redis · KeyDB · Dragonfly
· DynamoDB · Cosmos · Bigtable
· MongoDB · Couchbase · Firestore
· Elastic · OpenSearch
· Memcached · Hazelcast
Time-Series & Stream
Append-only timelines
· TimescaleDB · InfluxDB
· QuestDB · Prometheus TSDB
· Event-sourced state
· CDC streams (Debezium)
· Replay-able trajectories
F · Networking, Messaging & Edge
Move bytes safely and quickly — between users, services, agents, and tools
Edge & CDN
Closer to users
· Cloudflare · Fastly · Akamai
· AWS CloudFront · GCP CDN
· Edge functions / Workers
· WAF · DDoS shield
· Bot / abuse detection
Load Balancing & Ingress
Route & balance
· L4 / L7 LBs · Envoy
· NGINX · HAProxy · Traefik
· K8s Ingress / Gateway API
· Sticky sessions · health
· Global anycast
Service Mesh
Zero-trust east-west
· Istio · Linkerd · Cilium
· mTLS service-to-service
· Retries · timeouts · circuit
· Traffic shifting · canary
· eBPF data plane
RPC & Streaming
Inter-service calls
· gRPC · Connect · Twirp
· REST / OpenAPI
· GraphQL Federation
· Server-sent events (SSE)
· WebSocket · WebRTC
Event Bus / Queueing
Async & pub/sub
· Kafka · Redpanda
· NATS · Pulsar · RabbitMQ
· AWS SQS / SNS / EventBridge
· GCP Pub/Sub · Azure Service Bus
· DLQ · ordered delivery
High-Perf Fabrics
Training / inference net
· InfiniBand · NDR / XDR
· RDMA · RoCE
· NVLink · NVSwitch
· UCX · NCCL · libfabric
· Topology-aware routing
G · Identity, Secrets & Platform Security
Workload identity, secrets, supply-chain integrity, and confidential compute
Workload Identity
Service-to-service auth
· SPIFFE / SPIRE
· Cloud IAM (IRSA · WIF)
· OIDC trust federation
· Per-pod / per-agent identity
· Short-lived certs
Secrets & Key Management
Centralized · rotated
· HashiCorp Vault · Infisical
· AWS / GCP / Azure KMS
· HSM · CloudHSM
· External Secrets Operator
· Just-in-time injection
Supply-Chain & Confidential
Trust the binaries
· SBOM · SLSA levels
· Sigstore · cosign signing
· Image scanning (Trivy · Snyk)
· Confidential VMs (SEV · TDX)
· TEE · attestation
H · Deployment, CI/CD & Infrastructure-as-Code
Reproducible, auditable, GitOps-driven delivery for every artifact in the stack
CI/CD Pipelines
Build · test · deploy
· GitHub Actions · GitLab CI
· Buildkite · CircleCI · Jenkins
· Eval gates · safety gates
· Reproducible builds
· Promotion across envs
GitOps & Continuous Delivery
Declarative · auditable
· Argo CD · Flux
· Helm · Kustomize
· Progressive delivery (Argo Rollouts)
· Canary · blue-green · feature flags
· Auto-rollback on regression
Infrastructure-as-Code
Codify the stack
· Terraform · OpenTofu
· Pulumi · Crossplane
· CDK · Bicep · ARM
· Policy as code (OPA · Sentinel)
· Drift detection · plan reviews
I · Deployment Topologies · Observability · SRE & FinOps
Where the stack runs, how to keep it up, and how to keep it affordable
Deployment Topologies
Where the stack lives
· Public cloud (AWS · GCP · Azure)
· On-prem · co-location
· Hybrid · private cloud
· Edge · device · air-gapped
· Sovereign cloud
Multi-Region & HA
Resilience & locality
· Active / active · A/P
· Cross-region replication
· Failover & DR drills
· Backup · point-in-time restore
· Data-residency routing
Observability Stack
Metrics · logs · traces
· OpenTelemetry collectors
· Prometheus · Grafana · Loki
· Datadog · New Relic · Honeycomb
· AI-specific (Langfuse · LangSmith)
· Profiling (pprof · Pyroscope)
SRE & Reliability
Run it like production
· SLO / SLI / error budgets
· On-call · runbooks
· Chaos engineering
· Post-mortems & lessons
· PagerDuty · OpsGenie
FinOps & Cost Control
$ accountability
· Token / GPU / $ meters
· Per-tenant chargeback
· Reserved + spot blending
· Anomaly & budget alerts
· Rightsizing recommendations
Sustainability
Energy & carbon-aware
· Carbon-aware scheduling
· PUE / WUE tracking
· Renewable-region routing
· Per-token energy meters
· Sustainability reporting
⇣ Platform Outputs — Capacity, SLOs, Cost & Compliance Evidence
SLO dashboards · capacity forecasts · cost & carbon reports · compliance evidence · DR & failover posture · supply-chain attestations
Cross-cutting Reliability, Cost & Sustainability
Cross-cutting Reliability, Cost & Sustainability
Compute & Accelerators
Serving / Identity
Runtimes / Deploy
Containers / K8s
Storage
Networking
SRE / FinOps / Sustainability
Stack dependency
Platform reports
Auto-scale / FinOps loop
Detailed view of Layer 12 — Infrastructure & Platform from the Agentic AI System Architecture reference.
Workloads from L1–L11 land on a heterogeneous compute fleet, are served via inference engines and agent runtimes, scheduled on Kubernetes, backed by polyglot storage, and connected through service-mesh networking. Identity, secrets, supply-chain integrity, deployment automation, and SRE / FinOps / sustainability practices keep the substrate trustworthy, available, and economical at scale.
Layer diagrams and decomposition in this document are original synthesis for teaching. The entries below anchor technical claims about agents, tools, retrieval, evaluation, safety, and platform practice in peer-reviewed research, official standards, and widely adopted specifications. Verify instrument texts and vendor docs before compliance or production design work.
Agentic reasoning, planning, and reflection
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. “ReAct: Synergizing Reasoning and Acting in Language Models.” arXiv preprint arXiv:2210.03629, 2022.https://arxiv.org/abs/2210.03629
Wei, J., Wang, X., Schuurmans, D., et al. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” Advances in Neural Information Processing Systems (NeurIPS) , 2022.https://arxiv.org/abs/2201.11903
Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., & Yao, S. “Reflexion: Language Agents with Verbal Reinforcement Learning.” arXiv preprint arXiv:2303.11366, 2023.https://arxiv.org/abs/2303.11366
Tool use, skills, and environment interfaces
Schick, T., Dwivedi-Yu, J., Dessì, R., et al. “Toolformer: Language Models Can Teach Themselves to Use Tools.” arXiv preprint arXiv:2302.04761, 2023.https://arxiv.org/abs/2302.04761
Anthropic et al. Model Context Protocol (MCP) — open standard for model-to-tool and model-to-data integration.https://modelcontextprotocol.io/ · Specification (GitHub)
Paranjape, B., Lundberg, S., Singh, S., et al. “ART: Automatic multi-step reasoning and tool-use for large language models.” arXiv preprint arXiv:2303.09014, 2023.https://arxiv.org/abs/2303.09014
Retrieval, knowledge, and memory architectures
Lewis, P., Perez, E., Piktus, A., et al. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” NeurIPS , 2020. arXiv:2005.11401.https://arxiv.org/abs/2005.11401
Packer, C., Fang, V., Patil, S. G., Lin, K., Stoica, I., & Gonzalez, J. E. “MemGPT: Towards LLMs as Operating Systems.” arXiv preprint arXiv:2310.08560, 2023.https://arxiv.org/abs/2310.08560
Gao, Y., Xiong, Y., Gao, X., et al. “Retrieval-Augmented Generation for Large Language Models: A Survey.” arXiv preprint arXiv:2312.10997, 2023–2024.https://arxiv.org/abs/2312.10997
Multi-agent systems and coordination
Wu, Q., Bansal, S., Zhang, J., et al. “AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation.” arXiv preprint arXiv:2308.08155, 2023.https://arxiv.org/abs/2308.08155
Hong, S., Zheng, X., Chen, J., et al. “MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework.” arXiv preprint arXiv:2308.00352, 2023.https://arxiv.org/abs/2308.00352
Wooldridge, M. J. An Introduction to MultiAgent Systems (2nd ed.). Wiley, 2009. — classical foundations for coordination, negotiation, and distributed decision-making.
Evaluation, benchmarks, and holistic model assessment
Liang, P., Bommasani, R., Lee, T., et al. “Holistic Evaluation of Language Models (HELM).” arXiv preprint arXiv:2211.09110, 2022.https://arxiv.org/abs/2211.09110
Hendrycks, D., Burns, C., Basart, S., et al. “Measuring Massive Multitask Language Understanding.” ICLR , 2021. arXiv:2009.03300.https://arxiv.org/abs/2009.03300
Security, abuse, and trustworthy deployment
OWASP Foundation. OWASP Top 10 for Large Language Model Applications .https://owasp.org/www-project-top-10-for-large-language-model-applications/
Greshake, K., Abdelnabi, S., Mishra, S., et al. “Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection.” arXiv preprint arXiv:2302.12173, 2023.https://arxiv.org/abs/2302.12173
Mitchell, M., et al. “Model Cards for Model Reporting.” FAT* , 2019. arXiv:1810.03993.https://arxiv.org/abs/1810.03993
Amodei, D., Olah, C., Steinhardt, J., et al. “Concrete Problems in AI Safety.” arXiv preprint arXiv:1606.06565, 2016.https://arxiv.org/abs/1606.06565
Regulation, risk management, and management-system standards
European Parliament and Council. Regulation (EU) 2024/1689 (Artificial Intelligence Act). EUR-Lex.https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689
NIST. Artificial Intelligence Risk Management Framework (AI RMF 1.0) , NIST AI 100-1, 2023.https://doi.org/10.6028/NIST.AI.100-1
NIST. Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile , NIST AI 600-1, 2024.NIST.AI.600-1 (PDF)
ISO/IEC JTC 1/SC 42. ISO/IEC 42001:2023 — Artificial intelligence — Management system.ISO store: ISO/IEC 42001:2023
Observability, platform economics, and supply chain
OpenTelemetry Project. OpenTelemetry Specification — vendor-neutral traces, metrics, logs (foundation for LLM/agent tracing).https://opentelemetry.io/docs/specs/otel/
FinOps Foundation. FinOps Framework — unit economics and cloud financial accountability (applies to inference and GPU spend).https://www.finops.org/framework/
OpenSSF. SLSA: Supply-chain Levels for Software Artifacts — integrity controls for build and release pipelines.https://slsa.dev/