Enterprise AI · workshops & implementation

The Enterprise AI Playbook

How I walk leadership teams from first pilots to a durable setup: shared platform, trustworthy data, governance people actually use, and metrics tied to the P&L.

Author · Linh Truong, MA (Harvard), MBA
Web · LinhTruong.com
Email · Linh@Alumni.Harvard.edu
Updated · May 2026

Contents

  1. Executive Summary
  2. The 2026 AI Landscape
  3. The AI Value Stack
  4. Strategic Framework (5 Horizons)
  5. Operating Model & Org Design
  6. Use-Case Portfolio & Prioritization
  7. Reference Architecture
  8. Data, RAG & Knowledge Foundations
  9. Agentic AI & Workflow Automation
  10. Build vs Buy vs Partner
  11. Governance, Risk & Compliance
  12. Change Management & Workforce
  13. Economics, ROI & FinOps
  14. 12-Month Implementation Roadmap
  15. KPIs & the AI Scorecard
  16. Failure Modes & Anti-Patterns
  17. 90-Day Quick-Start Plan
  18. Glossary & Further Reading
  19. Closing — The 7 Commitments
  20. References

1Executive Summary

AI is on most boards’ agendas; the gap is execution. The teams pulling away are not “trying the most”—they are serious about operating model, data quality, governance, and proof of value. This is the outline I use to teach that stack in one pass.

3–5×
ROI gap, leaders vs. laggards
70%
of value comes from process redesign, not the model
<15%
of pilots reach durable production without an operating model
faster time-to-value with a reusable AI platform
Insight 01

Treat AI as a portfolio

Balance quick-win productivity gains with re-engineered workflows and a few transformational bets. Don't pick only one horizon.

Insight 02

Platform > projects

A shared AI platform (models, data, evals, guardrails, observability) compounds. Project-by-project builds do not.

Insight 03

Governance is a moat

Trust accelerates adoption. The companies pulling ahead are over-investing in evals, red-teaming, and auditability.

Insight 04

People > tools

Skills, incentives, and process redesign drive 60–70% of measured value. Tooling is necessary but not sufficient.

Insight 05

Agents are the inflection

Single-turn copilots are 2024 thinking. 2026 winners are deploying supervised agentic workflows on real business processes.

Insight 06

Measure or stall

Without a scorecard tied to P&L, AI programs lose executive sponsorship in 12–18 months. Instrument from day one.

2The 2026 AI Landscape

Models keep improving, costs keep falling, and regulation is real. My shorthand for strategy: differentiation is workflow, data, and trust—not the raw foundation model.

Figure 2.1 — Forces shaping enterprise AI
Enterprise AI Strategy Capability Generalization Reasoning, multimodal, long-context models commoditize task quality. Cost Collapse Inference cost falling ~10× / 18 mo. Enables agentic, always-on workloads. Regulation Hardening EU AI Act, sectoral rules, audit duty. Governance becomes a moat. Agentic Workflows Multi-step tool use replaces single-shot prompts. Data & IP Leverage Proprietary data + workflows are the durable advantage. Workforce Shift Roles re-bundle around oversight, judgment, and orchestration.

What changed in the last 18 months

  • Frontier reasoning models close most "complex task" gaps with human experts on narrow domains.
  • Context windows long enough to ingest entire policy manuals or codebases in one call.
  • Tool use, code execution, and computer use enable real workflow automation.
  • Evaluation tooling matured — eval-driven development is now a real discipline.
  • EU AI Act enforcement live; sector regulators (finance, health, public sector) publish binding guidance.

What this means for the C-suite

  • Stop benchmarking models monthly — assume capability parity within 6 months across frontier vendors.
  • Invest where capability does not replicate: your data, your processes, your customer trust.
  • Move from "AI projects" to "AI products" with real owners, SLAs, and roadmaps.
  • Build an AI compliance posture before regulators force one onto you.
  • Set a workforce strategy now — re-skilling lead times are 12–24 months.

3The AI Value Stack

Most enterprises confuse layers and end up over-investing in models and under-investing in the layers that actually produce sustained value. Use this stack to audit where you are spending — and where you should.

Figure 3.1 — The 7-layer enterprise AI value stack
7 · Business Outcomes Revenue lift · cost-to-serve · cycle time · NPS · risk reduction 6 · Reimagined Workflows & Products End-to-end process redesign · agentic flows · new products 5 · Applications & Copilots Internal copilots · customer-facing assistants · embedded UX 4 · Orchestration & Agent Runtime Planning · tool use · memory · routing · guardrails · evals 3 · Foundation Models & Specialized Models Frontier LLMs · fine-tuned · vertical · embeddings · vision 2 · Data & Knowledge Foundations Lakehouse · feature store · vector store · lineage · contracts 1 · Infrastructure, Security & Identity Compute · networking · IAM · secrets · audit · key mgmt Strategy ← ─ value capture ─ → Foundations

Spend the right time at the right layer. A typical maturity-1 enterprise over-invests in layer 5 (apps) while neglecting layers 2 and 4 — which is why pilots stall.

4Strategic Framework — The 5 Horizons of AI Value

Successful programs don't pick "productivity" or "transformation" — they sequence across five horizons. Each horizon has a different sponsor, a different metric, and a different risk profile.

Figure 4.1 — The 5 Horizons of Enterprise AI Value
H1 Personal Productivity Months 0–6 Knowledge worker copilots, writing, research, code. Goal: adoption + literacy H2 Function Workflows Months 3–12 Embed AI in marketing, sales, support, finance. Goal: function P&L impact H3 Cross-Functional Months 9–24 Re-engineer order-to-cash, hire-to-retire, etc. Goal: end-to-end cycle-time win H4 Product & Customer Months 12–36 AI-native features in product and customer journeys. Goal: revenue + retention H5 Business Model Months 24+ New revenue lines, platform plays, data monetization. Goal: durable advantage Risk & ambition → → Strategic value
HorizonPrimary metricOwnerTypical investment mixRisk profile
H1Active weekly users, hours saved per userCIO / Chief of StaffTools, licenses, trainingLow — buy
H2Function P&L (cost, revenue, cycle time)Function head + AI leadIntegrations, prompts, light fine-tuningModerate — configure
H3End-to-end cycle time, error rateCOO / Process ownerWorkflow redesign, agents, data plumbingHigh — re-engineer
H4Activation, retention, ARPUCPO / GMCustom models, UX, data productsHigh — build
H5New revenue lines, platform GMVCEO / BoardM&A, ventures, R&DVery high — bet
Allocation rule of thumb: aim for roughly 50 / 30 / 15 / 5 across H1–H2 / H3 / H4 / H5 in Year 1, shifting to 30 / 35 / 25 / 10 by Year 3 as the platform matures.

5Operating Model & Org Design

The #1 reason corporate AI programs fail is not technology — it is the lack of a clear operating model. Pick one of four archetypes and align it to your strategy.

Figure 5.1 — Four operating model archetypes
Centralized "Center owns and delivers" CoE Best for: early stage, scarce talent Hub-and-Spoke "Platform + embedded squads" Hub BU BU BU Best for: most large enterprises Decentralized "BU's own everything" BU1 BU2 BU3 Best for: holding cos, conglomerates AI Product Org "AI as a product line" AI Product CEO Platform Apps Research Best for: AI-native business models

Roles every AI program needs

Chief AI Officer / Head of AI

Owns strategy, portfolio, and accountability to the board. Reports to CEO or COO, not CTO.

AI Platform Lead

Runs the shared platform (models, evals, vector store, guardrails, observability) as a product.

AI Product Managers

One per major use case. Own outcomes, not features.

Applied ML / AI Engineers

Build prompts, agents, RAG pipelines, fine-tunes. Hybrid SWE + ML skill set.

Data & Knowledge Engineers

Own pipelines, contracts, embeddings, and lineage for AI workloads.

AI Risk & Responsible AI Lead

Owns policy, evals for safety/bias, audit, and regulator interface.

Change & Adoption Lead

Owns enablement, training, and behavior change inside the business.

Domain Champions ("AI Translators")

Embedded in each function — turn business problems into AI specifications.

AI Security Engineer

Threat models prompts, agents, data exfil, and supply chain risk.

6Use-Case Portfolio & Prioritization

I rank initiatives like a portfolio: value, feasibility, strategic fit. The matrix below is the slide I reach for most often in those conversations.

Figure 6.1 — Value × Feasibility prioritization matrix
Business value → Feasibility (data, tech, change) → STRATEGIC BETS High value, hard — sponsor & sequence FLAGSHIP WINS High value, doable — go now DEPRIORITIZE QUICK WINS Low value, easy — for adoption / learning Customer service agent (Tier-1 deflection) Sales rep copilot & meeting summarization Marketing content + asset generation Underwriting / risk decisioning agents Drug discovery / R&D co-scientist Autonomous supply planning agent Meeting notes & transcript summarization Internal HR & IT helpdesk Q&A bot Code & doc search "AI for everything" novelty pilots

Reference use cases by function

FunctionHigh-ROI use casesTypical value lever
Customer ServiceAgent assist, tier-1 deflection, QA & coaching, voice agents−25–45% cost-to-serve, +CSAT
SalesRep copilot, account research, meeting prep, RFP automation+10–20% rep productivity, +win rate
MarketingContent generation, segmentation, personalization, SEO/SEM−40% content cost, +CTR/CVR
Product & EngineeringCoding agents, code review, test gen, doc gen, incident triage+15–35% dev throughput
FinanceInvoice processing, FP&A copilot, controls testing, narrative gen−30–50% manual effort
HR / PeopleRecruiting screen, onboarding agent, policy Q&A, performance prep−20–40% admin load
Legal / ComplianceContract review, clause extraction, regulatory mapping−40–60% review time
Operations / SCMDemand forecast, route optimization, planning agents−5–15% working capital
Risk / AuditAML/KYC triage, fraud signals, control narratives+detection, −false-positives
IT & SecurityL1 helpdesk, log triage, SOC copilot, vulnerability summarization−ticket volume, faster MTTR

Scoring template (use in every steering committee)

DimensionWeightScore 1–5Notes
Annualized value at stake ($)25%Net of run cost
Strategic fit (customer / moat)15%Tie to corporate strategy
Data readiness15%Available, clean, accessible
Technical feasibility15%Model / system maturity
Adoption / change difficulty15%Process & behavior change
Risk profile (reg / safety / brand)15%Inverse-scored

7Reference Architecture

Every enterprise AI estate ultimately converges on the same six layers. Build them as a platform, not as scattered project assets.

Figure 7.1 — Reference enterprise AI architecture
Experience Layer Web · Mobile · IDE · CRM/ERP embed · Slack/Teams · Voice · API · Email Orchestration & Agent Runtime Planner / Router model + task routing Tool Use / MCP connectors to systems Memory short / long / episodic Guardrails PII, policy, jailbreak Evals offline + online Observability traces · cost · latency Model Layer Frontier LLM(s) Small / open Fine-tuned / domain Embeddings Vision / Speech Predictive ML Knowledge & Retrieval Vector index Keyword / BM25 Knowledge graph Re-rankers Caches Doc/Chunk store Data & Knowledge Foundations Lakehouse Feature store Streaming / CDC Data contracts Lineage / catalog Master data Infrastructure, Security & Identity Compute (GPU/CPU) · Networking · IAM & SSO · Secrets · KMS · DLP · SIEM · Audit log · Region/data residency

Architecture principles

1 · Model-agnostic by default

Abstract behind a gateway. Assume you will rotate frontier models every 6–12 months.

2 · Retrieval before fine-tuning

RAG solves 80% of knowledge needs. Fine-tune only when style, latency, or cost demands it.

3 · Evals are part of CI/CD

No prompt or agent ships without an offline eval suite and online quality monitors.

4 · Human-in-the-loop is a feature

Design review, approve, and override paths into every high-stakes flow.

5 · Treat the platform as a product

SLAs, roadmap, on-call, internal "customers." Otherwise it rots.

6 · Cost & latency are first-class

Token budgets, caching, and model routing baked in from day one.

8Data, RAG & Knowledge Foundations

Your AI is only as good as the data and context it can reach. The single biggest determinant of pilot success is whether the right knowledge is retrievable at the right moment.

Figure 8.1 — Modern RAG pipeline (production-grade)
Ingest Sources SharePoint · CRM · Wiki Tickets · PDFs · DBs Process parse · OCR · clean chunk · enrich · ACL → versioned blobs Index Embeddings (dense) BM25 / keyword Knowledge graph Metadata + ACLs Multi-tenant, multi-lang versioned, replayable Retrieve Query rewrite Hybrid search Re-rank (cross-encoder) Diversify + dedupe ACL-filtered to caller Generate System prompt + policy Cited context Tool use if needed Output schema Citations · confidence Evaluate & Learn Faithfulness / groundedness Answer relevance Citation accuracy User feedback signals → feeds back to index & prompt

Data-readiness checklist

Foundational

  • Authoritative source-of-truth per domain identified.
  • Data catalog with ownership, sensitivity, lineage.
  • PII / PHI classification and redaction policies.
  • Access control aligned to identity (no shared service accounts).
  • Document conversion (PDF / scan / image) pipeline.

AI-specific

  • Chunking strategy versioned per content type.
  • Embedding model and dimension governed centrally.
  • Re-ranker in place for high-stakes flows.
  • Evaluation set per use case (≥ 200 representative queries).
  • Feedback capture (thumbs / edits) wired to retraining loop.

9Agentic AI & Workflow Automation

2026's most impactful AI deployments are not chatbots — they are supervised agents doing multi-step work inside real business processes. Treat them as junior employees: scope, train, supervise, audit.

Figure 9.1 — Agent autonomy ladder
L1 · Assistive Suggests, human acts L2 · Augmenting Drafts, human approves L3 · Supervised Agent Executes, human reviews L4 · Conditional Auto Auto within guardrails L5 · Autonomous Self-directed (rare today) Lower risk · faster to deploy Higher risk · stricter governance
Default to L2–L3 in 2026. The fastest, safest enterprise value is in supervised agents on bounded workflows. Reserve L4 for low-risk, reversible actions with strong evals. L5 is rarely appropriate for production today.

Anatomy of a production agent

Figure 9.2 — Anatomy of a supervised agent
Supervised Agent Goal & Policy scope · do/don't · tone Planner decompose · sequence Tools (MCP) CRM · ERP · search · code Memory short · long · user Guardrails PII · jailbreak · policy Critic / Self-check verify before act Human Checkpoint approval before write Audit Log prompt · tool · output Outcome metrics · cost · latency · success rate · escalation rate · user satisfaction

Where to deploy agents first

10Build vs Buy vs Partner

A simple rule: buy commodity, build differentiated, partner where the moat is someone else's. Re-evaluate every 12 months — the market moves fast.

LayerDefaultBuild when…Buy when…
Foundation modelsBuyYou have a regulated, narrow, latency-sensitive use caseDefault — frontier vendors will out-invest you
Inference / hostingBuyYou have data residency / sovereignty needsDefault
Vector store / RAG infraBuy → Build adaptersYou operate at >10⁸ vectors with strict SLAsDefault
Agent frameworkBuy thin, build domain logicFrameworks limit your control over evals/guardrailsStandard agent patterns
Evaluation & observabilityBuy + customizeUse cases need bespoke metricsDefault
Domain copilotsBuildWorkflow is your differentiatorFor commodity functions (e.g., IT helpdesk)
Customer-facing AI featuresBuildIt touches your product moatNever — outsourcing your moat is fatal
Governance / policy toolingBuy + integrateYou operate in heavily regulated sectorsDefault

11Governance, Risk & Compliance

Governance is not a brake — it's an accelerator. Teams with clear policy and tooling ship faster because they know what's allowed. Without it, every project re-litigates risk.

Figure 11.1 — AI risk taxonomy & controls
Model Risks • Hallucination • Bias / discrimination • Drift • Capability misuse • Prompt injection • Jailbreaks Controls: evals · red-team · guardrails grounding · escalation Data Risks • Sensitive data leakage • Copyright / IP • Consent / lawful basis • Cross-border transfer • Poisoning • Retention violations Controls: classification · DLP · ACLs redaction · residency Operational Risks • Cost / token blowups • Latency / availability • Vendor lock-in • Supply-chain compromise • Tool misuse by agents • Silent regression Controls: budgets · SLOs · multi-vendor canary · regression evals Compliance Risks • EU AI Act class • Sectoral rules (fin/health) • Transparency duty • Human oversight duty • Recordkeeping • Notification & appeal Controls: risk class · model cards audit log · DPIA Reputational / Ethical • Brand harm • Customer harm • Workforce trust • Environmental cost Controls: policy · ethics board disclosures · footprint

The Three Lines of Defense for AI

1st Line

Build teams

Product, engineering, and data teams own day-to-day risk: evals, guardrails, monitoring, and HITL design.

2nd Line

Risk & Responsible AI

Centralized policy, model risk management, red-teaming, classification, and approvals. Owns the AI register.

3rd Line

Internal Audit

Independent assurance. Tests that controls operate as designed. Reports to audit committee.

The minimum AI policy stack

Pattern that works: a single "AI Review Board" meeting weekly, with a 1-page risk tier intake. Low-risk auto-approved; medium reviewed asynchronously; high goes to the board live. This keeps velocity while protecting the brand.

12Change Management & Workforce

The single biggest predictor of AI ROI is whether employees actually change how they work. Software gets deployed in weeks; behavior change takes quarters. Invest accordingly.

Figure 12.1 — The adoption flywheel
Value created Awareness & Trust demos · stories · policy Access & Tooling licenses · SSO · catalog Skill Building labs · playbooks · prompts Workflow Redesign SOPs · roles · handoffs Incentives & OKRs leaders model the change Measurement usage · quality · P&L

The new skill ladder

TierAudienceWhat they learnTarget
AI LiterateAll employeesSafe use, prompting basics, what AI can/can't do100% of workforce
AI FluentKnowledge workersWorkflow integration, copilots, custom GPTs, RAG basics60–80%
AI Power UserFunction championsAgent design, evals, automation, function-specific playbooks5–10%
AI BuilderEngineers, data folksPrompting, agents, RAG, fine-tuning, guardrails, evals at scale1–3%
AI LeaderExecs & managersStrategy, governance, change leadership, ROI captureAll people-leaders
Workforce stance: communicate explicitly that AI is augmenting jobs, redesigning some, and creating new ones. Silence breeds fear and quiet sabotage. Most successful programs publish a clear "AI & You" charter to employees within the first 90 days.

13Economics, ROI & FinOps

AI projects fail financially in three ways: (1) value never measured, (2) cost-to-serve underestimated, (3) hidden labor cost of oversight. A simple unit economic model up front prevents all three.

The AI unit economics formula

Net Value per Task = (Value created) − (Inference cost) − (Retrieval & data cost) − (Human-in-the-loop cost) − (Allocated platform & governance cost) − (Cost of errors × error rate)
Figure 13.1 — Total cost of an AI feature
Where the money actually goes (typical 1st-year program) Inference 18% Data & integration 24% Platform 16% People & change 22% Gov 10% Reserve 10% Lesson: inference alone is <20% of total cost. Optimize the whole stack, not just tokens. Where the value actually comes from Labor productivity / cost-to-serve 38% Revenue lift 22% Cycle time 18% Risk avoided 14% Other 8% Lesson: most value is operational. "Revenue from AI features" matters, but is rarely #1 in year one.

FinOps for AI — the levers

LeverTypical savingsHow
Model routing (small vs. frontier)30–60%Route simple tasks to small models; reserve frontier for hard cases.
Prompt & context caching30–90%Cache system prompts and retrieved chunks across calls.
Compression & summarization20–40%Pre-summarize long contexts; drop irrelevant chunks.
Batching & off-peak10–30%Use batch APIs for non-real-time workloads.
Output schema & max tokens10–20%Constrain output; stop generation early.
Eval-driven prompt slimming10–25%Shorten prompts where evals show no quality loss.
Distillation / fine-tuning small models40–80% (on hot paths)For high-volume, narrow tasks once specs stabilize.
Watch out: agentic workflows can use 10–50× the tokens of a single chat call. Budget at the workflow level, not the call level, or you will be surprised.

1412-Month Implementation Roadmap

A pragmatic, quarter-by-quarter sequence that has survived contact with reality across multiple industries. Adjust pace, not order.

Figure 14.1 — The 12-month enterprise AI rollout
Q1 Q2 Q3 Q4 Strategy Platform Use cases Governance People Vision · ambition · portfolio Pick stack · gateway · evals 2–3 flagship pilots picked AUP · risk tiers · register AI literacy rollout starts First ROI baseline RAG · observability · cost FinOps Pilots to limited production AI Review Board cadence Function champions trained Portfolio refresh · next bets Agent runtime · MCP · memory First agentic workflow live Red-team program live Role-redesign in 2 functions Board-level scorecard Multi-region · DR · audit-ready Scale flagship to enterprise External assurance / cert AI fluency > 50% workforce

Quarter-by-quarter exit criteria

QuarterExit criteria (must be true)
Q1Board-approved ambition; named CAIO; AI policy v1; platform v0; 2–3 flagship pilots scoped with owners and metrics.
Q2Pilots live with real users; eval suites; cost & usage dashboards; AI register populated; ≥30% of target users trained.
Q3At least one production agentic workflow; verified ROI on 2+ use cases; red-team report; first role redesign documented.
Q4Audit-ready posture; scorecard reported to board; ≥3 use cases compounding; talent funnel and skill metrics on track.

15KPIs & the AI Scorecard

If you don't measure all four layers, executives will lose confidence and funding will quietly retreat. Use this scorecard from month one — even if numbers are rough.

Figure 15.1 — The 4-layer AI scorecard
L4 · Business Outcomes Revenue · margin · NPS · cycle time · risk-adj loss L3 · Adoption & Workflow WAU/MAU · % of work touching AI · process cycle time · CSAT L2 · Quality & Safety Eval scores · groundedness · escalations · incidents · drift L1 · Platform & Cost Latency · uptime · cost / task · cache hit · model mix

Recommended metrics by layer

LayerMetricWhy it matters
L1 PlatformCost per successful taskThe only honest unit-economic number
L1 PlatformP95 latency, uptimeAdoption collapses below SLA
L1 PlatformCache hit rate, % small-model routedDirect FinOps levers
L2 QualityGroundedness, citation accuracyPredicts hallucination rate
L2 QualityEval pass rate vs. baselineDetects silent regression on model swap
L2 SafetyPolicy violations, jailbreak attempts blockedGovernance signal for the board
L3 AdoptionWeekly active users / target populationMost predictive leading indicator of ROI
L3 WorkflowTasks completed end-to-end with AIReal penetration into work
L4 OutcomeFunction P&L delta attributable to AIThe number the CFO will ask for
L4 OutcomeRisk-adjusted incident rateThe number the CRO will ask for

16Failure Modes & Anti-Patterns

The same failure patterns repeat across industries. Learning to spot them is more valuable than any single best practice.

Anti-pattern 1

"100 pilots, 0 production"

No prioritization, no platform, every team rebuilds plumbing. Symptom: each pilot is a snowflake.

Fix: stage-gate funding, mandatory reuse of platform components, 80/20 portfolio rule.

Anti-pattern 2

Tool-first thinking

Buying a platform before identifying value. Six months later, nothing has changed in any workflow.

Fix: tie every license to a named use case with a metric and an executive sponsor.

Anti-pattern 3

Demo-driven roadmap

Leadership chases the latest viral demo; teams whiplash between priorities.

Fix: portfolio governance with a quarterly horizon, not monthly chasing.

Anti-pattern 4

"Pilot → cliff"

Successful pilot has no path to enterprise scale because security, data, and ops were never in scope.

Fix: include security and ops on day one; define production exit criteria up front.

Anti-pattern 5

Shadow AI

Employees paste sensitive data into personal accounts because no sanctioned path exists.

Fix: sanction a fast, free, safe internal option before banning anything.

Anti-pattern 6

"Set and forget" evals

Eval suite written once at launch, never updated. Production quality silently drifts.

Fix: evals are a living asset; grow them from every incident and user feedback signal.

Anti-pattern 7

Hero-engineer dependency

One brilliant individual holds all knowledge; bus factor of 1. Common in early agent builds.

Fix: codify prompts, evals, and runbooks; pair, document, and rotate.

Anti-pattern 8

Compliance theater

Policy documents exist but nothing in the build pipeline enforces them.

Fix: make policy executable — automated checks in CI, blocking gates before deploy.

1790-Day Quick-Start Plan

If you read nothing else, do these 12 things in 90 days. They establish irreversible momentum.

Figure 17.1 — 90-day quick-start (3 phases of 30 days)
Days 1–30 · Frame Strategy • Name an executive sponsor & AI lead • Articulate 3-year AI ambition (1 page) • Run portfolio workshop → 10 candidate use cases Foundations • Approve sanctioned AI tools for all staff • Draft acceptable-use policy & risk tiers People • Launch AI literacy program (top 3 functions) • Name function "AI champions" Days 31–60 · Build Platform • Stand up gateway, logging, evals v0 • First RAG corpus indexed & ACL-checked • Cost & usage dashboards live Use cases • 2 flagship pilots in real user hands • Eval suite per pilot · baseline metrics set Governance • AI Review Board running weekly • AI register v1 populated Days 61–90 · Prove Value • Measured ROI on at least 1 pilot • Roadmap v1 with named owners & metrics • Board-level scorecard published Scale prep • Security/IT signoff for production exit • Red-team & HITL plan for high-risk flows People • ≥ 30% of target population trained • Internal storytelling campaign live

18Glossary & Further Reading

Key terms

TermPlain-English definition
Foundation modelA large, general-purpose AI model (text, image, multimodal) that other applications build on top of.
RAGRetrieval-Augmented Generation — pulling relevant company data into the model's context at query time instead of retraining the model.
AgentAn AI system that can plan, choose tools, and take multi-step actions toward a goal — not just answer a single question.
MCPModel Context Protocol — an emerging standard for connecting AI models to enterprise tools, data, and actions.
EvalA repeatable, scored test of model quality on a specific task — the unit test of AI.
GuardrailsProgrammatic checks that block, redact, or escalate unsafe inputs and outputs.
GroundednessThe degree to which a model's answer is supported by the source material it cited.
Prompt injectionAn attack where hostile content in retrieved documents or tool output hijacks the model's instructions.
Fine-tuningContinuing to train a foundation model on your data so it learns your domain, style, or task.
DistillationTraining a smaller, cheaper model to imitate a larger one on a specific task.
HITLHuman-in-the-loop — explicit review or approval step in an AI workflow.
Model cardA short standard document describing a model's intended use, limits, and evaluation results.
FinOps for AIThe discipline of measuring and optimizing cost per task across the AI stack.
Risk tierA classification (e.g., low / medium / high / prohibited) that determines the controls an AI system must meet.

Recommended further reading

Short list only—for numbered citations and stable URLs see Section 19 · References.

Closing — The 7 Commitments of an AI-Ready Company

1 · We have a portfolio, not pilots.

Every initiative ties to value, an owner, and a metric — and we kill what doesn't earn its keep.

2 · We treat the platform as a product.

One shared stack: models, data, evals, guardrails, observability — funded and on-call.

3 · We measure quality and cost like we measure money.

Evals in CI, FinOps for inference, scorecards to the board.

4 · We design for humans & oversight.

Every high-stakes flow has review, override, and audit. Trust is a moat.

5 · We invest more in change than in code.

Behavior change, skill ladders, role redesign — these capture the value.

6 · We are model-agnostic, data-loyal.

Models rotate every year; our data and workflows are the durable asset.

7 · We govern in the open, not in PDFs.

Policy is executable; risk tiers are enforced; the AI register is live.

The bottom line

Durable AI value usually goes to teams that are operationally serious—evals, data discipline, change management, and live governance—not to the group with the longest model list. This outline is meant to support that kind of work.

19References

Frameworks, diagrams, and sequencing in this playbook are teaching artifacts. Where the document discusses law, standards, methods, or economics, the items below are authoritative or widely used primary sources. Illustrative percentages (KPI strip, TCO bars, savings bands) synthesize published industry research and operating experience—verify against your own data before external reporting.

Law, policy, and standards

  1. European Parliament and Council. Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Consolidated text via EUR-Lex.
    https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689
  2. European Parliament and Council. Regulation (EU) 2016/679 on the protection of natural persons with regard to the processing of personal data (General Data Protection Regulation).
    https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R0679
  3. National Institute of Standards and Technology (NIST). Artificial Intelligence Risk Management Framework (AI RMF 1.0), NIST AI 100-1. Gaithersburg, MD, 2023.
    https://doi.org/10.6028/NIST.AI.100-1 · NIST AI RMF program page
  4. NIST. Artificial Intelligence Risk Management Framework: Generative Artificial Intelligence Profile, NIST AI 600-1. July 2024.
    NIST.AI.600-1 (PDF)
  5. ISO/IEC JTC 1/SC 42. ISO/IEC 42001:2023 — Information technology — Artificial intelligence — Management system.
  6. OECD. OECD AI Principles (OECD Legal Instrument 0449). Adopted 2019; recommendations on trustworthy AI.
    https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449
  7. G7. Hiroshima Process International Code of Conduct for Organizations Developing Advanced AI Systems (multilateral voluntary commitments on frontier AI governance). UK Government host copy (October 2023).
    https://www.gov.uk/government/publications/hiroshima-process-international-code-of-conduct-for-organisations-developing-advanced-ai-systems

Security, risk, and trustworthy AI practice

  1. OWASP Foundation. OWASP Top 10 for Large Language Model Applications (LLM risks including prompt injection and insecure output handling).
    https://owasp.org/www-project-top-10-for-large-language-model-applications/
  2. Mitchell, M., et al. “Model Cards for Model Reporting.” Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT*), 2019. arXiv:1810.03993.
    https://arxiv.org/abs/1810.03993
  3. The Institute of Internal Auditors. The IIA’s Three Lines Model — governance structure mapping to 1st, 2nd, and 3rd lines (referenced for AI risk committees and internal audit).
    https://www.theiia.org/en/topics/internal-audit/tri/

Architecture, methods, and technical foundations

  1. Lewis, P., et al. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” Advances in Neural Information Processing Systems (NeurIPS), 2020. arXiv:2005.11401.
    https://arxiv.org/abs/2005.11401
  2. Anthropic (contributors). Model Context Protocol (MCP) — open specification for connecting models to tools and data.
    https://modelcontextprotocol.io/ · Specification (GitHub)

Economics, operations, and sustainability

  1. McKinsey Global Institute. “The economic potential of generative AI: The next productivity frontier.” McKinsey & Company, June 2023 (industry estimates on automation, labor mix, and value at stake—compare to internal KPIs).
    McKinsey.com — economic potential of generative AI
  2. FinOps Foundation. FinOps Framework — cost accountability and unit economics in cloud (extended here to inference and AI workloads).
    https://www.finops.org/framework/
  3. Green Software Foundation. Software Carbon Intensity (SCI) Specification — methodology for carbon accounting at software level.
    https://sci.greensoftware.foundation/

Frontier model safety and provider research (for behavior, evals, and guardrails)

  1. Anthropic. Research publications and model/system cards (e.g., Claude family documentation and safety research).
    https://www.anthropic.com/research
  2. OpenAI. System cards, preparedness framework materials, and safety publications for GPT model families.
    https://openai.com/research/ · Safety overview
  3. Google DeepMind. Safety & alignment and model documentation hub.
    https://deepmind.google/research/