How I walk leadership teams from first pilots to a durable setup: shared platform, trustworthy data, governance people actually use, and metrics tied to the P&L.
AI is on most boards’ agendas; the gap is execution. The teams pulling away are not “trying the most”—they are serious about operating model, data quality, governance, and proof of value. This is the outline I use to teach that stack in one pass.
3–5×
ROI gap, leaders vs. laggards
70%
of value comes from process redesign, not the model
<15%
of pilots reach durable production without an operating model
5×
faster time-to-value with a reusable AI platform
Insight 01
Treat AI as a portfolio
Balance quick-win productivity gains with re-engineered workflows and a few transformational bets. Don't pick only one horizon.
Insight 02
Platform > projects
A shared AI platform (models, data, evals, guardrails, observability) compounds. Project-by-project builds do not.
Insight 03
Governance is a moat
Trust accelerates adoption. The companies pulling ahead are over-investing in evals, red-teaming, and auditability.
Insight 04
People > tools
Skills, incentives, and process redesign drive 60–70% of measured value. Tooling is necessary but not sufficient.
Insight 05
Agents are the inflection
Single-turn copilots are 2024 thinking. 2026 winners are deploying supervised agentic workflows on real business processes.
Insight 06
Measure or stall
Without a scorecard tied to P&L, AI programs lose executive sponsorship in 12–18 months. Instrument from day one.
2The 2026 AI Landscape
Models keep improving, costs keep falling, and regulation is real. My shorthand for strategy: differentiation is workflow, data, and trust—not the raw foundation model.
Figure 2.1 — Forces shaping enterprise AI
What changed in the last 18 months
Frontier reasoning models close most "complex task" gaps with human experts on narrow domains.
Context windows long enough to ingest entire policy manuals or codebases in one call.
Tool use, code execution, and computer use enable real workflow automation.
Evaluation tooling matured — eval-driven development is now a real discipline.
EU AI Act enforcement live; sector regulators (finance, health, public sector) publish binding guidance.
What this means for the C-suite
Stop benchmarking models monthly — assume capability parity within 6 months across frontier vendors.
Invest where capability does not replicate: your data, your processes, your customer trust.
Move from "AI projects" to "AI products" with real owners, SLAs, and roadmaps.
Build an AI compliance posture before regulators force one onto you.
Set a workforce strategy now — re-skilling lead times are 12–24 months.
3The AI Value Stack
Most enterprises confuse layers and end up over-investing in models and under-investing in the layers that actually produce sustained value. Use this stack to audit where you are spending — and where you should.
Figure 3.1 — The 7-layer enterprise AI value stack
Spend the right time at the right layer. A typical maturity-1 enterprise over-invests in layer 5 (apps) while neglecting layers 2 and 4 — which is why pilots stall.
4Strategic Framework — The 5 Horizons of AI Value
Successful programs don't pick "productivity" or "transformation" — they sequence across five horizons. Each horizon has a different sponsor, a different metric, and a different risk profile.
Figure 4.1 — The 5 Horizons of Enterprise AI Value
Horizon
Primary metric
Owner
Typical investment mix
Risk profile
H1
Active weekly users, hours saved per user
CIO / Chief of Staff
Tools, licenses, training
Low — buy
H2
Function P&L (cost, revenue, cycle time)
Function head + AI lead
Integrations, prompts, light fine-tuning
Moderate — configure
H3
End-to-end cycle time, error rate
COO / Process owner
Workflow redesign, agents, data plumbing
High — re-engineer
H4
Activation, retention, ARPU
CPO / GM
Custom models, UX, data products
High — build
H5
New revenue lines, platform GMV
CEO / Board
M&A, ventures, R&D
Very high — bet
Allocation rule of thumb: aim for roughly 50 / 30 / 15 / 5 across H1–H2 / H3 / H4 / H5 in Year 1, shifting to 30 / 35 / 25 / 10 by Year 3 as the platform matures.
5Operating Model & Org Design
The #1 reason corporate AI programs fail is not technology — it is the lack of a clear operating model. Pick one of four archetypes and align it to your strategy.
Figure 5.1 — Four operating model archetypes
Roles every AI program needs
Chief AI Officer / Head of AI
Owns strategy, portfolio, and accountability to the board. Reports to CEO or COO, not CTO.
AI Platform Lead
Runs the shared platform (models, evals, vector store, guardrails, observability) as a product.
AI Product Managers
One per major use case. Own outcomes, not features.
Applied ML / AI Engineers
Build prompts, agents, RAG pipelines, fine-tunes. Hybrid SWE + ML skill set.
Data & Knowledge Engineers
Own pipelines, contracts, embeddings, and lineage for AI workloads.
AI Risk & Responsible AI Lead
Owns policy, evals for safety/bias, audit, and regulator interface.
Change & Adoption Lead
Owns enablement, training, and behavior change inside the business.
Domain Champions ("AI Translators")
Embedded in each function — turn business problems into AI specifications.
AI Security Engineer
Threat models prompts, agents, data exfil, and supply chain risk.
6Use-Case Portfolio & Prioritization
I rank initiatives like a portfolio: value, feasibility, strategic fit. The matrix below is the slide I reach for most often in those conversations.
Figure 6.1 — Value × Feasibility prioritization matrix
Scoring template (use in every steering committee)
Dimension
Weight
Score 1–5
Notes
Annualized value at stake ($)
25%
—
Net of run cost
Strategic fit (customer / moat)
15%
—
Tie to corporate strategy
Data readiness
15%
—
Available, clean, accessible
Technical feasibility
15%
—
Model / system maturity
Adoption / change difficulty
15%
—
Process & behavior change
Risk profile (reg / safety / brand)
15%
—
Inverse-scored
7Reference Architecture
Every enterprise AI estate ultimately converges on the same six layers. Build them as a platform, not as scattered project assets.
Figure 7.1 — Reference enterprise AI architecture
Architecture principles
1 · Model-agnostic by default
Abstract behind a gateway. Assume you will rotate frontier models every 6–12 months.
2 · Retrieval before fine-tuning
RAG solves 80% of knowledge needs. Fine-tune only when style, latency, or cost demands it.
3 · Evals are part of CI/CD
No prompt or agent ships without an offline eval suite and online quality monitors.
4 · Human-in-the-loop is a feature
Design review, approve, and override paths into every high-stakes flow.
5 · Treat the platform as a product
SLAs, roadmap, on-call, internal "customers." Otherwise it rots.
6 · Cost & latency are first-class
Token budgets, caching, and model routing baked in from day one.
8Data, RAG & Knowledge Foundations
Your AI is only as good as the data and context it can reach. The single biggest determinant of pilot success is whether the right knowledge is retrievable at the right moment.
Figure 8.1 — Modern RAG pipeline (production-grade)
Data-readiness checklist
Foundational
Authoritative source-of-truth per domain identified.
Data catalog with ownership, sensitivity, lineage.
PII / PHI classification and redaction policies.
Access control aligned to identity (no shared service accounts).
Evaluation set per use case (≥ 200 representative queries).
Feedback capture (thumbs / edits) wired to retraining loop.
9Agentic AI & Workflow Automation
2026's most impactful AI deployments are not chatbots — they are supervised agents doing multi-step work inside real business processes. Treat them as junior employees: scope, train, supervise, audit.
Figure 9.1 — Agent autonomy ladder
Default to L2–L3 in 2026. The fastest, safest enterprise value is in supervised agents on bounded workflows. Reserve L4 for low-risk, reversible actions with strong evals. L5 is rarely appropriate for production today.
Workflows where a human reviewer already exists — slot the agent in front of that reviewer.
Processes with a "system of record" the agent can write to via stable APIs (CRM, ERP, ITSM).
High-volume, low-blast-radius actions where reversibility is easy.
10Build vs Buy vs Partner
A simple rule: buy commodity, build differentiated, partner where the moat is someone else's. Re-evaluate every 12 months — the market moves fast.
Layer
Default
Build when…
Buy when…
Foundation models
Buy
You have a regulated, narrow, latency-sensitive use case
Default — frontier vendors will out-invest you
Inference / hosting
Buy
You have data residency / sovereignty needs
Default
Vector store / RAG infra
Buy → Build adapters
You operate at >10⁸ vectors with strict SLAs
Default
Agent framework
Buy thin, build domain logic
Frameworks limit your control over evals/guardrails
Standard agent patterns
Evaluation & observability
Buy + customize
Use cases need bespoke metrics
Default
Domain copilots
Build
Workflow is your differentiator
For commodity functions (e.g., IT helpdesk)
Customer-facing AI features
Build
It touches your product moat
Never — outsourcing your moat is fatal
Governance / policy tooling
Buy + integrate
You operate in heavily regulated sectors
Default
11Governance, Risk & Compliance
Governance is not a brake — it's an accelerator. Teams with clear policy and tooling ship faster because they know what's allowed. Without it, every project re-litigates risk.
Figure 11.1 — AI risk taxonomy & controls
The Three Lines of Defense for AI
1st Line
Build teams
Product, engineering, and data teams own day-to-day risk: evals, guardrails, monitoring, and HITL design.
2nd Line
Risk & Responsible AI
Centralized policy, model risk management, red-teaming, classification, and approvals. Owns the AI register.
3rd Line
Internal Audit
Independent assurance. Tests that controls operate as designed. Reports to audit committee.
The minimum AI policy stack
Acceptable use policy for employees using AI tools.
AI system inventory / register — every system classified by risk tier.
Approval workflow tied to risk tier (lightweight for low, full review for high).
Model card / system card standard with intended use, evals, and limits.
Data handling standard covering training, inference, and logs.
Incident response runbook covering hallucinations, leaks, and tool misuse.
Third-party AI assessment for vendors and embedded AI features.
Pattern that works: a single "AI Review Board" meeting weekly, with a 1-page risk tier intake. Low-risk auto-approved; medium reviewed asynchronously; high goes to the board live. This keeps velocity while protecting the brand.
12Change Management & Workforce
The single biggest predictor of AI ROI is whether employees actually change how they work. Software gets deployed in weeks; behavior change takes quarters. Invest accordingly.
Prompting, agents, RAG, fine-tuning, guardrails, evals at scale
1–3%
AI Leader
Execs & managers
Strategy, governance, change leadership, ROI capture
All people-leaders
Workforce stance: communicate explicitly that AI is augmenting jobs, redesigning some, and creating new ones. Silence breeds fear and quiet sabotage. Most successful programs publish a clear "AI & You" charter to employees within the first 90 days.
13Economics, ROI & FinOps
AI projects fail financially in three ways: (1) value never measured, (2) cost-to-serve underestimated, (3) hidden labor cost of oversight. A simple unit economic model up front prevents all three.
The AI unit economics formula
Net Value per Task = (Value created) − (Inference cost) − (Retrieval & data cost) − (Human-in-the-loop cost) − (Allocated platform & governance cost) − (Cost of errors × error rate)
Figure 13.1 — Total cost of an AI feature
FinOps for AI — the levers
Lever
Typical savings
How
Model routing (small vs. frontier)
30–60%
Route simple tasks to small models; reserve frontier for hard cases.
Prompt & context caching
30–90%
Cache system prompts and retrieved chunks across calls.
Compression & summarization
20–40%
Pre-summarize long contexts; drop irrelevant chunks.
Batching & off-peak
10–30%
Use batch APIs for non-real-time workloads.
Output schema & max tokens
10–20%
Constrain output; stop generation early.
Eval-driven prompt slimming
10–25%
Shorten prompts where evals show no quality loss.
Distillation / fine-tuning small models
40–80% (on hot paths)
For high-volume, narrow tasks once specs stabilize.
Watch out: agentic workflows can use 10–50× the tokens of a single chat call. Budget at the workflow level, not the call level, or you will be surprised.
1412-Month Implementation Roadmap
A pragmatic, quarter-by-quarter sequence that has survived contact with reality across multiple industries. Adjust pace, not order.
Figure 14.1 — The 12-month enterprise AI rollout
Quarter-by-quarter exit criteria
Quarter
Exit criteria (must be true)
Q1
Board-approved ambition; named CAIO; AI policy v1; platform v0; 2–3 flagship pilots scoped with owners and metrics.
Q2
Pilots live with real users; eval suites; cost & usage dashboards; AI register populated; ≥30% of target users trained.
Q3
At least one production agentic workflow; verified ROI on 2+ use cases; red-team report; first role redesign documented.
Q4
Audit-ready posture; scorecard reported to board; ≥3 use cases compounding; talent funnel and skill metrics on track.
15KPIs & the AI Scorecard
If you don't measure all four layers, executives will lose confidence and funding will quietly retreat. Use this scorecard from month one — even if numbers are rough.
Figure 15.1 — The 4-layer AI scorecard
Recommended metrics by layer
Layer
Metric
Why it matters
L1 Platform
Cost per successful task
The only honest unit-economic number
L1 Platform
P95 latency, uptime
Adoption collapses below SLA
L1 Platform
Cache hit rate, % small-model routed
Direct FinOps levers
L2 Quality
Groundedness, citation accuracy
Predicts hallucination rate
L2 Quality
Eval pass rate vs. baseline
Detects silent regression on model swap
L2 Safety
Policy violations, jailbreak attempts blocked
Governance signal for the board
L3 Adoption
Weekly active users / target population
Most predictive leading indicator of ROI
L3 Workflow
Tasks completed end-to-end with AI
Real penetration into work
L4 Outcome
Function P&L delta attributable to AI
The number the CFO will ask for
L4 Outcome
Risk-adjusted incident rate
The number the CRO will ask for
16Failure Modes & Anti-Patterns
The same failure patterns repeat across industries. Learning to spot them is more valuable than any single best practice.
Anti-pattern 1
"100 pilots, 0 production"
No prioritization, no platform, every team rebuilds plumbing. Symptom: each pilot is a snowflake.
EU AI Act — final text and obligations by risk class.
NIST AI Risk Management Framework (AI RMF 1.0) and Generative AI profile.
OECD AI Principles and Hiroshima Process documents on frontier AI.
ISO/IEC 42001 — AI management system standard.
Your sector regulator's published AI guidance (financial services, healthcare, public sector).
Anthropic, OpenAI, Google DeepMind safety publications for current frontier model behavior.
★Closing — The 7 Commitments of an AI-Ready Company
1 · We have a portfolio, not pilots.
Every initiative ties to value, an owner, and a metric — and we kill what doesn't earn its keep.
2 · We treat the platform as a product.
One shared stack: models, data, evals, guardrails, observability — funded and on-call.
3 · We measure quality and cost like we measure money.
Evals in CI, FinOps for inference, scorecards to the board.
4 · We design for humans & oversight.
Every high-stakes flow has review, override, and audit. Trust is a moat.
5 · We invest more in change than in code.
Behavior change, skill ladders, role redesign — these capture the value.
6 · We are model-agnostic, data-loyal.
Models rotate every year; our data and workflows are the durable asset.
7 · We govern in the open, not in PDFs.
Policy is executable; risk tiers are enforced; the AI register is live.
The bottom line
Durable AI value usually goes to teams that are operationally serious—evals, data discipline, change management, and live governance—not to the group with the longest model list. This outline is meant to support that kind of work.
19References
Frameworks, diagrams, and sequencing in this playbook are teaching artifacts. Where the document discusses law, standards, methods, or economics, the items below are authoritative or widely used primary sources. Illustrative percentages (KPI strip, TCO bars, savings bands) synthesize published industry research and operating experience—verify against your own data before external reporting.
Mitchell, M., et al. “Model Cards for Model Reporting.” Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT*), 2019. arXiv:1810.03993. https://arxiv.org/abs/1810.03993
The Institute of Internal Auditors. The IIA’s Three Lines Model — governance structure mapping to 1st, 2nd, and 3rd lines (referenced for AI risk committees and internal audit). https://www.theiia.org/en/topics/internal-audit/tri/
Architecture, methods, and technical foundations
Lewis, P., et al. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” Advances in Neural Information Processing Systems (NeurIPS), 2020. arXiv:2005.11401. https://arxiv.org/abs/2005.11401
McKinsey Global Institute. “The economic potential of generative AI: The next productivity frontier.” McKinsey & Company, June 2023 (industry estimates on automation, labor mix, and value at stake—compare to internal KPIs). McKinsey.com — economic potential of generative AI
FinOps Foundation. FinOps Framework — cost accountability and unit economics in cloud (extended here to inference and AI workloads). https://www.finops.org/framework/
Green Software Foundation. Software Carbon Intensity (SCI) Specification — methodology for carbon accounting at software level. https://sci.greensoftware.foundation/
Frontier model safety and provider research (for behavior, evals, and guardrails)
Anthropic. Research publications and model/system cards (e.g., Claude family documentation and safety research). https://www.anthropic.com/research