Linh Truong · Product strategy · 2026

The AI Product Manager
Operating Manual

Canvases, diagrams, and decision checks I use with teams—from discovery through scale—for AI that actually ships and stays trustworthy.

Author: Linh Truong, MA (Harvard), MBA
Source: LinhTruong.com
Email: Linh@Alumni.Harvard.edu
Edition: v2026.1
Reading time: ~35–45 min
For: PMs, group PMs, heads of product
Covers: LLM, agentic, multimodal, embedded AI (B2B, consumer, internal)

Contents

  1. The New Mandate of the AI PM
  2. Competency Model & T-Shape
  3. The AI Product Lifecycle
  4. Strategy Canvas & North-Star
  5. Technical Foundations Every PM Needs
  6. Discovery, Problem Framing & Validation
  7. Data, Evals & Quality Loops
  8. Metrics: North-Star, HEART, Guardrails
  9. Risk, Safety, Ethics & Governance
  10. Build vs. Buy vs. Fine-tune vs. RAG
  11. Pricing, Unit Economics & Cost Control
  12. Team Topology & RACI
  13. Roadmapping & Prioritization
  14. Go-to-Market & Adoption
  15. Common Pitfalls & Anti-Patterns
  16. The 30 / 60 / 90 Day Plan
  17. Tooling Stack Reference
  18. Closing: The Durable AI PM
  19. References & sources
78%
Enterprises with AI in production
McKinsey State of AI, 2025
3.7×
ROI gap, leaders vs. laggards
BCG AI Maturity Index, 2025
42%
AI pilots that reach scale
Gartner, 2025
11mo
Median PoC → production
a16z Enterprise survey, 2025

01The New Mandate of the AI Product Manager

The PM job did not change. The medium did. AI shifts the product surface from deterministic features to probabilistic capabilities — and that flips the operating model.

Traditional PMs ship features that behave the same way on Tuesday as they did on Monday. AI PMs ship capabilities whose quality is a distribution. Two consequences:

  • Evals replace spec sheets. A “done” feature is no longer a checked acceptance criterion — it is a measured pass-rate on a representative test set.
  • Data is the new roadmap. What you can build is bounded by the data you can collect, label, and govern. Roadmaps without a data plan are wishlists.
  • UX absorbs uncertainty. Confidence, fallbacks, citations, and undo become first-class UX primitives, not afterthoughts.
  • Cost is a feature. Latency and unit economics are now product decisions, not infra concerns.
Classical Product AI Product Deterministic Probabilistic Same input → same output Distribution of outputs p50
Fig. 1 — The product surface shifts from a discrete behavior to a probability distribution.
“In AI products, the PM's job is not to specify the answer — it's to define the question, the bar, and the experience when the answer is wrong.” — Linh Truong

02Competency Model: The T-Shaped AI PM

An AI PM is not a research scientist — but cannot be a passenger either. The bar is fluency, not authorship: enough depth to ask the right questions and make the right tradeoffs.

AI Product Manager ① Product Craft Discovery, JTBD, roadmapping Outcomes > outputs Storytelling, narrative writing PRD & spec maturity ② Technical Fluency LLM behavior, context, tokens RAG, agents, fine-tuning Latency / cost tradeoffs Evals & failure modes ③ Data Literacy Schemas, lineage, quality Labeling & ground truth Sampling & bias PII, consent, residency ④ Risk & Ethics Hallucination, harm, fairness Red teaming, abuse cases Regulatory: EU AI Act, NIST Disclosure & consent ⑤ Business Acumen ⑥ Leadership
Fig. 2 — Six competency lobes. Most AI PMs are strong in 2–3 and adequate in the rest. The differentiator is depth in ② and ③.
CompetencyNovicePractitionerExpert
Technical fluencyKnows what a token isDesigns eval sets & prompts; chooses RAG vs. fine-tuneCo-designs system architecture with eng; reasons about cost/latency curves
Data literacyReads dashboardsDefines labeling guidelines & sampling strategyOwns a data flywheel with feedback loops in production
Risk & ethicsLists obvious harmsRuns structured red-team; owns disclosure UXBuilds governance program mapped to EU AI Act / NIST AI RMF
Business acumenKnows pricingModels cost-per-query & gross marginDrives pricing reinvention around value, not seats

03The AI Product Lifecycle

A two-loop model: an outer product loop (months) and an inner quality loop (days–hours). Most failures come from teams running only one.

OUTER LOOP — Product Discovery → Delivery (weeks/months) Problem JTBD & value hypothesis Feasibility Data & model spike Design UX of uncertainty Build Pipeline + UI + evals Launch Gradual rollout Operate Monitor & drift Learn Re-frame problems INNER LOOP — Quality / Eval (hours/days) Collect cases Logs · users · red team Curate evals Golden + adversarial sets Iterate prompt/model Compare variants Ship if Δ>0 Behind feature flag The outer loop chooses what to be good at. The inner loop makes it good. Decouple ownership; couple cadence.
Fig. 3 — The two-loop operating model. Outer loop is the PM's classical territory; the inner loop is where AI products are won or lost.

04The AI Strategy Canvas

Before writing a PRD, fill the canvas. If any of the seven boxes are empty, you are not ready to commit roadmap.

① Who & What User segment & JTBD Trigger moment Today's alternative Pain & willingness-to-pay Test: Can you describe one user, by name, who currently pays (in time/money) for this? ② Capability Wedge What AI capability unlocks value Why now (model, data, cost) 10× over status quo on which axis Edge case = the demo Anti-test: If GPT-N+1 launched tomorrow, would your wedge be commoditized or sharpened? ③ Data Moat Proprietary data & access Feedback loop in product Labeling & lineage plan Privacy & residency posture Test: After 90 days of use, does the product get measurably better for the user? ④ UX of Uncertainty Confidence display Citations / sources Fallback & escape hatch Undo / human-in-loop Test: When wrong, does the user lose trust in the product, or just this answer? ⑤ Risk Surface Worst-case output Bias & fairness Misuse / jailbreak Regulatory class (EU AI Act) Test: Could a bad output appear on the front page of a newspaper? ⑥ Unit Economics Cost per successful task Latency budget Pricing logic Gross margin target Test: At 10× volume, do you still make money? If unit cost halves, do you cut price? ⑦ Distribution Where it shows up Workflow integration Activation moment Habit / re-engagement Test: Does the user have to remember to come back, or is it embedded where they work?
Fig. 4 — The 7-box AI Strategy Canvas. Each box has a stress-test question; commit only when all seven pass.

05Technical Foundations Every AI PM Needs

You will not write the code. You will make tradeoffs about it weekly. Here is the minimum surface area.

5.1 The model landscape (2026)

ClassWhat it isWhen PM picks itRisks
Frontier general LLM Claude / GPT / GeminiHosted, broad, instruction-followingFast time-to-value, broad tasks, lower volumeVendor lock, cost at scale, data egress
Open-weights LLM Llama / Mistral / QwenSelf-hostable, customizableData sovereignty, fine-tuning, on-premOps burden, slower iteration
Small / specialized distilled · task-tunedCheap, fast, narrowHigh-volume narrow tasks (classify, extract)Brittle to drift
MultimodalText + vision + audio + videoDoc understanding, accessibility, roboticsEval is harder; PII in pixels
Agentic systemsTool-using, multi-step planningWorkflow automation, “do” vs “answer”Long-horizon failures compound

5.2 The four common architectures — and when to choose each

A · Pure Prompt Prompt → LLM → Output LLM When: Generic reasoning, fast PoC Limits: No fresh / private data PM care: Prompt versioning, evals Cost = tokens × calls B · RAG Retrieve → ground → generate Query Vector DB LLM When: Knowledge changes; cite-required Limits: Retrieval quality = ceiling PM care: Chunking, freshness, citations Recall@k vs precision C · Fine-tune / Adapt Train on examples → model Labeled data Adapter / LoRA When: Domain style, structured outputs Limits: Stale, costly to refresh PM care: Label quality > quantity Cost amortization plan D · Agentic Plan · tool · observe · loop Plan Tool Obs. When: “Do” tasks across systems Limits: Compounding errors, cost spikes PM care: Step budgets, kill-switches, audits
Fig. 5 — Four common architectures. The PM's job is to choose the simplest one that clears the quality bar at acceptable cost.
Rule of thumb: start with A, add B when knowledge matters, add C when style or structure matters, reach for D only when the task is multi-step and a human would otherwise click through 5+ screens.

5.3 Tokens, context, latency, cost — the four levers

Tokens
Words per token (English). Plan budgets by tokens, not words.
Context window
200k+
Common today. Bigger ≠ better — retrieval quality decays past mid-window.
Latency
P95
What users feel. Streaming hides P50; never P95.
Cost / 1M tok
10–100×
Spread between cheapest and frontier. Route by task.

06Discovery, Problem Framing & Validation

AI does not change discovery; it raises the cost of building the wrong thing. Spend more time on the problem, not less.

6.1 The Three-Question Filter

  1. Is this a real problem? Measured in current time/money spent, not stated interest.
  2. Is AI the best lever? Or would a rule, a form, or a search box do it cheaper and safer?
  3. Can we afford the error mode? What happens at the 1% and 0.01% bad output — and who bears the cost?
Failure pattern: teams skip Q2. “AI” is added to a doc-search problem that needed indexing. Result: higher cost, lower trust, same outcome.

6.2 The Wizard-of-Oz prototype

Before any model integration, simulate the experience with a human in the loop. If users do not love the experience when answers are perfect, no model will save you. If they do, you now have a quality bar.

6.3 Risk-classed problem framing

TierExamplesBar to ship
Low Drafting / brainstormingEmail drafts, alt copy, summaries>70% “useful” rating; user always edits
Mid Decision supportTriage, prioritization, lead scoringCited evidence, override always present, eval on rep set
High Autonomous actionSending email, executing trades, code commitsMulti-stage approval, audit log, kill-switch, scoped permissions
Regulated Health / legal / financial adviceDiagnosis, contract terms, fiduciary adviceDomain expert review of outputs; EU AI Act conformity; clear disclosure

07Data, Evals & the Quality Loop

Evals are not QA. Evals are the product spec. The PM who owns the eval set owns the product.

Online (production) A/B tests · user feedback · task success LLM-as-judge / model-graded Pairwise · rubric · calibration vs humans Offline deterministic Unit · regression · adversarial · format slow · trusted · scarce scalable · noisy · cheap fast · cheap · brittle Run before launch & on incidents Run on every model/prompt change Run in CI on every commit
Fig. 6 — The eval pyramid. Build bottom-up. Every team has CI tests; few have judge calibration; almost none have rigorous online evals — which is exactly where competitive advantage lives.

7.1 The PM's eval responsibilities

7.2 The data flywheel

User uses productReal workflows, real intent Implicit signalsEdit · keep · regenerate Curated datasetSampled · labeled · scrubbed Better model/promptFine-tune · routing · rules Eval gatesNo regression rule Improved productSame UI, better outcomes
Fig. 7 — The data flywheel. The PM's job is to ensure every loop closes — most break at “implicit signals” (not captured) or “eval gates” (not enforced).

08Metrics: North-Star, HEART & AI Guardrails

Classical product metrics (engagement, retention) are necessary but insufficient. AI products need a layer for quality and a layer for safety.

LayerWhat it measuresSample metrics
North-StarCustomer value createdSuccessful tasks completed; revenue per active user
Engagement (HEART)Happiness, Engagement, Adoption, Retention, Task successCSAT, DAU/MAU, activation rate, D30 retention, task completion
AI QualityOutput goodnessEval pass-rate, hallucination rate, citation accuracy, regeneration rate
AI Safety / GuardrailsWhat must not happenPolicy-violation rate, jailbreak success, PII leak rate
Unit EconomicsSustainability at scaleCost per successful task, gross margin, P95 latency
The cardinal rule: never optimize a single metric. AI products can game any one of them — e.g., a model that refuses everything has zero hallucinations and zero usefulness. Always pair an “up” metric with a guardrail.

8.1 Worked example: an AI support assistant

MetricTypeTargetGuardrail pair
Tickets self-servedNorth-Star↑ +30% YoYCSAT ≥ baseline
First-response accuracyQuality≥ 92%Escalation false-negative < 1%
Hallucinated policy citationsSafety= 0
Cost per resolved ticketEcon< $0.15P95 latency < 4s

09Risk, Safety, Ethics & Governance

Governance is product strategy. Regulatory class shapes architecture choices that are expensive to reverse later.

9.1 Failure modes the PM must name

Hallucination

Plausible-sounding falsehoods. Mitigate via RAG, citations, constrained generation, retrieval grounding.

Bias & fairness

Disparate quality across user groups. Mitigate via stratified evals, dataset audits, counterfactual tests.

Prompt injection

Hostile inputs hijack instructions. Mitigate via input sanitization, separated trust zones, output validation.

Data leakage

Training/inference data appearing in outputs. Mitigate via tenant isolation, no-train guarantees, output filters.

Over-reliance

Users skip review on low-friction outputs. Mitigate via friction, confidence cues, sampling forced-review.

Drift

Quality decays as world or model changes. Mitigate via monitoring, scheduled re-evals, alerting.

9.2 Regulatory landscape (2026 snapshot)

FrameworkScopeWhat it asks of you
EU AI ActAnyone serving EU usersRisk-class your system; conformity assessment for high-risk; transparency for limited-risk; GPAI documentation
NIST AI RMFUS, voluntary but referenced by gov contractsGovern, Map, Measure, Manage — produce documentation across all four
ISO/IEC 42001Org-wide AI managementAuditable management system; common in enterprise procurement
Sector-specificHIPAA, GLBA, FDA SaMD, etc.Pre-existing rules apply with AI-specific guidance overlays
PM action: on day one of any new AI initiative, write the system's EU AI Act risk classification on the first page of the PRD. If it is high-risk, route procurement, legal, and security before design — not after.

9.3 Red-team in product cadence

10Build vs. Buy vs. Fine-tune vs. RAG

A decision tree, not a debate.

Is the AI capability a moat? Differentiation vs. table-stakes Table-stakes / commodity "chat with your app" Moat / core to thesis domain-unique workflow Buy Embed vendor / off-the-shelf Frontier API + RAG Hosted LLM, your data Fine-tune / adapt Open weights + your labels Train / pretrain Only if data is truly unique When to choose what Buy → fastest TTV; vendor risk; little defensibility RAG → knowledge changes weekly; citations matter; data stays put Fine-tune → consistent style/format; high-volume narrow task; cost matters Train → almost never. Justify with capability not achievable otherwise. Cost order (typical): Buy < RAG < Fine-tune << Train Speed order: Buy > RAG > Fine-tune >> Train Defensibility: opposite of cost order PM rule: choose the cheapest option that clears the bar today.
Fig. 8 — Build/buy decision tree. Most enterprise AI products live in “Frontier API + RAG” for years before earning the right to fine-tune.

11Pricing, Unit Economics & Cost Control

AI products break the per-seat SaaS model. Consumption costs are real, variable, and unforgiving.

11.1 Pricing model menu

ModelBest forRisk
Per-seatPredictable usage; co-pilot productsHeavy users subsidize light; margin erosion
Per-task / outcomeAgentic products doing discrete jobsNeed clear unit of value; gaming risk
Usage / tokenDeveloper / API productsHard to forecast; bill shock
Tiered with creditsMid-market with variabilityComplexity; renegotiation cycles
Outcome-basedReplacement-of-labor positioningAttribution; long sales cycle

11.2 The cost stack and where PMs control it

Where the dollars go (typical AI feature) Inference Lever: model routing · prompt compression · caching · streaming Retrieval / embeddings Lever: chunking · embedding model · index TTL · hybrid search Storage / vector DB Lever: tiering · dedup · namespacing Eval / monitoring Lever: sample rate · judge model Bar width ≈ share of cost. Order of magnitude shifts with architecture.
Fig. 9 — Inference dominates most AI products' cost. Routing (small model for easy tasks, frontier for hard) is the highest-leverage PM decision.
Cost control (how I run it): (1) instrument cost-per-successful-task as a first-class metric; (2) build a model router with quality-cost-latency curves per task; (3) cache aggressively at the semantic level; (4) prompt-compress and trim context; (5) re-run unit economics every major model release — frontier prices fall ~70%/year.

12Team Topology & RACI

AI teams have more disciplines than classical product teams — and more handoffs. The PM is the seam-stitcher.

AI PM seam-stitcher Design UX of uncertainty ML / AI Eng model · pipeline Data Eng pipelines · quality Research eval · new techniques Legal / Risk policy · compliance GTM / CS adoption · feedback Eng (full-stack) QA / SRE prod observability
Fig. 10 — A typical AI product pod. PM doesn't manage every line, but is the only role with line-of-sight to every dependency.

12.1 RACI for an AI feature launch

ActivityPMML EngDesignDataLegal
Problem framing & success metricA/RCCCI
Eval set curationARCRC
Architecture choiceARICC
UX of uncertaintyACRIC
Risk class & disclosuresACCIR
Launch decision (go/no-go)ARRRR

13Roadmapping & Prioritization

Roadmaps for AI products are theses, not Gantt charts. Bet on capabilities, not features.

13.1 The three-horizon AI roadmap

Horizon 1 · Now (0–3mo) • Ship known wedge with frontier API • Build evals & monitoring • Capture feedback signals • Measure cost & latency baseline • Risk-class & disclose Goal: prove value, instrument loop Horizon 2 · Next (3–9mo) • Tighten quality via RAG & routing • Add workflow integrations • Begin selective fine-tuning • Move to outcome pricing where possible • Expand to adjacent JTBD Goal: compounding flywheel Horizon 3 · Later (9–24mo) • Agentic capabilities for “do” tasks • Proprietary data & models as moat • Platform & ecosystem plays • New category positioning • Re-architect on next-gen model class Goal: durable category position
Fig. 11 — Three horizons. Most teams over-invest in H1, under-invest in H2, and confuse H3 for vision deck décor.

13.2 Prioritization: RICE-AI

Classical RICE (Reach × Impact × Confidence ÷ Effort) needs two adjustments for AI work:

Score = (Reach × Impact × Confidence × Defensibility × (1 − Risk)) ÷ Effort

14Go-to-Market & Adoption

AI products fail in market for one reason more than any other: users don't know when to trust them. GTM solves trust before it sells capability.

14.1 Adoption curve archetypes

14.2 The trust ladder

Suggest User decides everything Draft User reviews & edits Execute (approved) User pre-approves classes Autonomous System acts, audits after earned trust over time
Fig. 12 — Trust ladder. Most B2B AI products should ship at “Draft” and graduate users to “Execute” per category as quality is proven.

14.3 GTM motions that work

15Common Pitfalls & Anti-Patterns

Demo-driven development. Shipping the demo as the product. The demo path is curated; production isn't. Build for the long tail from day one.
No eval, no progress. Without a golden set, every prompt change is a guess. Subjective “feels better” is not engineering.
Cost discovered at scale. Pricing locked before unit economics modeled. Then growth = losses.
UX without uncertainty. Confident-looking output with no source, no confidence, no undo. Each error erodes the whole product.
Capability fishing. “What can we do with AI?” is the wrong question. “What painful job is now possible?” is the right one.
One big launch. AI products improve by living in production. Gradual rollouts >> big bangs.
Outsourcing risk. “Legal will handle it.” They won't. Risk class belongs on page 1 of the PRD.
Frontier dependency without exit. No model abstraction, no routing, no second vendor. One vendor incident = product outage.

16The 30 / 60 / 90-Day Plan for a New AI PM

Day 30 · Learn Day 60 · Frame Day 90 · Ship Listen & map • Interview 10 users, 5 internal • Audit current AI features & cost • Read all complaints & CSAT tags • Run model-fluency self-audit • Inventory eval state (probably: none) • Map risk class & legal status • Identify the data flywheel breaks Output A one-page diagnosis Strategy & bets • Fill the 7-box canvas • Pick the “one job, done better” • Build the golden eval set (50–200) • Define North-Star + guardrails • Align on RACI & pod composition • Lock cost-per-task budget • Draft 3-horizon roadmap Output A signed-off strategy & PRD Deliver & instrument • Ship one improvement to prod • Wire feedback signals into eval • Run first red-team • Establish weekly quality review • Publish first internal trust report • Set Q+1 OKRs grounded in data • Codify the inner loop cadence Output A live loop, measurable Δ in quality
Fig. 13 — A 90-day plan biased toward learning before bets, and bets before shipping.

17Tooling Stack Reference (2026)

Categories worth knowing, with archetypes. Tool choices change yearly; categories don't.

CategoryWhat it doesArchetypes
Frontier model APIsHosted LLMs / multimodalClaude · GPT · Gemini
Open-weights servingSelf-host modelsvLLM · TGI · Ollama
OrchestrationChains, agents, tool useLangChain · LlamaIndex · in-house
Vector DBEmbeddings & retrievalpgvector · Pinecone · Weaviate · Qdrant
Evals & observabilityTest, monitor, alertBraintrust · LangSmith · Arize · Helicone
Prompt managementVersion, A/B, deploy promptsPromptLayer · in-house registry
Annotation / labelingGround-truth creationLabel Studio · Scale · Surge
Guardrails / safetyFilter, classify, redactLlama Guard · NeMo Guardrails · in-house
Cost / routingSmart model selectionLiteLLM · Martian · in-house router
GovernanceRisk register, model cardsInternal wiki · Credo AI · ModelOp

18Closing: The Durable AI Product Manager

Models and vendors churn fast. Judgment doesn’t: framing the problem, setting the bar, designing for failure, and running a compounding quality loop.

Be skeptical of the model. Be relentless about the user. Be precise about the bar. Be honest about the risk. Ship.

Nail that, and you beat most roadmaps full of clever features. The stack will move; the job of the PM won’t.

One-page summary · (1) Probabilistic product = distribution, not feature · (2) Evals are the spec · (3) UX absorbs uncertainty · (4) Data flywheel or no moat · (5) Risk class on page 1 · (6) Cost is a feature · (7) Trust earned in rungs, not leaps · (8) The PM owns the seams.

19References & sources

Below is the reading list behind this note: KPI sources (§1), discovery and metrics (§6–8), evals and RAG (§7, §10), risk (§9), economics and delivery (§11–13). Survey numbers change every release—pull the publisher’s current PDF before you cite a statistic.

For formal citations, prefer stable URLs and publisher PDFs. arXiv preprints are fine where that’s the canonical version; vendor docs are for day-to-day work, not your bibliography.

Industry context & adoption (KPI strip, §1)

  1. McKinsey & Company, “The state of AI.” Annual survey series—enterprise adoption and economics. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai
  2. Boston Consulting Group, AI maturity / value surveys (e.g., AI maturity index materials). Use the year labeled on the report you download.
  3. Gartner research on AI deployment and pilot-to-scale rates—access via subscription or public summaries; verify headline stats against the underlying note.
  4. Andreessen Horowitz (a16z), enterprise AI surveys & data infrastructure essays. Useful directional context for B2B AI GTM (§14).

Product discovery, strategy & UX metrics (§2–8)

  1. Cagan, Inspired: How to Create Tech Products Customers Love. Wiley—product discovery and empowered teams; maps to lifecycle §3–6.
  2. Ries, The Lean Startup. Crown Business—validated learning and experiment design.
  3. Torres, Continuous Discovery Habits. Product Talk—interview cadence and opportunity mapping (§6).
  4. Olsen, The Lean Product Playbook. Wiley—hypothesis-driven PM process.
  5. Rodden et al., “Measuring the User Experience on a Large Scale” (HEART framework). Google / CHI lineage—maps to §8 HEART.
  6. Christensen et al., “Know Your Customers’ Jobs to Be Done.” Harvard Business Review, 2016—JTBD framing for “one job, done better.”

Software delivery, teams & operating models (§12–13)

  1. Forsgren, Humble & Kim, Accelerate—DORA metrics and delivery performance. IT Revolution.
  2. Skelton & Pais, Team Topologies—stream-aligned teams vs platform; RACI adjacency in §12.

Machine learning systems & AI product engineering (§5, §7, §10)

  1. Huyen, Designing Machine Learning Systems. O’Reilly—data, deployment, and monitoring vocabulary for PMs working with engineers.
  2. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” NeurIPS 2020—RAG baseline. https://arxiv.org/abs/2005.11401
  3. Es et al., “Ragas: Automated Evaluation of Retrieval Augmented Generation.” 2023—RAG eval metrics named in tooling discussions. https://arxiv.org/abs/2309.15217
  4. Zheng et al., “Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.” NeurIPS 2023—human-calibrated judging. https://arxiv.org/abs/2306.05685
  5. Hu et al., “LoRA: Low-Rank Adaptation of Large Language Models.” ICLR 2022—fine-tuning trade-space in §10. https://arxiv.org/abs/2106.09685

Safety, security & AI governance (§9)

  1. European Union, Artificial Intelligence Act (Regulation (EU) 2024/1689). Primary legal text. EUR-Lex
  2. NIST, AI Risk Management Framework (AI RMF 1.0). https://www.nist.gov/itl/ai-risk-management-framework
  3. ISO/IEC 42001—AI management system standard (purchase from ISO or national body).
  4. OWASP Top 10 for Large Language Model Applications. OWASP project page
  5. OECD AI Principles—high-level policy framing. https://oecd.ai/en/ai-principles

Pricing, platforms & economics of information (§11, §14)

  1. Christensen, The Innovator’s Dilemma. Harvard Business Review Press—incumbent dynamics when new capabilities (e.g., AI) reshape value chains.
  2. Shapiro & Varian, Information Rules: A Strategic Guide to the Network Economy. Harvard Business Press—versioning, bundling, and metering strategies; useful mental model for token / usage pricing.

Reliability & cost discipline (§11)

  1. Google, Site Reliability Engineering (free). O’Reilly / Google—SLOs and error budgets when AI features share production services. SRE book
  2. Dean & Barroso, “The Tail at Scale.” Communications of the ACM, 2013—why tail latency matters for AI UX.
KPI strip at the top: Re-check McKinsey, BCG, Gartner, and a16z figures against each publisher’s latest report before you drop them in a deck—headlines and years shift.