Single-page notes on the data lifecycle, familiar ethics pillars, EU AI Act / GDPR / NIST / ISO frames (as engineers actually use them), controls, eval habits, and checklists. I reach for this in design reviews—not as legal advice, but so “ethics” becomes tickets and tests instead of a vague slide.
Source: if you redistribute this file, link LinhTruong.com so the copy people get still shows authorship.
For me, ethics stopped being “someone else’s slide deck” once it collided with audits, DPIAs, and release gates. Regulators can pursue serious penalties under frameworks like the EU AI Act; GDPR still bites on sloppy data handling; buyers ask for model cards; a bad launch is a trust problem, not just a metric blip. What follows is the map I use: controls, lifecycle, architecture sketches, and checklists—so “responsible AI” shows up as code and reviews, not vibes.
Non-compliant systems face fines, takedowns, and reputational harm.
Documented provenance and evaluation are now a procurement requirement.
When fairness, safety, and drift get the same care as latency, I see fewer ugly surprises after launch.
Reusable guardrails accelerate, not slow, future releases.
Software engineers are the last mile of ethical AI. Policies become real only when expressed as code: data filters, eval gates, RBAC rules, audit logs, content-safety checks, retention jobs, and rollback triggers. Treat ethics as a non-functional requirement on equal footing with availability, latency, and security.
OECD, UNESCO, NIST, and EU High-Level Expert Group language mostly lines up on six themes. I use them as a rubric when I'm sanity-checking a system—not as a substitute for your counsel's risk tiering.
Ethical risk is introduced — and must be mitigated — at every stage of the pipeline. The diagram below maps the eight canonical stages with the controls a software engineer is expected to implement at each.
| Stage | Risk | Engineering control |
|---|---|---|
| 1. Framing | Solving the wrong problem; harmful use case | Algorithmic Impact Assessment (AIA); proportionality review; “should we build it?” memo |
| 2. Collection | Unlawful, biased, or non-consensual sourcing | Source whitelist; consent capture; robots.txt & ToS check; provenance signatures (C2PA) |
| 3. Labeling | Annotator bias; harmful labor practices | Annotator guidelines; agreement metrics (Krippendorff α); fair-pay attestation |
| 4. Training | Memorization of PII; poisoning | De-duplication; PII scrubbing; differential privacy; checkpoint signing |
| 5. Evaluation | Hidden subgroup failures | Disaggregated metrics; red-teaming; jailbreak suites; capability evals |
| 6. Deployment | Misuse, hallucination, prompt injection | System-prompt hardening; output filters; rate limits; user-visible disclosure |
| 7. Monitoring | Drift, regression, emerging harms | Production fairness dashboards; user-feedback loop; incident hotline |
| 8. Decommission | Stale models causing harm; retention violations | Sunset policy; deletion / unlearning workflows; archive of lineage |
The following stack is what most regulated AI products look like in 2026. Each layer pairs a capability with the controls that make it ethical and auditable.
Multiple regimes now apply concurrently. Engineers do not need to be lawyers, but must know which controls satisfy which regime so technical decisions don't undo legal posture.
| Regulation | Scope | Status (May 2026) | Engineer-facing requirements |
|---|---|---|---|
| EU AI Act | Risk-tiered AI systems sold/used in EU | In force; high-risk obligations apply from Aug 2026; GPAI obligations active | Risk classification, technical docs (Annex IV), logging, human oversight, post-market monitoring, conformity assessment |
| GDPR | Personal data of EU residents | In force | Lawful basis, DPIA, data minimization, Art. 22 automated-decision rights, deletion/portability |
| NIST AI RMF 1.0 | US guidance, gov & contractors | Voluntary, widely adopted | Map · Measure · Manage · Govern functions |
| ISO/IEC 42001 | AI management system standard | Certification track | AIMS policies, controls, internal audit, continual improvement |
| ISO/IEC 23894 | AI risk management guidance | Reference | Risk identification, treatment, monitoring |
| Colorado AI Act / NYC LL144 | US state & municipal AI laws | In force | Bias audits, candidate notice, impact assessments |
| UK AI Bill (2026) | UK domestic regime | Sectoral guidance from regulators | Transparency, safety testing for frontier models |
| HIPAA / FDA SaMD | US health AI | In force | Validation, change-control, predetermined change-control plans (PCCP) |
Bias is rarely a single bug. It is the cumulative effect of which data was collected, who labeled it, what objective was optimized, and who used the output. Engineers must measure it explicitly — disaggregated metrics, not aggregate accuracy.
No single metric satisfies all definitions simultaneously — choose deliberately and document the trade-off.
# Disaggregated evaluation skeleton (illustrative)
for group in df["protected_attr"].unique():
sub = df[df["protected_attr"] == group]
metrics[group] = {
"n": len(sub),
"accuracy": accuracy(sub.y_true, sub.y_pred),
"tpr": recall_score(sub.y_true, sub.y_pred),
"fpr": false_positive_rate(sub.y_true, sub.y_pred),
"calibration_ece": expected_calibration_error(sub.y_true, sub.y_prob),
}
assert max_gap(metrics, "tpr") < 0.05, "TPR disparity exceeds threshold"
AI systems inherit classical software risk and add a new attack surface (model behavior). Map your threats with frameworks like MITRE ATLAS and the OWASP Top 10 for LLM Applications.
| Threat | Where it hits | Mitigation |
|---|---|---|
| Prompt injection (direct & indirect) | LLM input from users / retrieved docs | Content provenance tagging, instruction-isolation, allowlisted tools, output validation |
| Training-data poisoning | Crawled or third-party corpora | Source vetting, anomaly detection, signed datasets, dedup & canaries |
| Model extraction / theft | Public APIs | Rate limits, watermarking, query budgets, auth |
| Membership inference | Models trained on PII | DP-SGD, regularization, output perturbation |
| Jailbreaks | Aligned models | System-prompt hardening, classifier ensembles, refusal training, red-team CI |
| Tool/agent abuse | Autonomous agents | Capability tokens, human approval for high-impact actions, sandboxing |
| Supply-chain | Models/weights from hubs | SBOM, hash pinning, provenance attestations (SLSA, Sigstore) |
Short, structured documentation of intended use, limits, evaluation, and ethical considerations.
Source, collection method, motivation, composition, preprocessing, distribution.
Cryptographic provenance for generated media; required for many EU "limited-risk" disclosure cases.
Feature attributions (SHAP, IG), example-based (influence functions), surrogate models, chain-of-thought traces (with care).
Article 22 GDPR & AI Act: meaningful info about logic, significance, consequences.
Append-only logs: model version, prompt, context, output, user, override, justification.
The EU AI Act mandates "effective human oversight" for high-risk systems. That is concrete: a person must be able to interpret the output, override it, and stop the system.
| Artifact | Purpose | Owner | Trigger |
|---|---|---|---|
| AI Use-Case Intake Form | Capture purpose, data, risk class | Product + Engineering | New AI feature proposed |
| Algorithmic Impact Assessment | Identify affected populations & harms | Ethics/Policy + Eng | Before training |
| DPIA | Privacy risk analysis | DPO + Eng | Personal data involved |
| Datasheet | Document dataset | Data Eng | Per dataset version |
| Model Card | Document model | ML Eng | Per model release |
| Risk Register | Track top risks & status | AI Governance | Continuous |
| Annex IV Tech File | EU AI Act conformity dossier | Eng Lead + Legal | High-risk systems |
| Post-market Monitoring Plan | Detect & respond to harms | SRE + ML Ops | At launch |
| Activity | Engineer | ML Lead | Product | Legal/DPO | AI Governance |
|---|---|---|---|---|---|
| Risk classification | C | C | R | A | C |
| Dataset approval | R | A | C | C | I |
| Model card | C | R/A | C | I | I |
| Bias evaluation | R | A | C | I | C |
| Production guardrails | R/A | C | C | I | I |
| Incident response | R | C | C | C | A |
AI incidents differ from classic outages: nothing is broken, the system is just wrong, harmful, or unsafe. Build an AI-specific IR playbook.
| Need | Open-source / standards | Notes |
|---|---|---|
| Fairness metrics | Fairlearn, AIF360, What-If Tool | Use disaggregated metrics, not a single number |
| Explainability | SHAP, Captum, InterpretML | Match technique to model class & audience |
| Privacy | Opacus (DP-SGD), TensorFlow Privacy, Presidio (PII) | Validate ε budget & downstream utility |
| Eval frameworks | lm-eval-harness, HELM, BIG-bench, Inspect | Add internal red-team suite |
| Guardrails | NeMo Guardrails, Guardrails AI, LLM Guard | Defense-in-depth, not single-layer |
| Provenance | C2PA, Sigstore, SLSA | Sign artifacts and content |
| Lineage / catalog | OpenLineage, DataHub, Unity Catalog | Required for "explain why" requests |
| Risk & governance | NIST AI RMF Playbook, ISO/IEC 42001 controls | Map controls to existing SOC2/ISO 27001 |
Diagrams and narrative in §1–§17 are my synthesis for engineering readers; §18 lists primary standards and papers to verify details. Author: Linh Truong · LinhTruong.com.
Educational engineering reference only—not legal, compliance, or security advice for your jurisdiction. Check current legal texts and qualified counsel before you rely on penalty figures or risk labels in production decisions.
© 2026 Linh Truong · Single-file HTML · print-friendly