⭐ Air Quality Forecasting with AI/ML — Project Case Study
URL: https://TraceAQ.LinhTruong.com
🚀 Overview
TraceAQ is an AI‑powered air‑quality forecasting platform that uses NASA satellite data, historical pollution patterns, and machine learning models to predict PM2.5 and PM10 levels for the next 72 hours. The system provides real‑time monitoring, trend analysis, and natural‑language summaries to help users understand environmental risks.
Unlike static AQI dashboards, TraceAQ uses predictive modeling to forecast future air quality — enabling proactive decisions for health, outdoor activities, and environmental planning.
👨💻 My Role
- Architecture:
Designed the data ingestion pipeline, ML forecasting engine, and visualization layer - Data Engineering:
Built ETL pipelines for NASA satellite data and EPA ground sensors - AI/ML:
Time‑series forecasting, feature engineering, model evaluation - Frontend:
Interactive charts, AQI visualizations, and natural‑language summaries - DevOps:
Scheduling, automation, deployment, monitoring
🧩 Problem & Constraints
Air quality is dynamic and influenced by:
- Weather patterns
- Wildfires
- Seasonal changes
- Industrial activity
- Geographic factors
Most AQI dashboards only show current conditions. Users need future predictions to plan ahead.
Constraints:
- Must ingest large satellite datasets (NASA MODIS/VIIRS)
- Must combine satellite + ground sensor data
- Must run forecasting models daily
- Must present results in a clean, understandable UI
- Must generate natural‑language summaries for non‑technical users
🏗️ 1. System Architecture Diagram
┌───────────────────────────────┬───────────────────────────────┬───────────────────────────────┐
│ Frontend │ API Gateway Layer │ Visualization Layer │
│ (AQI Charts + Forecast UI) │ (Auth, Routing, Caching) │ (Graphs, Maps, Summaries) │
└───────────────┬───────────────┴───────────────┬───────────────┴───────────────┬───────────────┘
│ │ │
▼ ▼ ▼
┌──────────────────────────┐ ┌──────────────────────────┐ ┌──────────────────────────┐
│ Data Ingestion Engine │ │ Feature Engineering │ │ Forecasting Engine │
│ (NASA + EPA + Weather) │ │ (Cleaning + Merging) │ │ (ML Models: LSTM/ARIMA) │
└───────────────┬──────────┘ └───────────────┬──────────┘ └───────────────┬──────────┘
│ │ │
└───────────────┬───────────────┴───────────────┬───────────────┘
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ Data Warehouse │ │ LLM Summary Layer │
│ (Historical AQI + Feats) │ │ (Natural-Language Output) │
└──────────────────────────┘ └──────────────────────────┘
🔄 2. Data Flow Diagram
┌──────────────────────────┐
│ 1. Fetch Satellite Data │
│ (NASA MODIS/VIIRS) │
└───────────────┬──────────┘
▼
┌──────────────────────────┐
│ 2. Fetch Ground Sensors │
│ (EPA AQI Stations) │
└───────────────┬──────────┘
▼
┌──────────────────────────┐
│ 3. Merge + Clean Data │
│ (Interpolation, Scaling) │
└───────────────┬──────────┘
▼
┌──────────────────────────┐
│ 4. Feature Engineering │
│ (Weather, Seasonality) │
└───────────────┬──────────┘
▼
┌──────────────────────────┐
│ 5. Train/Run Forecasting │
│ Models (LSTM/ARIMA) │
└───────────────┬──────────┘
▼
┌──────────────────────────┐
│ 6. Store Predictions │
│ in Warehouse │
└───────────────┬──────────┘
▼
┌──────────────────────────┐
│ 7. LLM Generates Summary │
│ (Readable Forecast) │
└───────────────┬──────────┘
▼
┌──────────────────────────┐
│ 8. UI Displays Charts │
│ + Narrative Insight │
└──────────────────────────┘
🔁 3. Workflow Diagram (Daily Forecast Cycle)
┌──────────────────────────┐
│ 1. Daily Scheduler Triggers│
│ Forecast Pipeline │
└───────────────┬──────────┘
▼
┌──────────────────────────┐
│ 2. Download Satellite │
│ + Sensor Data │
└───────────────┬──────────┘
▼
┌──────────────────────────┐
│ 3. Clean + Merge Data │
│ (Fill Gaps, Normalize)│
└───────────────┬──────────┘
▼
┌──────────────────────────┐
│ 4. Run ML Forecasting │
│ (Next 72 Hours) │
└───────────────┬──────────┘
▼
┌──────────────────────────┐
│ 5. Store Predictions │
│ in Warehouse │
└───────────────┬──────────┘
▼
┌──────────────────────────┐
│ 6. LLM Generates │
│ Narrative Summary │
└───────────────┬──────────┘
▼
┌──────────────────────────┐
│ 7. UI Updates Charts │
│ + Forecast Text │
└──────────────────────────┘
🤖 4. “Forecasting Brain” Coordination Diagram
┌──────────────────────────┐
│ Data Ingestion Engine │
│ (NASA + EPA + Weather) │
└───────────────┬──────────┘
│
▼
┌──────────────────────────┐
│ Feature Engineering │
│ (Lag Features, Trends) │
└───────────────┬──────────┘
│
┌───────┴───────────────────────────────┐
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ Forecasting Models │ │ Data Warehouse │
│ (LSTM, ARIMA, Prophet) │ │ (Historical + Predictions)│
└───────────────┬──────────┘ └───────────────┬──────────┘
│ │
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ LLM Summary Layer │ │ Visualization Layer │
│ (Readable Forecast Text) │ │ (Charts, AQI Colors) │
└───────────────┬──────────┘ └───────────────┬──────────┘
│ │
└───────────────┬──────────────────────┘
▼
┌──────────────────────────┐
│ TraceAQ Frontend │
│ (Forecast + Insights) │
└──────────────────────────┘
🎨 UX & Design Rationale
- Color‑coded AQI charts
for instant readability - Forecast timeline
(next 72 hours) with trend arrows - Natural‑language summaries
for non‑technical users - Mobile‑first layout
for quick checks on the go - Minimalist UI
to keep focus on the forecast
🔬 Technical Deep Dive
Data Ingestion
- NASA MODIS/VIIRS aerosol optical depth (AOD)
- EPA AirNow ground sensor data
- Weather variables (humidity, wind, temperature)
Feature Engineering
- Lag features (1h, 3h, 6h, 24h)
- Rolling averages
- Seasonal patterns
- Weather‑pollution interactions
Forecasting Models
- LSTM for nonlinear temporal patterns
- ARIMA for baseline comparison
- Ensemble averaging for stability
LLM Summary Layer
- Converts raw predictions into:
- Human‑readable summaries
- Health recommendations
- Trend explanations
Visualization Layer
- Interactive charts
- AQI color bands
- Forecast timeline
📈 Results & Impact
- Provides predictive air‑quality insights, not just real‑time readings
- Helps users plan outdoor activities and protect health
- Demonstrates end‑to‑end ML engineering (data → model → UI → narrative)
- Showcases your ability to integrate satellite data + ML + LLMs
- Strong portfolio piece for AI, data engineering, and environmental modeling