ML-PRO Cheatsheet — Production ML on Databricks (Features, Registry, Deployment, Monitoring)

Last-mile ML-PRO review: feature pipeline patterns, MLflow registry and promotion workflows, batch vs online deployment pickers, monitoring/drift decision rules, and governance essentials.

Use this for last‑mile review. Pair it with the Syllabus for coverage and Practice to validate production judgment.


1) The “production ML loop” (what the exam is testing)

    flowchart LR
	  FE["Feature pipeline"] --> TR["Train + evaluate"]
	  TR --> RUN["MLflow run (params/metrics/artifacts)"]
	  RUN --> REG["Registry version"]
	  REG --> DEP["Deploy (batch/online)"]
	  DEP --> MON["Monitor + drift"]
	  MON -->|retrain| FE

Exam rule: if a solution lacks versioning, lineage, or rollback, it’s rarely correct.


2) Feature pipelines: consistency beats cleverness

RiskSymptomMitigation
Training/serving skewproduction metrics collapseshared transforms; enforce schema
Leakageunrealistically good offline metricstime-aware splits; careful feature design
Driftmodel degrades over timemonitor distributions and outcomes

3) Registry and release workflows (high-yield)

ConceptWhy it matters
Registry versionsstable, auditable artifacts
Stage transitionscontrolled promotion and rollback
Approval gatesreduce “accidental production”

One-sentence heuristic: runs are for experiments, the registry is for releases.


4) Deployment pickers (batch vs online)

RequirementPreferWhy
Low latency per requestOnline servingrequest/response
High throughput scoringBatch inferencecost-efficient
Model updates frequentlyManaged rollout + rollbackreduce risk
Strict governanceVersioned registry releasesauditability

5) Monitoring decision rules (what to do when metrics drop)

ObservationFirst questionLikely action
Gradual degradationdata drift? seasonality?retrain / update features
Sudden droppipeline break? schema change?rollback or fix upstream
Only one segment affectedsampling bias?segment monitoring + targeted fix

6) Fast troubleshooting pickers

  • Can’t reproduce model: missing data/code versioning, randomness, or preprocess mismatch.
  • Model works in staging, fails in prod: feature skew, missing preprocessing, wrong schema.
  • Drift alarms firing: confirm data pipeline changes and validate feature distributions.