MLA-C01 Syllabus — Objectives by Domain

Blueprint-aligned learning objectives for AWS Certified Machine Learning Engineer — Associate (MLA-C01), organized by domain with quick links to targeted practice.

Use this syllabus as your source of truth for MLA-C01. Work through each domain in order and drill targeted sets after every task.

What’s covered

Domain 1: Data Preparation for Machine Learning (ML) (28%)

Practice this topic →

Task 1.1 - Ingest and store data

  • Choose appropriate data formats for ML (for example, Apache Parquet, JSON, CSV, ORC, Avro, RecordIO) based on access patterns and downstream training requirements.
  • Differentiate validated and non-validated data ingestion formats and identify when schema enforcement is required.
  • Select core AWS data sources for ML workloads (for example, Amazon S3, Amazon EFS, Amazon FSx for NetApp ONTAP) and explain common trade-offs.
  • Select AWS streaming data sources to ingest data (for example, Amazon Kinesis, managed Apache Flink, Apache Kafka) based on throughput, ordering, and latency constraints.
  • Explain AWS storage options for ML datasets and artifacts and choose between them based on cost, performance, and data structure.
  • Extract data from storage systems (for example, Amazon S3, Amazon EBS, Amazon EFS, Amazon RDS, Amazon DynamoDB) and identify AWS options that improve transfer or I/O performance (for example, S3 Transfer Acceleration, EBS Provisioned IOPS).
  • Design a simple data landing strategy (raw → curated) and explain why consistent partitioning and file layout improves training performance.
  • Merge data from multiple sources using appropriate approaches (for example, programming techniques, AWS Glue, Apache Spark) and validate schema compatibility.
  • Ingest and explore data with Amazon SageMaker Data Wrangler and export prepared datasets for model training.
  • Ingest features into Amazon SageMaker Feature Store and explain how a feature store supports feature reuse and consistency.
  • Differentiate offline and online feature access patterns in SageMaker Feature Store and select the correct approach for training vs inference.
  • Troubleshoot data ingestion and storage issues related to capacity and scalability (for example, throughput bottlenecks, partition hot keys, I/O limits).
  • Implement dataset versioning and repeatable data extraction to support reproducibility across experiments and audits.
  • Choose compression, object sizing, and partitioning strategies to balance read efficiency and storage cost for large datasets.
  • Prepare datasets for the training environment by staging data to the appropriate storage resource and access method (for example, file-based access vs object-based access).
  • Validate ingestion completeness and correctness by using basic checks (record counts, schema checks, and integrity checks) before handing data to transformation or training steps.

Task 1.2 - Transform data and perform feature engineering

  • Apply common data cleaning techniques (outlier detection/treatment, missing value imputation, combining, deduplication) and explain how they affect model performance.
  • Select and apply feature scaling techniques (normalization, standardization) and identify when scaling is required by the chosen algorithm.
  • Perform feature transformations such as binning, log transforms, and feature splitting and recognize when they reduce skew or improve signal.
  • Choose categorical encoding techniques (one-hot, binary, label encoding) based on feature cardinality and model type.
  • Apply tokenization and basic text preprocessing as part of feature engineering for NLP workloads.
  • Use tools to explore, visualize, and transform data (for example, SageMaker Data Wrangler) and interpret profiling outputs to guide transformations.
  • Transform data at scale using AWS tools (for example, AWS Glue jobs, Spark on Amazon EMR) and choose an approach based on volume and operational constraints.
  • Use AWS Glue DataBrew for no-code profiling and transformations and identify when it is more efficient than custom ETL code.
  • Transform streaming data using appropriate services (for example, AWS Lambda or Spark-based streaming) and handle late/out-of-order events at a high level.
  • Create and manage features using SageMaker Feature Store to ensure consistency between training and inference.
  • Design feature definitions and transformations to avoid training/serving skew, including using the same transformation logic in training and inference paths.
  • Validate transformation outputs by checking schema, value ranges, and null thresholds and documenting transformation assumptions.
  • Choose data annotation and labeling approaches to create high-quality labeled datasets and identify quality controls (audit tasks, consensus labeling) at a high level.
  • Label and validate data using AWS services (for example, SageMaker Ground Truth and Amazon Mechanical Turk) and select the best option based on scale and sensitivity.
  • Version features and transformation code to enable reproducible training and consistent reprocessing when data changes.
  • Explain how feature engineering choices affect interpretability, bias, and downstream monitoring (for example, how encoded features influence fairness analysis).

Task 1.3 - Ensure data integrity and prepare data for modeling

  • Validate data quality using AWS tools (for example, AWS Glue DataBrew and AWS Glue Data Quality) and interpret quality checks to identify issues before training.
  • Define and interpret pre-training bias metrics for numeric, text, and image data (for example, class imbalance and difference in proportions of labels).
  • Identify sources of bias in data (for example, selection bias and measurement bias) and explain how these biases can propagate into models.
  • Select strategies to address class imbalance (for example, resampling and synthetic data generation) and evaluate potential trade-offs.
  • Prepare datasets to reduce prediction bias using dataset splitting and shuffling strategies, including stratification when appropriate.
  • Apply data augmentation techniques for numeric, text, and image data and recognize when augmentation can introduce unintended bias.
  • Use Amazon SageMaker Clarify to identify and mitigate sources of bias and document findings for governance.
  • Select techniques to encrypt data at rest and in transit and explain why encryption and key management are required for sensitive ML datasets.
  • Classify data and apply anonymization or masking techniques to reduce exposure of personally identifiable information (PII) or protected health information (PHI).
  • Evaluate compliance implications such as PII/PHI handling and data residency requirements and apply them to dataset storage and processing design.
  • Prevent data leakage by designing correct train/validation/test split boundaries and ensuring that transformations do not use future information.
  • Prepare data in the expected input format and packaging for training jobs (for example, sharded inputs, channel layout, and consistent schema).
  • Configure data to load into model training resources (for example, Amazon EFS or Amazon FSx) and explain why file-based storage can be useful for certain training workloads.
  • Establish dataset lineage and provenance (dataset versioning, transformation history) to support reproducibility and audits.
  • Implement integrity checks between pipeline stages (for example, checksums, idempotent reprocessing, schema validation gates).
  • Define governance requirements for data readiness (approvals, documentation) before data is used for training or fine-tuning.

Domain 2: ML Model Development (26%)

Practice this topic →

Task 2.1 - Choose a modeling approach

  • Assess available data, label quality, and problem complexity to determine whether an ML solution is feasible and how it should be scoped.
  • Translate business requirements into an ML problem type (classification, regression, clustering, NLP, or computer vision) and identify success metrics.
  • Compare common ML algorithm families and match them to use cases based on data type, interpretability, and operational constraints.
  • Consider interpretability and transparency requirements during model selection and choose approaches that meet stakeholder needs.
  • Choose between building a custom model and using AWS AI services based on customization needs, time-to-value, and operational overhead.
  • Map common business problems to AWS AI services (for example, Amazon Translate, Amazon Transcribe, Amazon Rekognition) when a managed service is the best fit.
  • Identify when a foundation model approach (for example, Amazon Bedrock) is appropriate for generative AI tasks and recognize common limitations.
  • Choose built-in algorithms, foundation models, and solution templates using options such as SageMaker JumpStart and Amazon Bedrock.
  • Identify SageMaker built-in algorithms and recognize scenarios where built-in algorithms reduce development time compared to custom training.
  • Evaluate trade-offs between model performance, training time, inference latency, and cost when selecting an approach.
  • Select a modeling approach that meets cost constraints by considering dataset size, training compute needs, and inference request volume.
  • Define a simple baseline solution and explain how baselines help determine whether a more complex model is justified.
  • Evaluate privacy and compliance constraints that affect model or service selection, including data residency and sensitive data handling.
  • Choose feature representations (for example, embeddings vs engineered features) as part of the modeling approach based on task type and constraints.
  • Identify when human review or human-in-the-loop workflows are appropriate due to model risk or decision impact.
  • Incorporate deployment requirements (real-time vs batch, model size, edge constraints) into model selection and solution design decisions.

Task 2.2 - Train and refine models

  • Explain core training loop concepts (epochs, steps, batch size) and relate them to convergence, training time, and model quality.
  • Choose methods to reduce training time (for example, early stopping and distributed training) while maintaining model quality.
  • Identify factors that influence model size and understand how model size impacts inference latency, cost, and deployability.
  • Select methods to improve model performance, including feature selection, data improvements, and algorithm/hyperparameter changes.
  • Apply regularization techniques (for example, dropout, weight decay, L1 and L2) and explain how they reduce overfitting.
  • Compare hyperparameter tuning techniques (random search and Bayesian optimization) and select an approach appropriate to the search space and budget.
  • Identify hyperparameters and their effects on model performance (for example, number of trees and depth in tree-based models, number of layers in neural networks).
  • Use SageMaker built-in algorithms and common ML libraries to develop models efficiently.
  • Use SageMaker script mode with supported frameworks (for example, TensorFlow and PyTorch) to run custom training code.
  • Fine-tune pre-trained models using custom datasets (for example, with Amazon Bedrock or SageMaker JumpStart) when customization is required.
  • Perform hyperparameter tuning using SageMaker automatic model tuning (AMT) and interpret tuning results to select candidate models.
  • Integrate models that were built outside SageMaker into SageMaker training and hosting workflows.
  • Prevent overfitting, underfitting, and catastrophic forgetting using appropriate techniques (regularization, feature selection, training strategies).
  • Combine multiple models to improve performance using ensemble methods (ensembling, stacking, boosting) and identify when ensembles increase operational complexity.
  • Reduce model size for deployment using techniques such as pruning, compression, feature selection changes, and data type changes, and evaluate accuracy trade-offs.
  • Manage model versions for repeatability and audits using tools such as the SageMaker Model Registry.

Task 2.3 - Analyze model performance

  • Select evaluation techniques and metrics appropriate to the task (for example, confusion matrix, F1, accuracy, precision, recall, RMSE, ROC, AUC) and explain why the metric choice matters.
  • Interpret confusion matrices, ROC curves, and related visualizations to assess classification performance and threshold trade-offs.
  • Create performance baselines and compare candidate models against baselines to evaluate improvement and regressions.
  • Identify model overfitting and underfitting using training/validation results and apply corrective actions.
  • Diagnose convergence issues during training and recognize symptoms such as unstable loss, divergence, and slow convergence.
  • Use SageMaker Model Debugger to debug model convergence issues and analyze training telemetry.
  • Detect model bias and evaluate fairness concerns by selecting and interpreting relevant metrics and slice-based analysis.
  • Use SageMaker Clarify metrics to gain insights into ML training data and models, including bias detection and explainability outputs.
  • Assess trade-offs between model performance, training time, and cost when selecting a model for production use.
  • Design reproducible experiments by tracking datasets, code, parameters, and outputs and using AWS services to support repeatability.
  • Compare the performance of a shadow variant to a production variant and decide when to promote, rollback, or iterate.
  • Use SageMaker Clarify outputs to interpret model predictions and communicate findings to stakeholders.
  • Run A/B testing for model variants in production and interpret results with appropriate statistical caution at a high level.
  • Define acceptance criteria for model promotion, including accuracy thresholds, bias checks, and operational constraints.
  • Perform error analysis by inspecting mispredictions and identifying systematic failure patterns across data segments.
  • Document evaluation results, limitations, and known risks to support governance, audits, and future improvements.

Domain 3: Deployment and Orchestration of ML Workflows (22%)

Practice this topic →

Task 3.1 - Select deployment infrastructure based on existing architecture and requirements

  • Select a model serving strategy (real time, serverless, asynchronous, batch inference) based on latency, throughput, and user experience requirements.
  • Differentiate model and endpoint requirements for deployment endpoints (serverless endpoints, real-time endpoints, asynchronous endpoints, batch inference) and choose the best fit for a scenario.
  • Apply deployment best practices such as versioning, staged rollouts, and rollback strategies to reduce risk when deploying models.
  • Choose AWS deployment services (for example, Amazon SageMaker) and identify when container platforms (Kubernetes, Amazon ECS, Amazon EKS) or AWS Lambda are more appropriate.
  • Provision compute resources for training and inference in production and test environments (CPU, GPU) and select environments to meet performance requirements.
  • Evaluate performance, cost, and latency trade-offs across model sizes, instance types, and endpoint configurations.
  • Select the appropriate compute environment for training and inference based on requirements (GPU vs CPU, processor family, networking bandwidth).
  • Choose appropriate containers for hosting (provided vs customized/BYOC) and identify when custom containers are required.
  • Select multi-model or multi-container deployments to optimize cost and manage multiple models or pre/post-processing components.
  • Choose a deployment target that fits existing architecture and requirements (for example, SageMaker endpoints, Kubernetes, Amazon ECS, Amazon EKS, AWS Lambda).
  • Select the correct deployment orchestrator (for example, Apache Airflow or SageMaker Pipelines) based on workflow complexity and team operational preferences.
  • Choose model deployment strategies (real-time vs batch) and understand implications for monitoring, scaling, and cost.
  • Design integration patterns for model inference APIs, including synchronous request/response vs asynchronous patterns and streaming responses where applicable.
  • Optimize models on edge devices using approaches such as SageMaker Neo at a high level and explain constraints that drive edge optimization.
  • Package and version model artifacts and dependencies for deployment and ensure repeatable builds across environments.
  • Plan network placement and data access for endpoints, including VPC integration requirements and secure access to downstream data sources.

Task 3.2 - Create and script infrastructure based on existing architecture and requirements

  • Differentiate on-demand and provisioned resources and choose an approach based on workload predictability and performance requirements.
  • Compare scaling policies (for example, target tracking, step scaling, scheduled scaling) and determine which policy best meets expected traffic patterns.
  • Configure SageMaker endpoint auto scaling policies to meet scalability requirements based on demand or time-based schedules.
  • Choose specific metrics for auto scaling (for example, model latency, CPU utilization, invocations per instance) and explain why metric selection affects stability and cost.
  • Explain containerization concepts and identify AWS container services used for ML deployments (Amazon ECR, Amazon ECS, Amazon EKS).
  • Build and maintain containers for ML workloads (including BYOC with SageMaker) and apply reproducible build and dependency management practices.
  • Select infrastructure as code (IaC) options (AWS CloudFormation vs AWS CDK) and explain trade-offs for maintainability and reuse.
  • Automate provisioning of compute resources and manage communication between stacks using CloudFormation or AWS CDK outputs and parameters.
  • Apply best practices to build scalable and cost-effective ML solutions (for example, endpoint auto scaling and cost-aware compute choices such as Spot where appropriate).
  • Deploy and host models using the SageMaker SDK and automate endpoint creation, update, and deletion workflows.
  • Configure SageMaker endpoints within a VPC network and explain how subnets, security groups, and routing affect connectivity.
  • Design for maintainability by separating environments (dev/test/prod) and parameterizing infrastructure for repeatable deployments.
  • Implement cost optimization patterns for inference infrastructure, including right-sizing, scaling to match demand, and choosing appropriate endpoint types.
  • Secure infrastructure provisioning with least-privilege roles, encryption defaults, and safe handling of secrets and artifacts.
  • Troubleshoot scaling and performance issues for deployed models, including misconfigured scaling policies, throttling, and service quota constraints.
  • Evaluate trade-offs between infrastructure simplicity and flexibility when choosing between managed endpoints and container orchestration platforms.

Task 3.3 - Use automated orchestration tools to set up continuous integration and continuous delivery (CI/CD) pipelines

  • Describe CI/CD principles and explain how ML workflows add additional versioned assets (data, features, models) compared to traditional application CI/CD.
  • Identify capabilities and quotas for AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy and design pipelines that operate within service limits.
  • Configure CodePipeline stages and artifacts for ML workflows (build, test, train, evaluate, register, deploy) and explain the purpose of each stage.
  • Configure and troubleshoot CodeBuild, CodeDeploy, and CodePipeline, including stage transitions and common failure modes.
  • Use version control systems (for example, Git) and select a branching strategy (Gitflow or GitHub Flow) aligned to deployment cadence and risk.
  • Explain how code repositories and pipelines work together to trigger builds, tests, and deployments.
  • Automate and integrate data ingestion with orchestration services as part of an end-to-end ML workflow.
  • Use deployment strategies and rollback actions (blue/green, canary, linear) to reduce risk when deploying new model versions.
  • Use AWS services to automate orchestration for model building and deployment, including coordinating processing, training, and evaluation steps.
  • Configure training and inference jobs using event-driven triggers (for example, Amazon EventBridge rules) and pipeline tools such as SageMaker Pipelines and CodePipeline.
  • Create automated tests in CI/CD pipelines (unit, integration, end-to-end) and explain how tests improve reliability of ML deployments.
  • Implement validation gates in pipelines (metric thresholds, bias checks, explainability requirements) before promoting a model to production.
  • Build and integrate mechanisms to retrain models in response to schedules, new data, or detected drift.
  • Manage versioning for datasets, code, and models to support reproducibility and safe rollback in production.
  • Design approval workflows and separation of duties in ML delivery pipelines to satisfy governance and compliance expectations.
  • Troubleshoot pipeline issues related to IAM permissions, artifact locations, environment configuration, and dependency management.

Domain 4: ML Solution Monitoring, Maintenance, and Security (24%)

Practice this topic →

Task 4.1 - Monitor model inference

  • Differentiate data drift and concept drift and explain why drift detection is required for production ML systems.
  • Select techniques to monitor data quality and model performance for inference workloads and identify which signals indicate degradation.
  • Set up model monitoring in production using SageMaker Model Monitor and establish baselines for comparison.
  • Monitor workflows to detect anomalies or errors in data processing or model inference and design alerting strategies.
  • Detect changes in the distribution of data that can affect model performance and identify how tooling such as SageMaker Clarify can assist.
  • Monitor model performance in production using A/B testing and interpret results to decide on promotion or rollback.
  • Compare shadow deployments and A/B testing and choose the appropriate strategy for safe evaluation in production.
  • Define monitoring thresholds, SLIs/SLOs, and triggers for automated actions such as rollback, alerting, or retraining.
  • Design dashboards and reporting for ML inference quality, latency, and error rates and communicate monitoring results to stakeholders.
  • Incorporate feedback loops to capture ground truth when available and use feedback for evaluation and retraining planning.
  • Plan operational processes for monitoring incidents, including triage, root cause analysis, and post-incident remediation.
  • Apply design principles from ML well-architected guidance that relate to monitoring, including automation, observability, and controlled change.
  • Balance observability needs with privacy constraints by limiting exposure of raw sensitive inputs and controlling access to logs.
  • Design retraining triggers based on monitoring signals, including drift, performance regression, and changes in business conditions.
  • Detect and respond to anomalous or abusive inference traffic patterns that may indicate misuse or security issues.
  • Maintain documentation of monitoring configurations, baselines, and changes to support audits and continuous improvement.

Task 4.2 - Monitor and optimize infrastructure and costs

  • Identify key performance metrics for ML infrastructure (utilization, throughput, availability, scalability, fault tolerance) and map them to operational goals.
  • Use Amazon CloudWatch metrics, logs, and alarms to troubleshoot latency and performance issues for ML endpoints and pipelines.
  • Use CloudWatch Logs Insights to analyze logs and identify bottlenecks and error patterns in ML applications.
  • Use observability tools (for example, AWS X-Ray and CloudWatch Lambda Insights) to trace requests and diagnose end-to-end latency.
  • Create AWS CloudTrail trails to log, monitor, and audit activities that affect ML systems, including retraining and deployment actions.
  • Build dashboards to monitor performance and cost metrics using tools such as CloudWatch dashboards and Amazon QuickSight.
  • Monitor infrastructure events using Amazon EventBridge events and integrate event-driven notifications or remediation workflows.
  • Differentiate instance families (general purpose, compute optimized, memory optimized, inference optimized) and select the right family for training and inference workloads.
  • Right-size inference infrastructure using tools such as SageMaker Inference Recommender and AWS Compute Optimizer.
  • Troubleshoot and resolve latency and scaling issues, including misconfigured auto scaling, throttling, and insufficient capacity.
  • Manage service quotas and capacity constraints and identify when to request quota increases for production workloads.
  • Implement cost tracking and allocation techniques (for example, resource tagging) to enable chargeback and visibility.
  • Use cost analysis tools (AWS Cost Explorer and AWS Billing and Cost Management) to analyze spend and identify optimization opportunities.
  • Set budgets and cost quotas using tools such as AWS Budgets and use alerts to prevent unexpected spend.
  • Use AWS Trusted Advisor to identify cost and performance recommendations and prioritize fixes by impact.
  • Optimize infrastructure costs using purchasing options (Spot Instances, On-Demand Instances, Reserved Instances, SageMaker Savings Plans) based on workload characteristics.

Task 4.3 - Secure AWS resources

  • Design IAM roles, policies, and groups to control access to AWS services used in ML systems and apply the principle of least privilege.
  • Use resource policies (for example, Amazon S3 bucket policies) to restrict access to datasets, model artifacts, and other ML assets.
  • Configure IAM policies and roles for users and applications that interact with ML systems, including separate roles for training, processing, and inference.
  • Secure SageMaker workloads using appropriate SageMaker security and compliance features and document controls for audits.
  • Configure least-privilege access to ML artifacts (datasets, feature store, model registry, endpoints) and periodically review permissions.
  • Control network access to ML resources using VPCs, subnets, security groups, and routing, and explain how isolation reduces blast radius.
  • Build VPC network designs that securely isolate ML systems and support private connectivity to dependent AWS services.
  • Secure data at rest and in transit for ML systems using encryption and key management practices (for example, using AWS KMS).
  • Apply security best practices for CI/CD pipelines, including least-privilege build roles, secure artifact handling, and safe secret management.
  • Monitor, audit, and log ML systems to ensure continued security and compliance, including capturing access and change events.
  • Design logging retention and access controls so that audit logs support investigations without exposing sensitive data.
  • Troubleshoot and debug security issues in ML systems, including AccessDenied errors, KMS permission issues, and network connectivity problems.
  • Implement separation of duties and environment separation (dev/test/prod) to reduce risk and support compliance requirements.
  • Secure and rotate secrets used by ML applications (for example, credentials and API keys) using appropriate AWS secret management services.
  • Establish incident response practices for ML systems, including containment actions, credential rotation, and post-incident review.
  • Apply governance practices such as tagging, policy enforcement, and periodic access reviews to maintain secure and compliant ML environments.

Tip: MLA-C01 is heavy on “best-fit” trade-offs. After each task, write 5–10 one-liner rules from your misses.