MLA-C01 Syllabus — Objectives by Domain

Blueprint-aligned learning objectives for AWS Certified Machine Learning Engineer — Associate (MLA-C01), organized by domain with quick links to targeted practice.

Use this syllabus as your source of truth for MLA-C01. Work through each domain in order and drill targeted sets after every task.

What’s covered

Domain 1: Data Preparation for Machine Learning (ML) (28%)
Domain 2: ML Model Development (26%)
Domain 3: Deployment and Orchestration of ML Workflows (22%)
Domain 4: ML Solution Monitoring, Maintenance, and Security (24%)

Domain 1: Data Preparation for Machine Learning (ML) (28%)

Practice this topic →

Task 1.1 - Ingest and store data

Choose appropriate data formats for ML (for example, Apache Parquet, JSON, CSV, ORC, Avro, RecordIO) based on access patterns and downstream training requirements.
Differentiate validated and non-validated data ingestion formats and identify when schema enforcement is required.
Select core AWS data sources for ML workloads (for example, Amazon S3, Amazon EFS, Amazon FSx for NetApp ONTAP) and explain common trade-offs.
Select AWS streaming data sources to ingest data (for example, Amazon Kinesis, managed Apache Flink, Apache Kafka) based on throughput, ordering, and latency constraints.
Explain AWS storage options for ML datasets and artifacts and choose between them based on cost, performance, and data structure.
Extract data from storage systems (for example, Amazon S3, Amazon EBS, Amazon EFS, Amazon RDS, Amazon DynamoDB) and identify AWS options that improve transfer or I/O performance (for example, S3 Transfer Acceleration, EBS Provisioned IOPS).
Design a simple data landing strategy (raw → curated) and explain why consistent partitioning and file layout improves training performance.
Merge data from multiple sources using appropriate approaches (for example, programming techniques, AWS Glue, Apache Spark) and validate schema compatibility.
Ingest and explore data with Amazon SageMaker Data Wrangler and export prepared datasets for model training.
Ingest features into Amazon SageMaker Feature Store and explain how a feature store supports feature reuse and consistency.
Differentiate offline and online feature access patterns in SageMaker Feature Store and select the correct approach for training vs inference.
Troubleshoot data ingestion and storage issues related to capacity and scalability (for example, throughput bottlenecks, partition hot keys, I/O limits).
Implement dataset versioning and repeatable data extraction to support reproducibility across experiments and audits.
Choose compression, object sizing, and partitioning strategies to balance read efficiency and storage cost for large datasets.
Prepare datasets for the training environment by staging data to the appropriate storage resource and access method (for example, file-based access vs object-based access).
Validate ingestion completeness and correctness by using basic checks (record counts, schema checks, and integrity checks) before handing data to transformation or training steps.

Task 1.2 - Transform data and perform feature engineering

Apply common data cleaning techniques (outlier detection/treatment, missing value imputation, combining, deduplication) and explain how they affect model performance.
Select and apply feature scaling techniques (normalization, standardization) and identify when scaling is required by the chosen algorithm.
Perform feature transformations such as binning, log transforms, and feature splitting and recognize when they reduce skew or improve signal.
Choose categorical encoding techniques (one-hot, binary, label encoding) based on feature cardinality and model type.
Apply tokenization and basic text preprocessing as part of feature engineering for NLP workloads.
Use tools to explore, visualize, and transform data (for example, SageMaker Data Wrangler) and interpret profiling outputs to guide transformations.
Transform data at scale using AWS tools (for example, AWS Glue jobs, Spark on Amazon EMR) and choose an approach based on volume and operational constraints.
Use AWS Glue DataBrew for no-code profiling and transformations and identify when it is more efficient than custom ETL code.
Transform streaming data using appropriate services (for example, AWS Lambda or Spark-based streaming) and handle late/out-of-order events at a high level.
Create and manage features using SageMaker Feature Store to ensure consistency between training and inference.
Design feature definitions and transformations to avoid training/serving skew, including using the same transformation logic in training and inference paths.
Validate transformation outputs by checking schema, value ranges, and null thresholds and documenting transformation assumptions.
Choose data annotation and labeling approaches to create high-quality labeled datasets and identify quality controls (audit tasks, consensus labeling) at a high level.
Label and validate data using AWS services (for example, SageMaker Ground Truth and Amazon Mechanical Turk) and select the best option based on scale and sensitivity.
Version features and transformation code to enable reproducible training and consistent reprocessing when data changes.
Explain how feature engineering choices affect interpretability, bias, and downstream monitoring (for example, how encoded features influence fairness analysis).

Task 1.3 - Ensure data integrity and prepare data for modeling

Validate data quality using AWS tools (for example, AWS Glue DataBrew and AWS Glue Data Quality) and interpret quality checks to identify issues before training.
Define and interpret pre-training bias metrics for numeric, text, and image data (for example, class imbalance and difference in proportions of labels).
Identify sources of bias in data (for example, selection bias and measurement bias) and explain how these biases can propagate into models.
Select strategies to address class imbalance (for example, resampling and synthetic data generation) and evaluate potential trade-offs.
Prepare datasets to reduce prediction bias using dataset splitting and shuffling strategies, including stratification when appropriate.
Apply data augmentation techniques for numeric, text, and image data and recognize when augmentation can introduce unintended bias.
Use Amazon SageMaker Clarify to identify and mitigate sources of bias and document findings for governance.
Select techniques to encrypt data at rest and in transit and explain why encryption and key management are required for sensitive ML datasets.
Classify data and apply anonymization or masking techniques to reduce exposure of personally identifiable information (PII) or protected health information (PHI).
Evaluate compliance implications such as PII/PHI handling and data residency requirements and apply them to dataset storage and processing design.
Prevent data leakage by designing correct train/validation/test split boundaries and ensuring that transformations do not use future information.
Prepare data in the expected input format and packaging for training jobs (for example, sharded inputs, channel layout, and consistent schema).
Configure data to load into model training resources (for example, Amazon EFS or Amazon FSx) and explain why file-based storage can be useful for certain training workloads.
Establish dataset lineage and provenance (dataset versioning, transformation history) to support reproducibility and audits.
Implement integrity checks between pipeline stages (for example, checksums, idempotent reprocessing, schema validation gates).
Define governance requirements for data readiness (approvals, documentation) before data is used for training or fine-tuning.

Domain 2: ML Model Development (26%)

Practice this topic →

Task 2.1 - Choose a modeling approach

Assess available data, label quality, and problem complexity to determine whether an ML solution is feasible and how it should be scoped.
Translate business requirements into an ML problem type (classification, regression, clustering, NLP, or computer vision) and identify success metrics.
Compare common ML algorithm families and match them to use cases based on data type, interpretability, and operational constraints.
Consider interpretability and transparency requirements during model selection and choose approaches that meet stakeholder needs.
Choose between building a custom model and using AWS AI services based on customization needs, time-to-value, and operational overhead.
Map common business problems to AWS AI services (for example, Amazon Translate, Amazon Transcribe, Amazon Rekognition) when a managed service is the best fit.
Identify when a foundation model approach (for example, Amazon Bedrock) is appropriate for generative AI tasks and recognize common limitations.
Choose built-in algorithms, foundation models, and solution templates using options such as SageMaker JumpStart and Amazon Bedrock.
Identify SageMaker built-in algorithms and recognize scenarios where built-in algorithms reduce development time compared to custom training.
Evaluate trade-offs between model performance, training time, inference latency, and cost when selecting an approach.
Select a modeling approach that meets cost constraints by considering dataset size, training compute needs, and inference request volume.
Define a simple baseline solution and explain how baselines help determine whether a more complex model is justified.
Evaluate privacy and compliance constraints that affect model or service selection, including data residency and sensitive data handling.
Choose feature representations (for example, embeddings vs engineered features) as part of the modeling approach based on task type and constraints.
Identify when human review or human-in-the-loop workflows are appropriate due to model risk or decision impact.
Incorporate deployment requirements (real-time vs batch, model size, edge constraints) into model selection and solution design decisions.

Task 2.2 - Train and refine models

Explain core training loop concepts (epochs, steps, batch size) and relate them to convergence, training time, and model quality.
Choose methods to reduce training time (for example, early stopping and distributed training) while maintaining model quality.
Identify factors that influence model size and understand how model size impacts inference latency, cost, and deployability.
Select methods to improve model performance, including feature selection, data improvements, and algorithm/hyperparameter changes.
Apply regularization techniques (for example, dropout, weight decay, L1 and L2) and explain how they reduce overfitting.
Compare hyperparameter tuning techniques (random search and Bayesian optimization) and select an approach appropriate to the search space and budget.
Identify hyperparameters and their effects on model performance (for example, number of trees and depth in tree-based models, number of layers in neural networks).
Use SageMaker built-in algorithms and common ML libraries to develop models efficiently.
Use SageMaker script mode with supported frameworks (for example, TensorFlow and PyTorch) to run custom training code.
Fine-tune pre-trained models using custom datasets (for example, with Amazon Bedrock or SageMaker JumpStart) when customization is required.
Perform hyperparameter tuning using SageMaker automatic model tuning (AMT) and interpret tuning results to select candidate models.
Integrate models that were built outside SageMaker into SageMaker training and hosting workflows.
Prevent overfitting, underfitting, and catastrophic forgetting using appropriate techniques (regularization, feature selection, training strategies).
Combine multiple models to improve performance using ensemble methods (ensembling, stacking, boosting) and identify when ensembles increase operational complexity.
Reduce model size for deployment using techniques such as pruning, compression, feature selection changes, and data type changes, and evaluate accuracy trade-offs.
Manage model versions for repeatability and audits using tools such as the SageMaker Model Registry.

Task 2.3 - Analyze model performance

Select evaluation techniques and metrics appropriate to the task (for example, confusion matrix, F1, accuracy, precision, recall, RMSE, ROC, AUC) and explain why the metric choice matters.
Interpret confusion matrices, ROC curves, and related visualizations to assess classification performance and threshold trade-offs.
Create performance baselines and compare candidate models against baselines to evaluate improvement and regressions.
Identify model overfitting and underfitting using training/validation results and apply corrective actions.
Diagnose convergence issues during training and recognize symptoms such as unstable loss, divergence, and slow convergence.
Use SageMaker Model Debugger to debug model convergence issues and analyze training telemetry.
Detect model bias and evaluate fairness concerns by selecting and interpreting relevant metrics and slice-based analysis.
Use SageMaker Clarify metrics to gain insights into ML training data and models, including bias detection and explainability outputs.
Assess trade-offs between model performance, training time, and cost when selecting a model for production use.
Design reproducible experiments by tracking datasets, code, parameters, and outputs and using AWS services to support repeatability.
Compare the performance of a shadow variant to a production variant and decide when to promote, rollback, or iterate.
Use SageMaker Clarify outputs to interpret model predictions and communicate findings to stakeholders.
Run A/B testing for model variants in production and interpret results with appropriate statistical caution at a high level.
Define acceptance criteria for model promotion, including accuracy thresholds, bias checks, and operational constraints.
Perform error analysis by inspecting mispredictions and identifying systematic failure patterns across data segments.
Document evaluation results, limitations, and known risks to support governance, audits, and future improvements.

Domain 3: Deployment and Orchestration of ML Workflows (22%)

Practice this topic →

Task 3.1 - Select deployment infrastructure based on existing architecture and requirements

Select a model serving strategy (real time, serverless, asynchronous, batch inference) based on latency, throughput, and user experience requirements.
Differentiate model and endpoint requirements for deployment endpoints (serverless endpoints, real-time endpoints, asynchronous endpoints, batch inference) and choose the best fit for a scenario.
Apply deployment best practices such as versioning, staged rollouts, and rollback strategies to reduce risk when deploying models.
Choose AWS deployment services (for example, Amazon SageMaker) and identify when container platforms (Kubernetes, Amazon ECS, Amazon EKS) or AWS Lambda are more appropriate.
Provision compute resources for training and inference in production and test environments (CPU, GPU) and select environments to meet performance requirements.
Evaluate performance, cost, and latency trade-offs across model sizes, instance types, and endpoint configurations.
Select the appropriate compute environment for training and inference based on requirements (GPU vs CPU, processor family, networking bandwidth).
Choose appropriate containers for hosting (provided vs customized/BYOC) and identify when custom containers are required.
Select multi-model or multi-container deployments to optimize cost and manage multiple models or pre/post-processing components.
Choose a deployment target that fits existing architecture and requirements (for example, SageMaker endpoints, Kubernetes, Amazon ECS, Amazon EKS, AWS Lambda).
Select the correct deployment orchestrator (for example, Apache Airflow or SageMaker Pipelines) based on workflow complexity and team operational preferences.
Choose model deployment strategies (real-time vs batch) and understand implications for monitoring, scaling, and cost.
Design integration patterns for model inference APIs, including synchronous request/response vs asynchronous patterns and streaming responses where applicable.
Optimize models on edge devices using approaches such as SageMaker Neo at a high level and explain constraints that drive edge optimization.
Package and version model artifacts and dependencies for deployment and ensure repeatable builds across environments.
Plan network placement and data access for endpoints, including VPC integration requirements and secure access to downstream data sources.

Task 3.2 - Create and script infrastructure based on existing architecture and requirements

Differentiate on-demand and provisioned resources and choose an approach based on workload predictability and performance requirements.
Compare scaling policies (for example, target tracking, step scaling, scheduled scaling) and determine which policy best meets expected traffic patterns.
Configure SageMaker endpoint auto scaling policies to meet scalability requirements based on demand or time-based schedules.
Choose specific metrics for auto scaling (for example, model latency, CPU utilization, invocations per instance) and explain why metric selection affects stability and cost.
Explain containerization concepts and identify AWS container services used for ML deployments (Amazon ECR, Amazon ECS, Amazon EKS).
Build and maintain containers for ML workloads (including BYOC with SageMaker) and apply reproducible build and dependency management practices.
Select infrastructure as code (IaC) options (AWS CloudFormation vs AWS CDK) and explain trade-offs for maintainability and reuse.
Automate provisioning of compute resources and manage communication between stacks using CloudFormation or AWS CDK outputs and parameters.
Apply best practices to build scalable and cost-effective ML solutions (for example, endpoint auto scaling and cost-aware compute choices such as Spot where appropriate).
Deploy and host models using the SageMaker SDK and automate endpoint creation, update, and deletion workflows.
Configure SageMaker endpoints within a VPC network and explain how subnets, security groups, and routing affect connectivity.
Design for maintainability by separating environments (dev/test/prod) and parameterizing infrastructure for repeatable deployments.
Implement cost optimization patterns for inference infrastructure, including right-sizing, scaling to match demand, and choosing appropriate endpoint types.
Secure infrastructure provisioning with least-privilege roles, encryption defaults, and safe handling of secrets and artifacts.
Troubleshoot scaling and performance issues for deployed models, including misconfigured scaling policies, throttling, and service quota constraints.
Evaluate trade-offs between infrastructure simplicity and flexibility when choosing between managed endpoints and container orchestration platforms.

Task 3.3 - Use automated orchestration tools to set up continuous integration and continuous delivery (CI/CD) pipelines

Describe CI/CD principles and explain how ML workflows add additional versioned assets (data, features, models) compared to traditional application CI/CD.
Identify capabilities and quotas for AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy and design pipelines that operate within service limits.
Configure CodePipeline stages and artifacts for ML workflows (build, test, train, evaluate, register, deploy) and explain the purpose of each stage.
Configure and troubleshoot CodeBuild, CodeDeploy, and CodePipeline, including stage transitions and common failure modes.
Use version control systems (for example, Git) and select a branching strategy (Gitflow or GitHub Flow) aligned to deployment cadence and risk.
Explain how code repositories and pipelines work together to trigger builds, tests, and deployments.
Automate and integrate data ingestion with orchestration services as part of an end-to-end ML workflow.
Use deployment strategies and rollback actions (blue/green, canary, linear) to reduce risk when deploying new model versions.
Use AWS services to automate orchestration for model building and deployment, including coordinating processing, training, and evaluation steps.
Configure training and inference jobs using event-driven triggers (for example, Amazon EventBridge rules) and pipeline tools such as SageMaker Pipelines and CodePipeline.
Create automated tests in CI/CD pipelines (unit, integration, end-to-end) and explain how tests improve reliability of ML deployments.
Implement validation gates in pipelines (metric thresholds, bias checks, explainability requirements) before promoting a model to production.
Build and integrate mechanisms to retrain models in response to schedules, new data, or detected drift.
Manage versioning for datasets, code, and models to support reproducibility and safe rollback in production.
Design approval workflows and separation of duties in ML delivery pipelines to satisfy governance and compliance expectations.
Troubleshoot pipeline issues related to IAM permissions, artifact locations, environment configuration, and dependency management.

Domain 4: ML Solution Monitoring, Maintenance, and Security (24%)

Practice this topic →

Task 4.1 - Monitor model inference

Differentiate data drift and concept drift and explain why drift detection is required for production ML systems.
Select techniques to monitor data quality and model performance for inference workloads and identify which signals indicate degradation.
Set up model monitoring in production using SageMaker Model Monitor and establish baselines for comparison.
Monitor workflows to detect anomalies or errors in data processing or model inference and design alerting strategies.
Detect changes in the distribution of data that can affect model performance and identify how tooling such as SageMaker Clarify can assist.
Monitor model performance in production using A/B testing and interpret results to decide on promotion or rollback.
Compare shadow deployments and A/B testing and choose the appropriate strategy for safe evaluation in production.
Define monitoring thresholds, SLIs/SLOs, and triggers for automated actions such as rollback, alerting, or retraining.
Design dashboards and reporting for ML inference quality, latency, and error rates and communicate monitoring results to stakeholders.
Incorporate feedback loops to capture ground truth when available and use feedback for evaluation and retraining planning.
Plan operational processes for monitoring incidents, including triage, root cause analysis, and post-incident remediation.
Apply design principles from ML well-architected guidance that relate to monitoring, including automation, observability, and controlled change.
Balance observability needs with privacy constraints by limiting exposure of raw sensitive inputs and controlling access to logs.
Design retraining triggers based on monitoring signals, including drift, performance regression, and changes in business conditions.
Detect and respond to anomalous or abusive inference traffic patterns that may indicate misuse or security issues.
Maintain documentation of monitoring configurations, baselines, and changes to support audits and continuous improvement.

Task 4.2 - Monitor and optimize infrastructure and costs

Identify key performance metrics for ML infrastructure (utilization, throughput, availability, scalability, fault tolerance) and map them to operational goals.
Use Amazon CloudWatch metrics, logs, and alarms to troubleshoot latency and performance issues for ML endpoints and pipelines.
Use CloudWatch Logs Insights to analyze logs and identify bottlenecks and error patterns in ML applications.
Use observability tools (for example, AWS X-Ray and CloudWatch Lambda Insights) to trace requests and diagnose end-to-end latency.
Create AWS CloudTrail trails to log, monitor, and audit activities that affect ML systems, including retraining and deployment actions.
Build dashboards to monitor performance and cost metrics using tools such as CloudWatch dashboards and Amazon QuickSight.
Monitor infrastructure events using Amazon EventBridge events and integrate event-driven notifications or remediation workflows.
Differentiate instance families (general purpose, compute optimized, memory optimized, inference optimized) and select the right family for training and inference workloads.
Right-size inference infrastructure using tools such as SageMaker Inference Recommender and AWS Compute Optimizer.
Troubleshoot and resolve latency and scaling issues, including misconfigured auto scaling, throttling, and insufficient capacity.
Manage service quotas and capacity constraints and identify when to request quota increases for production workloads.
Implement cost tracking and allocation techniques (for example, resource tagging) to enable chargeback and visibility.
Use cost analysis tools (AWS Cost Explorer and AWS Billing and Cost Management) to analyze spend and identify optimization opportunities.
Set budgets and cost quotas using tools such as AWS Budgets and use alerts to prevent unexpected spend.
Use AWS Trusted Advisor to identify cost and performance recommendations and prioritize fixes by impact.
Optimize infrastructure costs using purchasing options (Spot Instances, On-Demand Instances, Reserved Instances, SageMaker Savings Plans) based on workload characteristics.

Task 4.3 - Secure AWS resources

Design IAM roles, policies, and groups to control access to AWS services used in ML systems and apply the principle of least privilege.
Use resource policies (for example, Amazon S3 bucket policies) to restrict access to datasets, model artifacts, and other ML assets.
Configure IAM policies and roles for users and applications that interact with ML systems, including separate roles for training, processing, and inference.
Secure SageMaker workloads using appropriate SageMaker security and compliance features and document controls for audits.
Configure least-privilege access to ML artifacts (datasets, feature store, model registry, endpoints) and periodically review permissions.
Control network access to ML resources using VPCs, subnets, security groups, and routing, and explain how isolation reduces blast radius.
Build VPC network designs that securely isolate ML systems and support private connectivity to dependent AWS services.
Secure data at rest and in transit for ML systems using encryption and key management practices (for example, using AWS KMS).
Apply security best practices for CI/CD pipelines, including least-privilege build roles, secure artifact handling, and safe secret management.
Monitor, audit, and log ML systems to ensure continued security and compliance, including capturing access and change events.
Design logging retention and access controls so that audit logs support investigations without exposing sensitive data.
Troubleshoot and debug security issues in ML systems, including AccessDenied errors, KMS permission issues, and network connectivity problems.
Implement separation of duties and environment separation (dev/test/prod) to reduce risk and support compliance requirements.
Secure and rotate secrets used by ML applications (for example, credentials and API keys) using appropriate AWS secret management services.
Establish incident response practices for ML systems, including containment actions, credential rotation, and post-incident review.
Apply governance practices such as tagging, policy enforcement, and periodic access reviews to maintain secure and compliant ML environments.

Tip: MLA-C01 is heavy on “best-fit” trade-offs. After each task, write 5–10 one-liner rules from your misses.

Study Plan

Cheat Sheet

Browse Exams — Mock Exams & Practice Tests

MLA-C01 Syllabus — Objectives by Domain

What’s covered

Domain 1: Data Preparation for Machine Learning (ML) (28%)

Task 1.1 - Ingest and store data

Task 1.2 - Transform data and perform feature engineering

Task 1.3 - Ensure data integrity and prepare data for modeling

Domain 2: ML Model Development (26%)

Task 2.1 - Choose a modeling approach

Task 2.2 - Train and refine models

Task 2.3 - Analyze model performance

Domain 3: Deployment and Orchestration of ML Workflows (22%)

Task 3.1 - Select deployment infrastructure based on existing architecture and requirements

Task 3.2 - Create and script infrastructure based on existing architecture and requirements

Task 3.3 - Use automated orchestration tools to set up continuous integration and continuous delivery (CI/CD) pipelines

Domain 4: ML Solution Monitoring, Maintenance, and Security (24%)

Task 4.1 - Monitor model inference

Task 4.2 - Monitor and optimize infrastructure and costs

Task 4.3 - Secure AWS resources