Use this syllabus as your source of truth for SAP-C02. Work through each domain in order and drill targeted sets after every task.
What’s covered
Domain 1: Design Solutions for Organizational Complexity (26%)
Practice this topic →
Task 1.1 - Architect network connectivity strategies
- Evaluate VPC design options (CIDR planning, subnetting, IPAM) to support multi-account and multi-Region connectivity with minimal future rework.
- Select the appropriate inter-VPC connectivity pattern (Transit Gateway, VPC peering, Cloud WAN, PrivateLink) based on scale, routing requirements, and isolation.
- Design hybrid connectivity using Direct Connect, Site-to-Site VPN, and Transit Gateway to meet resiliency, bandwidth, and latency requirements.
- Design DNS resolution across on-premises and AWS using Route 53 Resolver endpoints, private hosted zones, and conditional forwarding.
- Implement network segmentation and traffic control using separate VPCs/accounts, security groups, NACLs, and routing to enforce blast-radius boundaries.
- Design centralized ingress and egress architectures with traffic inspection using Network Firewall, Gateway Load Balancer, or third-party appliances.
- Use VPC endpoints (Gateway and Interface) and PrivateLink to keep service access private and reduce data exfiltration risk.
- Design multi-Region connectivity and routing (TGW inter-Region peering, Cloud WAN, Route 53 policies) for global applications and failover.
- Choose appropriate load balancing and edge routing (ALB/NLB/GWLB, CloudFront, Global Accelerator) for performance and availability.
- Define network observability and troubleshooting strategies using VPC Flow Logs, Traffic Mirroring, CloudWatch, and centralized log analytics.
Task 1.2 - Prescribe security controls
- Design identity and access management for multi-account environments using AWS Organizations, IAM Identity Center, and least-privilege role assumptions.
- Implement permission guardrails using SCPs, permission boundaries, session policies, and ABAC (tags) to prevent privilege escalation.
- Design data protection controls including encryption at rest and in transit, KMS key policies, key rotation, and secrets management.
- Select authentication and federation approaches (SAML/OIDC, cross-account roles, external IdP) for workforce and workload identities.
- Design network security controls including segmentation, security groups/NACLs, WAF, Shield, and threat protection for internet-facing workloads.
- Implement logging and audit controls using CloudTrail, CloudTrail Lake, AWS Config, and centralized log accounts to meet compliance requirements.
- Design detective controls using GuardDuty, Security Hub, Macie, and Inspector with automated triage and response.
- Define vulnerability and patch management processes using Systems Manager, ECR image scanning, and standardized AMI pipelines.
- Implement secure software supply chain controls (artifact signing, least-privilege pipelines, secret scanning) for CI/CD.
- Design incident response readiness including playbooks, isolation mechanisms, key compromise response, and forensics workflows.
Task 1.3 - Design reliable and resilient architectures
- Design highly available multi-tier architectures across multiple AZs with stateless tiers, health checks, and automated scaling.
- Select data durability and replication options (S3, EBS snapshots, RDS/Aurora, DynamoDB) to meet RPO/RTO requirements.
- Design multi-Region architectures using active-passive or active-active patterns and appropriate routing (Route 53, Global Accelerator).
- Implement fault isolation and blast-radius reduction using multi-account segmentation, cell-based architectures, and shuffle sharding where applicable.
- Apply resiliency patterns such as retries with backoff, circuit breakers, idempotency, and dead-letter queues for distributed systems.
- Design caching strategies (CloudFront, ElastiCache, DAX) to improve resilience and reduce backend load during traffic spikes.
- Plan for service limits and zonal/regional failures using quotas management, fallback strategies, and dependency mapping.
- Define backup, restore, and disaster recovery testing procedures (game days) to validate recoverability.
- Design for graceful degradation with feature flags, read-only modes, and prioritized recovery of critical user paths.
- Evaluate and document reliability trade-offs using the Well-Architected Reliability pillar and resilience scoring.
Task 1.4 - Design a multi-account AWS environment
- Design an AWS Organizations structure with OUs and accounts aligned to security boundaries, environments (prod/non-prod), and teams.
- Implement landing zone foundations using Control Tower or custom approaches, including account vending, baseline networking, and mandatory guardrails.
- Centralize security services (CloudTrail, Config, GuardDuty, Security Hub) in dedicated accounts with cross-account aggregation.
- Design shared services and connectivity using hub-and-spoke networking (Transit Gateway/Cloud WAN) and AWS RAM for resource sharing.
- Implement cross-account access patterns (IAM roles, Identity Center, delegated administrators) for operations and security teams.
- Define governance and policy enforcement using SCPs, tag policies, backup policies, and service quotas at the organization level.
- Design centralized identity, logging, and monitoring strategies that scale across regions and large numbers of accounts.
- Plan account lifecycle processes (provisioning, drift detection, decommissioning) with IaC and automation.
- Establish data and key management boundaries across accounts (KMS key strategy, key sharing, multi-Region keys).
- Implement cost management in multi-account setups (consolidated billing, chargeback, budgets) tied to organizational structure.
Task 1.5 - Determine cost optimization and visibility strategies
- Build a cost visibility model using consolidated billing, cost allocation tags, and cost categories for showback/chargeback.
- Configure the Cost and Usage Report (CUR) and integrate it with Athena/QuickSight for granular analysis and reporting.
- Use AWS Budgets, Budget Actions, and anomaly detection to enforce spend controls and proactive alerts.
- Evaluate compute purchase options (Savings Plans, Reserved Instances, Spot) and define governance processes for commitments.
- Identify major AWS cost drivers (data transfer, NAT Gateway, storage requests, logging) and plan mitigations early in architecture.
- Design storage cost controls using lifecycle policies, tiering, archival (Glacier), and intelligent access patterns.
- Implement rightsizing and scaling strategies using Compute Optimizer, Auto Scaling, and serverless concurrency tuning.
- Define cost guardrails in CI/CD and IaC (policy-as-code, cost estimation, tagging enforcement).
- Optimize multi-Region and hybrid architectures for cost (traffic routing, replication frequency, Direct Connect vs VPN).
- Establish FinOps operating rhythms with dashboards, periodic reviews, and a continuous optimization backlog.
Domain 2: Design for New Solutions (29%)
Practice this topic →
Task 2.1 - Design a deployment strategy to meet business requirements
- Design CI/CD pipelines that support multi-account and multi-Region deployments with separation of duties and least privilege.
- Select deployment patterns (blue/green, canary, rolling, immutable) based on risk tolerance, rollback needs, and statefulness.
- Implement infrastructure as code strategies (CloudFormation/CDK/Terraform) with environment parity, drift detection, and automated promotion.
- Design artifact management and versioning (ECR, S3, CodeArtifact) with provenance, signing, and rollback support.
- Plan database and schema change strategies (expand/contract, online migrations) that minimize downtime and avoid data loss.
- Automate testing stages (unit, integration, load, security) and gate releases using metrics and automated approvals.
- Implement progressive delivery using feature flags, traffic shifting, and automated rollback on SLO breaches.
- Design configuration and secrets distribution (SSM Parameter Store, Secrets Manager) with rotation and environment isolation.
- Ensure deployment compliance and auditability using change management records, CloudTrail, and pipeline logs.
- Coordinate deployments for microservices and event-driven systems using contract testing, backward-compatible changes, and idempotent consumers.
Task 2.2 - Design a solution to ensure business continuity
- Map business requirements to DR targets (RTO/RPO) and select the appropriate DR strategy (backup/restore, pilot light, warm standby, multi-site active-active).
- Design cross-region data protection using S3 CRR, DynamoDB global tables, Aurora Global Database, and multi-Region KMS keys.
- Implement centralized backup policies using AWS Backup with cross-account and cross-region vaults, lifecycle management, and Vault Lock immutability.
- Design DNS and traffic failover using Route 53 routing policies, health checks, and Global Accelerator for fast failover.
- Plan compute recovery strategies using golden AMIs, IaC re-provisioning, Auto Scaling, and AWS Elastic Disaster Recovery (DRS).
- Ensure continuity for identity and access (IdP/IAM Identity Center availability, break-glass accounts, emergency roles).
- Architect resilient messaging and integration with multi-AZ and multi-Region patterns and event replay mechanisms.
- Design observability for DR readiness (replication lag, backup success, health probes) with automated alerting.
- Perform DR testing and game days with documented runbooks, validation steps, and postmortem improvements.
- Account for regional service dependencies and design fallback approaches when a managed service is unavailable in a target Region.
Task 2.3 - Determine security controls based on requirements
- Translate compliance requirements (PCI, HIPAA, SOC, data residency) into AWS controls using Organizations guardrails, Config rules, and evidence collection.
- Design encryption and key management strategies that meet regulatory requirements (customer-managed keys, CloudHSM, BYOK, rotation).
- Implement least-privilege access for humans and workloads using roles, federation, MFA, and short-lived credentials.
- Design data access controls and sharing patterns using resource-based policies, Lake Formation, and cross-account KMS key sharing.
- Implement network protections for private-only access, egress control, and inspection using endpoints, firewalls, and segmentation.
- Define logging, retention, and monitoring requirements using CloudTrail Lake, centralized S3, and immutable storage.
- Design secure multi-tenant architectures with isolation strategies (separate accounts/VPCs, IAM conditions, encryption context).
- Select detection and response controls aligned to required MTTR and severity handling (GuardDuty, Detective, Security Hub).
- Implement secrets and certificate management that meets rotation and lifecycle requirements (Secrets Manager, ACM, private CA).
- Define incident response and forensics processes that meet legal/compliance requirements (log integrity, chain of custody, snapshotting).
Task 2.4 - Design a strategy to meet reliability requirements
- Design workloads that eliminate single points of failure using multi-AZ deployments, health checks, and self-healing mechanisms.
- Select AWS service choices for high availability based on failure modes and operational needs (Aurora Multi-AZ, DynamoDB, SQS).
- Design scaling strategies (horizontal vs vertical, predictive scaling) to handle demand spikes while maintaining SLOs.
- Implement resilient asynchronous patterns (queues, streams, retries, DLQs) to decouple components and tolerate downstream failures.
- Plan maintenance and release strategies that preserve availability during updates (rolling, blue/green, zero-downtime).
- Reduce dependency risk and blast radius with partitioning (cells), multi-account segmentation, and regional isolation.
- Manage quotas and capacity planning using Service Quotas, alarms, and pre-warming where needed.
- Implement multi-Region failover and failback procedures and define data consistency strategies.
- Design graceful degradation and prioritization (circuit breakers, fallback content, reduced features) under stress.
- Validate reliability with Well-Architected reviews, resilience testing, and continuous improvement loops.
- Identify and optimize performance bottlenecks across compute, network, storage, and database layers using metrics and tracing.
- Choose compute options (EC2 instance families, Graviton, ECS/EKS, Lambda) aligned to latency, throughput, and concurrency goals.
- Design low-latency networking using placement groups, enhanced networking, and appropriate load balancers (ALB vs NLB).
- Select storage solutions and configurations (EBS types, EFS throughput modes, FSx) to meet IOPS and throughput requirements.
- Design database performance strategies (indexes, partition keys, read replicas, caching) for RDS/Aurora/DynamoDB.
- Use caching and edge delivery (CloudFront, ElastiCache, DAX) to reduce origin load and improve global response times.
- Architect streaming designs (Kinesis, MSK) to handle high-volume events with predictable latency and backpressure control.
- Design performance testing and capacity validation using load tests, canaries, and synthetic monitoring.
- Optimize API performance and scalability with throttling, pagination, request shaping, and idempotency.
- Balance performance and cost by selecting appropriate scaling policies and managed services.
Task 2.6 - Determine a cost optimization strategy to meet solution goals and objectives
- Choose cost-effective architectures by comparing managed services vs self-managed deployments, factoring labor and operational overhead.
- Select compute purchasing and scaling strategies (Savings Plans, Reserved Instances, Spot, Auto Scaling) aligned to workload predictability.
- Optimize storage costs by selecting appropriate classes, lifecycle policies, compression, and data retention requirements.
- Minimize data transfer and network costs using regional placement, VPC endpoints, caching, and traffic engineering.
- Design cost-aware multi-Region strategies (active-passive vs active-active) with explicit trade-offs in latency and disaster recovery.
- Implement cost monitoring and guardrails (Budgets, CUR, anomaly detection) tied to the solution’s KPIs.
- Architect serverless cost controls (Lambda tuning, provisioned concurrency, throttling) to avoid unexpected spend.
- Plan licensing considerations and choose offerings that reduce TCO (for example, Aurora vs commercial database licensing).
- Apply tagging and cost allocation strategies in IaC to ensure accurate chargeback and reporting from day one.
- Run continuous cost optimization reviews using Trusted Advisor, Compute Optimizer, and Well-Architected Cost pillar findings.
Domain 3: Continuous Improvement for Existing Solutions (25%)
Practice this topic →
Task 3.1 - Determine a strategy to improve overall operational excellence
- Assess operational maturity using Well-Architected reviews and define an improvement backlog prioritized by risk and customer impact.
- Standardize infrastructure provisioning and change workflows with IaC, CI/CD, and automated approvals to reduce human error.
- Design centralized observability (metrics, logs, traces) across accounts using CloudWatch, OpenTelemetry, and log aggregation.
- Implement automated remediation and runbooks using Systems Manager Automation, EventBridge, and Lambda.
- Improve incident response with clear escalation paths, on-call processes, postmortems, and automation for common failure modes.
- Enhance configuration management and drift detection using AWS Config, CloudFormation drift detection, and policy-as-code.
- Build operational dashboards and SLOs/SLIs to measure system health and drive alerting and prioritization.
- Implement patching and fleet management at scale with Systems Manager, golden AMIs, and immutable deployments.
- Optimize release and change management with progressive delivery, environment parity, and automated rollbacks.
- Establish multi-account operational governance (delegated admin, shared tooling, standardized baselines) for consistent operations.
Task 3.2 - Determine a strategy to improve security
- Identify and remediate identity risks (over-privileged roles, unused access keys) using IAM Access Analyzer and least-privilege reviews.
- Strengthen organization-wide preventive controls using SCPs, permission boundaries, and Control Tower guardrails.
- Enhance detective controls with centralized CloudTrail/Config, GuardDuty, Security Hub, and automated workflows.
- Improve data protection by standardizing encryption, key rotation, secret rotation, and certificate lifecycle management.
- Reduce attack surface by implementing private access patterns, egress control, and centralized network inspection.
- Automate compliance using Config conformance packs, Security Hub standards, and continuous evidence collection.
- Improve vulnerability management (Inspector, ECR scanning, patch compliance) and define remediation SLAs.
- Enhance incident response readiness (forensics-ready logging, isolation runbooks, break-glass roles, cross-account containment).
- Integrate security into CI/CD with scanning, policy checks, and least-privilege pipelines (DevSecOps).
- Measure security posture improvements using KPIs (MTTD/MTTR, control coverage) and recurring audits.
- Use metrics and distributed tracing (CloudWatch, X-Ray/OpenTelemetry) to pinpoint latency contributors and hot paths.
- Optimize compute performance with right instance families, Auto Scaling tuning, and container/Lambda configuration.
- Improve database performance with indexing, caching, read scaling, and partition key redesign where necessary.
- Reduce network latency and improve throughput with edge services, route optimization, and load balancer selection.
- Apply caching strategies (CloudFront, ElastiCache, DAX) and cache invalidation policies to boost responsiveness.
- Improve storage performance by selecting correct EBS/EFS/FSx configurations and throughput/IOPS sizing.
- Implement performance regression testing in CI/CD and define performance budgets and alerts.
- Optimize asynchronous processing and backpressure to maintain stable latency under load.
- Evaluate managed service alternatives (Aurora Serverless, DynamoDB, SQS) when they improve performance and scalability.
- Validate improvements with controlled experiments, load tests, and measurable SLO impact.
Task 3.4 - Determine a strategy to improve reliability
- Perform failure mode analysis to identify single points of failure and prioritize fixes that reduce customer-facing downtime.
- Increase availability by adopting multi-AZ patterns, stateless services, and automated recovery mechanisms.
- Improve disaster recovery by tightening RTO/RPO, adding cross-region replication, and automating failover/failback.
- Enhance resiliency of integrations with retries, DLQs, idempotency, and circuit breakers.
- Reduce blast radius with cell-based architectures, account/VPC isolation, throttling, and limiters.
- Improve observability for reliability (health indicators, saturation metrics) and tune alert thresholds to reduce noise.
- Implement chaos engineering or game days to validate assumptions and improve runbooks.
- Manage quotas and dependencies proactively with monitoring, pre-approvals, and fallback designs.
- Increase deployment reliability with progressive delivery and safer change windows.
- Track reliability KPIs (availability, error budgets) and run continuous improvement cycles.
Task 3.5 - Identify opportunities for cost optimizations
- Analyze Cost and Usage Report (CUR) data to identify top spend areas, trends, and anomalous spikes.
- Right-size compute and storage using Compute Optimizer recommendations and utilization metrics.
- Identify underutilized and idle resources (EBS volumes, snapshots, NAT gateways, load balancers) and create cleanup automation.
- Optimize storage and backup retention policies to reduce long-term costs while meeting compliance requirements.
- Evaluate commitment strategies (Savings Plans/RIs) and monitor coverage and utilization over time.
- Reduce data transfer costs by redesigning architectures (regional placement, caching, endpoints) and monitoring cross-AZ traffic.
- Implement cost allocation improvements (tagging, cost categories) and enforce tagging compliance.
- Optimize logging and observability costs by tuning retention, sampling, and log levels.
- Review licensing and managed service alternatives to reduce total cost of ownership.
- Establish a repeatable FinOps backlog and continuous tracking of realized savings.
Domain 4: Accelerate Workload Migration and Modernization (20%)
Practice this topic →
Task 4.1 - Select existing workloads and processes for potential migration
- Perform application portfolio discovery and dependency mapping to understand migration scope and sequencing.
- Classify workloads by business criticality, compliance requirements, and data sensitivity to prioritize migration candidates.
- Assess readiness using the 6Rs (rehost, replatform, refactor, repurchase, retire, retain) and define target outcomes.
- Evaluate technical constraints (latency, licensing, OS/database constraints) that impact migration feasibility.
- Identify shared services and cross-cutting concerns (identity, DNS, logging, networking) required before production migrations.
- Build a migration business case that includes TCO, risk, timelines, and organizational change impacts.
- Determine data migration complexity (volume, change rate, downtime tolerance) and select appropriate transfer methods.
- Plan organizational processes for migration waves, change management, and stakeholder communication.
- Identify operational prerequisites (monitoring, incident response, backup) required before moving production workloads.
- Define success metrics and validation criteria for each migrated workload.
Task 4.2 - Determine the optimal migration approach for existing workloads
- Choose the appropriate migration strategy per workload (rehost, replatform, refactor, repurchase) based on objectives and constraints.
- Design wave-based migration plans using Migration Hub, including dependency-aware sequencing and rollback options.
- Select tooling for server migration and cutover (Application Migration Service, VM import/export) with minimal downtime.
- Design database migration approaches using DMS, native replication, and Schema Conversion Tool (SCT) as needed.
- Plan large-scale data transfer using Snowball/Snowmobile, Transfer Family, or Direct Connect based on volume and timelines.
- Implement landing zone and account/VPC patterns to support migrations securely (shared services, connectivity, guardrails).
- Design hybrid coexistence during migration, including identity, DNS, and network routing between on-premises and AWS.
- Define testing, validation, and cutover runbooks for each migration wave, including performance and security checks.
- Manage data consistency and cutover (dual writes, CDC, freeze windows) to meet downtime tolerance.
- Establish post-migration stabilization processes and handoff to operations, including monitoring and cost baselines.
Task 4.3 - Determine a new architecture for existing workloads
- Redesign monoliths into modular architectures with clear domain boundaries and APIs (microservices or modular monolith).
- Modernize compute by moving to containers (ECS/EKS) or serverless (Lambda) based on workload characteristics.
- Replace self-managed databases with managed services (Aurora, DynamoDB, ElastiCache) to improve scalability and reduce operations.
- Implement event-driven architectures using EventBridge, SNS/SQS, or Kinesis to decouple services and improve resilience.
- Design multi-account and network segmentation for the modernized architecture, including shared services and secure connectivity.
- Apply security-by-design in the target architecture: least privilege IAM, encryption, secrets, and centralized logging as defaults.
- Design observability for the new architecture with structured logging, distributed tracing, and operational dashboards.
- Implement CI/CD and IaC for the new architecture with safe deployment patterns and automated testing.
- Plan data modernization and analytics architectures (data lake, governance, integration) using appropriate AWS services.
- Balance modernization with risk by selecting incremental patterns (strangler fig, parallel run, feature flags).
Task 4.4 - Determine opportunities for modernization and enhancements
- Identify opportunities to adopt managed services and reduce undifferentiated heavy lifting (managed databases, serverless, managed messaging).
- Improve security posture during modernization by adding guardrails, posture management, and automated remediation.
- Enhance reliability and disaster recovery by adopting multi-AZ/multi-Region patterns and automating recovery procedures.
- Improve performance with caching, edge delivery, and right-sized compute as part of modernization initiatives.
- Optimize costs by adopting consumption-based services and eliminating idle capacity with Auto Scaling and serverless.
- Implement platform engineering foundations (golden paths, service catalog, reusable IaC modules) to accelerate delivery.
- Improve developer productivity with standardized CI/CD, ephemeral environments, and automated quality gates.
- Modernize data pipelines using managed analytics services and event streaming for near-real-time insights.
- Upgrade monitoring and incident management with unified observability and automated runbooks.
- Establish a continuous modernization roadmap aligned to business goals and Well-Architected pillars.
Tip: After each task, do a 15–25 question drill focused on that task’s themes (networking, org design, DR, security controls, migration), then revisit weak objectives before moving on.