Use this syllabus as your source of truth for CCDAK. Work topic-by-topic, and drill questions after each section.
What’s covered
Topic 1: Kafka Core Concepts & Architecture
Practice this topic →
1.1 Topics, partitions, offsets, and ordering
- Explain how a Kafka topic is split into partitions and why partitions are the unit of parallelism.
- Define an offset and describe how offsets relate to a consumer’s progress within a partition.
- State Kafka’s ordering guarantee and identify when ordering is and is not preserved (per partition vs across partitions).
- Explain how record keys influence partition selection in the default partitioning strategy.
- Distinguish between topic-level ordering requirements and entity-level ordering requirements (per key/entity).
- Describe the purpose of record headers and give examples of developer use cases (tracing, routing hints, versioning).
- Given a scenario, choose a partitioning approach that balances ordering needs with throughput and consumer scaling.
1.2 Brokers, leaders, replicas, and durability
- Describe the roles of brokers, partitions, and replica sets in a Kafka cluster.
- Explain leader/follower replication at a high level and how leader failover affects producers and consumers.
- Define the in-sync replica set (ISR) conceptually and explain why it matters for durability and acknowledgements.
- Differentiate between availability and durability trade-offs in replication-related decisions.
- Explain what client bootstrap servers are used for and why clients should list multiple brokers.
- Describe how consumer groups enable horizontal scaling for reads while preserving per-partition ordering.
- Identify common failure scenarios (broker failure, network partition) and the developer-visible symptoms.
1.3 Storage model: retention, compaction, and delivery assumptions
- Differentiate between time/size-based retention and log compaction at a conceptual level.
- Explain what log compaction is used for (latest value per key) and when it is appropriate for event design.
- Recognize that Kafka is not a queue with destructive reads; records remain according to topic retention settings.
- Explain how consumer groups allow multiple independent applications to read the same topic with separate offsets.
- Describe the relationship between partitions, segment files, and sequential I/O as a performance intuition.
- Identify why increasing partitions can improve throughput but can complicate ordering and consumer scaling strategies.
- Given a scenario, choose retention/compaction intent for an event stream (audit log vs changelog).
Topic 2: Producer API & Record Writing
Practice this topic →
2.1 ProducerRecord, keying, and partitioning behavior
- Construct a ProducerRecord with topic, key, value, headers, and (when needed) an explicit partition.
- Explain how keys affect partition selection and why key stability matters for ordered processing.
- Differentiate between specifying a partition explicitly vs allowing the partitioner to choose automatically.
- Describe how changing the number of partitions affects key→partition mapping and ordering expectations.
- Explain how message timestamps are assigned (create time vs log append time) at a high level.
- Choose an appropriate key for common domain models (customer, order, device) to meet ordering requirements.
- Recognize situations where a custom partitioner may be justified (hot key mitigation, affinity routing).
2.2 Reliability: acknowledgements, retries, timeouts, idempotence
- Compare acknowledgement modes at a conceptual level and their trade-offs for durability and latency.
- Explain why retries can produce duplicates without additional safeguards and how idempotence mitigates this risk.
- Differentiate between retriable and non-retriable send failures at a high level (timeout vs serialization error).
- Describe how producer timeouts bound send attempts (delivery timeout vs request timeout) conceptually.
- Explain how max in-flight requests relates to ordering and retry safety in producer behavior.
- Design an error-handling strategy using callbacks/futures to surface and act on send failures.
- Given a scenario, choose a producer configuration that matches intent (throughput vs durability vs latency).
- Explain how batching improves throughput and why it increases latency (linger and batch sizing intuition).
- Describe the role of compression and common trade-offs (CPU vs network/IO).
- Explain how buffer exhaustion/backpressure can appear in producers and how to respond (throttle, tune, retry).
- Recognize how record size impacts throughput and why large messages can create operational risk.
- Describe how asynchronous send and callbacks enable high-throughput producers.
- Identify anti-patterns that reduce throughput (flush on every record, tiny batches, synchronous blocking sends).
- Given a scenario, choose tuning levers to increase throughput safely without breaking durability assumptions.
Topic 3: Consumer API & Consumer Groups
Practice this topic →
3.1 Poll loop fundamentals and record processing
- Describe the core consumer poll loop pattern and why polling must happen regularly.
- Implement basic consumption with subscribe, poll, process, and shutdown patterns.
- Differentiate between subscribe (group management) and assign (manual partition assignment) use cases.
- Recognize common consumer exceptions (deserialization errors, authorization failures) and how they surface.
- Explain how max poll records influences processing batches and throughput.
- Describe safe shutdown behavior (wakeup/close) and why abrupt termination can increase duplicates or lag.
- Given a scenario, choose consumer processing patterns that balance throughput with correctness.
3.2 Offset management: commits, seeking, and reset behavior
- Explain what it means to commit an offset and why committed offsets represent consumer progress.
- Differentiate auto-commit from manual commit strategies and the trade-offs in control vs simplicity.
- Implement at-least-once consumption by committing offsets only after successful processing.
- Identify how committing before processing results in at-most-once behavior and when that might be acceptable.
- Explain the purpose of auto offset reset policies and when earliest vs latest applies.
- Describe seeking behavior conceptually (resetting position) and common use cases (reprocessing, backfills).
- Given a scenario, choose an offset strategy that minimizes duplicates while preserving correctness.
3.3 Consumer group coordination and rebalancing
- Explain how consumer groups distribute partitions across consumers and why one partition maps to one consumer per group.
- Describe what triggers a rebalance and common developer-visible symptoms (revocation, duplicate processing).
- Implement safe rebalance handling using callbacks/listeners (commit on revoke, initialize on assign).
- Explain the difference between session timeouts and max poll interval at a high level and how each affects liveness.
- Describe static membership conceptually and why it can reduce unnecessary rebalances.
- Explain why long processing without polling can cause group instability and how to mitigate it (smaller batches, async processing).
- Given a scenario, diagnose frequent rebalances and pick the most likely configuration or code fix.
Topic 4: Delivery Semantics & Processing Patterns
Practice this topic →
4.1 At-most-once vs at-least-once vs effectively exactly-once
- Define at-most-once and at-least-once semantics in terms of commit timing and failure behavior.
- Explain why at-least-once can lead to duplicates and why idempotent processing is a common mitigation.
- Describe an idempotency key strategy for consumers (dedupe store, conditional upserts) at a high level.
- Differentiate duplicates from reordering and identify which Kafka guarantees help and which do not.
- Explain the role of producer idempotence in reducing duplicates caused by retries.
- Given a scenario, choose the correct semantic label based on when offsets are committed and when side effects occur.
- Select a processing approach that matches business risk (loss vs duplicates vs latency).
4.2 Transactions and read isolation (EOS building blocks)
- Explain the purpose of Kafka transactions at a high level (atomic multi-partition writes).
- Describe transactional IDs conceptually and why they are used to fence producers.
- Differentiate read committed vs read uncommitted consumer isolation at a high level.
- Describe how transactions can prevent downstream consumers from seeing aborted writes.
- Explain how exactly-once semantics typically combine idempotent producers, transactions, and careful offset handling.
- Identify scenarios where transactions are warranted vs unnecessary complexity (simple event emission vs pipelines).
- Given a scenario, choose settings or patterns that support EOS-style guarantees conceptually.
4.3 Error handling patterns: retries, DLQs, and poison messages
- Describe common retry patterns: immediate retry, delayed retry topic, and exponential backoff strategies.
- Explain a dead-letter topic (DLT/DLQ) pattern and what metadata should be preserved for debugging.
- Differentiate transient failures (timeouts) from permanent failures (bad schema) and route accordingly.
- Describe how to handle poison messages without stalling the entire consumer group (skip, quarantine, or isolate partition).
- Explain how partition-level ordering can be preserved while still supporting retries and DLQs.
- Identify how to avoid infinite retry loops and how to cap attempts with headers or metadata.
- Given a scenario, choose the safest failure-handling design that preserves correctness and operability.
Topic 5: Serialization, Schemas & Evolution (Schema Registry Awareness)
Practice this topic →
- Describe the role of serializers/deserializers and why mismatches fail at runtime.
- Compare JSON, Avro, and Protobuf at a high level (schema, size, tooling, evolution).
- Explain how schema-based formats enable safer evolution than ad-hoc JSON without contracts.
- Identify common causes of serialization failures (missing schema, incompatible changes, invalid payload).
- Explain why message size and format choices affect throughput and latency.
- Given a scenario, choose a serialization format that matches requirements (cross-language, strict schema, human readability).
- Describe how headers can support versioning or content-type hints without breaking payload contracts.
5.2 Schema Registry concepts: subjects and compatibility
- Explain what a schema registry provides conceptually (central schema storage and lookup by ID).
- Describe the idea of schema subjects and how subjects relate to topics (high level).
- Define backward, forward, and full compatibility in terms of who can read what.
- Explain why defaults and optional fields matter for backward-compatible evolution.
- Identify schema changes that are typically breaking (removing fields, changing types incompatibly).
- Given a scenario, select the compatibility approach that minimizes consumer breakage.
- Describe why versioning strategies should be designed before many producers/consumers exist.
5.3 Designing event contracts for real systems
- Design an event schema with stable identifiers, timestamps, and clear ownership fields.
- Explain why event schemas should be additive when possible (add fields rather than remove/rename).
- Describe a strategy for introducing new event versions without breaking existing consumers (dual publish, tolerant readers).
- Differentiate between event time and processing time and where timestamps should live in an event contract.
- Explain why consumers should be resilient to unknown fields and missing optional fields.
- Given a scenario, choose between schema evolution and topic versioning (new topic) approaches.
- Identify the risks of “schema drift” and how central schema governance reduces those risks.
Topic 6: Security & Governance (Developer Level)
Practice this topic →
6.1 TLS and secure connectivity (high level)
- Explain why TLS is used for encryption in transit between clients and brokers.
- Describe the difference between server authentication and mutual TLS (mTLS) conceptually.
- Identify common developer mistakes that cause TLS connection issues (wrong truststore, hostname mismatch).
- Describe how secure endpoints differ from plaintext endpoints and why bootstrap server configuration matters.
- Explain the purpose of certificate rotation and why long-lived hard-coded certs are risky.
- Given a scenario, choose a secure connectivity approach that matches compliance requirements (TLS vs plaintext).
- Recognize that security configuration and authorization failures are visible to clients as runtime errors.
6.2 Authentication vs authorization and ACL awareness
- Differentiate authentication (who you are) from authorization (what you can do) in Kafka access control.
- Describe SASL at a high level and recognize it as an authentication layer (mechanisms vary by environment).
- Explain what ACLs control conceptually (topic read/write, group access) and why least privilege matters.
- Identify common authorization failure symptoms for producers and consumers (write denied, group denied).
- Describe why consumers need both topic read permissions and group permissions to operate correctly.
- Given a scenario, choose the minimum required permissions for a producer-only or consumer-only application.
- Explain why embedding secrets in code is risky and identify safer patterns (env vars, secret managers).
6.3 Secure development practices for streaming apps
- Identify sensitive data risks in event payloads and select appropriate mitigation (tokenization, encryption, redaction).
- Describe the concept of PII classification and why it matters for event design and retention.
- Explain why auditability matters and how headers/metadata can help with traceability.
- Recognize that topic naming and partitioning choices can affect data isolation and access management.
- Explain how least privilege applies to client credentials and why shared credentials create blast radius.
- Given a scenario, choose where to place secrets (not in code, not in source control) and how to rotate them.
- Identify logging pitfalls (printing full payloads with PII) and safe logging patterns.
Topic 7: Ecosystem Awareness & Troubleshooting
Practice this topic →
- Describe Kafka Connect’s role in integrating external systems via source and sink connectors (conceptual).
- Differentiate a connector from a task and explain how tasks provide parallelism.
- Explain the role of converters and how they relate to serialization formats (JSON, Avro, Protobuf).
- Describe single message transforms (SMTs) at a high level and common uses (field rename, routing).
- Explain error handling concepts in Connect (tolerance, dead-letter topics) at a high level.
- Given a scenario, choose when Connect is a better fit than writing custom integration code.
- Recognize that Connect does not eliminate schema design; it enforces it via converters and configuration.
7.2 Kafka Streams awareness (topologies and stateful processing)
- Describe the purpose of Kafka Streams at a high level (stream processing using Kafka topics).
- Differentiate KStream vs KTable conceptually and identify common use cases (events vs changelogs).
- Explain windowing at a high level and why time semantics matter in stream processing.
- Describe state stores conceptually and why local state requires changelog topics for durability.
- Recognize how stream processing can leverage transactions/EOS for stronger guarantees (conceptual).
- Given a scenario, identify when a simple consumer/producer pipeline is sufficient vs when Streams fits better.
- Explain why repartitioning can occur in stream processing and how it relates to keys and parallelism.
7.3 Diagnosing lag, rebalances, and client-side bottlenecks
- Define consumer lag conceptually and list common causes (slow processing, insufficient partitions, downstream bottlenecks).
- Diagnose frequent consumer group rebalances and select likely root causes (timeouts, long processing, unstable membership).
- Identify producer-side bottlenecks (buffer exhaustion, request timeouts, large records) and mitigation strategies.
- Explain the trade-off between throughput and latency when tuning batching settings.
- Recognize symptoms of serialization issues and how to isolate whether the problem is data vs schema vs code.
- Describe the role of metrics and logs in diagnosing client behavior (request latency, error rates, poll cadence).
- Given a scenario, choose the most probable fix among configuration changes, code changes, and scaling decisions.
Tip: After finishing a topic, take a 15–25 question drill focused on that area, then revisit weak objectives before moving on.