How to become a Certified Kubernetes Applicaton Developer

This guide is pretty much a summary of my study tips for passing the Certified Kubernetes Application Developer (CKAD) exam by Linux Foundation.

However I’m not the typical student material as I followed some universally considered unorthodox methods, therefore I’ll give you my study tips as well as tips for making your daily work with Kubernetes a bit more pleasant.

The exam

The CKAD exam focuses on the following areas:

(in detail here)

Application Design and Build 20%
Application Environment, Configuration and Security 25%
Application Deployment 20%
Services and Networking 20%
Application Observability and Maintenance 15%

The whole exam is performed with a proctored session by PSI (at the time of writing), using the PSI Bridge system. Essentially once you launch the exam, you will be prompted with a download of a secure browser, which will connect you with a proctor, perform a system analysis and provide you with a virtual desktop to perform the necessary tasks. The number of tasks varies between 15 and 20 if I am not mistaken and you’re given 120mins to complete them.

Moving on, I will go through the curriculum just in a different order and I’ll mention a couple of out-of-scope-for-CKAD resources which are definitely useful for your day to day work.

I will not go through the basics and will assume you know about what Kubernetes is as well as how important is kubectl for managing one or multiple Kubernetes clusters/workloads.

My study process

In all honesty, I had forgotten I had purchased the exam (which you have 1y to schedule it, including your free retake from the time of purchase) only to remember it last week by an email reminder by the Linux Foundation. I had ocassionally spent time throughout the year to collect some information on the exam and studied by solving some practice tasks, but I wasn’t stressed about it anyway since I kind of live and breath on Kubernetes these days.

Aaand a week before the exam I enrolled in ACloudGuru, which I’ve used in the past for other certifications (AWS Solutions Architect Associate in precise) and I was pleased with how they organize and present the content and their labs. However, the labs were “easy” as I’d describe them, and they focus more on giving you some hands-on experience with Kubernetes but do not even scratch anything below the surface.

The most helpful preparatory step was nailing the killer.sh simulator exam. This has harder than the real exam task scenarios, therefore if you go through the simulator at least once and be mindful of the time (remember the 120min exam duration), you won’t really have any issues during the real exam.

Unpopular opinion: many people nag that the 120mins is the hardest part of the exam. Well, I finished the real exam 15-20mins earlier and had all tasks completed, so I’d say don’t focus on the timer, just solve as many scenarios as possible and remember to be elaborate (eg. validate your resources), as you should do also at your job.

How to manage Kubernetes clusters and workloads

Shortcuts and cheatsheets

A lot of people go overboard here. I strongly recommend just these three but feel free to expand this list depending on what makes you feel more comfortable:

alias k=kubectl
export now="--grace-period 0 --force"
export do="--dry-run=client --restart=Never -o yaml"

Also, the official Kubernete Docs is one of the allowed URLs during the exam, and they include a golden cheat sheet. Make sure to familiarize yourself with the docs, as during the exam there is no time to learn new things, just enough tieme to copy resource definitions. So make sure you are well versed with finding the content you need as quickly as possible.

Debugging and inspecting

Thank God, Kubernetes makes it ridiculously easy to spin up a temporary pod in order to eg. curl an IP/endpoint and validate your app running at service.ns:80 or ensure NetworkPolicies are working and whatnot.

k run test --image=busybox --restart=Never --rm -it -- echo "hello!"

Namespaces

Namespaces allow you to create logical cluster separations. Some people even use them as virtual clusters, eg. for different environments (production, dev staging, etc).

Another use case is to break down system components, eg. one namespace holds the frontend components, another the backend and another one some database applications or shared infrastructure.

Finally, a larger org might opt for offering each domain delivery team with a namespace to organize their systems and services, while a central DevOps team is maintaining shared infrastructure in a universally accessible namespace.

My tip is to always start your kubectl commands with the namespace, so that you can ensure you don’t forget it and easily navigate to your last command and edit it accordingly without minding the namespace at the end. Example: k -n my-namespace get deployment.

# Create a namespace
k create ns my-namespace

# Get all namespaces
k get ns

# Delete a namespace
k delete ns my-namespace

# Force delete a namespace
k delete ns my-namespace --force

Labels

Labels are a core concept in Kubernetes. They are not only used for filtering (for example when listing pods with kubectl get pod -l foo=bar), but they also play an important role in selector-based use-cases. For example, this is how a Service picks the Pod objects it directs traffic to.

# Add a label to a pod
k label pod nginx app=nginx

# Remove a label from a pod
k label pod nginx app-

# Get pods including their `app` label
k get pod -L app

# See app logs from all pods with label app=nginx
k logs -f -l app=nginx

# Get nodes with their labels
k get nodes --show-labels

Similarly you can define labels in the metadata of a resource:

metadata:
  labels:
    foo: bar
    ...

Annotations

Annotations are very much like labels, but they serve a different purpose. For example, we don’t select pods by annotations. However they provide information on pods once they’re processed by an external system.

metadata:
	annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"

eg. the above example lets Prometheus know that this pod can be scrapped and defines the port for scrapping.

Some quick examples when dealing with annotations:

# Add a label to a node
k annotate node node1 class=high

# Remove a label from a node
k annotate node node1 class-

Similarly you can define labels in the metadata of a resource:

metadata:
  annotations:
    registry: https://my-registry.example.com
    ...

Pods

As you have understood by now, a Pod is the most essential unit in the Kubernetes world. It is a logical wrapping around containers. Most people think that 1 container equals 1 Pod. However, this is just the most common used pattern for pods. Keep in mind that multiple container pods are a thing (and CKAD exam material).

# Get all pods in namespace
k get pods -n my-namespace
# or even if you need to save a cople of taps
k get po -n my-namespace 

# Quickly running a Pod resource
k -n my-namespace run nginx --image=nginx:stable --port=80 --restart=Never

# Get a Pod definition quickly so that you can edit it without minding the boring initial details
k -n my-namespace run nginx --image=nginx:stable --port=80 --restart=Never --dry-run=client -o yaml > pod.yaml
vim pod.yaml  # to edit the definition, eg. add resources details
k apply -f pod.yaml # to create the pod using the definition created
...

# Force delete a Pod
k -n my-namespace delete pod nginx --force --grace-period 0
# or for those who prefer to follow my tips
k -n my-namespace delete pod nginx $now

Init containers

A special kind of ephemeral containers, which are not long-lived inside the pod’s lifecycle. Instead, they are taking care of setting up the pod, eg. copying some files, applying database migrations, etc.

pod.yaml

....

initContainers:
  - name: init
    image: busybox
    command: cp /tmp/index.html /var/www/index.html

...

Multi-container pod patterns

A pod can consist of multiple containers. The pod is a logical wrapping around all of them. Containers within the same pod share:

resources
network
storage

There are 3 multiple container pod patterns you need to know about:

Sidecars: an additional container runs alongside the main container, and performs a task secondary to the main container. eg. facilitate details regarging the service discovery.
Adapters: in this pattern the additional(s) container(s) transform the output of the main container. eg. add a missing timestamp in main container logs.
Ambassaddor: the additional container is acting as a proxy to the main container.

Probes

A magnificent and simple way to define rules that will determine the health of your pod.

Probes can be:

httpGet, which as the name suggests is an HTTP GET request to a predefined path and port combo, expecting a 2xx response to consider it a success
exec which executes a command
tcp socket probe, which tries to establish a TCP connection at the specified port

Readiness

A container might run but still be unable to receive and service traffic. Imagine a scenario where database migrations which might take a while, need to execute at service startup (although this is a questionable practice). This is where this probe kicks in, as it will not let the pod receive traffic before this succeeds.

readinessProbe:
  httpGet:
    path: /healthz  # the path to send the HTTP GET request
    port: 80 # the port to be used in the HTTP GET request
  periodSeconds: 30 # how frequently the requests need to be sent for this readiness check
  initialDelaySeconds: 45 # start probing after 45 seconds at pod startup

Liveness

This probe is responsible for determining whether the pod is considered alive or not. eg. if the pod hangs because of application related reasons and is considered lost, then the liveness probe will fail. This will result in Kubernetes killing the pod and replacing it with a new one.

livenessProbe:
  exec:
    command:
      - cat
      - /my-path/app-healthy
  periodSeconds: 15

Startup

This is a relatively new probe type that got introduced first as beta in v1.18. Sometimes, you have to deal with legacy applications that might require an additional startup time on their first initialization. In such cases, it can be tricky to set up liveness probe parameters without compromising the fast response to deadlocks that motivated such a probe. (ref1)

startupProbe:
  httpGet:
    path: /healthz
    port: liveness-port
  failureThreshold: 30
  periodSeconds: 10

ConfigMaps and Secrets

Splitting the applications configuration from the codebase is a good practice and a very practical thing to do, too. This is why we have ConfigMaps! A key-value kind of configuration-storing object that can be then loaded into a pod’s environment. Secrets are very similar, just for sensitive values, eg. database passwords.

Tip: Secrets’ values are base64-encoded, but not encrypted (at least not by default)!

ConfigMaps and Secrets can be created from litterals or from files.

# Create a ConfigMap from a literal
k -n my-namespace create configmap my-cm --from-literal=db_username=my-db-username

# or passing the values from a file
k -n my-namespace create configmap another-cm --from-file=my-config-file


# Create a Secret from a literal
k -n my-namespace create secret my-secret --from-literal=db_pass=my-password

They can be mounted on a Pod like this:

...
 env:
 - name: MY_USERNAME
        valueFrom:
          configMapKeyRef:
            name: my-cm           
            key: db_username  # this will load the specific key's value from the configmap/secret
      - name: MY_PASS
        valueFrom:
          secretKeyRef:
            name: my-secret
            key: db-pass
...

PersistentVolumes and PersistentVolumeClaims

Taking a step back and getting a bit out of CKAD territory: Kubernetes lets you define your own way to provision persistent storage. This might be a virtual hard-drive or a network file system of some kind in a cloud provider. You can define multiple ways as well, and admins can separate the storage they offer to “classes”. This is named StorageClass.

PersistentVolumes or PV rely on StorageClasses, and they are essentially instances of “virtual hard drives” inside the cluster based on a specific storage class. A PersistentVolumeClaim or PVC is exactly that: a claim on this volume. PVs have their own lifecycle, and a Pod can request to use a volume through a PVC.

Example:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: foo-pv
spec:
  storageClassName: "high-io"
  claimRef:
    name: foo-pvc
    namespace: foo
--
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: foo-pvc
  namespace: foo
spec:
  storageClassName: "high-io"
  volumeName: foo-pv

In the Pod spec you can then define the volumes to mount later on inside the container(s):

volumes:
  - name: test-pv
    persistentVolumeClaim:
      claimName: foo-pvc

Resources

Resources as you might imagine define how many cpu and memory resources a container gets. This is really useful as the kubernetes scheduler will take this info into consideration when scheduling that pod.

Resources are divided in two types, requests and limits. Requests define the estimated resources the container will need, and this is actually the information that will be guaranteed to exist on the node the Kubernetes scheduler will place this pod in. Limits on the other hand define the maximum resources the application should consume - anything over this will cause a pod restart. Also, they are not guaranteed to exist on the node.

pod.yaml

spec:
  ...
  resources:
    requests: 
      memory: 128Mi
      cpu: 50m
    limits: 256Mi
      cpu: 200m

Limit resources need to be at least equal with the request resources. Failure to adhere to this rule will trigger a validation error.

ResourceQuotas and LimitRanges

Developers go overboard with resources sometimes.

Cluster admins might need to limit eg. the number of resources of a specific kind in a cluster. Or limit the resources used.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: test-quota
spec:
  hard:
    configmaps: "2"

Or quickly create it using the command line:

k create quota test-quota --hard="configMaps=2"

For resource constraints however, ResourceQuota is not the only object kind you can find. However in most courses out there it’s the only one mentioned with LimitRanges entirely neglected.

A LimitRange is a constraint that can enforce min/max resources usage on a per namespace or per pod basis.

apiVersion: v1
kind: LimitRange
metadata:
  name: resource-limits
spec:
  limits:
    - type: Container
      max:
        cpu: "4"
        memory: "2Gi"
      min:
        cpu: "80m"
        memory: "40Mi"
      default:
        cpu: "30m"
        memory: "128Mi"
      defaultRequest:
        cpu: "30m"
        memory: "128Mi"

Some useful commands:

# Get resource quotas
k get resourcequota -n my-namespace

# Get limit ranges
k get limitrange -n my-namespace

# Descirbe a limit range so that you can get useful insights in terms of how to size your pod resources
k -n my-namespace describe limitrange my-limitrange

Deployments

It’s not practical to roll out pods one by one, and also we need a reliable way to ensure new pods start up once our old pods die. Welcome Deployment resources! A Deployment holds a template of how pods managed by it should look like and makes sure a ReplicaSet resource spins up the number of pods needed while also observing their health. If a pod dies, a new one will take its place. Once a new rollout starts, the Deployment follows a specific rollout strategy as defined in the spec.

# Get all deployments in a namespace
k get deploy -n my-namespace

# Create an nginx deployment with 3 replicas
k -n my-namespace create deployment nginx --image=nginx --replicas 3 

# Describe a deployment
k -n my-namespace describe deploy nginx

# You can also edit a deployment manifest once the resource has been created
k -n my-namespace edit deploy nginx

Deployment rollout options

Blue/Green: in this case a separate new deployment (green), identical with the old one (blue) is deployed using the newest image tag. Once verified as stable and ready, then the service traffic is redirected from blue to green.
Canary: the new deployment is rolled out simultaneously with the old one. The Service resource directs traffic to both of them simultaneously, in different percentages.

An exam note on canary deployments: you can mimic the % of a deployment by the number of pods in a deployment. eg. if there needs to be 10 pods in total among two deployments, and 30% is the task’s canary share, then rolling our a new deployment with 3 pods is considered as a valid solution.

Services

One of the ways to expose your application to the world.

A Service resource selects (by matching labels) the pods to direct traffic to and provide with additional config eg. for port redirection.

apiVersion: v1
kind: Service
metadata:
  name: test-service
spec:
  selector:
    id: my-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 9376

You can also easily expose a deployment or a pod like that:

# Expose a deployment - this will create a new service for this deployment
k expose deploy my-app --port=80

# Expose a deployment and give the service a specific name
k expose deploy my-app --port=80 --name my-svc-name

Types of Services

ClusterIP: this is the default service type. It exposes the service internally in the Kubernetes cluster.
NodePort: this service type opens up in all cluster nodes the port specified here and the service receives and serves traffic through that.
LoadBalancer: the service is exposed externally by using a cloud-provider’s Load Balancer service, eg. AWS ALB.

Ingresses

Not part of the CKAD curriculum.

Services are not the only way to advertise your services to the world. Ingresses provide you with a public endpoint and they are tightly connected with services.

However, as an entry point to the cluster, they need to use Ingress Controllers, with the most popular ones gathered in this community-maintained spreadsheet.

NetworkPolicies

By default, all ingress (incoming) and egress (outcoming) traffic is enabled to all pods. Except if there is a NetworkPolicy which selects the pod and applies a blocking rule to it.

Pods that are selected by a NetworkPolicy are named isolated, while every other pod is considered non isolated.

With NetworkPolicies you can define various rules which can be as sophisticated as needed for your cluster, in order to achieve your pod connectivity and security requirements.

A very common use case is to allow an api to receive incoming traffic only from the frontend pods.

Example networkpolicy.yaml:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-api-policy
spec:
  podSelector:
    matchLabels:
      app: api # this NP selects `api` pods
  policyTypes:
    - Ingress
  ingress: 
  - from:
    - podSelector: 
        matchLabels: 
          app: frontend # only pods with the `app=frontend` label can send requests to the API pods labeled with `app=api`

SecurityContext

By defining a SecurityContext you can enable security features on a pod or container level. Specifying a security context object on container level overrides the pod level security context.

spec:
  securityContext:
    runAsUser: 2000 # every container will run as this user
  containers:
  - name: my-container
     image: alpine
     securityContext:
     	runAsUser: 1000 # overrides the pod-level SecurityContext
      allowPrivilegeEscalation: false

ServiceAccounts

Each namespace has a default service account, although it’s not a good practice to let all pods use it. Service accounts make it possible to connect to the API server and interact with it. For example list pods in a given namespace to perform some metrics task.

k create serviceaccount my-sa -n my-ns

You can attach a service account to a pod:

spec:
  serviceAccount: my-sa       # prefer this field
  serviceAccountName: my-sa

Jobs and CronJobs

Jobs are processes or tasks that run to completion. Each execution results in a pod spinned up and running the commands defined. Pods coming from Jobs will not be restarted. Unless stated otherwise, the pod will run only once - if no completions are specifically set.

# Create a job with a busybox image echoing "hello!"
k -n my-ns create job test-job --image=busybox -- echo "hello!"

# or export to a yaml file for further editing
k -n my-ns create job test-job --image=busybox -o yaml > job.yaml

CronJobs are tasks that are on specific time intervals periodically or at specific times coming up. The Kubernetes scheduler will try to do a good job and spin the pods up very close to the specified time, however it might fail. This is controlled by startingDeadlineSeconds.

Eg. this cronjob would print "Hello world!" every minute:

k -n my-ns create cronjob test-cronjob --image=busybox --schedule="*/1 * * * *" -- echo "Hello world!"

Concepts not covered

Not implying that this guide is fully complete but at least made a brief comment for every concept that should be included in CKAD’s curriculum.

However, some concepts were left totally out of this article - and they are out of the curriculum as well:

Affinity
DaemonSets
StatefulSets
Taints and Tolerations

Feel free to contact me and let me know you’d like me to update this article to include these!

References

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/