🌚

Hey Folks!

Running Production Workloads on Google Kubernetes Engine(GKE)

Every day, hour, minute, and second, software engineers build and deploy different kinds of applications to Kubernetes. This article’s focus is to assist engineers in following best practices when running apps in a Production environment on GKE.

Before we dive right in, what the heck is GKE?

GKE stands for Google Kubernetes Engine. It is a service offered by Google that ensures you can run containerized workloads at scale on the Cloud. It can deploy, manage and scale containerized applications, powered by Kubernetes. Now, we dive!

Scaling

This is an essential component in any highly available system. It entails removing and adding services based on demand. GKE ships with autoscaling features called:

  • Vertical Pod Scaling: This means adding/removing compute resources like RAM and CPU for your containers.
  • Horizontal Pod Scaling: This implies an increasing and decreasing number of pods in your cluster.
  • Cluster Autoscaling: This increases or decreases the number of Nodes (virtual machines) based on demand.

Best Practices

  • Use Cluster Autoscaling where it makes sense. If you are going to deploy heavy workloads, it is recommended you configure this.
  • Use Horizontal Pod Autoscaling when deploying stateless applications. For example, if you have a node.js app running with high traffic.
  • Use Vertical Pod Autoscaling when dealing with databases. If you’re running a DBs like Redis or any stateful application, you are better off configuring this.

Health

This is another essential component for running production workloads. Even as human beings, it’s vital we take care of our health, so why not our application?! There are two health checks on Kubernetes, namely:

  • Readiness Probe: This checks to make sure your container is ready to receive traffic. Some apps take time to start, and as such, you need to way to see when a pod is ready for use.
  • Liveness Probe: This check help to restart a pod when there is a deadlock. Apps like Nodejs are single-threaded as such can crash an entire app if exceptions are not handled properly.

Best Practices

  • Ensure your Readiness and Liveness Probe is properly configured to avoid disruptions.

Security

This happens to be the most underrated components when running a production workload. Kubernetes is not SECURED BY DEFAULT, but there are components built-in for configuring how secure you want it to be.

Best Practices

  • Ensure you don’t store application secrets on Kubernetes. Kubernetes has a data store called ETCD that stores passwords in plain text. Kubernetes does not encrypt secrets but encodes them. Use tools like Google Secret Manager, Google KMS, and HashiCorp Vault when dealing with secrets.
  • Ensure you mount secrets as Volumes and not environmental variables.
  • Enable Pod Polices.
  • Don’t allow containers to run as root
  • Disable privileged containers
  • Prevent escalating privileges
  • Enable and configure Network policies. This creates some level of security isolation for your applications running in the cluster
  • Ensure you scan all images before deploying them on Kubernetes such as Clair, Trivy, Anchor, and many more.

Reliability

This has to do with keeping an app or service up and running without any form of interruption or downtime.

Best Practices

  • Run more than one replica of your app for high availability. One thing I love about GKE is how to spread this replica to each node (virtual machine).
  • Configure Pod Disruption Budgets where it makes sense.

Chaos Engineering

Chaos engineering in cloud computing has to do with experimenting on a system to build resilience to withstand unexpected conditions in production environments. Running production Workload is not easy. It is almost impossible to predict the workloads or traffic of any system to avoid an outage. Some tools that can assist with these include:

  • Chaos Monkey: This is a tool used for terminating nodes inside a production environment. It was developed by Netflix. It gives engineers insights to build a more resilient system.
  • Litmus - This another great tool that brings chaos into an environment, thereby helping SREs to find weaknesses in their system.
  • Kube-bench - This tool scans your GKE cluster and ensures you follow best practices.

Labeling and Tagging

This is another crucial best practice. It gives a better understanding of the description of a pod and service deployed on the Kubernetes Cluster.

Best Practices

  • Label resources in your deployments like Pods, Services, Ingress, etc.

Conclusion

As Kubernetes grows, there will be more issues and how to improve on those issues. I generally advise that if you set up a GKE cluster, let the minimum in the number of nodes be three(3), and it should be deployed regionally for HA.

I hope this article gives you a better insight how on how to build confidence in your system. Kindly check out the CIS Kubernetes Benchmark to get more knowledge about securing your Kubernetes Cluster. Happy Building!

— Feb 12, 2021