Organizations across the globe are embracing Kubernetes and harnessing its power to deliver cloud-native applications on top cloud providers like Amazon Web Services (AWS), Azure, and Google Cloud Platform. However, Kubernetes' default security limitations mean safeguarding your cluster and data should be a top priority.
Securing Kubernetes can be a challenge, particularly since clusters are a common source of misconfigurations in the cloud, can be shared between multiple teams, and the learning curve for Kubernetes is steep. But there is help available. Last August, the National Security Agency (NSA) and the Cybersecurity and Infrastructure Security Agency (CISA) released an updated version of the Kubernetes Hardening Guide.
The guide helps organizations solve those key misconfigurations and provides guidance to enterprises on solving key misconfigurations such as:
- Access controls. The most secure access follows the zero trust model, i.e., only give what is absolutely necessary. However, in Kubernetes, many users can end up getting cluster role binding privileges, which gives them access to the entire cluster when they don’t necessarily need or shouldn’t have that - especially in shared multi-tenant cluster use cases.
- Network Security. Since all pods can talk to each other by default, this increases the lateral attack surface inside the cluster. The NSA guide gives great insight into how to solve this using Kubernetes network policies.
- Container runtime issues. Examples of this include overprivileged containers and containers with mutable file systems, which can lead to vulnerabilities and can lead to breakout.
While the insights the guide provides are a fantastic first step, there are some key challenges we saw in the market when it came to using the recommendations:
- Operationalizing the Insights and Misconfiguration Failures: While the guide was a great start, it was hard for customers to keep up with an entire single PDF guide and operationalize it as part of their day to day processes without codifying the rules. Non-compliant infrastructure should be visible in a single pane of glass, and any failures should be easily assignable to DevOps and platform teams responsible for the infrastructure.
- Enabling continuous posture checks on existing and new infrastructure: Every time a new cluster is deployed or a new namespace is provisioned, the same NSA checks should be done on new infrastructure without manual intervention. Moreover, without a system in place, it is hard for customers to do daily checks, leading them to do it only on a quarterly basis, which can lead to significant blind spots, especially since assets are ephemeral. For example, a container can be spun up and down after doing something non-compliant in minutes. Therefore, continuous checks and alerting are key.
- Remediation info and real-time evidence: Sometimes, while it can be great to have the information as a CISO, in order to make sure your dev and platform teams understand the risk of the issue, real-time evidence is needed. If remediation steps are provided, that leads to less friction in fixing the issue as the devs have a path forward on what to do.
While these three challenges can seem daunting, the Uptycs platform streamlines visibility into key compliance risks and provides real-time evidence and remediation steps that are continuous, actionable, and insightful. Let’s see an example of this in action!
Step 1: Assessing the State of the System
When you land on the compliance dashboard, you get a single pane of glass view into the different standards being run across your cluster fleet. Any cluster you enroll, whether EKS, GKE, AKS, or even ones you manage yourself, can undergo the same compliance scanning process. Compliance scanning is done every 6 hours and is configurable. This gives the user a good understanding that their assets are being continuously monitored for compliance issues and an overall sense of the state of their system at any present moment.
In the example below, we see that roughly 40% of our assets are compliant when it comes to NSA Hardening, and 60% are non-compliant. We can see that 63 rules passed while 101 failed, and below, we get a breakdown across the different NSA Hardening categories of our pass/fail rate.
Figure 1 - Compliance overview
Next, let’s look at what actual rules are failing so we can appropriately triage.
Step 2: Triaging Through the Rules
Next, we can click the list of rules and get an overview across different categories of what is passed, what is failing, and how many assets for the given rule are passing/failing. The good thing is that just like the NSA Hardening Guide, Uptycs also logically organizes it into the relevant sections so you can focus on a specific security task that is most relevant to you instead of traversing through tables of 100s of rules.
Figure 2 - Rules listing
Clicking a specific rule, for example, NSA-23 - Defining Resource Quota Policies, gives us the clusters that are passing and failing this rule. In this case, most of my clusters do not have resource quota policies defined for their namespaces. This can be problematic, especially in shared cluster environments where one user assigned to a namespace can over-inflate their namespace and limit other applications and pods from getting deployed. This can be especially problematic if you have important software add-ons for security, observability, cost metrics, and more that constantly need to be running on the cluster.
Figure 3 - Cluster listing
Suppose you assign this to a DevOps or platform engineer to take a closer look at. Let’s see how we can use the evidence-based view to get more details.
Step 3: Generating the Evidence and Remediation To Take Action
Many times, as a security ops engineer, you may get pushback asking why this check is so important. By clicking the purple magnifying glass, you get an evidence-based view that gives the real source of why this issue is so important, the evidence behind it, and the remediation steps. For example, for the no resource policies defined issue, you can see exact remediation steps of how one can define a ResourceQuota with CPU and memory limits for a given namespace.
Image 4 - Resource quota evidence
Another example below shows the importance of implementing a default network policy to prevent lateral movement inside the cluster. This is especially important because, by default, every pod can talk to one another inside a cluster, increasing your lateral surface attack in case one pod with vulnerabilities is compromised.
Figure 5 - Network policy evidence
Uptycs provides a 3 step process of visualizing, triaging, and generating evidence to remediate that allows CISOs and security admins to be continuously compliant. This process applies to NSA Hardening and any standard such as CIS Kubernetes Benchmarks out of the box.
If you’re looking to harden your infrastructure using the NSA Hardening rules, try the Uptycs platform and see how it enables you to take the great guidance the NSA has provided and make it operational and actionable for your organization.
More from this series: