Mastering Kubernetes Security #5: Incident Response with Detections

Blog Author
Abhinav Mishra

As we approach the close of 2023 and head into 2024, we are seeing an increase in Kubernetes attacks varying across all different types of threats and malicious attacks including cryptomining, stolen or exposed credential files and secrets, data exfiltration attacks, and more. In fact, according to the Red Hat State of Kubernetes there was a 7% increase in attacks with 37% of respondents experiencing revenue or customer loss due to a Kubernetes security event/incident. 


This is highly problematic and with the increasing attack surface and volume of noise generated from alerts, CISOs and SecOps are fatigued with having to figure out what is a false positive versus what is an actual threat that needs attention. 


In this blog, we’ll discuss a framework for catching different threats to secure and harden your Kubernetes clusters.


3 steps to uncovering and addressing Kubernetes threats


Step 1: Think like a threat actor and list the most common types of attacks


While there are many different misconfigurations in Kubernetes, there are a handful of different types of attacks that are typical and can cover a lot of ground. Many frameworks such as OWASP Top 10 exist out there today that talk about the different security principles. 


However, one that we highly recommend looking at and have seen many of our own customers ask for is Kubernetes GOAT. What’s exciting about Kubernetes GOAT is that it covers many end-to-end examples for example: 


  1. Privilege escalation 
  2. Container breakout into the host exposing cluster details 
  3. Port scanning 
  4. RBAC least privilege misconfigs (similar to our last blog post) 
  5. SSRF attacks


Kubernetes GOAT also provides details on how to actually replicate the attack. These insights are key to getting into the mind of an attacker and seeing what kind of commands they would perform, the starting points for most key attacks, and more.


Figure 1 - Kubernetes GOATFigure 1 - Kubernetes container escape example

The Container Escape Example is a great one to look at and replicate to see how a malicious attacker can access the host/VM from a pod/container and then use that to access other cluster-level info, nodes, and even secrets. Other types of attacks can include: 


  • SSRF attacks 
  • Exposed services on node ports 
  • Reverse shell executions to exploit a system’s vulnerabilities and attack the host 
  • Denial of service based on exploiting services that don’t have memory or CPU limits (this can be extremely problematic in multi-tenant scenarios where stressing one namespace or pod can take down the entire cluster) 


As seen in Figure 2 below, Uptycs can aggregate these kinds of attacks based on the latest threat intel so you can stay up to date and have the best possible breadth, including from frameworks such as Kubernetes GOAT and more!


Figure 2 - Different types of Kubernetes detectionsFigure 2 - Different types of Kubernetes detections

Step 2: Address all the blind spots through breadth of telemetry collection


In order to build robustness around your detections, you need to collect telemetry from some key sources to eliminate any blind spots. Let’s take a look and briefly describe how Uptycs in particular tackles these: 


Audit Logs: Audit logs contain key information from the Kubernetes API about any new resources created, actions performed, etc. Audit logging should always be turned on in your cluster from a compliance point of view as well as to triage certain issues. However, what audit logs can provide is a way for you to understand what policies, roles, and infrastructure entities are being created or modified. For example, it can tackle issues around: 


  • RBAC - cluster role bindings: Typically you want to limit the cluster roles and permissions inside your cluster. If cluster role bindings are unnecessarily being created it should be monitored. In addition, you want to perform a continuous audit of these types of roles and limit the permissions to only the specific actions needed. 
  • Network policies: Typically to enable a zero-trust model, you want to limit the number of possible connections through network policies. If your network policies, especially cluster-level ones that enable tenant isolation are being modified, it’s important for you to audit those actions before something inadvertently takes place. 


Uptycs maps collects audit log data and maps them into key sets of security events and detections for you. In fact, we have over 60 different types of detections just around audit logs that we collect as part of our Kubernetes security events package! 


Runtime telemetry: In addition to audit log data on the control plane, you want to understand in real time what is happening in your data plane. This allows you to map what is happening in real time to key misconfigurations in your control plane. Some types of information you may want to collect are: 


  • Process events: If a malicious attacker enters your container or pod, you want to identify the process running to stop it, kill it, and more. 
  • Files: These fall into two key buckets. 
    1. One is files that are insecure whether they be a misconfigured kubeconfig file that has overprivileged access or cryptominer malware. Understanding what files are present in your system in real-time can be key to your ability to threat hunt faster.
    2. Second is files that are not insecure but are important to your cluster operations. This could include files like secrets/sensitive data or certificates. You should always make sure these files are at the minimum encrypted and ideally not accessible from within the cluster and pod itself in case those are compromised. Typically secrets should be stored in a secrets manager such as Hashicorp Vault or AWS Secrets Manager.  
  • Socket events: If insecure network connections are made to a malicious IP, then it is important to detect those and understand where they came from. Socket events can help you identify that. 


Uptycs collect this data via our runtime osquery-based sensor which is able to go in-depth and collect these different types of event data. 


Once you identify the telemetry sources, correlate them using SQL or a rule engine into different types of events so that you can quickly identify and act on any data found in real time. Uptycs with its real-time security data lake empowers users by: 


  • Automating checks of the most known detections: based on event rules that map back to the real-time security data lake and data collected from a variety of sources, Uptycs eases the burden of having to build event rules by detecting and alerting users on the most common types of detections. 
  • Providing a scalable security detection engine: Uptycs can scale to millions of events and prioritize the threats that are most relevant and risky via ML and anomaly-based detections as well as correlating telemetry at scale from the different sources mentioned. 
  • Empowering customizability through detections as code: Allowing SecOps to build their own detections based on Sigma rules as well as using simple SQL queries through the single console investigate engine and security data lake. 

Step 3: Think like a threat hunter 


Once you have a system for identifying threats, it’s important to understand where your Kubernetes threat came from and perform the right techniques. Let’s take a look at an example: 


Example: Identifying a malicious port scan process through YARA rule signature


One common attack is using a port scan via nmap as described here. Nmap enumerates for exposed ports and vulnerabilities associated with those open ports. In order to think like a threat actor and stop this attack, we need to think about the following: 


Where is the attack coming from? 


  • If it’s an internal user, we need to know their intentions behind the execution. Are they doing it for genuine reasons? And are they doing it from a namespace that has access to sensitive information or their own app namespace for debugging purposes? 
  • If it’s an external user where is the malicious IP coming from and how. We need to look for compromises in the system. Do I have an exposed node port? Or a network policy/ingress controller that allows any time of traffic? 


How is the attack being masked?


In Figures 3 and 4 below we see that the attacker is renaming nmap to something else to hide from defense systems. Uptycs gives SecOps teams superpowers by allowing them to analyze the process using a YARA rule signature. In this case, we see that Uptycs shows that the port scan, nmap is being masqueraded behind /qwer in the process.


Figure 3 - Original nmap hiding behind /qwerFigure 3 - Original nmap hiding behind /qwer


Figure 4 - YARA rule scan being performed to discover masqueraded nmap execution

Figure 4 - YARA rule scan being performed to discover masqueraded nmap execution 


From there, Uptycs of course offers container process remediation to kill malicious processes running on a container such as the masqueraded nmap. These kinds of threat-hunting capabilities and mindset are needed, however, to avoid these kinds of attacks! 

We’re excited to demonstrate our detection capabilities at Kubecon North America 2023 based on Kubernetes GOAT and more. In our next blog, we’ll dive deep into examples from Kubernetes GOAT around different types of attacks. 


More from this series

Mastering Kubernetes Security Part 1: NSA Hardening Guide

Mastering Kubernetes Security Part 2: Vulnerability Management

Mastering Kubernetes Security Part 3: Runtime Admission Controls

Mastering Kubernetes Security Part 4: Authorization, Access & Secrets