Skip to content

 

1. Container Escapes—Small Bug, Big Blast-Radius

Modern runtimes such as runc and containerd rely on namespace and cgroup isolation to keep a container’s processes away from the host. A single kernel or runtime mistake, however, can punch a hole through that boundary.

The January 2024 “Leaky Vessels” flaw (CVE-2024-21626) is a prime example: through a leaked file descriptor, a cleverly crafted image could pivot a process’ working directory back onto the host filesystem, giving it read/write access outside the container. Detecting that kind of abuse as it happens is worth its weight in gold—especially in ephemeral, auto-scaling clusters where you may never get a second look at the node.

 

2. Why Osquery? Evented Tables & eBPF Magic

Unlike one-off “kubectl exec” forensics, osquery can subscribe to kernel events and buffer them in evented tables such as process_events, socket_events, and the newer bpf_process_events_v2 (container-aware, introduced in 5.8).

Because rows arrive when the kernel emits an event, you get near–real-time telemetry with much lower polling overhead than snapshot queries.

 

3. Preparing the Ground

Step

What to do

Notes

Kernel

Ensure CONFIG_AUDIT and CONFIG_BPF are enabled (true for stock Ubuntu, Amazon Linux 2, COS, etc.).

Needed for audit- and eBPF-based tables.

Deploy

Run osquery as a DaemonSet on every Kubernetes node (or systemd service on plain VMs).

Use privileged: true and mount /proc, /sys read-only.

Flags

Minimal extra flags to turn on events:

--enable_bpf_events=true

--audit_allow_process_events=false (let eBPF do the work)

--audit_allow_sockets=false

--events_expiry=3600

Turn off audit duplicates; expire rows after 1 h to keep RocksDB small.

Shipping

Point the TLS logger at Uptycs, Fleet, Splunk, Loki, or an OTLP collector.

Works the same as snapshot logs.

 

4. Real-Time Detection Queries

Below are three practical queries you can schedule at 30-second (or faster) intervals. All run on upstream osquery—no custom extensions required.4.1

4.1 Watching for suspicious namespace pivots

WITH dcp AS (
  SELECT * FROM docker_container_processes
  WHERE id IN (SELECT id FROM docker_containers)
)
SELECT
  datetime(pe.time, 'unixepoch', 'localtime') AS ts,
  dcp.id AS container_id, pe.pid, pe.syscall, pe.path
FROM bpf_process_events AS pe
LEFT JOIN dcp ON (pe.pid = dcp.pid OR pe.parent = dcp.pid)
WHERE dcp.id IN (SELECT id FROM docker_containers)
  AND pe.syscall IN ('setns', 'mount', 'pivot_root');

Catch-me logic: setns and pivot_root are rarely used by normal workloads once the container is running. Raise an alert if they appear.

4.2 Detecting CVE-2024-21626 style escapes

WITH dcp AS (
  SELECT * FROM docker_container_processes
  WHERE id IN (SELECT id FROM docker_containers)
)
SELECT datetime(pe.time, 'unixepoch', 'localtime') AS ts,
  dcp.id AS container_id, pe.pid, pe.path
FROM bpf_process_events AS pe
LEFT JOIN dcp ON (pe.pid = dcp.pid OR pe.parent = dcp.pid)
WHERE pe.path GLOB '/proc/[0-9]*/cwd/*';

The exploit relies on tricking a process into operating from /proc/<pid>/cwd. A quick GLOB match is enough to surface it.

4.3 Spotting containers that talk to the host kernel or metadata service

WITH dcp AS (
  SELECT * FROM docker_container_processes
  WHERE id IN (SELECT id FROM docker_containers)
)
SELECT
  datetime(s.time,'unixepoch','localtime') AS ts,
  dcp.id AS container_id,
  s.pid, s.local_address, s.remote_address, s.remote_port
FROM bpf_socket_events AS s
JOIN dcp ON (s.pid = dcp.pid OR s.parent = dcp.pid)
WHERE s.remote_address IN
  ('127.0.0.1','169.254.169.254'); -- add API server IPs if needed

Why it matters: many real-world escapes immediately probe 169.254.169.254 (AWS IMDS) or the host’s Docker socket.

 

5. Turning Queries Into Live Alerts

  • Fleet → Policies: paste each query, set “fails if row count > 0”, choose Slack/email/webhook.
  • Grafana/Loki → Alert rules: filter for the query name and a non-zero result.
  • Splunk index=osquery event_result>0 | stats count by query_name, host.
  • Uptycs → Apply the uptycs_edr_linux_mitre tag to your Linux endpoints to enable comprehensive MITRE ATT&CK® -based threat detection covering container escapes and more.

 

6. Performance Tuning Tips

Symptom

Fix

RocksDB grows >2 GB

Lower events_expiry to an hour and/or raise events_max to flush sooner.

High CPU on busy nodes

Pin osquery to one core with --worker_threads=1 and disable unused event tables (audit_allow_*).

Missing rows under heavy load

Increase --audit_backlog_limit (if you keep Audit) or switch fully to eBPF.

For performance-critical workloads, Uptycs provides in-kernel eBPF event filtering and deduplication.

7. Beyond SQL—Enrich & Correlate

  • Join with host context: combine pe.uid with users to map UID→username.
  • Add image metadata: join container ID to docker_containers.image to see which registry image spawned the process.
  • Persistent metadata: one challenge with the above detections is that metadata about exited processes and the containers they were in may have been lost by the time the query scans eBPF events. Uptycs offers osquery-based solutions that provide deep container telemetry without race conditions, connecting GitHub activity, container CI/CD pipelines, K8s admission control, and eBPF-based runtime detections.

 

8. Conclusion

Container escapes will keep showing up as long as sandboxes exist. By enabling evented osquery on every node you:

  • Hear the kernel the moment a process makes a suspicious syscall.
  • See exactly which container, image, and command line triggered it.
  • Act within seconds, not minutes, using the log pipeline you already have.

Give these queries a spin in staging, tune the flags for your load, and let osquery watch your walls while you sleep.

Happy hunting!

👉 Contact Uptycs to get started.