Detect Silent AI Model Theft in Kubernetes via eBPF

Detect Silent AI Model Theft in Kubernetes via eBPF

Detecting Silent AI Model Theft via eBPF-Powered Kernel Observability in K8s

Introduction

As artificial intelligence moves from research labs to production environments, the focus of cyber-security has shifted from protecting traditional databases to securing the crown jewels of the modern enterprise: AI model weights. A 70B parameter model represents millions of dollars in R&D and compute investment. Unlike a database breach, which often triggers alerts due to mass query volume, model theft is frequently “silent.”

In a Kubernetes (K8s) environment, an attacker or a malicious insider who gains access to a pod can exfiltrate models by slowly reading .safetensors or .pth files and sending them to an external endpoint, or by scraping the model directly from the memory of an inference engine like vLLM or NVIDIA Triton. Traditional security layers—Kubernetes RBAC, network policies, and high-level application logs—are often blind to these low-level system interactions.

To combat this, security engineers are turning to eBPF (extended Berkeley Packet Filter). By operating at the kernel level, eBPF provides deep observability into system calls (syscalls), allowing for the detection of unauthorized file access and network exfiltration with near-zero overhead. This post explores how to leverage eBPF to create a “Data Perimeter” around AI models in Kubernetes.


Technical Overview

The Visibility Gap

Standard observability tools operate in “user space.” They see what the application reports. However, if a container is compromised, the attacker can bypass application-level logging. eBPF operates in “kernel space,” meaning it observes the interface between the application and the hardware.

Key eBPF Hooks for Model Security

To detect model theft, we focus on specific syscalls and kernel functions:

  1. File I/O (openat, read, mmap): Detects when a process opens a sensitive model file. Since most inference engines use mmap to map weights into memory, monitoring this syscall is critical.
  2. Network Socket Activity (connect, sendto): Identifies where data is being sent. By correlating a read of a model file with a subsequent sendto on a socket, we can identify exfiltration.
  3. Memory Access (process_vm_readv): Detects “memory scraping,” where one process attempts to read the memory space of another (e.g., an attacker trying to dump weights from a running Triton process).

Architecture of an eBPF Security Stack

In a Kubernetes cluster, the most efficient architecture is a DaemonSet deployment. This ensures that every node in the cluster has an eBPF agent running, providing global visibility without the management overhead of sidecars.

  • eBPF Probes: Reside in the kernel, filtering events based on cgroup (which identifies the specific K8s Pod).
  • Agent (e.g., Tetragon or Falco): Collects raw events from the probes and enriches them with K8s metadata (Namespace, Pod Name, Labels).
  • Policy Engine: Evaluates events against a security manifest (e.g., “Only the inference-server process is allowed to read /models/*.bin“).

Implementation Details

The following implementation uses Cilium Tetragon, a powerful eBPF-based security tool, to enforce a security policy that detects and blocks unauthorized model access.

1. Defining the TracingPolicy

A TracingPolicy allows us to define exactly which syscalls to monitor. In this example, we want to monitor any access to files ending in .safetensors within the ml-workloads namespace.

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "detect-model-access"
spec:
  kprobes:
    - call: "fd_install"
      syscall: false
      args:
        - index: 0
          type: "int"
        - index: 1
          type: "file"
      selectors:
        - matchArgs:
            - index: 1
              operator: "Postfix"
              values:
                - ".safetensors"
                - ".pth"
                - ".onnx"
          matchNamespaces:
            - "ml-workloads"
          matchActions:
            - action: Sigkill # Immediately terminate the process if it's not on the whitelist

2. Correlating File Access with Network Egress

The real power comes from detecting the sequence of events. If a process reads a model and then opens a network connection to an unknown IP, that is a high-fidelity signal of theft.

apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: "detect-exfiltration"
spec:
  kprobes:
    - call: "tcp_connect"
      syscall: false
      args:
        - index: 0
          type: "sock"
      selectors:
        - matchPIDs:
            - operator: "In"
              followForks: true
              values: [1234] # Dynamically identified PID that previously accessed model files
          matchActions:
            - action: Post
              postAction:
                # Log the event to the security pipeline

3. Monitoring Memory Scraping

Attackers may use process_vm_readv to copy weights from a running inference engine’s RAM. We can use eBPF to detect this specific syscall between processes.

# Tetragon filter snippet for memory reading
- call: "sys_process_vm_readv"
  syscall: true
  args:
    - index: 0
      type: "int" # The PID being read
  selectors:
    - matchActions:
        - action: Warn

Best Practices and Considerations

1. Baselining “Normal” Behavior

AI inference engines have predictable start-up patterns. They perform a massive read of model files into memory upon initialization.
* Action: Use eBPF during your CI/CD or staging phase to generate a baseline. Any file access after the initial load phase should be treated as suspicious.

2. Filtering at the Source

eBPF can generate a massive amount of data.
* Action: Apply filters at the kernel level (within the eBPF program) to discard noisy events (like healthz checks) before they ever reach your logging stack. This reduces CPU overhead on the node.

3. Namespace Isolation

Use K8s namespaces and cgroups as primary filters.
* Action: Only enable deep syscall tracing on namespaces that actually host AI models. Running full I/O tracing on a web-frontend namespace is unnecessary and resource-intensive.

4. Security of the eBPF Tooling

Since eBPF programs run with CAP_SYS_ADMIN or CAP_BPF privileges, the eBPF agent itself is a target.
* Action: Ensure your eBPF agent (Tetragon/Falco) is running in a locked-down namespace, and use Kubernetes PodSecurityStandards to prevent unauthorized users from deploying their own eBPF programs.


Real-World Use Cases and Performance

Use Case: Detecting the “Slow Leak”

An attacker uses dd to copy small chunks of a model file over several hours to avoid bandwidth spikes.
* Detection: eBPF tracks total bytes read from specific file descriptors associated with the model files. By integrating this with a stream processor (like Flink or a simple Python script), you can detect cumulative I/O that exceeds the expected model size.

Use Case: Unauthorized kubectl cp

An engineer with legitimate pod access uses kubectl cp to copy the model to their local machine.
* Detection: eBPF captures the tar or cat process spawned inside the container and logs the destination of the data flow, even if the user has bypassed standard application audit logs.

Performance Metrics

In high-throughput inference environments (NVIDIA A100/H100 clusters), performance is non-negotiable.
* Overhead: Properly configured eBPF probes generally add < 2% CPU overhead and negligible latency to the application.
* Throughput: eBPF can process millions of events per second, though it is best practice to aggregate these into summaries before sending them to a SIEM like Splunk or Datadog.


Conclusion

The “Silent” theft of AI models is a unique threat that necessitates a shift from peripheral security to kernel-level observability. By leveraging eBPF within Kubernetes, security teams can move beyond simple RBAC and achieve granular visibility into how model weights are accessed, moved, and exfiltrated.

Key Takeaways:
* Weights are Vulnerable: Standard K8s logs are insufficient for protecting high-value model assets.
* eBPF is the Solution: It provides the necessary hooks (openat, mmap, tcp_connect) to correlate file access with network movement.
* Runtime Enforcement: Tools like Tetragon allow for immediate action, such as killing a process (Sigkill) the moment it violates a data access policy.
* Operational Efficiency: eBPF’s low overhead makes it suitable for performance-sensitive AI workloads.

Implementing eBPF-powered observability is no longer an “advanced” security measure; for organizations deploying proprietary LLMs and generative models, it is a foundational requirement for a modern MLSecOps strategy.


Discover more from Zechariah's Tech Journal

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply