Back to blog
Infrastructure13 min read

eBPF in Production: Writing Kernel-Level Observability Without Breaking Everything

eBPF lets you attach programmable hooks to any kernel function with near-zero overhead. But writing safe, performant eBPF programs for production observability requires understanding the verifier, map types, and BTF — not just the marketing pitch.

eBPF (extended Berkeley Packet Filter) is the most significant change to how Linux systems are observed and controlled in two decades. It allows user-defined programs to run inside the kernel, attached to any of thousands of probe points — network stack, scheduler, file I/O, system calls — with overhead measured in nanoseconds, not microseconds. Cilium, Pixie, Parca, Falco, Datadog's Agent, and dozens of other production tools run on eBPF.

The marketing pitch writes itself. The engineering reality is harder. eBPF programs run in a constrained execution environment with a strict verifier, limited stack space, no dynamic memory allocation, and a type system enforced at load time. Writing correct, safe, and performant eBPF for production observability requires understanding these constraints deeply — not just calling bpftrace one-liners.

The eBPF Execution Model

An eBPF program is a sequence of 64-bit instructions executed by the kernel's in-kernel virtual machine. The ISA (instruction set architecture) is RISC-style with 11 64-bit registers (r0-r10), a 512-byte stack, and a fixed maximum instruction count (currently 1 million for programs with the BPF_F_ANY_ALIGNMENT flag, but effectively constrained by verifier complexity limits).

Before execution, every eBPF program is passed through the kernel verifier — a static analysis pass that rejects programs that could cause crashes, infinite loops, or invalid memory accesses. The verifier performs abstract interpretation: it simulates all possible execution paths of the program, tracking the type and value range of every register at every instruction.

What the Verifier Checks

The verifier runs in O(n^2) in the worst case and can reject programs that are correct simply because the verification complexity exceeds internal limits. This is a real production concern: complex observability programs with many branches and map accesses can hit verifier limits even when semantically correct.

BPF Maps: State Between Kernel and Userspace

eBPF programs are stateless by nature — they execute in response to a probe firing and return. State is maintained through BPF maps: key-value data structures accessible from both the eBPF program (in kernel context) and userspace. Choosing the right map type for each use case is a critical performance decision.

Map Types and Their Performance Characteristics

A common performance mistake is using BPF_MAP_TYPE_PERF_EVENT_ARRAY for high-frequency event streaming. Each event requires a system call from userspace to drain, creating overhead that dominates at >100k events/second. BPF_MAP_TYPE_RINGBUF with poll() is the correct choice for modern kernels (5.8+) — it is 2-4x more CPU efficient at high event rates.

BTF: BPF Type Format and CO-RE

One of the most significant engineering challenges in eBPF has been portability across kernel versions. Kernel data structures change between versions — struct offsets shift, fields are added or removed. An eBPF program that reads task_struct->mm->pgd at a fixed byte offset will produce garbage or crash on a kernel where that offset changed.

BTF (BPF Type Format) and CO-RE (Compile Once, Run Everywhere) solve this. BTF is a compact type information format embedded in the kernel. CO-RE is a technique in libbpf that uses BTF to relocate field accesses at load time, adjusting byte offsets based on the actual structure layout of the running kernel.

How CO-RE Works in Practice

When you write BPF_CORE_READ(task, mm->pgd) in your eBPF C program, the compiler emits relocation records. At load time, libbpf consults the kernel's BTF to find the actual offset of mm within task_struct and pgd within mm_struct on this specific kernel version, and patches the compiled bytecode before loading it. The result is a single compiled eBPF object that runs correctly on any kernel from 4.14 onward (with appropriate BTF data).

Probe Types: Choosing the Right Attachment Point

eBPF programs attach to probe points — specific locations in kernel or userspace code where the program fires. Choosing the right probe type for an observability goal is critical for correctness, performance, and stability.

Kernel Probe Types

XDP: The Networking Fast Path

XDP (eXpress Data Path) is an eBPF hook attached at the earliest point in the network receive path — before skb allocation, before the kernel's networking stack. XDP programs make a decision (pass, drop, redirect, or transmit) for every incoming packet with overhead of 50-200 nanoseconds per packet, comparable to raw DPDK performance but without leaving the kernel.

Cilium uses XDP for load balancing and DDoS mitigation — dropping attack traffic before it consumes any significant kernel resources. Cloudflare uses XDP to mitigate volumetric DDoS attacks at rates of tens of millions of packets per second on commodity hardware.

Writing a Production eBPF Observability Program

Let's walk through the engineering decisions for a realistic production use case: latency distribution tracking for all outbound TCP connections, broken down by destination IP, with p50/p95/p99 reporting from userspace.

Architecture Decisions

The verifier will reject naive implementations of this. A common pitfall: reading the socket pointer from the tracepoint arguments requires a null check that the verifier can trace. Failing to add the explicit null check before the map lookup will cause the program to be rejected with 'R1 is not a known value' or 'possible null pointer dereference.'

The Overhead Reality

eBPF's overhead is genuinely low, but not zero. Understanding the cost profile is essential for production deployment.

For a system handling 100k TCP connections per second, the per-connection eBPF overhead is approximately 200 ns * 2 (connect + close) * 100k = 40ms of CPU time per second — about 0.1% of a single CPU core. This is negligible for almost any production workload, which is why eBPF's value proposition is so compelling.

eBPF has made the tradeoff between observability depth and production overhead essentially disappear. The question is no longer 'can we afford to observe this?' but 'do we have the engineering capacity to write the eBPF program correctly?' The bottleneck shifted from performance to correctness.

Operational Considerations

Build Zero-Overhead Infrastructure Observability with Accelar

Accelar builds production-grade eBPF observability systems — from custom kernel probes and network telemetry to continuous profiling pipelines and security detection engines. If you need deep visibility into your infrastructure without the overhead of traditional agents, we have the kernel engineering expertise to build it. Let's talk.