Operations | Monitoring | ITSM | DevOps | Cloud

Zero-config Go heap profiling

Coroot's node-agent already collects CPU profiles for any process on the node using eBPF, with zero integration from the application side. For Java, we dynamically inject async-profiler into the JVM to get memory and lock profiles. But Go processes were still a blind spot for non-CPU profiling unless the app exposed a pprof endpoint and the cluster-agent scraped it. We wanted the same zero-config experience for Go heap profiles. This post is about how we got there.

Profiling Java apps: breaking things to prove it works

Coroot already does eBPF-based CPU profiling for Java. It catches CPU hotspots well, but that's all it can do. Every time we looked at a GC pressure issue or a latency spike caused by lock contention, we could see something was wrong but not what. We wanted memory allocation and lock contention profiling. So we decided to add async-profiler support to coroot-node-agent. The goal: memory allocation and lock contention profiles for any HotSpot JVM, with zero code changes. Here's how we got there.

Making encrypted Java traffic observable with eBPF

Coroot's node agent uses eBPF to capture network traffic at the kernel level. It hooks into syscalls like read and write, reads the first bytes of each payload, and detects the protocol: HTTP, MySQL, PostgreSQL, Redis, Kafka, and others. This works for any language and any framework without touching application code. For encrypted traffic, we attach eBPF uprobes to TLS library functions like SSL_write and SSL_read in OpenSSL, crypto/tls in Go, and rustls in Rust.

Instrumenting Rust TLS with eBPF

Coroot is an open source observability tool that uses eBPF to collect telemetry directly from applications and infrastructure. One of the things it does is capture L7 traffic from TLS connections without any code changes, by hooking into TLS libraries and syscalls. Works great for OpenSSL. Works for Go. Then rustls enters the picture and everything stops being obvious. With OpenSSL, everything is nicely wrapped: From eBPF’s point of view this is perfect: Everything happens inside one call.

How to Reduce Your Cloud Costs with Coroot

Cloud costs often grow quietly until they suddenly command everyone’s attention. Gartner estimates that companies overspend on cloud services by up to 70 percent, mostly because they lack clear visibility into where the money is actually being spent. Cloud invoices speak the language of infrastructure: nodes, instance types, regions, volumes, and egress. Engineering teams speak the language of services, deployments, and code.

Memory stall: the agony before OOM

When we set a memory limit for a container, the expectation is simple: if the app leaks memory, the OOM killer steps in, the container dies, Kubernetes restarts it, done. But reality is messier. As a container gets close to its memory limit, allocations don’t just fail instantly. They get slower. The kernel tries to reclaim memory inside the cgroup, and that takes time. Instead of being killed right away, your app just crawls.

Instrumenting the Node.js event loop with eBPF

Recently, I was testing Coroot’s AI Root Cause Analysis on failure scenarios from the OpenTelemetry demo. One of them, loadgeneratorFloodHomepage, simulates a flood of excessive requests. As expected, it caused a latency degradation across the stack. Coroot’s RCA highlighted how the latency cascaded through all dependent services. At the same time, we noticed a moderate increase in CPU usage for the frontend service and the node itself.

Using GreptimeDB as Prometheus Data Lake in Coroot

Coroot is excited to feature an editorial from the open source observability database GreptimeDB as an Open Source Spotlight. We hope to improve the work of our global community of SREs and DevOps professionals by sharing exciting projects like GreptimeDB, which make innovation accessible for everyone through the freedom of open source.

Size-capped telemetry storage with ClickHouse and Coroot

Cloud platforms make it incredibly easy to store data. Object storage feels endless, and block volumes can be resized anytime. That’s great, until you check the cost. In some cases, like financial transactions, storage costs are tiny compared to the value of the data. But observability is a different story. Logs, traces, and profiles can be extremely detailed and often take up more space than the actual business data. Yes, there are situations where logs need to be kept for compliance reasons.