Operations | Monitoring | ITSM | DevOps | Cloud

Zero-config Go heap profiling

Coroot's node-agent already collects CPU profiles for any process on the node using eBPF, with zero integration from the application side. For Java, we dynamically inject async-profiler into the JVM to get memory and lock profiles. But Go processes were still a blind spot for non-CPU profiling unless the app exposed a pprof endpoint and the cluster-agent scraped it. We wanted the same zero-config experience for Go heap profiles. This post is about how we got there.

Debug Live Production Apps in Codex with Lightrun MCP

Lightrun’s Dan Putman demonstrates the power of the latest Lightrun MCP skill. Watch how your AI code agent can now debug live applications directly in production. By connecting OpenAI's Codex to real-time runtime data via the Lightrun MCP, engineers can now generate and validate hypotheses using live telemetry and snapshots, without breaking flow. Ready to bring runtime context to your AI agents?

Live Runtime Investigation in Claude Code with Lightrun MCP

In this video, Lightrun’s Dan Putman demonstrates what happens when Lightrun MCP is integrated within Claude Code. See how, once activated, Claude can ask specific questions about what services it can see and instrument in order to perform a deep investigation in production to get to a validated root cause analysis without the friction of redeploying or switching contexts.

Jira GitHub Integration: The Complete Guide

Most teams use Jira to plan work and GitHub to build it. The problem is those two tools don’t talk to each other by default. Developers end up manually copying commit references into tickets, project managers hunt through GitHub to answer basic status questions, and sprint reviews become archaeology expeditions through two disconnected systems. Git Integration for Jira closes that gap.

SRE agent vs. traditional engineer: 7 key differences

The role of a Site Reliability Engineer (SRE) is evolving. The focus has shifted from simply working harder during an outage; A new kind of teammate is here to help: the SRE Agent. But what are the key differences when you compare an SRE agent versus a traditional site reliability engineer? This isn’t just a superficial change. It signifies a fundamental alteration in how teams construct and sustain dependable services.

Understanding disaggregated GenAI model serving with llm-d

llm-d is an open source solution for managing high-scale, high-performance Large Language Model (LLM) deployments. LLMs are at the heart of generative AI – so when you chat with ChatGPT or Gemini, you’re talking to an LLM. Simple LLM deployments – where an LLM is deployed to a single server – can suffer from latency issues, even with just one user. This can be because of lack of memory-bandwidth on the server, or because of KV cache pressure on system memory.