Operations | Monitoring | ITSM | DevOps | Cloud

Certificate Audit logs are live

Certificate automation does a lot of work on your behalf. Agents running on your servers, talking to certificate authorities, deploying certs to your infrastructure. At some point someone (your CISO, your auditor, or your own brain at 3am) is going to ask: what exactly happened, and when? Today we’re shipping audit logs. Every action taken in CertKit is now recorded: logins, invitations, certificates added, issued, renewed, revoked, and deployed. Agent registrations, approvals, and config changes.

The Productivity Tax of Repeat IT Failures in Technology Companies

Technology companies are being pushed to deliver faster outcomes while justifying growing investment in AI, SaaS, and digital infrastructure. But productivity does not improve just because new tools are deployed. It improves when employees can use those tools without the constant drag of slow devices, unstable applications, and fixes that do not fully solve the problem. That is the productivity tax of digital friction.

How to Create Your Own Plugins and Check Commands in Icinga 2

If you’ve been using Icinga 2 for a while, you probably know the built-in checks cover a lot of ground: disk space, CPU, memory, ping. But sooner or later you’ll run into something specific to your setup that no existing plugin handles. That’s where writing your own plugin comes in. The good news? It’s simpler than it sounds. Icinga 2 doesn’t care what language your plugin is written in. It just runs the script, reads the exit code, and displays the output. That’s it.

NVIDIA Vera Rubin: What is it, what's new, and when you can get it

NVIDIA's infrastructure roadmap moves fast, and the next major milestone is already here. The NVIDIA Vera Rubin platform is the company's next-generation AI compute architecture, the successor to Blackwell, and it's shaping up to be one of the most significant leaps forward in AI infrastructure NVIDIA has ever shipped. Whether you're planning your next training cluster, scaling inference pipelines, or building the infrastructure to power autonomous agents, Vera Rubin is worth understanding now.

Honeycomb Canvas: The Multiplayer Workspace for the Agentic Era

Last week, we launched a major update to Canvas, our investigation workspace. The new Canvas has evolved from an AI co-pilot you chat with to a place where your whole team, human and agent, can work the same problem on the same surface. Auto-investigations begin the moment a trigger, SLO, or anomaly fires. Custom skills encode your team's runbooks so every agent investigates with your team's expertise built in.

Introducing Atatus Sensitive Data Classifier

Your logs know too much. Every debug statement, every traced request, every APM span can carry the risk of capturing something they shouldn't. A customer email. A JWT token. A credit card number. An API key that was never meant to leave your payment service. It doesn't look like a breach. There's no alert. Your observability platform just quietly accumulates sensitive data like indexed, replicated, and accessible to every engineer with log query access.

How we made a SQL query optimization agent 59% more accurate using autoresearch and LLM Observability

Without experiment infrastructure to help you test your LLM applications, every research session starts with the same questions: What have we tried previously? What were the numbers? Which prompt version produced that result? Why did we discard that approach? The answers live in scattered notes, terminal history, and half-remembered conversations. Each handoff between sessions loses context. In practice, iteration can slow down as teams get bogged down in testing and analysis.

How to audit and clean up monitors effectively

Alert fatigue and blind spots develop together. Monitoring stacks that generate noise while missing critical issues may have incomplete coverage or poorly configured alerts. As they grow reactively and without structured coverage assessment, both issues worsen. Teams will often add monitors when something breaks and tune thresholds when alerts become unbearable, but rarely audit their overall setup to see if it works.