Operations | Monitoring | ITSM | DevOps | Cloud

Six AI agent SDKs for enterprise Kubernetes, compared

There’s a question we hear constantly from platform and engineering leaders right now, “which agent SDK should we standardize on for our Kubernetes clusters?” The honest answer is that the question is slightly wrong, and the rest of this post explains why. But it’s a fair question, so let’s compare the contenders first.

DevOps with Kubernetes: How to Reduce Cluster Toil and Complexity

Has Kubernetes made your DevOps team faster, or just busier? Most teams adopt it for speed and portability, and they get both. What arrives with it is a quieter cost: the operational weight of running the cluster day to day. That weight shows up in the manual work the platform was supposed to eliminate. A resource limit set incorrectly can waste infrastructure for months.

Unified Observability: Moving IT Teams from Reactive to Predictive

What does it take to stop an outage before it starts? In many cases, the warning signs are already there, scattered across different monitoring tools, which makes it difficult to see the full picture before issues escalate. When an incident occurs, engineers often spend valuable time piecing together metrics, logs, traces, and alerts to determine the root cause. Every minute spent investigating extends the outage and increases its business impact.

Extending the Application Edge with F5 BIG-IP VE and Megaport Virtual Edge

Learn how F5 BIG-IP VE simplifies multicloud application delivery, security, and traffic management with MVE. As enterprise applications continue spreading across multiple clouds, the application edge is changing. A few years ago, application delivery was usually tied to a physical appliance sitting in a data center; today, applications are everywhere.

Observability for LLM Apps and Agents: OpenLIT SDK + VictoriaMetrics observability stack

Many “LLM observability with OpenTelemetry” tutorials stop at a single chat.completions span. That works for a demo, but it leaves gaps once an agent fans out into 30 tool calls, two vector-DB queries, three handoffs, and a 90-second tail latency you need to attribute. This post wires the OpenLIT SDK (50+ instrumentations, OTel GenAI semantic conventions, one line of code) into the full VictoriaMetrics observability stack and shows query examples that turn agent telemetry into decisions.

Introducing AI Analytics Reports in InvGate Service Management

Most teams can confirm their AI features are turned on. Measuring how often employees use them, which requests get resolved without agent intervention, and where AI is helping support teams work more efficiently is a different question. In InvGate Service Management, those capabilities live in AI Hub, a set of built-in AI features that includes the Virtual Service Agent, AI-assisted ticket resolution for agents, automated knowledge generation, and more.

How to Measure AI ROI in IT Service Management

A service desk manager launches a virtual agent in January. By March, chat conversations are climbing, ticket volume hasn't changed much, and the monthly report doesn't explain whether the investment is delivering value. AI rarely produces a single number that proves its return. The gains accumulate across thousands of support interactions, making measurement just as important as deployment.

Make the most of shift-based schedules

We recently updated our Schedules to better reflect how teams are currently managing their on-call responsibilities. Not everyone is working on weekly shifts or providing 24×7 coverage for all of their services, and that should be easy to schedule in our new tooling. To give you some examples, I’ve gone back through some of the questions we’ve gotten on the PagerDuty Commons over the past couple of years for questions about custom schedules that we weren’t really thinking about.

ServiceNow Runs Your IT. PagerDuty Makes Sure It Never Stops.

For most enterprises, ServiceNow has become the backbone of IT operations, the platform where workflows are governed, compliance is maintained, and every incident, change, and request is tracked from start to finish. If you’re running ServiceNow, you’ve made a serious investment in how your IT operates. PagerDuty is built to make that investment work even harder.

Icinga Web 2.14, Security Releases, and Module Updates

We are shipping a new batch of Icinga Web ecosystem releases today. Icinga Web 2.14 is the headline, bringing the baseline for two-factor authentication support, configurable password policies, a configurable Content Security Policy, and a round of developer tooling improvements that have been in the works for a while. Icinga Certificate Monitoring 1.4, Icinga Reporting 1.1, and Icinga PDF Export 0.13 join it with PHP 8.5 support across the board and a set of focused improvements for each module.