Operations | Monitoring | ITSM | DevOps | Cloud

Agentic AI in Enterprise Risk Management - Key Benefits And Features for Chief Risk Officers

Companies using AI are expected to grow by 5.6% in 2025, pushing AI-driven business value to $4.9 trillion, up from $4.7 trillion in 2024, predicts Forrester. If this momentum continues, we won’t just see growth in AI adoption; we’ll witness a transformation in how businesses operate, opening doors to even greater innovation and long-term success.

Securing Containers at Scale: Docker Hardened Images + Cloudsmith

Containers have been with us for a while and are ubiquitous in the Secure Software Development Life Cycle (SSDLC). According to some reports, nearly 60% of organizations use containers for most or all of their production applications. It’s no surprise really, as containers provide consistency and standardization across the lifecycle while speeding up delivery pipelines. They revolutionized how we develop and deploy apps in the cloud and there is no sign of this changing anytime soon.

Preparing for the Autonomous Future

Throughout this blog series, we’ve followed how AI reshapes network operations – from foundational data harmonization to real-time correlation, from contextual insights to agent-driven automation, and most recently, to conversational access through natural language interfaces. But we haven’t reached the final destination.

Harnessing Network Observability to Enhance Grid Resilience

Within the utility sector, a lot is changing. Utilities continue to pursue digital transformation, altering the way services are delivered and operations are managed. What hasn’t changed is the criticality of the services provided. These organizations deliver essential resources like natural gas, electricity, and water—services that we as consumers rely upon constantly for our comfort, sustenance, communications, and more.

SAML authentication in Grafana Cloud: a guide for easy configuration

In my role as Senior Observability Architect here at Grafana Labs, one of the things I focus on is making sure customers are getting the most out of our products. Recently, I noticed a trend where customers were struggling to get SAML authentication configured properly. They were getting stuck on some of the steps needed to configure the users key pair values, which allows users to log in with the correct roles assigned in Grafana.

PagerDuty + Microsoft Build 2025: Transforming critical work with AI and automation

At Microsoft Build 2025, PagerDuty was featured in key announcements showcasing how intelligent agents and real-time automation redefine digital operations. From Microsoft Copilot to the launch of a new Azure SRE Agent, PagerDuty was highlighted as a strategic partner in enabling intelligent, scalable incident response.

A Simple Guide to Monitoring and Optimizing Prometheus CPU Usage

Prometheus is supposed to help you monitor your stack, not become the thing you need to monitor. But if you’ve ever seen it spike in CPU and slow everything down, you know that’s not always the case. High Prometheus CPU usage usually shows up when you're scraping too many metrics, using expensive queries, or running with default configs that don’t fit your workload. This guide covers how to track Prometheus CPU usage, what typically causes it, and how to fix it.

VPC Log Format: Custom and Advanced Configurations

VPC Flow Logs come with a default format that gives you basic network traffic details. But you can tweak the format to capture exactly what you need. This can lower costs, speed up processing, and make your logs fit better with what you’re trying to monitor. If you want to improve security, keep an eye on performance, or save money, adjusting your VPC logs can make a big difference. Let’s take a look at some practical ways to customize your logs beyond the default settings.

Surprised By Your AWS ELB Bill? Here's What Happened

On May 1st, AWS corrected a long-standing billing bug tied to Elastic Load Balancer (ELB) data transfers between Availability Zones (AZs) and regions. That fix triggered a noticeable increase in charges for many users, especially for those with high traffic volumes or distributed architectures. The problem wasn’t new usage; it was a silent correction to an old error.

Under the hood: Request coverage feature

‍ The ilert mobile app is primarily used by responders to receive notifications about critical alerts, react to them on the go, and check their current on-call status. It has various capabilities, including critical notifications via push, quick actions for alerts, and critical alert settings. The app enables responders to view their current on-call shifts and escalation policies, take on-call shifts from somebody else, and create coverage requests to ask for on-call shift handover from a colleague.