Operations | Monitoring | ITSM | DevOps | Cloud

Securely quarantine suspect packages using Rego code with Cloudsmith's Enterprise Policy Management.

Software supply chain attacks are becoming more sophisticated, and Cloudsmith tackles this head-on with EPM. Using a set of tools, including a policy-as-code approach, you can tailor security policies to be as simple or as advanced as you need. Define any policy using Rego code and Open Policy Agent (OPA) to be highly prescriptive and catch suspect or non-compliant software artifacts before the damage is done..

Securing Containers at Scale: Docker Hardened Images + Cloudsmith

Containers have been with us for a while and are ubiquitous in the Secure Software Development Life Cycle (SSDLC). According to some reports, nearly 60% of organizations use containers for most or all of their production applications. It’s no surprise really, as containers provide consistency and standardization across the lifecycle while speeding up delivery pipelines. They revolutionized how we develop and deploy apps in the cloud and there is no sign of this changing anytime soon.

Preparing for the Autonomous Future

Throughout this blog series, we’ve followed how AI reshapes network operations – from foundational data harmonization to real-time correlation, from contextual insights to agent-driven automation, and most recently, to conversational access through natural language interfaces. But we haven’t reached the final destination.

Harnessing Network Observability to Enhance Grid Resilience

Within the utility sector, a lot is changing. Utilities continue to pursue digital transformation, altering the way services are delivered and operations are managed. What hasn’t changed is the criticality of the services provided. These organizations deliver essential resources like natural gas, electricity, and water—services that we as consumers rely upon constantly for our comfort, sustenance, communications, and more.

SAML authentication in Grafana Cloud: a guide for easy configuration

In my role as Senior Observability Architect here at Grafana Labs, one of the things I focus on is making sure customers are getting the most out of our products. Recently, I noticed a trend where customers were struggling to get SAML authentication configured properly. They were getting stuck on some of the steps needed to configure the users key pair values, which allows users to log in with the correct roles assigned in Grafana.

PagerDuty + Microsoft Build 2025: Transforming critical work with AI and automation

At Microsoft Build 2025, PagerDuty was featured in key announcements showcasing how intelligent agents and real-time automation redefine digital operations. From Microsoft Copilot to the launch of a new Azure SRE Agent, PagerDuty was highlighted as a strategic partner in enabling intelligent, scalable incident response.

A Simple Guide to Monitoring and Optimizing Prometheus CPU Usage

Prometheus is supposed to help you monitor your stack, not become the thing you need to monitor. But if you’ve ever seen it spike in CPU and slow everything down, you know that’s not always the case. High Prometheus CPU usage usually shows up when you're scraping too many metrics, using expensive queries, or running with default configs that don’t fit your workload. This guide covers how to track Prometheus CPU usage, what typically causes it, and how to fix it.

VPC Log Format: Custom and Advanced Configurations

VPC Flow Logs come with a default format that gives you basic network traffic details. But you can tweak the format to capture exactly what you need. This can lower costs, speed up processing, and make your logs fit better with what you’re trying to monitor. If you want to improve security, keep an eye on performance, or save money, adjusting your VPC logs can make a big difference. Let’s take a look at some practical ways to customize your logs beyond the default settings.

Surprised By Your AWS ELB Bill? Here's What Happened

On May 1st, AWS corrected a long-standing billing bug tied to Elastic Load Balancer (ELB) data transfers between Availability Zones (AZs) and regions. That fix triggered a noticeable increase in charges for many users, especially for those with high traffic volumes or distributed architectures. The problem wasn’t new usage; it was a silent correction to an old error.