Operations | Monitoring | ITSM | DevOps | Cloud

Get started with Grafana Alerting: Route alerts using dynamic labels

In this tutorial you will learn how to configure notification policies for dynamic routing based on query values Don't miss the rest of the "Get started with Grafana Alerting" series! Each part dives into a different feature to help you get the most out of alerting in Grafana.

Demo of Raygun's remote MCP

This Raygun remote MCP demo highlights the new depth of context available. The agent isn’t just fetching error lists. it’s reasoning through stack traces to find the issues. Combine this with the ability to now view associated deployment versions, browser information, breadcrumbs, customer data and more, the agent becomes infinitely more capable at solving errors. We’ve even heard of some of the early testers going from having errors in production to having them solved within minutes.

AWS Outage: How do you prepare for the failure of your own safety net?

When AWS’s massive outage struck, it didn’t just take down cloud services, apps, and enterprise platforms. It also knocked out many of the monitoring systems organizations depend on for real-time answers. Observability companies, including Datadog, New Relic, Checkly, Dynatrace, SpeedCurve, and Splunk Observability, lost visibility or functionality precisely when organizations needed them most.

PagerDuty Joins AWS QuickSuite: Connect Your Incident Management with 1,000+ Applications

Today, we’re announcing that PagerDuty is now available in AWS QuickSuite through the Model Context Protocol (MCP). This means PagerDuty’s incident management capabilities can now connect with the 1,000+ applications and data sources that QuickSuite integrates with, from AWS services to enterprise SaaS platforms, all accessible through natural language.

Ari Stowe, Resolve COO and Carla Ely of Grokstream speak at Innovate Americas Dallas

At Innovate Americas Dallas, Ari Stowe, Resolve COO and Carla Ely, Grokstream Channel Development Manager joined industry leaders to discuss how AIOps and agentic automation are reshaping IT operations for a Zero Ticket future. In this dynamic session, they explored how AI-driven event correlation, predictive remediation, and autonomous workflows are transforming how enterprises detect, diagnose, and resolve issues before they ever become tickets.

How to test the reliability of a Point of Sale (POS) system

Point of Sale (POS) systems are the backbone of any retail store. A single outage can cost retail companies thousands of dollars each minute in lost sales, and even more if the outage happens during peak hours. If the outage goes on too long, it can cause even more costly damage as customers abandon carts and turn to competitors. In an industry where customer loyalty is worth its weight in gold, that brand damage can end up even more costly than the initial lost sales.

Unreal Engine crash reporting now available on gaming consoles with trace-connected logs

With the first major release of the Sentry Unreal SDK (now on v1.2.0, and you can also explore in our interactive sandbox), we’ve made some important improvements to support cross-platform Unreal developers when it comes to platform coverage, debugging with user feedback, and performance monitoring improvements. Here’s what’s new.