Operations | Monitoring | ITSM | DevOps | Cloud

Observability and Monitoring Governance (Part 1 of 4)

In contrast to the many flavors of governance used for IT, such as data governance, audit and compliance, and governance and security, IT monitoring governance lacks a definition in many organizations. This is true even as teams have decades of experience monitoring the health, performance, and availability of applications, infrastructures, networks, and user experience. Good monitoring governance “just sort of happens—naturally, organically.” Not exactly!

Custom OpenTelemetry Collectors: Build, Run, and Manage at Scale

I tried thinking back to when the last time I read an actual tutorial that did not include a bunch of em (—) dashes, semicolons, normal dashes, and an unnervingly large quantity of the phrases like “XYZ-thing Alert ” and “Exciting News!”. Well, hold on to your suspenders folks, here we go again. Part 2 is up and it’s a controversial one.

Early Warning Signals now in Discord

We’re rolling out Early Warning Signals to yet another place your team works every day: Discord. With this release, nearly all of our chat integrations now deliver Early Warning Signals—bringing you proactive outage alerts no matter where your team collaborates. Already available by email, SMS, Slack, Microsoft Teams, Google Chat, and webhooks, Early Warning Signals are now live in Discord too—closing the gap and making sure your team is covered wherever you communicate.

Managing access in Grafana: a single stack journey with teams, roles, and real-world patterns

When multiple teams use Grafana, it can start to feel a bit messy. Dashboards pile up, permissions become unclear, and teams accidentally overwrite each other’s work. To help you and your organization stay clear, collaborative, and secure, we recommend putting all users in a single Grafana Cloud stack and managing access with teams, roles, and folders. To illustrate this, I’ll share a hypothetical example of how you can put this into practice across three teams. Let’s dive in!

A practical guide to error handling in Go

When you first start coding in Go, you quickly learn how error handling in the language differs from error handling in languages such as Java, Python, JavaScript, or Ruby. In those languages, throwing an exception automatically generates a stack trace. Go, by contrast, provides no built-in error tracing to reveal an error’s origin.

How to Use AI for Operational Excellence

Organizations are under immense pressure to do more with less – streamline operations, reduce costs, all whilst improving both the outcomes of the business and their employees. For IT and end-user computing (EUC) professionals, this challenge is especially prevalent. Systems are becoming increasingly complex, the digital employee experience is now directly tied to customer satisfaction, and the role of technology teams extends much further than solely keeping the lights on.

Broadcom Recognized as a Leader: Engineering the Future of Service Orchestration

In our digitally transforming world, the pace of change is relentless. Businesses are tasked with managing increasingly complex hybrid environments, from core mainframes to dynamic cloud services. The pressure is on, not only to keep the lights on, but to innovate faster, deliver flawless services, and fuel business growth. In this high-stakes environment, service orchestration and automation platforms are no longer just a tool—they are the central nervous system of the modern enterprise.

Crash reporting for gaming consoles is now Generally Available

TL;DR: Error monitoring and crash reporting for all major gaming consoles is now generally available (plus, the v1.1 of our Unreal Engine SDK). Already convinced? Jump to the ‘What’s In The Release?’ section. Over a decade ago, a customer hacked Sentry into their PlayStation 3 games. Fast forward to today, Sentry now supports thousands of game developers across web, mobile, and desktop. The missing piece? Consoles. Developers asked for it. We built it.

Serverless Monitoring Made Simple: Challenges and Solutions with Atatus

Serverless computing has revolutionized the way applications are built and deployed by eliminating infrastructure management and enabling automatic scaling. However, the dynamic and distributed nature of serverless architectures presents unique monitoring challenges that can impact application performance and user experience.