Operations | Monitoring | ITSM | DevOps | Cloud

Inside the Wins: Real Stories of Transforming Azure Observability into Business Value

Azure environments are growing fast, and so are the challenges of monitoring them at scale. In this blog, part of our Azure Monitoring series, we look at how real ITOps and CloudOps teams are moving beyond Azure Monitor to achieve hybrid visibility, faster troubleshooting, and better business outcomes. These real-life customer stories show what’s possible when observability becomes operational. Want the full picture? Explore the rest of the series.

SentinelOne Outage: Why Early Detection and Independent Monitoring Matter

When SentinelOne, a leader in cybersecurity and endpoint protection, experienced a major outage last week, thousands of organizations were suddenly left in the dark. With SentinelOne down for hours, IT and security teams scrambled for information and updates. But there was a critical missing piece: SentinelOne has no public status page. This gap left customers frustrated, searching for answers on social media, Reddit, and unofficial channels.

Why Does Your Network Get Blamed When Trouble Lies Beyond the Firewall?

The familiar scene unfolds: Critical applications are sluggish, user complaints are mounting, and the IT war room is buzzing. Eyes quickly dart towards the network team. It’s an almost instinctual reaction. But what happens when the problem isn't within the corporate LAN or even the data center? What if the real culprit lurks somewhere in the vast, untamed wilderness of the internet, a cloud provider's backbone, or a third-party SaaS application’s infrastructure?

Scaling Observability: How We Designed Bindplane to Manage 1,000,000 OpenTelemetry Collectors

Join the live stream at 11 am ET, here. Platform teams tend to start with just one, or in some cases a handful of OpenTelemetry (OTel) Collectors usually running in gateway mode. They then embrace the benefit of a vendor-neutral, standardized, telemetry collector for unified logs, metrics, and traces.

IETF Decreased Mean Response Time by 90% with Scout APM!

The Internet Engineering Task Force (IETF) is the premier Internet standards body, developing open standards through open processes. The IETF is a large open international community of network designers, operators, vendors, and researchers concerned with the evolution of Internet architecture and the smooth operation of the Internet. The IETF standards-setting process is open to any individual interested in providing technical contributions.

How to Set Up Tracing for Elixir Apps Using AppSignal

Over time, web applications have evolved from simple request/response-based systems into complex, distributed ones with lots of moving parts. If something goes wrong (and you can be sure it will), finding the cause can be nearly impossible. But this need not be the case: enter tracing. Tracing refers to the process of collecting detailed information about the execution of requests within an application, including function calls, execution time, and other relevant data.

Your Collector, Your Rules: Introducing BYOC and the OpenTelemetry Distribution Builder

Join the live stream at 11 am ET, here. OpenTelemetry’s super-power has always been: Choice. Yet, most observability vendors still insist you run their collector. Today we’re removing that last point of friction. With Bring Your Own Collector (BYOC), Bindplane now accepts any upstream-compatible build, recognizes exactly which receivers, processors, and exporters it contains, and adapts the UI and configuration workflow on the fly.

Edge Data Replication: Contributions and Status Updates for InfluxDB 3

If you’ve ever stood up multiple edge InfluxDB instances in remote locations and wished you could consolidate their data into a centralized instance for analysis, you’re not alone. That’s exactly why we designed Edge Data Replication (EDR) in InfluxDB v2. Now, with InfluxDB 3 Core and 3 Enterprise, we’re seeing new ways to handle replication using the brand-new Python Processing Engine.

Operational Resilience in 2025: Meeting New Standards, Mitigating New Risks

In a world of constant disruption, operational resilience is now mission critical. From cyberattacks and misconfigurations to vendor outages and natural disasters, today’s enterprises are navigating risks that move faster and hit harder than ever before. As we enter 2025, operational resilience has evolved from a best practice to a board-level imperative.