Operations | Monitoring | ITSM | DevOps | Cloud

Getting Started with Home Assistant Webhooks & Writing to InfluxDB

If you’re already running or are familiar with Home Assistant, you’ve likely worked with integrations, maybe a few automations, and possibly MQTT as a way to wire devices together. But webhooks add another layer of flexibility that lets you level up your smart home into a fully-customized, intelligent network. Instead of relying on built-in integrations and being confined to the same local network, you can let external devices and services push events directly into Home Assistant.

Isolate a User Session in Datadog Synthetics with proxymock

A customer pings support: “I tried to check out twice this morning and got a 500 each time, but it works fine for everyone else.” The session ID is in the email. You have full request/response capture in your environment, you have Datadog Synthetics already running browser checks against the same flow, and you still spend the next two hours grepping logs because none of those tools let you say “show me just this user’s requests, in order, and re-run them.”

Reports just got smarter

We’ve upgraded the Reports page in StatusGator to give you more insight directly inside the StatusGator dashboard. Previously, reporting was limited to exports you could use to calculate your own uptime percentages and trends. Now, in addition to exported reports, you can view key reports and metrics without needing to download anything. We’ve also added a one-click download of the most commonly requested report: Uptime percentage by monitor.

Improved Microsoft 365 private status integration

Keeping track of your Microsoft 365 services just got easier. We’ve rolled out an update to the Microsoft 365 integration that removes manual setup and improves visibility. All services in your account can now automatically appear as components, so you can monitor them right away.

Four types of incident alerts every team should know

Not every incident alert needs the same kind of response. One incident may need to wake someone up right away. Another may simply need to be picked up when the team starts work in the morning. Without a clear way to tell them apart, every incident feels equally urgent. That usually adds noise and makes incident response decisions harder than they need to be. This is where two questions help: In this guide, we’ll discuss what those questions mean and the four combinations that follow.

How to Test SQS Workflows Locally with LocalStack and OpenTelemetry

LocalStack lets you run SQS, Lambda, and S3 locally in Docker — but there's a hidden trap: OpenTelemetry's default AWS propagator doesn't work with free LocalStack. Here's how to set up end-to-end local testing with working trace propagation. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

ActiveMQ MQTT Protocol Setup Guide: QoS, SSL, and IoT Scale

Modern enterprise architectures increasingly need to bridge the gap between resource-constrained IoT devices and heavyweight enterprise backend systems. ActiveMQ MQTT support makes this possible: devices running the MQTT protocol - sensors, actuators, edge nodes, publish telemetry on standard topics, while JMS-based backend services consume and process the data without any client-code changes.

7 best AI deployment platforms for production Kubernetes workloads in 2026

Training a model in a notebook is easy. What breaks teams is the step after, serving it reliably without haemorrhaging cloud budget or burying your SREs in YAML. The common trap: picking a platform that handles the model but not the surrounding stack. An AI deployment platform should orchestrate the full application graph (inference endpoints, vector databases, caching layers, and frontends) inside a single VPC, with GPU autoscaling that doesn't require a dedicated platform engineer to babysit.

How to use an SRE agent to reduce downtime

An alert in the middle of the night warns of a potential business failure. Manual incident response becomes more complex due to the overwhelming data from distributed and dynamic digital services. With an SRE agent, your engineering team can cut through alert clutter. They can sort through various signals quicker, decreasing burnout and achieving faster, more affordable resolutions. Operational resilience will see its next evolution with Agentic AI.

What's New in InfluxDB 3 Explorer 1.8: Streaming Subscriptions, Smarter Sample Data, Line Protocol Validation, and Retention Controls

InfluxDB 3 Explorer 1.8 is all about writing data and keeping it under control. You can now subscribe to MQTT, Kafka, and AMQP streams directly from Explorer, generate custom sample datasets, stream live sample data continuously into your database, and validate your line protocol and preview the resulting schema before you write it. You can now also view and edit retention periods on both databases and individual tables.