Operations | Monitoring | ITSM | DevOps | Cloud

Detect, Communicate, Resolve: Checkly's Agentic Workflow End-to-End

Coding agents are the fastest-growing audience for the Checkly CLI, and we're doubling down on them. In this session, Stefan hands Claude a real e-commerce app, lets it set up monitoring with `npx checkly init`, generate Playwright tests through MCP, and walk an actual alert end-to-end with Rocky AI in the loop.

#056 - Cloud Contradictions and Cautionary Tales with Corey Quinn (The Duckbill Group)

In this episode of the Kubernetes for Humans podcast, Itiel sits down with the internet's favorite cloud contrarian, Corey Quinn of the Duckbill Group. Corey shares his unconventional career path as a "cautionary tale," explaining why his knack for fixing horrifying AWS bills makes him a terrible employee, and why he absolutely refuses to touch Kubernetes in production.

How to use an SRE agent to reduce downtime

An alert in the middle of the night warns of a potential business failure. Manual incident response becomes more complex due to the overwhelming data from distributed and dynamic digital services. With an SRE agent, your engineering team can cut through alert clutter. They can sort through various signals quicker, decreasing burnout and achieving faster, more affordable resolutions. Operational resilience will see its next evolution with Agentic AI.

7 best AI deployment platforms for production Kubernetes workloads in 2026

Training a model in a notebook is easy. What breaks teams is the step after, serving it reliably without haemorrhaging cloud budget or burying your SREs in YAML. The common trap: picking a platform that handles the model but not the surrounding stack. An AI deployment platform should orchestrate the full application graph (inference endpoints, vector databases, caching layers, and frontends) inside a single VPC, with GPU autoscaling that doesn't require a dedicated platform engineer to babysit.

ActiveMQ MQTT Protocol Setup Guide: QoS, SSL, and IoT Scale

Modern enterprise architectures increasingly need to bridge the gap between resource-constrained IoT devices and heavyweight enterprise backend systems. ActiveMQ MQTT support makes this possible: devices running the MQTT protocol - sensors, actuators, edge nodes, publish telemetry on standard topics, while JMS-based backend services consume and process the data without any client-code changes.

VictoriaMetrics Virtual Meetup Q1 2026 - VictoriaMetrics Cloud Updates

VictoriaMetrics Cloud continues to mature as a secure, reliable, and cost-efficient observability platform. With PrivateLink now available across all regions, including Frankfurt, users can operate entirely without exposure to the public internet. Blue-green cluster deployments enable seamless, zero-downtime updates, while incremental backups ensure storage efficiency by capturing only what has changed. Operational visibility is improved with clearer alert states, showing Firing and Resolved conditions upfront. Security enhancements include stronger password policies and expanded authentication safeguards.

How to Test SQS Workflows Locally with LocalStack and OpenTelemetry

LocalStack lets you run SQS, Lambda, and S3 locally in Docker — but there's a hidden trap: OpenTelemetry's default AWS propagator doesn't work with free LocalStack. Here's how to set up end-to-end local testing with working trace propagation. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Four types of incident alerts every team should know

Not every incident alert needs the same kind of response. One incident may need to wake someone up right away. Another may simply need to be picked up when the team starts work in the morning. Without a clear way to tell them apart, every incident feels equally urgent. That usually adds noise and makes incident response decisions harder than they need to be. This is where two questions help: In this guide, we’ll discuss what those questions mean and the four combinations that follow.

From Context to Commitment

If service-centric observability provides the control layer, the next question becomes more urgent. What happens when organizations pair context with automation that operates inside clear defined boundaries? During conversations at Nexus Live 2025, leaders did not describe automation as a futuristic aspiration. They described it as a necessary progression. However, the distinction they drew was important. Automation without context accelerates activity.