Operations | Monitoring | ITSM | DevOps | Cloud

Introducing: Checkly Agent Skills

AI coding agents are excellent at writing code. Ask Claude Code, Codex, or Cursor to add a feature, and it just works. At Checkly, we were ready for the new agentic world from the start! Monitoring as Code means your entire monitoring setup lives in your repository. API Checks, Browser Checks, alert channels, status pages; everything is defined in code, managed with the Checkly CLI, and version-controlled like any other part of your stack.

Improve performance and reliability with APM Recommendations

SREs and application developers rely on telemetry data to understand and improve their systems. As organizations scale and evolve, those systems generate an ever-growing volume of metrics, logs, and traces. But more data alone does not make it easier to improve performance or reliability: Identifying meaningful optimizations still requires careful investigation and analysis.

Designing Alerts for Action

In the first two posts of this series, we explored how alert noise emerges from design decisions, and why notification lists fail to create accountability when responsibility is unclear. There’s a deeper issue underneath both of those problems. Many alerting systems are designed without being clear about the outcome they’re meant to produce. When teams don’t explicitly decide what they want to happen as a result of a signal, they default to the loudest option available.

Unlimited Team Sizes for All

Starting from today, Healthchecks.io users on all plans (Hobbyist, Supporter, Business, Business Plus) can invite an unlimited number of users into their projects. Previously, the limits were: 3 team members for Hobbyist and Supporter, 10 team members for Business, and unlimited team members for Business Plus. From now on, it is unlimited for all.

Top 6 Cloud Monitoring Challenges in Hybrid & Multi-Cloud Environments

Hybrid and multi-cloud monitoring breaks down when teams can’t connect signals to customer impact fast enough to act. Hybrid and multi-cloud sound simple: run some workloads in public cloud, keep some on-premises, and connect it all. But in practice, you’re managing dependencies across teams and systems, tools that don’t share context, and incidents that refuse to stay in one place.

Turn Raw Data into Reliability by Changing Performance Perspectives

In a global microservices architecture, technical performance initially presents as a chaotic stream of disconnected telemetry. For a Technical Program Manager (TPM), success depends on the ability to move past these disconnected individual data points to identify stable patterns. If they have services entering critical states, looking at individual logs or traces is inefficient. Protecting system reliability requires an engine that can automate pattern recognition at scale.

Unlocking business resilience with full-stack observability in hybrid IT environments

For CIOs and technology leaders across the Gulf Cooperation Council (GCC), full-stack observability is a strategic lever for achieving faster ROI, operational resilience, and digital maturity. By integrating AI-powered insights and automation, IT leaders can streamline operations and align technology outcomes with business goals. Demonstrating ROI within tight timelines is critical, as is leveraging observability to maintain competitive advantage in a rapidly evolving market.

OpenTelemetry support for .NET 10: A behind-the-scenes look

At Grafana Labs, we are fully committed to the open source OpenTelemetry project and are actively engaged with the OTel community. Many Grafanistas spend a large proportion of their time contributing directly to OpenTelemetry upstream projects, helping make observability more powerful, reliable, and accessible for everyone as part of our big tent philosophy.