Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

What is Cybersecurity?

Cybersecurity refers to the processes and technology used to protect information technology networks, data, people, servers, endpoint devices and other IT-related systems from cyberattacks. The need for this protection has never been greater. All organizations (in both private and public sectors) now exist in a threat landscape that allows attacks against their IT infrastructure.

Sync Your Users Into Icinga Notifications: Introducing the Contacts/Groups API

If you’ve ever onboarded a teammate at 4:57 PM on a Friday (or offboarded one at 4:58 PM… ), you know the pain: keeping notification contacts and groups up to date is work. With the Icinga Notifications REST API, you can automate that and avoid drift.

Top 9 Observability Tools for AI-Assisted Development & Deployment

AI-assisted development is rapidly becoming the default way software is built. Code generation, AI copilots, agentic pull requests, and automated refactoring are now embedded directly into engineering workflows. While this shift dramatically increases delivery speed, it also introduces a new operational reality: production systems are changing faster than humans can fully reason about them. This is where observability becomes mission-critical.
Sponsored Post

Why Every MSP Needs Centralized SaaS Monitoring

Your monitoring stack catches server failures, network issues, and application crashes. But what happens when Microsoft Teams goes down across half your client base at 3 AM? Your on-call tech gets bombarded with alerts that all trace back to one root cause they can't see. This is the MSP blind spot: third-party SaaS dependencies that sit outside your monitoring perimeter but directly impact your SLAs.

Observing agentic AI workflows with Grafana Cloud, OpenTelemetry, and the OpenAI Agents SDK

As agentic AI applications are used more broadly in production, they introduce new operational models, combining multi-step reasoning, tool execution, and autonomous decision-making into a single workflow. SRE teams need visibility into how these agents behave, where they fail, and how they perform over time.

Monitoring Sprawl: Why IT Teams Still Can't Get Actionable Insight Fast

IT teams collect extensive monitoring data but struggle to turn it into fast, confident decisions during incidents. Most IT leaders aren’t worried about whether their environments are monitored—they’re worried about whether their teams can make sense of what they’re seeing quickly enough to actually resolve issues. When something breaks, the problem usually isn’t finding data. Dashboards show activity, alerts indicate changes, and logs capture events across the entire stack.

AI Agent Governance: How to Keep Agentic ITOps Workflows Safe

The future of ITOps automation is better control over what AI agents can see, share, and do. AI automation in ITOps is expected to resolve incidents, reduce operational load, and operate with limited human involvement. Those outcomes depend on systems that can take action, not just surface insight. Agentic AI enables that shift. AI agents can correlate signals across tools, update tickets, trigger remediation, and coordinate workflows without waiting for instruction.

Make faster, better product decisions with Datadog Product Analytics

Product managers (PMs) need to make fast, confident decisions about what to build, fix, and improve based on user behavior within their application. But in practice, collecting the user insights they require is rarely straightforward. Recent updates to Datadog Product Analytics address this challenge. Product Analytics adds structure to autocaptured data and makes analysis easier to interpret, reuse, and share, helping PMs move from questions to answers without relying on SQL or engineering.