Operations | Monitoring | ITSM | DevOps | Cloud

Getting Started Guide with Netdata

New to Netdata? Start here. In this quick and practical guide, we’ll help you get set up and confident with Netdata in just a few minutes. You’ll learn how to: Access your Netdata Space Connect your nodes—servers, VMs, containers, network devices, and more Organize your infrastructure with Spaces and Rooms Collaborate with your team in real time Explore alerting and integrations Customize notifications so you’re only alerted when it truly matters.

Flyway Autopilot: Ingeniously Simple Database DevOps

Can Flyway Autopilot really help you get up and running with database migrations, CI/CD, and deployment best practices in under 20 minutes? In this special episode, Tonie and Tony sit down with Huxley Kendell, the co-creator of Flyway Autopilot, to find out what it does, how it works, and what inspired him to build it. Whether you're new to Flyway or looking to improve how your team handles database changes, this is your shortcut to getting started.

Playwright fixtures: A deep dive

Fixtures may be one of Playwright’s most powerful yet under-used features. Playwright fixtures can be used to simplify repetitive setup or teardown in your tests, manage test data ,and test state better. Fixtures are key if your objective is to write cleaner, maintainable and manageable Playwright tests. This tutorial is aimed at helping you master using Playwright fixtures, understand their purpose, and showing how you can use them most effectively in your tests.

Introducing Coralogix's MCP Server: Helping customers build smarter AI agents

Now available: Secure, real-time access to your observability data via Coralogix’s Model Context Protocol (MCP) Server. AI agents are only as powerful as the context they’re given. Today, we’re excited to announce the launch of the Coralogix MCP Server, which enables third-party AI agents to connect directly to your observability data across production, staging, and other environments.

IT Process Improvement Is Great... If You Can Find Someone to Build It

IT leaders know the value of process improvement. Smoother onboarding, faster incident resolution, streamlined change management, etc. It’s not for lack of ideas that IT teams fall short; it’s almost always a lack of bandwidth. Because of that, most process improvement efforts stall before they scale. Great ideas get captured in diagrams, Confluence pages, and strategy decks, but they rarely make it into production. Why?

Quantifying the True Cost of Healthcare IT Downtime

In today’s hospitals, technology is woven into every touchpoint of patient care. Nurses check vitals through digital monitors. Physicians review test results in the EHR. Medications get ordered, verified, and delivered through a network of connected systems. But when even one link in that chain fails, the impact isn’t just inconvenient—it’s dangerous. Downtime doesn’t just slow operations.

Splunk Named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms

We are proud to announce that Splunk has been named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms for the third year in a row. In our opinion, our recognition in the Observability category comes on the heels of Splunk being recognized for a tenth consecutive time as a Leader in the 2024 Gartner Magic Quadrant for Security Information and Event Management (SIEM). Splunk was the only vendor named a Leader in both SIEM and Observability for the Gartner Magic Quadrant three times.

Jaeger Metrics: Internal Operations and Service Performance Monitoring

You're monitoring a microservices-based system. Alerts trigger when response times exceed 2 seconds. But when you open Jaeger, you're faced with thousands of traces. Identifying which service or operation is responsible becomes time-consuming. Jaeger metrics help reduce this friction by exposing aggregated telemetry. Instead of scanning individual traces, you get service-level and operation-level performance metrics, latency, throughput, and error rates that highlight where the issue lies.

Automating High CPU Utilization Remediation with Resolve

High CPU utilization alerts can overwhelm IT teams and disrupt user productivity—especially in virtualized environments. In this video, see how Resolve automates the end-to-end remediation process for sustained CPU spikes. From detecting alerts and creating incidents to gathering host data, verifying VM configurations, and dynamically adding vCPUs—watch how Resolve eliminates manual effort and speeds up incident resolution.