Operations | Monitoring | ITSM | DevOps | Cloud

Data Observability: Build confidence in the data life cycle

Datadog Data Observability provides a complete solution with quality checks (e.g., volume, row changes, freshness), custom SQL-based monitors, anomaly detection, column-level lineage across systems like Snowflake and Tableau, full pipeline visibility, and targeted alerts when data issues arise.

Stop guessing! Speedscale's Notebook finds anything in your traffic.

Debugging complex microservices just got an upgrade. This video demonstrates Speedscale's innovative Notebook capability, allowing you to perform advanced substring searches and filter production traffic based on deeply nested JSON fields within request and response bodies. Unlike traditional observability tools that only record telemetry, Speedscale's always-on recorder captures full traffic payloads, empowering you to precisely pinpoint issues, identify specific user calls, or validate API versions. Streamline your troubleshooting, enhance your testing, and gain unprecedented visibility into your production environment.

Disposable Code Is Here to Stay, but Durable Code Is What Runs the World

Every day I seem to run into yet another post with someone solemnly opining that “writing code has never been the hardest part of software engineering. And hey, that’s smashing. As an engineer from the ops/infra/SRE side of the house, I feel like I’ve been saying this my whole career. (Is there anything more satisfying than being proven right in public? Not in my book.) So, which is it?

Why Your Loki Metrics Are Disappearing (And How to Fix It)

Grafana Loki is up and running, log ingestion looks healthy, and dashboards are rendering without issues. But when you query logs from a few weeks ago, the data's missing. This is a recurring problem for many teams using Loki in production: while the system handles short-term log visibility well, it often lacks the retention guarantees developers expect for historical analysis and incident review.

New in OTel: Auto-Instrument Your Apps with the OTel Injector

As distributed systems scale, maintaining manual instrumentation across services quickly becomes unsustainable. The OTel Injector addresses this by automatically attaching OpenTelemetry instrumentation to applications, no code changes needed. This blog covers how the OTel Injector works, how it integrates with Linux environments, and how to set it up for consistent telemetry across your stack.

Introducing the Cortex MCP Server

Cortex gives engineering teams full visibility and control over their services, from ownership and standards to service history and production readiness. Our goal is to help teams stay aligned and move faster so they are ready for whatever is ahead. The reality for any engineering team is that developers spend the most of their time in their IDE, not their IDP. And while developers love the context Cortex provides, they don’t love context switching.

Essential Monthly Developer Updates: Security and Software Improvements

Monthly developer updates focus on critical areas like Azure vulnerabilities and Visual Studio updates. Attention is needed for the Azure Monitor agent and service fabric, especially regarding auto-update settings. With Windows 10 nearing its end of life, organizations should consider ESU support. Adobe updates address vulnerabilities in After Effects, InCopy, and Illustrator. Windows updates for versions 10 and 11 tackle various vulnerabilities, while SQL Server and SharePoint server updates are essential for security.

What is Grafana Cloud? Fully Managed Observability Built on Open Standards | Grafana Labs

Grafana Cloud helps teams detect, investigate, and resolve incidents faster—thanks to AI, open standards, and seamless integrations with OpenTelemetry, Prometheus, Salesforce, and more. See how it all works in this live demo of a simulated e-commerce outage.