Operations | Monitoring | ITSM | DevOps | Cloud

What Is LLM Observability? For CFOs And Engineers, The Missing Layer Is Cost

You probably have Datadog. Maybe New Relic, maybe Dynatrace. Your observability stack has been solid for years — and you're still flying blind on AI cost. Here's why LLM observability needs a fourth pillar most tools skip, and how to build one that actually tells you what your models are costing you per request, per feature, per customer.

New: SSL Certificate Monitoring, Security Center, Domain & SSL Expiration Tracking - Plus Our Affiliate Program

DNS Spy now goes well beyond DNS record monitoring. We've shipped SSL certificate discovery and security auditing, expanded the Security Center to 40+ automated checks across six categories, and built expiration tracking for both domains and SSL certificates — with tiered alerts so nothing expires without warning.

Under the Hood: Engineering JFrog Premium Availability

In the modern software factory, 99.9% uptime is no longer the gold standard. A standard 99.9% SLA translates to approximately 43 minutes of unexpected downtime per month. While industry data shows that a single minute of downtime costs an average of $9,000, for large global enterprises, that figure can easily be 5x higher. At tens of thousands of dollars per minute, those 43 minutes quickly compound into a catastrophic financial and operational risk.

Nagios Plugins Collector: Run Your Existing Checks and Custom Scripts Inside Netdata

A lot of teams have a collection of Nagios plugins and custom monitoring scripts that have been running reliably for years. Some are standard community plugins for checking disk health or SSL certificate expiry. Others are homegrown Bash or Python scripts that check something very specific to the business: whether an API endpoint returns the right payload, whether a batch job completed on time, whether a queue depth is within bounds.

Modernizing a legacy CMake build-system

CMake tends to have a bad reputation for being to complex and convoluted, but often that notion stems from very old versions of CMake. Sure, CMake is a Turing-complete scripting language, but that is really needed for an ecosystem as complex as that of C and C++. And as Greenspun’s tenth rule of programming goes: There are countless build-systems and build-system generators for the C/C++ ecosystem. Some of them tried to use a simple, declarative approach.

Resolve's Agents of IT podcast - Ep. 17 - Agentic Workflows to Performance Intelligence

In this episode of Agents of IT, Ari Stowe sits down with Geoff McQueen, four-time founder and CEO of Ascendius, to unpack what it takes to navigate AI-driven disruption. Geoff shares a clear framework for where automation is headed, from individual AI use to agent-driven workflows to AI embedded across the business. Most organizations are still early. The real opportunity is in making AI work at the business level.

From Complex to Simple: How Integrated GRC Transforms Compliance, Risk and ITSM Operations

From Complex to Simple: How Integrated GRC Transforms Compliance, Risk and ITSM Operations Your teams face a complex regulatory landscape, limited visibility across departments and the need to demonstrate audit readiness and risk accountability. But you’re not alone; over half of global risk leaders say regulatory complexity is their biggest headache, while many struggle with siloed vendor data, fragmented controls and manual GRC processes.

Pyroscope Community Call LIVE from GrafanaCON 2026

Join us live from GrafanaCON 2026 for the Pyroscope Community Call! We’re kicking things off with a look at everything happening in the Pyroscope ecosystem, alongside special guest Alberto Soto. In this session: We take a look back over the last year in Pyroscope What’s new in continuous profiling What’s coming next From multi-language source code integration and symbolization improvements to OpenTelemetry profiles and performance gains, Pyroscope has evolved rapidly over the past year.

Loki Community Call LIVE from GrafanaCON 2026

Join us live from GrafanaCON 2026 for the Loki Community Call! We’re kicking things off with a look at everything happening in the Loki ecosystem, alongside special guests Poyzan Taneli, Ben Clive, and Trevor Whitney. In this session: We take a look back over the last year in Loki Explore the brand new “Thor” architecture Dive into what’s coming next for logging at scale From a completely new columnar storage format and Kafka-based ingestion, to a redesigned query engine and improved support for high-cardinality data—Loki is evolving to meet the demands of modern logging.