Monthly Archive

Beyond polling: Why enterprises are exploring network telemetry

Jun 30, 2026 By Monicaa M In ManageEngine

Polling has been the go-to approach for network monitoring for years, and it still plays an important role in keeping networks healthy. But as networks become more distributed, application-driven, and data-intensive, simply polling devices more often isn't always the most efficient way to gain deeper operational insights. That's where network telemetry comes in.

Read Post

ManageEngine

Read more about Beyond polling: Why enterprises are exploring network telemetry

Sponsored Post

CloudWatch Logs to S3: The Easy Way

Jun 30, 2026 By David Bunting In ChaosSearch

Many organizations use Amazon CloudWatch to analyze log data, but find that restrictive CloudWatch log retention issues hold them back from effective troubleshooting and root-cause analysis. As a result, many companies may be looking for effective ways to export CloudWatch logs to S3 automatically. Let's look at some of the reasons why you might want to export CloudWatch logs to S3 in the first place, along with some Amazon-native and open-source tools to help you with the process.

Read Post

ChaosSearch

Read more about CloudWatch Logs to S3: The Easy Way

Notes from the Field: Understanding "Lost connection" LAS activations in Citrix Virtual Apps and Desktops

Jun 30, 2026 By GripMatix In GripMatix

With the transition from file-based licensing to the License Activation Service now complete, many Citrix administrators are spending more time in the Citrix Cloud licensing portal. As organizations continue to operate and troubleshoot LAS-based Citrix Virtual Apps and Desktops environments, it becomes increasingly important to understand what the licensing dashboard is actually showing.

Read Post

GripMatix

Read more about Notes from the Field: Understanding "Lost connection" LAS activations in Citrix Virtual Apps and Desktops

NiCE VMware vSphere Management Pack 6.2

Jun 30, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

Deeper monitoring. Better reporting. Fewer alerts. Less coffee at 2 a.m. Get the new version.

Read Post

NiCE IT Mgmt

Read more about NiCE VMware vSphere Management Pack 6.2

Keep Calm and AI On: The UK leads on AI adoption but trails on confidence

Jun 30, 2026 By SolarWinds In SolarWinds

New data from SolarWinds reveals a growing gap between AI adoption and confidence in the UK, when compared with the US and India.

Read Post

SolarWinds

Read more about Keep Calm and AI On: The UK leads on AI adoption but trails on confidence

What Customers Are Doing With AI and Honeycomb

Jun 30, 2026 By Rox Williams In Honeycomb

At O11yCon, we talked to engineering teams across the industry, and the numbers are starting to get genuinely wild: Mixpanel DevOps Engineer Eddie Bracho told us their engineering team is generating 50% more PRs than before AI came into the mix (sorry). That kind of velocity is exciting, but it's also a pressure test for every part of your stack that isn't writing code, including your observability practice. Here's what we're hearing from customers about how that's playing out.

Read Post

Honeycomb

Read more about What Customers Are Doing With AI and Honeycomb

New Feature: Automatic Snapshots When Latency Spikes

Jun 30, 2026 By Roi Bar In Lightrun

We’ve released an exciting new Lightrun capability: set a duration threshold on your Tic & Toc or Method Duration metrics, and Lightrun will automatically capture a snapshot whenever execution exceeds it. It takes moments to configure, and gives engineers the runtime context they need to understand why unexpected slow executions are occurring.

Read Post

Lightrun

Read more about New Feature: Automatic Snapshots When Latency Spikes

The hard part of AI root cause analysis is no longer the model

Jun 30, 2026 By Nikolay Sivko In Coroot

Every few weeks someone tells me root cause analysis is a solved problem now: pipe your telemetry into an LLM, let it tell you what broke. I wish it were that easy. After years on this, I think "can AI do RCA?" is the wrong question, because doing RCA with an LLM is really two separate jobs, and the answer is different for each. They break in completely different ways, so it's worth pulling them apart.

Read Post

Coroot

Read more about The hard part of AI root cause analysis is no longer the model

Difference Between Elasticity and Scalability in Cloud Computing

Jun 30, 2026 By Ramya Shah In Motadata

In cloud computing, teams use elasticity and scalability as if they mean the same thing. In reality, the two describe different ways a system handles load, and they solve different problems. Mixing them up can be very expensive. You either pay for capacity that sits idle, or your app buckles the moment traffic spikes, and the bill and the incident report both feel it.

Read Post

Motadata

Read more about Difference Between Elasticity and Scalability in Cloud Computing

Sentry 201: How to debug with Traces, Logs & Metrics

Jun 30, 2026 By Sentry In Sentry

This workshop goes beyond the basics to show you how to monitor critical experiences in your application with workflows using dashboards, tracing, logs, and metrics.

View Video

Sentry

Read more about Sentry 201: How to debug with Traces, Logs & Metrics

Debug and evaluate your AI app from your coding agent with Datadog Agent Observability

Jun 30, 2026 By Michael Bevilacqua-Linn In Datadog

Coding agents like Claude Code, Cursor, and Codex CLI handle the coding parts of building an AI application well. The harder work comes after: understanding why a response went wrong, building eval sets that reflect real production behavior, and keeping up with an application that changes faster than any one-off script can. Teams spend 60–80% of their time on evaluation and error analysis, and much of that work needs to be redone every time the stack shifts.

Read Post

Datadog

Read more about Debug and evaluate your AI app from your coding agent with Datadog Agent Observability

5 pitfalls to avoid when measuring DevEx in the AI era

Jun 30, 2026 By Datadog In Datadog

Developer experience, commonly known as DevEx, describes how an organization’s systems, workflows, tools, and culture affect developer productivity. A positive DevEx leads to tangible organizational benefits, including faster releases, increased innovation, and reduced technical debt. Measuring DevEx enables engineering management to quantify their team’s impact and understand where to direct improvement efforts.

Read Post

Datadog

Read more about 5 pitfalls to avoid when measuring DevEx in the AI era

Datadog acquires Adaptive ML

Jun 30, 2026 By Alexis Lê-Quôc In Datadog

Off-the-shelf models are easy to deploy, but they are rarely enough to solve complex, domain-specific challenges in production. The key to sustained AI value is not in the models themselves but in the ability to tune, evaluate, and refine those models against your organization’s real-time signals. We are excited to announce that Adaptive ML is joining Datadog to accelerate this vision by combining our deep observability data with their expertise in building specialized, high-performance AI agents.

Read Post

Datadog

Read more about Datadog acquires Adaptive ML

What's New in InfluxDB 3 Explorer 1.9: Flux-to-SQL Conversion, InfluxQL Support, and More

Jun 30, 2026 By Daniel Campbell In InfluxData

InfluxDB 3 Explorer 1.9 makes it easier to work with your existing queries. Whether you’re migrating Flux queries to SQL or you’ve been writing in InfluxQL for years, this release helps bring your existing queries forward instead of starting from scratch. For teams moving to v3 from earlier versions of InfluxDB, query migration is often one of the last major hurdles.

Read Post

InfluxData

Read more about What's New in InfluxDB 3 Explorer 1.9: Flux-to-SQL Conversion, InfluxQL Support, and More

Signal: The key to Self-Healing Software

Jun 30, 2026 By Milin Desai In Sentry

More code is being written right now than at any point in our industry’s history. A year’s worth of software is now created every month, and most of it is no longer written by people. GitHub COO Kyle Daigle recently said, “There were 1 billion commits in 2025.

Read Post

Sentry

Read more about Signal: The key to Self-Healing Software

k8s-monitoring-helm Chart Office Hours (June 2026)

Jun 30, 2026 By Grafana In Grafana

In the June edition of the Kubernetes Monitoring Helm chart office hours, we discuss the version 4.1 release, the upcoming 4.2 feature release, and we discuss the deprecation of the 1.x and 2.0 versions.

View Video

Grafana

Read more about k8s-monitoring-helm Chart Office Hours (June 2026)

The Journey to Achieving Hyperscale Availability with AI-Driven Prediction

Jun 30, 2026 By Datadog In Datadog

At hyperscale, a regional cloud outage is not merely a technical disruption—for Samsung Account, which serves 2.1 billion users across three global regions, it is an immediate global service crisis. Fragmented, region-siloed monitoring creates blind spots that make early detection nearly impossible, leaving SRE teams perpetually reactive rather than predictive. The path to proactive reliability requires both a philosophical shift and a foundational change in how observability data is collected, unified, and reasoned over.

View Video

Datadog

Read more about The Journey to Achieving Hyperscale Availability with AI-Driven Prediction

Coralogix vs Sumo Logic: Support, Pricing, Features & More

Jun 30, 2026 By Chris Cooney In Coralogix

Coralogix and Sumo Logic are two different answers to the same observability platform decision. Where Coralogix processes telemetry in flight, stores it in your own Amazon Simple Storage Service (S3) bucket, and prices on data ingested, Sumo Logic keeps data in vendor-managed storage and, under its Flex model, bills for data scanned at query time. Both platforms have introduced pricing and artificial intelligence (AI) changes in the past year, and those changes have widened the difference between them.

Read Post

Coralogix

Read more about Coralogix vs Sumo Logic: Support, Pricing, Features & More

Coralogix vs New Relic: Comparison Guide (2026)

Jun 30, 2026 By Chris Cooney In Coralogix

Coralogix and New Relic both cover the full observability surface, but they charge for it and store it in different ways. One prices purely on data ingested and writes telemetry to a bucket you own, while the other combines ingest pricing with per-user licensing and retains data in its own backend. This guide covers how the two platforms compare on core features, pricing structure, AI observability, archiving and retention, security coverage, and support, then shows when each one is the stronger choice.

Read Post

Coralogix

Read more about Coralogix vs New Relic: Comparison Guide (2026)

6 Ways to Use the Hyperping MCP Server

Jun 30, 2026 By Leo Baecker In Hyperping

When something goes down, the last thing you want is to alt-tab between a monitoring dashboard, your on-call tool, and three Slack threads to figure out what is happening and who owns it. That context is usually all there. It is just scattered. The Hyperping MCP server fixes that by putting your monitoring data inside the AI tools you already work in. Your agent can read monitor state, outage timelines, SLAs, and on-call schedules, and answer the questions you would normally chase across tabs.

Read Post

Hyperping

Read more about 6 Ways to Use the Hyperping MCP Server

Introducing Atatus MCP Server: Connect AI Agents to Your Observability Data

Jun 30, 2026 By Mohana Ayeswariya J In Atatus

AI coding assistants like Claude, Cursor, Codex, GitHub Copilot have become standard tools in the modern engineering workflow. Developers use them to write code, generate tests, and review pull requests. But when something breaks in production, these assistants hit a wall: they have no access to your actual system state. They can reason about logs, traces, and metrics. They just can't see yours.

Read Post

Atatus

Read more about Introducing Atatus MCP Server: Connect AI Agents to Your Observability Data

Is Sonnet 5 actually more cost effective in practice?

Jun 30, 2026 By Coralogix In Coralogix

Anthropic have just released Sonnet 5, their cost effective alternative to Opus 4.8. The token cost is much lower, but when we analyse the telemetry, we find something surprising. It turns out, it's not all about token cost!

View Video

Coralogix

Read more about Is Sonnet 5 actually more cost effective in practice?

Full-stack observability in Grafana Cloud: How to investigate issues across services and infrastructure

Jun 30, 2026 By Victor Padilla In Grafana

Many times, the hardest part of troubleshooting isn’t fixing the actual problem. It’s figuring out where to start. As engineers, it’s easy to lose count of how many times we’ve opened logs, then 10 metrics tabs, and another 10 tabs with trace queries, only to end up back in the logs trying to find a root cause.

Read Post

Grafana

Read more about Full-stack observability in Grafana Cloud: How to investigate issues across services and infrastructure

New in Skylar One - Kyoto: Helping IT and Business Teams Focus on What Matters Most

Jun 30, 2026 By ScienceLogic In ScienceLogic

When technology works, businesses thrive. Employees stay productive, customers stay connected, and critical services keep running. But when something goes wrong, the real challenge is not only detecting the issue. It is understanding what it affects, who may fell the impact, and how urgently the business needs to respond. That is the value behind the Kyoto release. The latest Skylar One update helps teams better connect IT health to business impact.

Read Post

ScienceLogic

Read more about New in Skylar One - Kyoto: Helping IT and Business Teams Focus on What Matters Most

Configuration drift in enterprise networks: Causes, impact, and management

Jun 29, 2026 By akash.mj In ManageEngine

Network admins want all devices with the same role to behave the same way. But in real environments, that consistency rarely lasts. Imagine two core switches in the same data center. They serve the same function and run the same OS version. One handles traffic without issue, while the other drops packets during peak hours. Logs show nothing obvious. Routing looks correct. The team spends hours checking links, hardware, and traffic paths.

Read Post

ManageEngine

Read more about Configuration drift in enterprise networks: Causes, impact, and management

Connecting Ticketing Systems to Microsoft SCOM

Jun 29, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

Microsoft SCOM (System Center Operations Manager) remains a widely used enterprise monitoring platform due to its deep integration with Windows, hybrid-cloud support, and extensible management packs. However, the value of SCOM is fully realized only when its alerts seamlessly flow into ITSM or ticketing systems. This ensures incidents are created, routed, and resolved efficiently.

Read Post

NiCE IT Mgmt

Read more about Connecting Ticketing Systems to Microsoft SCOM

Sponsored Post

Avantra 26: A Breath of Fresh Multi-Tenant AIR

Jun 29, 2026 By Brenton O'Callaghan In Avantra

There's a crackle and spark in the air at Avantra lately, and I'm so pleased to be writing this bit on what we've accomplished with the Avantra 26 release. Automated root cause analysis, multi-tenant management support for Cloud ALM, enhanced security operations and financial operations monitoring BTP - it's all there, and more. It's an exciting and innovative release for Avantra!

Read Post

Avantra

Read more about Avantra 26: A Breath of Fresh Multi-Tenant AIR

What Is Agentic Observability? The Complete Guide for Enterprise Engineering Teams

Jun 29, 2026 By Libi Michelson In logz.io

TL;DR Agentic observability uses AI agents to autonomously investigate incidents, identify root causes, and take action in production environments. Unlike traditional monitoring (which alerts and waits) or AIOps (which assists human analysis), agentic platforms conduct the investigation themselves. Key capabilities include autonomous incident triage, evidence-backed root cause analysis, alert noise reduction, and governed remediation.

Read Post

logz.io

Read more about What Is Agentic Observability? The Complete Guide for Enterprise Engineering Teams

Unleashing Enterprise Agility: The Power of Portfolio Kanban Flow States

Jun 29, 2026 By Eric Nash In Broadcom

In the world of enterprise Agile, we face a persistent paradox: How do we empower individual teams to establish their own unique processes, while ensuring leadership maintains a clear, consistent view of the entire organization’s progress? For a long time, the answer was a compromise.

Read Post

Broadcom

Read more about Unleashing Enterprise Agility: The Power of Portfolio Kanban Flow States

Your AI isn't underperforming. Your data foundation is.

Jun 29, 2026 By Jeremy Pell In Elastic

New research reveals why Australian businesses are entering the new financial year with bigger AI budgets and the same unsolved problem. One in three Australian businesses exceeded their AI budget last year. Yet, half of them plan to increase AI spending again this year. Yet the behaviour that caused those budget overruns remains largely unaddressed.

Read Post

Elastic

Read more about Your AI isn't underperforming. Your data foundation is.

Instrumenting AI Agents for the Agent Timeline: A Practical OpenTelemetry Guide

Jun 29, 2026 By Dan Juengst In Honeycomb

AI agents are nondeterministic, multi-step, and opaque. When one fails in production, "the model said something weird" is the cheapest, most useless line in your incident postmortem. To debug agents the way they actually run, you need telemetry that captures all of it, in order, with enough context to reconstruct what happened. The OpenTelemetry GenAI Semantic Conventions give you a vendor-neutral way to do exactly that.

Read Post

Honeycomb

Read more about Instrumenting AI Agents for the Agent Timeline: A Practical OpenTelemetry Guide

What is Network Monitoring? A Guide for IT Teams

Jun 29, 2026 By Jagdish Sajnani In Motadata

Over 90% of mid-sized and large companies estimate that a single hour of downtime now costs more than $300,000. The clock starts the moment something breaks, whether anyone has noticed it or not. And most outages don't start with alarms. They begin with a small issue inside the network: an overloaded switch, a saturated link, or an unstable interface. Left unnoticed, those small issues grow into user complaints, stalled work, lost revenue, and damaged customer trust.

Read Post

Motadata

Read more about What is Network Monitoring? A Guide for IT Teams

The Frictionless Workplace Isn't What You Think It Is: Beyond the Ticket

Jun 29, 2026 By Ella Drimer In Nexthink

For many EUC and digital workplace leaders, the challenge isn't a lack of technology. It's understanding why workplace issues continue to surface despite years of investment in automation, AI, and digital transformation. Support teams are still dealing with high ticket volumes. Rollouts intended to improve employee experience can create new sources of disruption, and IT often struggles to understand what employees are experiencing until problems escalate into complaints, incidents, or support requests.

Read Post

Nexthink

Read more about The Frictionless Workplace Isn't What You Think It Is: Beyond the Ticket

Bug Us: LIVE AMA

Jun 29, 2026 By Sentry In Sentry

Try Sentry for free: https://sentry.io/welcome
Docs: https://docs.sentry.io

View Video

Sentry

Monitoring

Read more about Bug Us: LIVE AMA

Logz.io Webinar Recap: A Four-Step Blueprint for Faster Root Cause Analysis

Jun 29, 2026 By Libi Michelson In logz.io

Incident investigations take so long not because the fix is hard, but because finding the right fix is. Most engineers spend 20 to 60 minutes just understanding what’s wrong before they can act, not fixing anything, just trying to see the full picture. The framework that changes this has four steps: Orient, Isolate, Hypothesize, and Verify, and the order matters more than the tools.

Read Post

logz.io

Read more about Logz.io Webinar Recap: A Four-Step Blueprint for Faster Root Cause Analysis

When World Cup Traffic Spikes in Mexico, Can You See Where the Internet Breaks?

Jun 29, 2026 By LogicMonitor In LogicMonitor

The World Cup is already proving how quickly digital demand can concentrate across Mexico’s networks, making internet path visibility critical for teams responsible for reliable user experiences. The 2026 FIFA World Cup is already testing Mexico’s networks. Mexico’s June 11 opening match against South Africa drew 7.1 million viewers for an English-language U.S. broadcast and peaked at 9.1 million viewers. That kind of demand puts real pressure on the systems behind digital experiences.

Read Post

LogicMonitor

Read more about When World Cup Traffic Spikes in Mexico, Can You See Where the Internet Breaks?

Sentry + Github Copilot Agents

Jun 29, 2026 By Sentry In Sentry

Seer, Sentry's AI debugger, analyzes your issues and finds the root cause. Now you can pass that analysis directly to a GitHub Copilot agent which picks up the context, generates a fix, and opens a pull request. The agent session and PR both live on GitHub, with a link back in Sentry for easy access. This video walks through how the integration works and how to set it up in just a couple steps.

View Video

Sentry

Read more about Sentry + Github Copilot Agents

Monitoring your Shopify Store

Jun 29, 2026 By Uptime Website Monitoring In uptime

Learn how to monitor your Shopify store using the HTTPs and Transaction checks.

View Video

uptime

Monitoring

Read more about Monitoring your Shopify Store

What's New in Scout Monitoring: June 2026

Jun 29, 2026 By Aspen Clevenger In Scout

June was about finishing touches. The fun part. Node.js support, which we previewed in May, is live. Anomaly detection graduated with a rebuilt algorithm, per-monitor controls, and access from the API, CLI, and MCP server. We also kept pulling on the same thread from recent months: Scout data should be reachable from wherever you actually work. The MCP server now covers historical insights, anomaly events, and 30-day metrics. Discord is a notification channel. The CLI has scout anomalies.

Read Post

Scout

Read more about What's New in Scout Monitoring: June 2026

Next.js already traces your requests. Here's how to export them with OpenTelemetry.

Jun 29, 2026 By Kyle Tryon In Sentry

Traces are a goldmine of information that can help you, or your AI, find slow pages and fix them. Next.js comes out of the box with support for tracing. Incoming requests, fetch() calls, middleware, and server-side rendering are all wired up and ready to send traces to any OpenTelemetry-compatible backend. The catch is, unless you configure an exporter, you’ll never see those traces.

Read Post

Sentry

Read more about Next.js already traces your requests. Here's how to export them with OpenTelemetry.

Why Observability Isn't Enough for AI Coding Agents

Jun 29, 2026 By Lightrun Team In Lightrun

Observability platforms collect pre-instrumented logs, metrics, and distributed traces to monitor production systems and surface failures to human engineers. The adoption of AI into engineering has led observability providers to offer those same signals to agents. This is often packaged as AI observability, but the signals themselves were designed around a human investigation loop. AI coding agents work faster, consume data differently, and need feedback as they work rather than after deployment.

Read Post

Lightrun

Read more about Why Observability Isn't Enough for AI Coding Agents

Teach Your AI Coding Agent to Answer Production Questions | Lightrun Ask Prod AI Skill

Jun 28, 2026 By Lightrun In Lightrun

Lightrun's Gidi Freud demonstrates Ask Prod, the latest Lightrun AI Skill that teaches AI coding agents how to use Lightrun to answer production questions with live runtime evidence. Watch Codex use the skill to discover runtime sources, collect focused runtime data, adapt its investigation, and return an evidence-backed answer. Compatible with Claude Code, Cursor, GitHub Copilot, and other AI coding agents through the Lightrun MCP.

View Video

Lightrun

Read more about Teach Your AI Coding Agent to Answer Production Questions | Lightrun Ask Prod AI Skill

Fleet Observability: Linux Edge Device Monitoring

Jun 28, 2026 By Netdata Team In netdata

It feels less like managing devices and more like remote babysitting. You check the dashboard, everything is green, and then a customer in the field tells you a device has been down for two days. At a handful of servers, the rare failure is an event.

Read Post

netdata

Read more about Fleet Observability: Linux Edge Device Monitoring

How to Prevent SEO Issues During Website Migrations

Jun 28, 2026 By OpsMatters In OpsMatters

Website migrations are often necessary as businesses grow, modernize their platforms, or rebrand. Whether you're changing domains, redesigning your website, switching content management systems, or moving to a new hosting environment, a migration can improve performance and user experience. However, without proper planning, it can also lead to a significant loss in search engine visibility, organic traffic, and revenue.

Read Post

OpsMatters

Read more about How to Prevent SEO Issues During Website Migrations

From query to action: Introducing SQL alerting in Cloud Monitoring Observability Analytics

Jun 27, 2026 By Joy Wang In Google Operations

Cloud Monitoring Observability Analytics lets you create alerts from (and get alerted about) analytical SQL queries of logs and traces.

Read Post

Google Operations

Read more about From query to action: Introducing SQL alerting in Cloud Monitoring Observability Analytics

Rethinking Public Sector Observability: From Infrastructure Health to Mission Continuity

Jun 26, 2026 By Teia Jensen In LogicMonitor

Public sector reliability is not a green dashboard. It’s whether people can complete the service when it matters.

Read Post

LogicMonitor

Read more about Rethinking Public Sector Observability: From Infrastructure Health to Mission Continuity

Reduce CDN log costs with searchable archives

Jun 26, 2026 By Rufina Mariam In Datadog

Engineering teams that manage high-volume log sources, such as content delivery network (CDN) edges, streaming platforms, and authentication systems, often have to make a difficult retention tradeoff. Indexing every event keeps logs searchable during investigations, audits, and postmortems, but it can make long-term retention expensive.

Read Post

Datadog

Read more about Reduce CDN log costs with searchable archives

Why Is Root Cause Analysis So Hard for IT Teams to Get Right?

Jun 26, 2026 By Motadata In Motadata

In this video, learn what Root Cause Analysis (RCA) is and why it's essential for preventing recurring IT incidents instead of repeatedly fixing the same symptoms. Discover how effective RCA helps IT teams identify the real source of problems, reduce downtime, and improve operational resilience. In this video, you'll learn: Contact Us sales@motadata.com Resources Follow Us on Social Media.

View Video

Motadata

Read more about Why Is Root Cause Analysis So Hard for IT Teams to Get Right?

Runtime Aware PR Review: Validate Changes in Live Production

Jun 26, 2026 By Lightrun Team In Lightrun

Runtime PR review means validating a code change against live variable state, real execution paths, and downstream service behavior before the merge decision. Not after a checkout regression exposes what the diff missed. As AI coding agents ship PRs faster than any reviewer can mentally simulate execution, static analysis and CI leave a structural gap that only runtime evidence can close. This article explains what that gap looks like, why it recurs, and how to close it with runtime context code review.

Read Post

Lightrun

Read more about Runtime Aware PR Review: Validate Changes in Live Production

From Legacy to AI-Ops: Securing and Scaling Systems for 20M Device Requests with Datadog

Jun 26, 2026 By Datadog In Datadog

Modernizing a legacy system serving 20 million devices without users noticing is like replacing a jet engine mid-flight. In this session, YoungJin Jung and Donggen Hong from LG U+ share their 18-month journey transforming a Telco-scale API Gateway from a rigid, proprietary solution into a high-performance, open-source architecture on AWS, and the operational challenges they solved along the way.

View Video

Datadog

Read more about From Legacy to AI-Ops: Securing and Scaling Systems for 20M Device Requests with Datadog

Cloud Cost Optimization: 20 Strategies for Enterprises

Jun 26, 2026 By Ramya Shah In Motadata

Cloud cost optimization has become a critical priority in 2026. What starts as a manageable $5,000 monthly cloud bill can quickly grow to $50,000 within a few quarters, often without any major change in workload. If you lead an engineering or infrastructure team, this probably sounds familiar. You may have already seen costs rise faster than expected or struggled to explain sudden spikes in cloud spend. The challenge today goes beyond just rising numbers.

Read Post

Motadata

Read more about Cloud Cost Optimization: 20 Strategies for Enterprises

Fixing React Native Apps: Don't Let Bugs Crop Up

Jun 26, 2026 By Sentry In Sentry

Join us for a conversational workshop with Simon Grimm, creator of Galaxies.dev and solo developer behind Tiny Harvest, as he shares how he monitors and debugs a real, live React Native app in production.

View Video

Sentry

Monitoring

Read more about Fixing React Native Apps: Don't Let Bugs Crop Up

Ship Reliable AI Faster: How to Operate AI Agents with Control and Confidence

Jun 26, 2026 By Datadog In Datadog

Replace "AI shipped on hope" with an operating model that holds up once real users depend on it. AI quality is multi-dimensional, covering accuracy, tone, safety, and faithfulness to user data, and can't be debugged from outputs alone. Without visibility into what their AI actually did in production, teams miss regressions, reverse-engineer chains by hand, and watch a single bad answer erode trust built over hundreds of right ones.

View Video

Datadog

Read more about Ship Reliable AI Faster: How to Operate AI Agents with Control and Confidence

How VictoriaLogs Stores Your Logs in a Columnar Layout

Jun 26, 2026 By Phuong Le In VictoriaMetrics

If you run VictoriaLogs, your day-to-day comes down to three things: sending logs, querying them, and setting retention so the disk does not fill up. Everything else happens quietly on disk.

Read Post

VictoriaMetrics

Read more about How VictoriaLogs Stores Your Logs in a Columnar Layout

Icinga Director, vSphere Integration, and SSO: New Releases Available

Jun 26, 2026 By Ravi Srinivasa In Icinga

This release roundup brings together Icinga Web SSO v1.0.0, Icinga Director v1.11.9, and Icinga vSphere Integration v1.8.4. It introduces OpenID Connect single sign-on for Icinga Web and includes compatibility updates for the upcoming Icinga PHP Library 1.0.0 release.

Read Post

Icinga

Read more about Icinga Director, vSphere Integration, and SSO: New Releases Available

Sanctioned Isn't Secured: The AI Audit Logs Your SIEM Never Sees

Jun 26, 2026 By VirtualMetric In VirtualMetric

Your organization has approved AI platforms for development, data science, and productivity. Procurement signed off. Legal reviewed the terms. Employees are using them. The tools are sanctioned. What isn’t sanctioned is invisibility. The administrative layer of every AI platform in your environment — OpenAI, Amazon Bedrock, Google Gemini, Cursor, Databricks, Glean and others — generates security-relevant events that your SIEM has never seen.

Read Post

VirtualMetric

Read more about Sanctioned Isn't Secured: The AI Audit Logs Your SIEM Never Sees

How to collect telemetry from Claude Code and Codex

Jun 26, 2026 By Coralogix In Coralogix

#aicoding #softwareengineering #claudecode

View Video

Coralogix

Read more about How to collect telemetry from Claude Code and Codex

The AI Engineering Playbook: How to Evaluate & Iterate at Every Phase of Development

Jun 26, 2026 By Datadog In Datadog

AI coding tools are accelerating development velocity, creating a release challenge most teams aren’t equipped for. Without controlled rollout, higher change velocity makes it harder to know which specific release drove the results you’re seeing in production. And when teams use AI, to build AI – LLM apps and AI agents– complexity multiplies. Traditional observability can’t ensure AI agent quality, performance, and cost-efficiency at production scale.

View Video

Datadog

Read more about The AI Engineering Playbook: How to Evaluate & Iterate at Every Phase of Development

Women In Tech Panel - Engineering with AI in the Stack

Jun 26, 2026 By Datadog In Datadog

Every team is doing something with AI right now. What that something is, is an entirely different question. And whether that something is successful? Most teams are still figuring it out as they go.

View Video

Datadog

Read more about Women In Tech Panel - Engineering with AI in the Stack

Grafana + Uptrace: Reuse Your Dashboards in Seconds

Jun 26, 2026 By Uptrace In Uptrace

In this tutorial you'll learn how to use Uptrace and Grafana together. Uptrace exposes a Prometheus-compatible HTTP endpoint, so you can add it as a data source in Grafana and reuse your existing dashboards without changing metric names or rewriting queries.

View Video

Uptrace

Read more about Grafana + Uptrace: Reuse Your Dashboards in Seconds

Introducing the StatusGator Confluence integration

Jun 25, 2026 By Valeria Kurolapova In StatusGator

We’re excited to announce the new StatusGator Confluence integration. When issues happen, teams need information fast. With the StatusGator Confluence integration, you can embed real-time service status directly into Confluence, making operational updates accessible alongside your team’s documentation and knowledge base.

Read Post

StatusGator

Read more about Introducing the StatusGator Confluence integration

Where did all my Claude Code tokens go?

Jun 25, 2026 By Annie Freeman In Coralogix

Most teams judge their AI coding agent on two things: the monthly bill and a feeling. The bill tells you what you spent and the feeling tells you whether it seems to be helping, but neither one tells you what the agent actually did. As these tools move into the critical path of how software ships, that gap is starting to matter. I wanted to replace the feeling with something I could measure and understand what shapes of work affects this bill, so I decided to run an experiment on myself.

Read Post

Coralogix

Read more about Where did all my Claude Code tokens go?

What's New in Network Observability for Summer 2026

Jun 25, 2026 By Sean Armstrong In Broadcom

As a network engineer, you likely face two persistent operational challenges every day: When you have to manually track device lifecycles on spreadsheets or spend your scheduled maintenance periods troubleshooting software upgrades, you lose the time you need to proactively ensure network performance. Over the past six months, we have continued to enhance Network Observability by Broadcom. These latest enhancements directly address the operational challenges outlined above.

Read Post

Broadcom

Read more about What's New in Network Observability for Summer 2026

Chart Your Team's Analytics Journey with Customizable Dashboards in DX NetOps

Jun 25, 2026 By Helen Burke In Broadcom

DX NetOps now features customizable dashboards that give all users some important new features and capabilities. In addition, with the solution’s new integration capabilities, DX NetOps enables users of current analytics and reporting tools to add standardized dashboards over time.

Read Post

Broadcom

Read more about Chart Your Team's Analytics Journey with Customizable Dashboards in DX NetOps

High Cardinality in ClickHouse at Scale: What Actually Breaks

Jun 25, 2026 By Prathamesh Sonpatki In Last9

ClickHouse swallows high-cardinality telemetry at ingest, then breaks at query time weeks later. Here is what fails, and how we keep it fast in production. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Read Post

Last9

Read more about High Cardinality in ClickHouse at Scale: What Actually Breaks

Designing the Operational Architecture for Continuous SLA Exposure Governance

Jun 25, 2026 By ScienceLogic In ScienceLogic

Organizations seeking to reduce SLA volatility often attempt incremental enhancements to existing monitoring stacks. While additional analytics layers may improve telemetry visibility, exposure governance cannot function effectively when data, service context, and execution capabilities remain fragmented. Treating exposure management as an add-on capability limits its ability to protect across interdependent systems in real time.

Read Post

ScienceLogic

Read more about Designing the Operational Architecture for Continuous SLA Exposure Governance

How we saved over $3 million in idle compute costs with Datadog Kubernetes Autoscaling

Jun 25, 2026 By Jacob Simonov In Datadog

At Datadog, our broad Kubernetes footprint amplifies the significance of a familiar autoscaling tradeoff: Overprovisioning wastes cloud spend, while underprovisioning threatens reliability. We built Datadog Kubernetes Autoscaling (DKA) to help teams rightsize their workloads by generating intelligent resource recommendations and automating multidimensional workload scaling. Across Datadog, adopting DKA has eliminated more than $3 million in annualized idle compute costs while reducing reliability risks.

Read Post

Datadog

Read more about How we saved over $3 million in idle compute costs with Datadog Kubernetes Autoscaling

What is AIOps? Benefits, Use Cases, and How It Transforms IT Operations

Jun 25, 2026 By Venkat Narayanan In eG Innovations

Decades ago, IT operations was relatively simple, with a few components such as client, server, network, and the static environments. IT teams relied on manual analysis to manage these systems. Over time, however, IT operations has evolved significantly, driving the adoption of AIOps technologies.

Read Post

eG Innovations

Read more about What is AIOps? Benefits, Use Cases, and How It Transforms IT Operations

The End of Self-Service IT as We Know It

Jun 25, 2026 By Chanté Frazer In Nexthink

The modern service desk is not short on entry points. In fact, employees can open a portal, search a knowledge base, start a chatbot conversation, or submit a ticket from almost anywhere. In theory, that should mean fewer queues and faster resolution. But if access to IT has improved so dramatically, why has the operational burden behind each interaction barely moved?

Read Post

Nexthink

Read more about The End of Self-Service IT as We Know It

Getting started with Microsoft Defender dashboards

Jun 25, 2026 By Blog In Squared Up

Microsoft Defender does a great job protecting you and your organization from online threats. It is constantly working to detect and collect security data so you don’t have to worry about falling behind on incidents and vulnerabilities. The Defender portal can also provide great insights into that data, but connecting it to the rest of your stack is difficult.

Read Post

Squared Up

Read more about Getting started with Microsoft Defender dashboards

Full Stack Observability vs Monitoring: Key Differences

Jun 25, 2026 By Chandni Verma In eG Innovations

Traditional monitoring tracks system health by collecting data such as metrics and logs, this data is checked to see if a system is behaving as expected and alerts are raised if errors or anomalous data values are found. This works well in stable, predictable environments, but modern IT systems are far more complex and dynamic. In distributed architectures like microservices and cloud-native platforms, predefined alerts usually aren’t enough to explain why a failure is happening.

Read Post

eG Innovations

Read more about Full Stack Observability vs Monitoring: Key Differences

Help Desk or Service Desk: Which Does Your Business Need?

Jun 25, 2026 By Motadata In Motadata

In this video, learn the key differences between a Help Desk and a Service Desk and why choosing the right approach can significantly impact the growth and efficiency of your IT support operations. Discover when a help desk is enough, when a service desk becomes essential, and how modern IT teams can scale support effectively. In this video, you'll learn: Contact Us sales@motadata.com Resources.

View Video

Motadata

Read more about Help Desk or Service Desk: Which Does Your Business Need?

Why correlation is no longer sufficient in an AI-driven operating model?

Jun 25, 2026 By Virtana In Virtana

If your incident response still depends on engineers stitching alerts together… You don’t have observability. You just have dashboards. Correlation has to be automatic, contextual, and system-aware, or it becomes expensive noise.

View Video

Virtana

Read more about Why correlation is no longer sufficient in an AI-driven operating model?

Sentry 201: Integrations & Sampling That Won't Surprise You

Jun 25, 2026 By Sentry In Sentry

You set up Sentry. Errors are flowing in. You'd call it instrumented. We'd call it a start. When something breaks, you should know what failed, which commit caused it, who owns it, and what the user saw — without jumping between tools or getting surprised by your bill. Most teams are closer to that than they think. This is the 201 session for teams who got Sentry firing events and stopped there.

View Video

Sentry

Monitoring

Read more about Sentry 201: Integrations & Sampling That Won't Surprise You

No Baaaa-d Data: A Hoppy Hour of Discovery with Cribl

Jun 25, 2026 By Cribl In Cribl

This is the "Cribl: The Good Bits" version of a webinar we gave recently, which combined the fermented educational joy of a beer tasting led by Gabe Callahan; and the only-slightly-less-intoxicating demos of the Cribl platform, led by Principal Technical Marketing Engineer Leon Adato.

View Video

Cribl

Read more about No Baaaa-d Data: A Hoppy Hour of Discovery with Cribl

Overview of AI Evaluation (The Context Window #05)

Jun 25, 2026 By Grafana In Grafana

Can you actually trust an AI agent? In this pre-recorded episode of The Context Window, Nicole van der Hoeven sits down with Yas Ekinci, an engineer on the Grafana AI team, to talk about evals — how Grafana measures the quality and reliability of the AI it ships. They get into the difference between online and offline evals, why reviewing AI-generated code has become the real bottleneck, the "final answer problem" of plausible-but-wrong outputs, and o11y-bench, Grafana's open benchmark for observability agents. Along the way.

View Video

Grafana

Read more about Overview of AI Evaluation (The Context Window #05)

Monitor metrics now available in the v3 API

Jun 24, 2026 By Colin Bartlett In StatusGator

Monitor metrics are now available through the StatusGator v3 API for both Website Monitors and Ping Monitors. These endpoints provide the same latency and performance data available in the Monitor Metrics tab, making it accessible through the API and MCP server. You can find the endpoints in the API documentation.

Read Post

StatusGator

Read more about Monitor metrics now available in the v3 API

Achieving sovereign and secure AIOps with Ollama and OpManager

Jun 24, 2026 By Visakh P S In ManageEngine

Enterprise IT networks power business operations across the world. As businesses scale to catch up with an increasingly-demanding user base, networks also grow more complex. IT teams managing these networks have to monitor more data than before, under more stringent SLA terms, with little room for failure. Trying to do this manually across thousands of devices can take a lot of time and effort, and are prone to errors.

Read Post

ManageEngine

Read more about Achieving sovereign and secure AIOps with Ollama and OpManager

Microsoft SCOM Reporting Deep-Dive

Jun 24, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

Join us for a practical session on SCOM Reporting and reporting alternatives.

Read Post

NiCE IT Mgmt

Read more about Microsoft SCOM Reporting Deep-Dive

June 24 Global Shopify outage: Timeline and impact

Jun 24, 2026 By Andy Libby In StatusGator

On June 24, 2026, Shopify experienced a widespread service disruption that affected storefronts, admin dashboards, and merchant access across multiple regions. While the outage did not impact every user, reports quickly surfaced from merchants around the world who were unable to access stores, log in to administrative tools, or complete routine operations.

Read Post

StatusGator

Read more about June 24 Global Shopify outage: Timeline and impact

Replacing Your Legacy Monitoring Platform? Start with a Plan.

Jun 24, 2026 By Marie Ashway In Galileo

Whether you're using SolarWinds, PRTG, Datadog, or another long-standing monitoring solution, chances are your environment has evolved significantly since the platform was first deployed. New applications have been added. Infrastructure has expanded into cloud environments. Teams have developed custom dashboards, reports, alerts, and workflows. Over time, monitoring becomes deeply woven into daily operations. That's why many organizations continue using tools that no longer meet their needs.

Read Post

Galileo

Read more about Replacing Your Legacy Monitoring Platform? Start with a Plan.

The Four Pillars of AI Observability in 90 Seconds

Jun 24, 2026 By Splunk In Splunk

AI applications can behave unpredictably, potentially leading to errors such as hallucinations or data leaks, even when classic monitoring indicates a successful response. To effectively monitor AI systems, four key areas should be focused on. Implementing these pillars can enhance trust in AI deployments, help manage costs, and identify safety issues before they impact users.

View Video

Splunk

Read more about The Four Pillars of AI Observability in 90 Seconds

What's New in Grafana 13.1: Git Sync - Easier Importants, Easier READMEs & Signing

Jun 24, 2026 By Grafana In Grafana

Grafana 13.1 cuts down on GitOps pain by making Git Sync stronger — import dashboards to Git with a click, see your repo READMEs inside Grafana, and sign every commit for security-strict teams.

View Video

Grafana

Read more about What's New in Grafana 13.1: Git Sync - Easier Importants, Easier READMEs & Signing

How Git Worktrees Changed My Development Workflow

Jun 24, 2026 By Johannes Rauh In Icinga

Since I started using Claude Code more frequently, I kept noticing a “worktree” checkbox popping up whenever I started a session in a Git repository. I had no idea what it meant, so I did what any curious developer would do and started digging. What I found was a Git feature I somehow never came across before: git worktrees.

Read Post

Icinga

Read more about How Git Worktrees Changed My Development Workflow

Telegraf Enterprise Now Generally Available: Manage Telegraf Fleets at Scale

Jun 24, 2026 By Scott Anderson In InfluxData

Telegraf Enterprise is now generally available. It combines Telegraf Controller, a centralized management console for Telegraf, with official support from InfluxData. Open source Telegraf remains unchanged. Telegraf Controller is free to start with built-in limits, while a Telegraf Enterprise license unlocks higher-scale limits, audit logging, LDAP/OIDC integration, and commercial support. Telegraf has become the standard for collecting telemetry across cloud, edge, and physical infrastructure.

Read Post

InfluxData

Read more about Telegraf Enterprise Now Generally Available: Manage Telegraf Fleets at Scale

How Grafana Cloud Ingests Your Data | Data Sources, Alloy & OTel Explained

Jun 24, 2026 By Grafana In Grafana

Learn the two main ways to get data into Grafana Cloud. In this video, we break down how Grafana Cloud connects to over 150 external data sources (like Salesforce, Postgres, and CloudWatch) where your data stays in place, and how you can send raw telemetry into Grafana’s fully managed databases for logs, metrics, traces, and profiles.

View Video

Grafana

Read more about How Grafana Cloud Ingests Your Data | Data Sources, Alloy & OTel Explained

On Release Days We Wear Teal ep04

Jun 24, 2026 By Cribl In Cribl

In this episode, Leon explores some of the new features, functions, updates, and improvements in release 4.18 (from last month) and 4.18.2. For more information, check out these links.

View Video

Cribl

Read more about On Release Days We Wear Teal ep04

Real Time Network Monitoring: Topology, NetFlow, SNMP

Jun 24, 2026 By Shyam Sreevalsan In netdata

Interface counters tell you a port is busy. Bytes in, bytes out, errors, drops. That’s enough to know a link is saturated, but not enough to know which conversations are saturating it, which devices are involved, or how a problem propagates across your network. For that you’ve traditionally needed dedicated network performance monitoring tools, usually expensive, usually a separate console from the rest of your monitoring.

Read Post

netdata

Read more about Real Time Network Monitoring: Topology, NetFlow, SNMP

How Does SNMP Keep an Eye on Every Device on Your Network?

Jun 24, 2026 By Motadata In Motadata

In this video, learn what SNMP (Simple Network Management Protocol) is and why it remains one of the most important technologies for network monitoring. Discover how SNMP helps IT teams collect device health metrics, receive real-time alerts, and monitor thousands of network devices from a single platform. In this video, you'll learn: Contact Us sales@motadata.com Resources Follow Us on Social Media.

View Video

Motadata

Read more about How Does SNMP Keep an Eye on Every Device on Your Network?

AI coding tools ship surprisingly rich telemetry

Jun 24, 2026 By VictoriaMetrics In VictoriaMetrics

AI coding assistants are already emitting rich OpenTelemetry data — revealing prompts, tool usage, workflows, and developer behavior in real time. Resources for Further Learning.

View Video

VictoriaMetrics

Read more about AI coding tools ship surprisingly rich telemetry

Grafana 13.1 release: observability as code updates, extending Grafana Assistant across more data sources, and more

Jun 24, 2026 By Grafana Labs Team In Grafana

Earlier this year, Grafana 13 laid the groundwork for making it easier and faster than ever to turn your data into actionable insights. With our latest minor release, Grafana 13.1, we're building on that foundation, expanding observability as code, bringing Grafana Assistant to more data sources, and streamlining the everyday workflows teams rely on to visualize, analyze, and act on their data. Download Grafana 13.1 Below are just some of the highlights from Grafana 13.1.

Read Post

Grafana

Read more about Grafana 13.1 release: observability as code updates, extending Grafana Assistant across more data sources, and more

10 Best ITSM Tools in 2026 [Reviewed and Compared]

Jun 24, 2026 By Jagdish Sajnani In Motadata

How do you choose the best ITSM tool for your team when 20 vendors all promise the same three things: native AI, ITIL alignment, and a single system to run your whole IT operation? It is the fair question we hear most from IT managers and service desk leads, and the cost of getting it wrong is high. An ITSM platform is a multi-year commitment where your team works inside every day, so a poor fit shows fast as slow tickets, manual workarounds, and a migration nobody wants to repeat.

Read Post

Motadata

Read more about 10 Best ITSM Tools in 2026 [Reviewed and Compared]

Uptime report endpoint added to v3 API

Jun 23, 2026 By Colin Bartlett In StatusGator

Uptime reports are now available through the StatusGator v3 API. This new endpoint provides the same uptime report data available in the StatusGator UI, making it easy to access uptime statistics programmatically through the API and MCP server. You can find the endpoint in the API documentation.

Read Post

StatusGator

Read more about Uptime report endpoint added to v3 API

POPIA Compliance: What It Requires and How Motadata Supports It

Jun 23, 2026 By Jagdish Sajnani In Motadata

If your organization handles the personal information of people in South Africa, POPIA compliance is not optional. The Protection of Personal Information Act has been fully enforceable since 1 July 2021, and the Information Regulator now backs it with administrative fines of up to ZAR 10 million. The requirement your IT and security teams own most directly is security safeguards under Section 19, and it is the first place a regulator looks after a breach.

Read Post

Motadata

Read more about POPIA Compliance: What It Requires and How Motadata Supports It

How High-Performance IT Organizations Prevent SLA Exposure Before It Becomes a Customer Disruption

Jun 23, 2026 By ScienceLogic In ScienceLogic

Over the past decade, significant progress has been made in incident detection and response across enterprise IT environments. Observability platforms, event correlation engines, and AIOps capabilities have measurably reduced mean time to detection and mean time to resolution. Operational teams are better equipped to identify anomalies, triage alerts, and coordinate remediation across increasingly complex architectures.

Read Post

ScienceLogic

Read more about How High-Performance IT Organizations Prevent SLA Exposure Before It Becomes a Customer Disruption

Observability on Windows, before eBPF is production-ready

Jun 23, 2026 By Nikolay Sivko In Coroot

No large enterprise runs a single stack. A shiny new Kubernetes cluster sits right next to a Windows Server box that has quietly run the billing system for a decade without missing a beat. Both keep the business running. Both deserve the same visibility. Linux runs most server workloads, and Coroot grew up there. Our open-source node-agent uses eBPF to collect metrics, logs, traces, and profiles, with no code changes. But "most" is not "all".

Read Post

Coroot

Read more about Observability on Windows, before eBPF is production-ready

How to migrate feature flags without breaking production

Jun 23, 2026 By Anthony Rindone In Datadog

Feature flag migrations have a reputation problem. Ask anybody who’s been through one before and you’ll hear the stories, usually from someone still a little frustrated about a bad cutover, with a postmortem or two to show for it. The reputation is mostly undeserved. While the risks are real, they’re well understood and easily controlled. Getting a migration right doesn’t require a big coordinated effort.

Read Post

Datadog

Read more about How to migrate feature flags without breaking production

Who's in Charge? The 4 Key Pillars of AI Governance in 2026

Jun 22, 2026 By Omar Hafiz In ManageEngine

You hire an astute, hard-working, fresh graduate to run things for you. You hand them the keys to everything in your company; that includes every system, every endpoint, every file, and every password, all of it. Your only instruction to them? "Go ahead and improve things!" Then, trusting in their competence, you leave them to it. Doesn't that sound like a recipe for disaster? Yet that's precisely what's happening in IT departments across the world.

Read Post

ManageEngine

Read more about Who's in Charge? The 4 Key Pillars of AI Governance in 2026

Sponsored Post

Why Custom Management Packs Really Matter in Microsoft SCOM

Jun 22, 2026 By NiCE IT Mgmt In NiCE IT Mgmt

Monitor precisely. Report confidently.

Read Post

NiCE IT Mgmt

Read more about Why Custom Management Packs Really Matter in Microsoft SCOM

How network change management could've prevented a costly switch misconfiguration

Jun 22, 2026 By akash.mj In ManageEngine

Unplanned outages often trace back to a simple but overlooked cause: an untracked configuration change. In many organizations, network device configurations are updated manually without approvals, documentation, or rollback plans. This lack of structure can lead to performance issues, downtime, and compliance risks. In this blog, we'll see how a core switch misconfiguration exposed the risks of unmanaged changes.

Read Post

ManageEngine

Read more about How network change management could've prevented a costly switch misconfiguration

The New Shape of Engineering

Jun 22, 2026 By Datadog In Datadog

AI’s ability to write code made huge strides over the past year. Today, coding agents aren’t just assisting developers; they are winning the "coding race" by orders of magnitude and fundamentally changing the way engineers work.

View Video

Datadog

Read more about The New Shape of Engineering

Fireside Chat with Datadog CPO Yanbing Li and Vercel CPO Tom Occhino

Jun 22, 2026 By Datadog In Datadog

The way we build, ship, and run software is being reshaped by AI. In this fireside chat, Yanbing Li (CPO, Datadog) and Tom Occhino (CPO, Vercel) will discuss their perspectives on the impact AI is having across the industry and what it means for teams navigating this shift today.

View Video

Datadog

Read more about Fireside Chat with Datadog CPO Yanbing Li and Vercel CPO Tom Occhino

How to observe AI coding agents like prod

Jun 22, 2026 By Coralogix In Coralogix

#observability #claudecode #podcastclips

View Video

Coralogix

Read more about How to observe AI coding agents like prod

How AI Is Transforming Production Issue Investigation for Modern DevOps Teams?

Jun 22, 2026 By Mohana Ayeswariya J In Atatus

Production failures don't announce themselves cleanly. They arrive at 2 AM, buried inside 40 million log lines, spread across a dozen microservices, and disguised as something that looks entirely unrelated to the actual root cause. For years, engineering teams absorbed this pain through process: runbooks, on-call rotations, dashboards, and a deep institutional knowledge that lived in the heads of their most senior engineers.

Read Post

Atatus

Read more about How AI Is Transforming Production Issue Investigation for Modern DevOps Teams?

How Coding Agents are Changing the Traditional Software Development Lifecycle

Jun 22, 2026 By Datadog In Datadog

AI coding assistants are rapidly evolving from passive copilots into active, agentic collaborators capable of planning, executing, and iterating on complex software tasks. This shift has huge ramifications onthe software development lifecycle (SDLC), developer productivity, and even the structure of engineering teams.

View Video

Datadog

Read more about How Coding Agents are Changing the Traditional Software Development Lifecycle

What is Compliance in ITAM? Regulations, Penalties & Best Practices

Jun 22, 2026 By Ramya Shah In Motadata

Managing IT assets smoothly is not an easy task. Organizations depend more on technology to execute their operations these days. Hence, the requirement for effective IT Asset Management (ITAM) has grown considerably. However, beyond merely managing these assets, ensuring compliance with relevant ITAM regulations and standards matters just as much. And, in this race to keep up with changing regulations, you are not alone. Many organizations face the same challenge.

Read Post

Motadata

Read more about What is Compliance in ITAM? Regulations, Penalties & Best Practices

Progressing AI Beyond Scaling and Into Deep Reasoning

Jun 22, 2026 By Datadog In Datadog

The breakthroughs in AI today aren’t just coming from bigger datasets and more compute; Reinforcement Learning (RL) has quietly become one of the most powerful forces in modern AI development. RL is teaching models to reason and self-correct, enabling capabilities that make AGI feel less like science fiction and more like an inevitable future.

View Video

Datadog

Read more about Progressing AI Beyond Scaling and Into Deep Reasoning

Who's Driving Your Data? How to Regain Control of Your Apache Kafka Infrastructure

Jun 22, 2026 By Harshita Kulshrestha In meshIQ

Apache Kafka often succeeds faster than operational maturity can keep pace. Consumer lag, partition drift, and configuration sprawl create dangerous blind spots. Learn how unified visibility, governance, and automation transform reactive Kafka operations into predictive control.

Read Post

meshIQ

Read more about Who's Driving Your Data? How to Regain Control of Your Apache Kafka Infrastructure

Platform Confidence Is the Prerequisite for Modernization Speed

Jun 22, 2026 By ScienceLogic In ScienceLogic

Over the last year, one theme has consistently emerged in conversations with customers: organizations want to move faster, but not at the cost of the operational stability their business depends on. Whether the discussion is about modernization initiatives, automation programs, AI adoption, or platform upgrades, the underlying challenge is often the same. IT leaders are under pressure to deliver innovation while maintaining stability.

Read Post

ScienceLogic

Read more about Platform Confidence Is the Prerequisite for Modernization Speed

Observability Self Hosted 2026.2 | Routing Summary Dashboard

Jun 22, 2026 By solarwindsinc In SolarWinds

Connect with SolarWinds.

View Video

SolarWinds

Read more about Observability Self Hosted 2026.2 | Routing Summary Dashboard

The AI bill arrived. Now what?

Jun 22, 2026 By Lily Waldorf In Coralogix

There was a time when “Opus” meant a classical composition and “Sonnet” was fourteen lines of Shakespeare you definitely did not read before the test. Now they’re model tiers, and every new release rewrites the economics of your engineering org whether you’re ready or not. Currently, your monthly total hides the crucial information you need to control and justify AI spend.

Read Post

Coralogix

Read more about The AI bill arrived. Now what?

Builder in the loop: Tony Rogers on stress-testing AURA before production

Jun 22, 2026 By Mezmo In Mezmo

Builder in the loop is a Mezmo interview series focused on the engineers, product leaders, and operators shaping AURA, an open-source, MCP-native agent harness for production operations. This installment features Tony Rogers, whose work on AURA is less about building new features and more about trying to break them before users can.

Read Post

Mezmo

Read more about Builder in the loop: Tony Rogers on stress-testing AURA before production

Using Evaluation Frameworks with Agent Observability

Jun 22, 2026 By Jennifer Mickel In Datadog

AI teams have invested heavily in evaluation frameworks, yet getting those frameworks beyond local experimentation remains challenging. Teams using open source libraries like DeepEval and Pydantic Evals gain flexibility and research-grounded metrics, but operationalizing those evaluations still requires brittle custom integration code that doesn’t scale.

Read Post

Datadog

Read more about Using Evaluation Frameworks with Agent Observability

What Is a CMDB, and Why Is It Called the Heart of ITSM?

Jun 22, 2026 By Motadata In Motadata

In this video, discover why a Configuration Management Database (CMDB) is considered the heart of IT Service Management (ITSM). Learn how a CMDB helps IT teams understand dependencies, assess change impact, accelerate incident resolution, and build a reliable foundation for service management processes.

View Video

Motadata

Read more about What Is a CMDB, and Why Is It Called the Heart of ITSM?

Sponsored Post

Reducing support ticket burden with better outage visibility

Jun 21, 2026 By StatusGator In StatusGator

Reducing support ticket burden with better outage visibility requires a shift from reactive support to proactive communication and centralized monitoring. Here's how StatusGator helps.

Read Post

StatusGator

Read more about Reducing support ticket burden with better outage visibility

Which AI-Powered Observability Tools Accelerate Root Cause Analysis (RCA)?

Jun 21, 2026 By Libi Michelson In logz.io

TL;DR Choosing the right AI-powered observability platform isn’t about who has the most AI features. It’s about which platform helps your team identify root causes faster and spend less time investigating incidents. Here’s the short version: Logz.io + OrionIQ: Autonomous AI agents investigate incidents, perform root cause analysis, and surface next steps. Open standards, Kubernetes-ready, and deploys in as little as a week.

Read Post

logz.io

Read more about Which AI-Powered Observability Tools Accelerate Root Cause Analysis (RCA)?

Vendor Outage Monitoring for MSPs: Per-Client Status Pages and Custom Dashboards

Jun 21, 2026 By Hrishikesh Barua In IncidentHub

Handling client calls when a third-party vendor has an outage - this will sound familiar if you are a managed service provider (MSP). Your first instinct would be to check if the vendor's status page or social media handle shows anything, or check crowdsourced websites like Downdetector. Or even ask your client to check themselves. These approaches do not scale when you have more than a few clients, many vendor status pages to check, and clients with different stacks.

Read Post

IncidentHub

Read more about Vendor Outage Monitoring for MSPs: Per-Client Status Pages and Custom Dashboards

Creating & Scheduling SLA Reports

Jun 20, 2026 By Uptime Website Monitoring In uptime

Learn how to create and manage SLA Reports on Uptime.com, including configuring the Global and Reporting Groups tabs, setting up Uptime and Response Time sections, customizing report logos, and scheduling automated delivery to internal users and external stakeholders.

View Video

uptime

Monitoring

Read more about Creating & Scheduling SLA Reports

Automate website monitoring using Terraform

Jun 19, 2026 By Bela Susan Thomas In Site24x7

If you've ever joined an incident call only to discover the monitor for the affected service was either misconfigured, deleted, or never set up in the first place—you already understand why monitoring as code (MaC) exists.

Read Post

Site24x7

Read more about Automate website monitoring using Terraform

Monitoring vs. observability: The future of IT operations in 2026

Jun 19, 2026 By Kaviya Radhakrishnan In ManageEngine

For years, monitoring was the gold standard of infrastructure management. Dashboards. Thresholds. Alerts. If everything on the dashboard was green, you didn't need to worry. If something turned red, you responded. It was a model built on predictability, and for a long time, it worked. But modern infrastructure is no longer predictable.

Read Post

ManageEngine

Read more about Monitoring vs. observability: The future of IT operations in 2026

StatusGator is now available in SharePoint

Jun 19, 2026 By Valeria Kurolapova In StatusGator

We’re excited to announce the new StatusGator SharePoint integration. Many organizations use SharePoint as the central hub for company resources, communications, and internal tools. Now, you can add real-time service status directly to your SharePoint pages, helping employees stay informed about outages, maintenance, and service disruptions without leaving the platforms they already use every day.

Read Post

StatusGator

Read more about StatusGator is now available in SharePoint

Why Relational Databases Fail Satellite Telemetry

Jun 19, 2026 By Allyson Boate In InfluxData

Satellite operations depend on telemetry as the primary interface to systems that teams cannot directly inspect. Once a spacecraft reaches orbit, signals such as battery levels, temperature, signal strength, and fault codes become the foundation for understanding system health and maintaining control. Telemetry streams continuously, so the underlying data system becomes a critical control point that needs to handle a constant, heavy flow of data.

Read Post

InfluxData

Read more about Why Relational Databases Fail Satellite Telemetry

Is Your AI Coding Agent Actually Boosting Productivity

Jun 19, 2026 By Coralogix In Coralogix

#aiagents #claudecode #podcastclips

View Video

Coralogix

Read more about Is Your AI Coding Agent Actually Boosting Productivity

DataStream 2.0: Faster, Smarter, Built for Scale

Jun 19, 2026 By VirtualMetric In VirtualMetric

June 19, 2026 This is not a regular monthly update. DataStream Version 2.0 is a milestone — the result of relentless building, learning from customers, and pushing the platform toward what enterprise-scale security operations actually demand. The core has been rebuilt, new capabilities have been added across the board, and the platform is now faster, more resilient, and more extensible than ever. Here’s what’s new.

Read Post

VirtualMetric

Read more about DataStream 2.0: Faster, Smarter, Built for Scale

Digital Employee Experience Monitoring: Why It Matters for Hybrid Workforces

Jun 19, 2026 By Rachel Berry In eG Innovations

As enterprises embrace hybrid work models, SaaS-driven technology stacks, and highly distributed digital workplaces, employee experience has become inseparable from business performance.For years, IT investments were focused for customer-facing digital journeys, and internal systems were not a priority. However, the scenario has changed. Today, every employee relies on a complex and interdependent chain of endpoints, networks, cloud services, identity platforms, and business applications.

Read Post

eG Innovations

Read more about Digital Employee Experience Monitoring: Why It Matters for Hybrid Workforces

George Luong Shares How 200 Slack Engineers Use Honeycomb

Jun 19, 2026 By Honeycomb In Honeycomb

At Slack, between 100 to 200 users per day use Honeycomb for client observability, tracing, instrumentation, analysis of performance, frontend issues, investigating incidents, or just looking into production issues.

View Video

Honeycomb

Read more about George Luong Shares How 200 Slack Engineers Use Honeycomb

How AI-Powered Monitoring is Transforming IT Operations

Jun 19, 2026 By Venkat Narayanan In eG Innovations

Every monitoring vendor on the market now has an AI story. AIOps has moved from category buzzword to standard line-item in IT operations strategy, and the reasoning is sound: as infrastructure spreads across cloud, hybrid, microservices, and virtualized platforms, the volume and velocity of operational data has outrun what human teams can process. AI-powered monitoring is the obvious answer.

Read Post

eG Innovations

Read more about How AI-Powered Monitoring is Transforming IT Operations

Integrating Digital Employee Experience (DEX) with ServiceNow: What IT Teams Need to Know

Jun 19, 2026 By Teneo In Teneo

As CTO for Teneo, I get the opportunity to meet with many of our customers to talk about plans for the next few years. I often find we spend a lot of time talking about Digital Employee Experience, but far less time is spent fixing the operational friction that quietly erodes it. Slow devices, degraded application performance, and recurring service desk tickets are common themes in many organizations.

Read Post

Teneo

Read more about Integrating Digital Employee Experience (DEX) with ServiceNow: What IT Teams Need to Know

Introducing the New Galileo Website: A Better Resource for IT Visibility, Optimization, and Planning

Jun 18, 2026 By Marie Ashway In Galileo

That's why we've launched a completely redesigned Galileo website. The new site isn't just a fresh look but rather a reflection of our commitment to helping IT teams gain the visibility, insight, and guidance they need to manage modern infrastructure more effectively.

Read Post

Galileo

Read more about Introducing the New Galileo Website: A Better Resource for IT Visibility, Optimization, and Planning

Working as a remote engineer at Cribl | Building the AI Platform for Telemetry

Jun 18, 2026 By Cribl In Cribl

Learn what it’s like to work as an engineer at Cribl, a remote-first company building the AI platform for IT and security data. In this recruiting video, Cribl’s engineering and support leaders share how fully distributed teams collaborate, solve hard data problems, and grow their careers while working from around the world. You’ll hear from managers and leaders in site reliability engineering, security incubation, and technical support about.

View Video

Cribl

Read more about Working as a remote engineer at Cribl | Building the AI Platform for Telemetry

Multi Cloud Observability - Selector

Jun 18, 2026 By Selector In Selector

Unify cloud, network, and infrastructure telemetry into a single shared intelligence layer to get to root cause faster.

View Video

Selector

Read more about Multi Cloud Observability - Selector

KWhy? MSP Webinar

Jun 18, 2026 By Auvik In Auvik

Most MSPs are sitting on a goldmine of data across their tools. The problem isn’t access, it’s knowing what *actually* matters… and how to use it to drive better outcomes. Join Amanda Doucette-Lachapelle and Kyle Christensen (Empath) as they walk through how to use KPIs to make smarter, more confident decisions, with real examples you can apply right away.

View Video

Auvik

Read more about KWhy? MSP Webinar

The Second Edition of Observability Engineering Is Here

Jun 18, 2026 By Charity Majors In Honeycomb

IT’S HERE it’s here it’s here it’s here!!!! The second edition of Observability Engineering is available for download, and since Honeycomb is the sponsor, you can now download it from our website (the dead tree version will take another month). This is a strange time to be writing a book.

Read Post

Honeycomb

Read more about The Second Edition of Observability Engineering Is Here

Troubleshooting ActiveMQ Producer Flow Control Blocks

Jun 18, 2026 By meshIQ In meshIQ

The alert comes in at 2 AM: your order processing service is unresponsive. The application is not crashed, threads are running, the JVM is healthy, but no messages are being sent. Your operations team traces it to a blocked send() call on an ActiveMQ connection. Hours later, after restarting the application, someone finds this line in the broker log from 11 PM the previous day.

Read Post

meshIQ

Read more about Troubleshooting ActiveMQ Producer Flow Control Blocks

Service Level Agreement (SLA) Templates: Examples, Metrics, and Best Practices

Jun 18, 2026 By Ramya Shah In Motadata

How quickly should your team resolve a critical ticket, and what are the consequences when it misses the target? That is exactly where Service Level Agreements (SLAs) come into play. An SLA turns service expectations into measurable commitments by defining clear response and resolution targets. Rather than starting from scratch, an SLA template provides a structured foundation for establishing those commitments and tracking performance against agreed standards. Why does that matter?

Read Post

Motadata

Read more about Service Level Agreement (SLA) Templates: Examples, Metrics, and Best Practices

5 Alternatives to Prometheus in 2026

Jun 18, 2026 By Dejan Lukić In AppSignal

Prometheus is a battle-tested, flexible and, most importantly, free tool that has long been the go-to open-source monitoring solution. Much of its popularity came down to its simplicity. A few years have gone by, though, and the APM space has gotten pretty crowded. Developers are now starting to move away from the complexity of self-hosting, and OpenTelemetry stands out as one of the CNCF’s fastest-expanding projects. In fact, it’s now among the most adopted telemetry frameworks out there.

Read Post

AppSignal

Read more about 5 Alternatives to Prometheus in 2026

The Data Plane Reality: OTel Scales, While Topology UX Lags

Jun 18, 2026 By Jonny Steiner In Coralogix

OpenTelemetry won the architectural standards battle. At scale, though, telemetry breaks more like plumbing than code. It breaks quietly, across a graph, with a blast radius you don’t understand until it’s expensive. With over 65% of organizations now running more than 10 collectors in production, hybrid deployments across Kubernetes and VMs are accelerating fast. Telemetry standardization is no longer a project milestone. It is a baseline expectation.

Read Post

Coralogix

Read more about The Data Plane Reality: OTel Scales, While Topology UX Lags

Elastic's no-code and full-code approaches to custom integrations

Jun 18, 2026 By Charles Davison In Elastic

Making custom Elastic integrations better with both Automatic Import improvements and Integration Skills Elastic 9.4 shipped two tools for building custom integrations: Both are available today.

Read Post

Elastic

Read more about Elastic's no-code and full-code approaches to custom integrations

Agent Timeline Is Now Generally Available

Jun 18, 2026 By Dan Juengst In Honeycomb

A few weeks ago I wrote about a customer’s refund request that stopped halfway through at 11:47 p.m. on a Tuesday night. That post walked through the 40 minutes it took to work out what happened when an agentic application had a problem: a tool retried against a rate-limited payments API, the error responses filled up the context window, and the agent gave up. The whole reason we built Agent Timeline was to turn that 40 minutes into five. To reduce MTTR. To solve the problem and get back to sleep.

Read Post

Honeycomb

Read more about Agent Timeline Is Now Generally Available

3 Signs Your Network Monitoring Is Failing You

Jun 18, 2026 By Motadata In Motadata

Are users reporting issues before your monitoring tools do? Are critical alerts getting lost in the noise? Does root cause analysis take hours instead of minutes? These are 3 signs your network monitoring is failing. Discover how modern observability helps teams detect issues faster and resolve them with confidence.

View Video

Motadata

Read more about 3 Signs Your Network Monitoring Is Failing You

Everything New in Checkly CLI v8 (8 Features in 4 Minutes)

Jun 18, 2026 By Checkly In Checkly

We at Checkly shipped a major CLI v8 release. Here's everything new, and why it's built for your agent. The CLI v8 expands Monitoring as Code and streamlines your everyday workflows: Upgrade and try it: npm install checkly@latest Chapters.

View Video

Checkly

Read more about Everything New in Checkly CLI v8 (8 Features in 4 Minutes)

Observability for a Privacy-first AI Wearable | Grafana Everywhere

Jun 18, 2026 By Grafana In Grafana

Trust is everything when AI gets personal. Golden Grot Award winner and NeoSapien co-founder and CEO Dhananjay Yadav shares how his team uses Grafana Assistant to ensure the privacy-first AI wearable delivers a seamless, reliable experience without compromising its mission. Because when AI moves closer to our everyday lives, teams need to know what’s happening — and users need to trust that it’s working as intended.

View Video

Grafana

Read more about Observability for a Privacy-first AI Wearable | Grafana Everywhere

Network Operations Stand Up

Jun 18, 2026 By Kentik In Kentik

Network problems are NO laughing matter.

View Video

Kentik

Read more about Network Operations Stand Up

Tapirs, Trainings, and Team Dinners: My First Kentik Meetup

Jun 18, 2026 By Gavin Hower In Kentik

Gavin joined Kentik’s People Ops team less than a year ago, so when April brought his first team offsite and his first HR conference in San Diego, it was a lot of firsts at once. He writes about meeting his colleagues face to face for the first time, what he took away from HRA 26, and his new appreciation for tapirs.

Read Post

Kentik

Read more about Tapirs, Trainings, and Team Dinners: My First Kentik Meetup

The Illusion of Control: Why Dashboards Do Not Equal SLA Protection

Jun 18, 2026 By ScienceLogic In ScienceLogic

Modern operations teams work within a constant stream of dashboards, status summaries, and health indicators that turn complex environments into organized visual displays. Large screens show color-coded service conditions. Executive reports quantify uptime. Observability platforms map system dependencies across cloud, hybrid, and distributed architectures. This visual structure creates a sense of order. In environments defined by constant change, that sense of order can feel like control.

Read Post

ScienceLogic

Read more about The Illusion of Control: Why Dashboards Do Not Equal SLA Protection

From Alerts to Action: How Agentic AI Will Transform ITOps

Jun 18, 2026 By ManageEngine

What if your IT systems could go beyond detecting issues to resolving them autonomously? This white paper explains how Agentic AI enables IT operations to shift from reactive monitoring to intelligent, self-driven execution. Explore use cases, challenges, and how observability data powers AI-driven actions.

Get White Paper

ManageEngine

Read more about From Alerts to Action: How Agentic AI Will Transform ITOps

From event correlation to autonomous IT: Why observability isn't enough anymore

Jun 17, 2026 By Sangavi D In ManageEngine

Most IT war rooms have plenty of data, but not enough time or clarity to find the real answer. Dashboards are crowded, alerts keep piling up, and the real issue gets lost in all the noise. Ever dealt with this situation? You’re not alone, and there’s a simpler way to deal with it. OpManager Nexus closes this gap by moving beyond visibility to help teams actually diagnose and fix problems faster.

Read Post

ManageEngine

Read more about From event correlation to autonomous IT: Why observability isn't enough anymore

Monitoring website that redirects to a different URL

Jun 17, 2026 By Bela Susan Thomas In Site24x7

Is it necessary to monitor a website that redirects to a different URL? Imagine a user visits a URL and is automatically redirected to a new main URL without taking any action. This process is called URL redirection. It typically occurs when a web server sends a 3xx HTTP status code and a location header with the new URL. Sometimes there is only one redirect, but in other cases, the request passes through several URLs before reaching the final page.

Read Post

Site24x7

Read more about Monitoring website that redirects to a different URL

SolarWinds appoints Justin Henkel as Chief Information Security Officer

Jun 17, 2026 By SolarWinds In SolarWinds

London, UK - 17th June, 2026 - SolarWinds, a leading provider of simple, powerful, secure observability and IT management software, today announced the appointment of Justin Henkel as its Chief Information Security Officer.

Read Post

SolarWinds

Read more about SolarWinds appoints Justin Henkel as Chief Information Security Officer

9 Powerful Log Monitoring Best Practices to Follow in 2026

Jun 17, 2026 By Jagdish Sajnani In Motadata

How many of your last five incidents were already sitting in the logs before anyone noticed? Most teams already collect more than enough log data. The problem starts with what happens next, and the same four gaps show up almost everywhere: This guide covers the log monitoring best practices that close those gaps. It walks through how to collect, structure, correlate, retain, and secure logs, so monitoring becomes a steady process and not a scramble during the next incident.

Read Post

Motadata

Read more about 9 Powerful Log Monitoring Best Practices to Follow in 2026

Why Does Network Topology Decide How Fast Your Network Recovers?

Jun 17, 2026 By Motadata In Motadata

In this video, learn why network topology plays a critical role in network resilience, troubleshooting, and recovery. Discover how understanding network dependencies, eliminating single points of failure, and maintaining clear visibility can help IT teams reduce downtime and accelerate incident response. In this video, you'll learn.

View Video

Motadata

Read more about Why Does Network Topology Decide How Fast Your Network Recovers?

Features in Icinga Web 2 Worth Knowing About

Jun 17, 2026 By Jan Schuppik In Icinga

When you work closely with Icinga Web 2, developing modules, building dashboards, poking around the internals, you naturally pick up on features that most users never think about. Some are usability improvements that deserve more attention than they get. Others are developer conveniences that turn out to be genuinely useful in the right user situation too. They’re just the kind of thing that rarely makes it into the getting-started guide. Not all of these will apply to your daily workflow.

Read Post

Icinga

Read more about Features in Icinga Web 2 Worth Knowing About

What Is Your Operating Model Costing Your Business?

Jun 17, 2026 By Sean Malvey In Nexthink

The biggest cost in your business may not appear anywhere on your balance sheet because some of the most expensive problems are rarely measured directly. Lost productivity, recurring technology issues, underused applications, and the effort required to manage them all accumulate over time without ever appearing as a line item in a financial report.

Read Post

Nexthink

Read more about What Is Your Operating Model Costing Your Business?

Telemetry Talks ep. 5 - OpenTelemetry in the AI agents era

Jun 17, 2026 By VictoriaMetrics In VictoriaMetrics

Telemetry Talks explores how OpenTelemetry’s CNCF graduation arrives at a pivotal moment for AI-powered development. Together with Alex Marshalov, we dive into vibe coding, AI agents, and the growing need for observability in GenAI systems — from prompts and token usage to reasoning chains and distributed traces — using the VictoriaMetrics stack and OpenTelemetry as the foundation for understanding the next generation of autonomous software.

View Video

VictoriaMetrics

Read more about Telemetry Talks ep. 5 - OpenTelemetry in the AI agents era

The Observability Dataset: Architecture That Takes Agents From Junior to Senior

Jun 17, 2026 By Micha Duman In Coralogix

The race to better AI-assisted observability has been a race for bigger and better models. But intelligence was never the real bottleneck. Structure and context were.

Read Post

Coralogix

Read more about The Observability Dataset: Architecture That Takes Agents From Junior to Senior

ActiveMQ Protocol Comparison: AMQP vs MQTT vs OpenWire vs STOMP

Jun 17, 2026 By meshIQ In meshIQ

One of ActiveMQ's most powerful and underappreciated capabilities is its protocol polyglotism: a single broker can simultaneously accept Java JMS clients over OpenWire, Python services over AMQP, IoT sensors over MQTT, and Ruby scripts over STOMP, all routing messages between each other without protocol bridges or translation middleware.

Read Post

meshIQ

Read more about ActiveMQ Protocol Comparison: AMQP vs MQTT vs OpenWire vs STOMP

Reduce your token usage by 50%!?

Jun 17, 2026 By Coralogix In Coralogix

Agent mode in the Coralogix CLI cuts token consumption by nearly 50%, without sacrificing the context your agents need to actually do their job.

View Video

Coralogix

Read more about Reduce your token usage by 50%!?

Reduce Alert Fatigue with Composite Alerting in Hosted Graphite | Tutorial

Jun 17, 2026 By MetricFire In MetricFire

Tired of noisy alerts waking you up for issues that are not actually impacting your services? In this tutorial, we walk through MetricFire's Composite Alerting capabilities and show how to combine multiple metric conditions into a single high-confidence alert using AND / OR logic. Learn how to: Reduce alert fatigue and false positives Create service level alerts in Graphite Combine CPU, latency, and database metrics into meaningful alerts Use conditional logic to improve signal quality Build smarter observability workflows with Hosted Graphite.

View Video

MetricFire

Read more about Reduce Alert Fatigue with Composite Alerting in Hosted Graphite | Tutorial

Why Data Normalization Is the Foundation of AI Security

Jun 17, 2026 By VirtualMetric In VirtualMetric

The shift to autonomous and AI-assisted security operations is well underway. But there is a quiet infrastructure decision that will determine whether your AI agents are sharp or sluggish: how your security data is structured before it ever reaches them.

Read Post

VirtualMetric

Read more about Why Data Normalization Is the Foundation of AI Security

Driving Global Application Success at Scale: UPS' Journey with Nexthink Adopt

Jun 17, 2026 By Nexthink In Nexthink

Large-scale enterprises like UPS face the challenge of ensuring global employees adopt business-critical HR applications effectively without overburdening IT. In this session, John Hampton, Senior Leader at UPS, will share how his team used Nexthink Adopt to embed in-app guidance, contextual self-help, and proactive reminders, helping employees complete tasks accurately the first time while reducing support dependency in Workday and Cornerstone on Demand. With analytics providing visibility into adoption gaps and user behavior, UPS achieved measurable gains in productivity, accuracy, and employee experience.

View Video

Nexthink

Read more about Driving Global Application Success at Scale: UPS' Journey with Nexthink Adopt

Experience-Driven Transformation: How GSK Is Building a World-Class Workplace

Jun 17, 2026 By Nexthink In Nexthink

Guided by five strategic ambitions—sustainability, experience, innovation, focus, and operational excellence—GSK’s journey with Nexthink has grown from a standard deployment into a dynamic, insight-led philosophy shaping every facet of its workplace experience.

View Video

Nexthink

Read more about Experience-Driven Transformation: How GSK Is Building a World-Class Workplace

Datadog Data Observability: Be the first to know when data fails

Jun 17, 2026 By Datadog In Datadog

Bad data doesn't announce itself. Datadog Data Observability gives you unified visibility across your entire data stack—from source systems and pipelines to dashboards and AI applications—so you catch silent failures before they cascade. Detect data quality and pipeline issues before stakeholders do, pinpoint root causes with end-to-end lineage, and reduce pipeline costs with job, cluster, and query recommendations.

View Video

Datadog

Read more about Datadog Data Observability: Be the first to know when data fails

What's New in InfluxDB 3.10: Performance Beta Expanded with New Enterprise Features

Jun 17, 2026 By Peter Barnett In InfluxData

In our last release, we introduced a beta of performance updates designed for heavier, more complex time series workloads. InfluxDB 3.10 expands that beta to include enterprise features that give teams more control as they scale and manage larger workloads in InfluxDB 3. This release adds end-to-end backup and restore, row-level deletes, bulk import from Parquet, user management, and an RBAC preview to the previous performance beta.

Read Post

InfluxData

Read more about What's New in InfluxDB 3.10: Performance Beta Expanded with New Enterprise Features

Why AI observability is a critical ITOps priority

Jun 17, 2026 By Ismath Mohideen In LogicMonitor

AI Observability is a Critical Priority for ITOps Teams See how LogicMonitor helps ITOps teams monitor AI workloads, reduce blind spots, and move toward Autonomous IT. Schedule a meeting AI has shifted from experimental pilots to everyday business operations. Customers are interacting with AI-powered applications. Engineering teams are building with LLMs, GPUs, APIs, and automation at a much faster pace. That adds to the visibility strain on already overburdened ITOps teams.

Read Post

LogicMonitor

Read more about Why AI observability is a critical ITOps priority

Scout MCP Server: Example Prompts, Use Cases, and What's New

Jun 17, 2026 By Aspen Clevenger In Scout

The Scout MCP server connects your AI assistant directly to your Scout Monitoring data. Instead of switching between your editor, Scout, and a chat window, your assistant can pull traces, errors, N+1 insights, and endpoint metrics on its own and use that context to suggest or make fixes right in your codebase. This covers how to connect it, what to ask it, how other teams are using it, and what we shipped recently.

Read Post

Scout

Read more about Scout MCP Server: Example Prompts, Use Cases, and What's New

When Local Blocks Go Global: The India-Telegram BGP Incident

Jun 17, 2026 By Doug Madory In Kentik

Yesterday’s leak of a BGP hijack intended to block Telegram in India is the latest routing mishap best described as intentional, but also accidental — a pattern dating back to Pakistan Telecom’s infamous hijack of YouTube in 2008, in which a domestic block escaped containment and disrupted the service worldwide.

Read Post

Kentik

Read more about When Local Blocks Go Global: The India-Telegram BGP Incident

From Reactive to Proactive: Eversource's Journey with Nexthink

Jun 17, 2026 By Nexthink In Nexthink

Every DEX journey starts with challenges—but the real story is in how you overcome them and build lasting impact. In this session, Rebecca Hall and Mark Milano of Eversource will share how they tackled the pain points that initially drove their adoption of Nexthink, from visibility gaps to IT service inefficiencies. They’ll highlight the early wins that built momentum, including key integrations with their ITSM tool and ROI realized through targeted use cases.

View Video

Nexthink

Read more about From Reactive to Proactive: Eversource's Journey with Nexthink

New: Save time during incidents with incident templates

Jun 16, 2026 By Valeria Kurolapova In StatusGator

Creating incidents often means filling out the same information over and over again. That’s why we’ve added Incident Templates – a faster way to create incidents using pre-configured settings. With templates, you can save commonly used incident details and apply them with a single click whenever you need them.

Read Post

StatusGator

Read more about New: Save time during incidents with incident templates

Why CI/CD Pipelines Miss Runtime Failures

Jun 16, 2026 By Lightrun Team In Lightrun

CI/CD pipelines do four things: it builds code, runs tests against mocked dependencies, lints for style violations, and scans for known vulnerability patterns. What it cannot do is validate how that code behaves under real users, real service responses, and real runtime constraints that staging was never configured to reproduce. That entire class of failure clears every gate cleanly and surfaces only in production.

Read Post

Lightrun

Read more about Why CI/CD Pipelines Miss Runtime Failures

MSP Summit: Why You Need Effective Documentation & How to Achieve It

Jun 16, 2026 By Amanda Doucette-Lachapelle In Auvik

Every year, MSP Summit unites some of the brightest minds in managed services. From tackling complex migrations that should have been straightforward to managing thousands of unique client environments, MSPs excel at adapting and rising to challenges, even as industry trends evolve. Even as industry trends evolve, though, one theme consistently comes up year after year: documentation.

Read Post

Auvik

Read more about MSP Summit: Why You Need Effective Documentation & How to Achieve It

Robots: Form vs Function | SolarWinds TechPod

Jun 16, 2026 By solarwindsinc In SolarWinds

Ever wonder why robots look like us but can’t do basic chores? Discover how ignoring function over form could be holding back true innovation in robotics!

View Video

SolarWinds

Read more about Robots: Form vs Function | SolarWinds TechPod

Analysing Claude Code telemetry with SquaredUp - diving deeper

Jun 16, 2026 By Blog In Squared Up

In our previous article we looked at the basics of: In this article, we are going to take a deeper dive into some of the complexities of configuration as well as some of the nuances of analysing Claude telemetry. Before we dive into the code, let us just remind ourselves that our telemetry pipeline looks like this: That is, we are emitting Claude Code telemetry to an OpenTelemetry Collector. The telemetry is then exported to an Application Insights endpoint and stored in Log Analytics tables.

Read Post

Squared Up

Read more about Analysing Claude Code telemetry with SquaredUp - diving deeper

Introducing Datspaces and Datasets

Jun 16, 2026 By Coralogix In Coralogix

Dataspaces and Datasets | The Structured Data Layer for Teams and AI | Coralogix Dataspaces and Datasets from Coralogix: the structured data layer teams and AI were waiting for. Turn a single query into a reusable dataset, share it across teams, and keep dashboards fast as your data scales. In this video: Timestamps: Dataspaces and Datasets are available now in Coralogix. Whether you're building dashboards, running background queries, or powering AI agents with telemetry data, Dataspaces give your organization a governed, high-performance data architecture that scales with your teams.

View Video

Coralogix

Read more about Introducing Datspaces and Datasets

What is ServiceOps?

Jun 16, 2026 By Motadata In Motadata

Discover how Motadata ServiceOps unifies IT service management, IT asset management, and patch management in a single ITIL-aligned platform. Powered by AI, it helps automate workflows, streamline ticket resolution, and improve service delivery. One platform for service, assets, and everything in between.

View Video

Motadata

Read more about What is ServiceOps?

Inside the AI Team Weekly: AI Observability workflows and Prometheus exemplars (May 19th, 2026)

Jun 16, 2026 By Grafana In Grafana

The Grafana AI team (Engineers Ivana Huckova and Sonia Aguilar) share what's new in AI Observability this week: a new way to instrument and visualize agent workflows, plus a neat trick for jumping straight from a metric spike to the exact conversation that caused it using Prometheus exemplars. In this episode: We're showing parts of our team meetings to build in public in some small way and give you a sneak preview of what's to come. But not all features we show may make it to production! You've been warned. :)

View Video

Grafana

Read more about Inside the AI Team Weekly: AI Observability workflows and Prometheus exemplars (May 19th, 2026)

Eight best practices for a successful cloud migration strategy

Jun 16, 2026 By Colin Fernandes In Sumo Logic

Moving to the cloud is one of the most consequential decisions an IT organization makes. A successful cloud migration strategy sets the foundation for how your business scales, innovates, and competes. But too often, cloud migration initiatives stall, underperform, or force organizations to repatriate applications back on-premises because the groundwork wasn’t laid correctly.

Read Post

Sumo Logic

Read more about Eight best practices for a successful cloud migration strategy

Use This OTel Processor to Prevent Your Dashboards From Breaking

Jun 16, 2026 By Splunk In Splunk

A semantic-convention rename (http.method → http.request.method) can silently break your RED metrics — no errors, just gaps in dashboards and alerts. The OpenTelemetry Collector's schema processor fixes it: put it first in your pipeline and it normalizes attribute names no matter what each service emits. Migration mode writes BOTH the old and new names, so you get zero-downtime upgrades while queries keep working.

View Video

Splunk

Read more about Use This OTel Processor to Prevent Your Dashboards From Breaking

Deep AI Investigation for ITOps: What It Is and Why It Matters

Jun 16, 2026 By Margo Poda In LogicMonitor

Investigation is the most time-consuming and cognitively demanding phase of incident response, and it’s the phase least served by existing tooling. Modern ITOps teams have spent years investing in better detection and alerting. The tools are faster, the dashboards are richer, and anomaly detection keeps improving.

Read Post

LogicMonitor

Read more about Deep AI Investigation for ITOps: What It Is and Why It Matters

Visibility Isn't Reliability: Why Observability Alone Cannot Protect SLAs

Jun 16, 2026 By ScienceLogic In ScienceLogic

Over the past decade, enterprises have invested heavily in observability platforms designed to deliver comprehensive insight into increasingly complex environments. Modern systems generate continuous telemetry across infrastructure, applications, networks, cloud services, and third-party dependencies. Metrics, logs, traces, and topology maps now provide a level of technical transparency that would have been difficult to imagine only a few years ago.

Read Post

ScienceLogic

Read more about Visibility Isn't Reliability: Why Observability Alone Cannot Protect SLAs

IsDown is joining UptimeRobot

Jun 16, 2026 By Nuno Tomas In isDown

Today I'm sharing some big news. IsDown is joining UptimeRobot When I started IsDown, the idea was simple. Keeping track of outages across dozens of vendor status pages was painful, and I wanted to make it easy to see, in one place, when the services you depend on go down. Thousands of teams now rely on IsDown to do exactly that. Joining UptimeRobot is the natural next step.

Read Post

isDown

Read more about IsDown is joining UptimeRobot

Un-observable AI is Un-trustworthy AI

Jun 16, 2026 By Annie Freeman In Coralogix

Recently, someone talked Chipotle’s customer support agent into reversing a linked list – a task completely unrelated to burritos in any way. Screenshots circulated, people laughed, but underneath the joke sat a sharper question. If a production support agent will do that on a public channel, what else will it do that nobody is screenshotting? The bug is funny. The trust gap behind it is not.

Read Post

Coralogix

Read more about Un-observable AI is Un-trustworthy AI

Troubleshooting website connection failures with website monitoring RCA

Jun 15, 2026 By Bela Susan Thomas In Site24x7

Every engineer has a story about the outage that came out of nowhere. One moment everything is green. The next, your monitoring dashboard lights up red, your inbox fills faster than you can read it, and somewhere a customer is staring at a blank screen wondering if your business still exists.

Read Post

Site24x7

Read more about Troubleshooting website connection failures with website monitoring RCA

Alibaba Cloud monitoring: What changes when scale, speed, and cost collide

Jun 15, 2026 By Grace Nalini In ManageEngine

Alibaba Cloud monitoring isn't AWS or Azure monitoring with a different logo. The way its services scale, absorb load, and send early warning signals follows its own logic and if you're watching the wrong things, you'll find out too late. Cloud monitoring conversations often follow patterns set by AWS and Azure. The metrics are familiar, dashboards look the same, and operational playbooks are built around expected infrastructure behavior.

Read Post

ManageEngine

Read more about Alibaba Cloud monitoring: What changes when scale, speed, and cost collide

Troubleshooting website response time latency

Jun 15, 2026 By Bela Susan Thomas In Site24x7

Your dashboards may be telling a different story than what the customers are experiencing There's a version of a website problem that nobody talks about enough—the one where everything is technically fine. The site is up. The server is responding. No alerts have fired. And yet, somewhere out there, a user is watching a spinner rotate for the fifth second in a row, quietly losing faith in your product. This is what makes response time latency the most deceptive problem in web operations.

Read Post

Site24x7

Read more about Troubleshooting website response time latency

How Zero Trust is Reshaping Federal IT Strategy

Jun 15, 2026 By solarwindsinc In SolarWinds

Zero trust sparked a paradigm shift for federal agencies, changing the way they approach IT and data management as they "assume breach" from threat actors. Brian Chamberlain, Public Sector Business Development Lead at SolarWinds, explains how starting with observability helps federal agencies lay critical groundwork for meeting zero trust directives.

View Video

SolarWinds

Read more about How Zero Trust is Reshaping Federal IT Strategy

Observability: Are You Measuring What Actually Matters?

Jun 15, 2026 By Colin Burke In Honeycomb

Observability has always been important, and much like any core capability in your business, the value needs to be understood. For years, the value of observability was predictable. It was uptime, error rates, MTTR, and likely tool consolidation. That was enough to be able to show progress. These are foundational, tablestakes metrics—and they still matter, but they aren’t enough.

Read Post

Honeycomb

Read more about Observability: Are You Measuring What Actually Matters?

Overview of Custom Checks

Jun 15, 2026 By Uptime Website Monitoring In uptime

In this video, we’ll walk you through on how to set up and configure your Custom Checks in Uptime.com. Learn how to effectively monitor your automations and processes using Uptime.com’s Custom Checks. This tutorial covers Heartbeat and Incoming Webhook checks, ensuring your tasks run smoothly and delivering instant alerts when issues arise. Discover how to set up and configure these checks to maintain optimal performance.

View Video

uptime

Monitoring

Read more about Overview of Custom Checks

Kubernetes Monitoring: Datadog Alert to Lightrun Root Cause

Jun 15, 2026 By Lightrun Team In Lightrun

Datadog Kubernetes monitoring tells an SRE team what failed, which pod failed, and when. It does so within seconds of the alert firing. The investigation then stalls at the same point every time: nothing in the dashboard layer can prove why a specific request behaved the way it did inside a running JVM at the moment of failure. Variable values, feature flag evaluations, and code branches are never captured.

Read Post

Lightrun

Read more about Kubernetes Monitoring: Datadog Alert to Lightrun Root Cause

How to create User-Defined Datasets in Coralogix

Jun 15, 2026 By Coralogix In Coralogix

Learn how to create a user-defined dataset in Coralogix and route telemetry data into it using TCO policies with granular DataPrime expressions. In this walkthrough, you'll learn how to:• Create a new dataset with its own schema, permissions, retention, and cost visibility• Configure PBAC settings for governed access control• Route data using DataPrime expressions in TCO policies• Fan out events to multiple datasets from a single source.

View Video

Coralogix

Read more about How to create User-Defined Datasets in Coralogix

How to Reduce MTTR: 5 Proven Strategies for Enterprise IT Teams

Jun 15, 2026 By Motadata In Motadata

Every minute of downtime impacts your business. Mean Time to Resolution (MTTR) measures how quickly your team can resolve incidents and restore services. In this video, learn 5 proven ways to reduce MTTR using unified observability, AI-powered alert correlation, automated runbooks, and ITSM integration to resolve incidents faster and minimize downtime. In this video, you'll learn.

View Video

Motadata

Read more about How to Reduce MTTR: 5 Proven Strategies for Enterprise IT Teams

Your agent can now hit every Checkly API endpoint with one command!

Jun 15, 2026 By Checkly In Checkly

The new checkly api command is an authenticated pass-through to all 100+ Checkly API endpoints. Give it a path, and it handles your token and base URL. Full API coverage for your agents, no wrappers required.

View Video

Checkly

Read more about Your agent can now hit every Checkly API endpoint with one command!

Product Update - June 2026

Jun 15, 2026 By Hrishikesh Barua In IncidentHub

IncidentHub's latest product update includes private status ingestion for Microsoft Azure and Microsoft 365, a simpler UI for alerts configuration, an option to disable the public status page, and a better looking status page layout. Plus, support for more vendors (1070+ and counting). As always, I am grateful to all our customers and beta testers who have shared their feedback which has made IncidentHub better.

Read Post

IncidentHub

Read more about Product Update - June 2026

Find the Lookalike Domains Impersonating Your Brand: A Free Phishing & Typosquatting Scanner

Jun 15, 2026 By DNS Spy In DNS Spy

Somewhere out there, a domain that looks almost exactly like yours may already be registered. Maybe it swaps one letter. Maybe it uses a Cyrillic character that is visually identical to a Latin one. Maybe it just adds the word "login" or "secure" to your brand. These lookalike domains are the raw material of phishing, and most companies have no idea how many exist for their brand until something goes wrong.

Read Post

DNS Spy

Read more about Find the Lookalike Domains Impersonating Your Brand: A Free Phishing & Typosquatting Scanner

ClickHouse LowCardinality: When It Helps and When It Hurts

Jun 15, 2026 By Prathamesh Sonpatki In Last9

ClickHouse LowCardinality cuts storage and speeds up queries on low-cardinality columns, but backfires on trace IDs. How to tell the difference. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

Read Post

Last9

Read more about ClickHouse LowCardinality: When It Helps and When It Hurts

Python Error Tracking for Django, Flask, and FastAPI: A Practical Setup Guide

Jun 15, 2026 By Rollbar In Rollbar

Your Python app is throwing errors in production right now. Some of them are obvious: a 500 response, an angry Slack message from support. But most are quiet. A background task swallows an exception. A race condition surfaces only under load. A third-party API returns unexpected data and your code handles it by not handling it. If you’re relying on log files and user reports to find these, you’re debugging after the damage is done.

Read Post

Rollbar

Read more about Python Error Tracking for Django, Flask, and FastAPI: A Practical Setup Guide

PHP-FPM Performance Optimization: The Complete Tuning + Monitoring Guide

Jun 14, 2026 By Pavithra Parthiban In Atatus

When your PHP-based application starts attracting thousands of visitors, the way you run PHP becomes critical. A slow-loading page or a server crash during peak hours can cost you revenue, users, and reputation. PHP-FPM (PHP FastCGI Process Manager) is the default way most high-performance websites run PHP. While its default configuration works fine for small to medium workloads, high-traffic applications need custom tuning to handle large volumes of requests efficiently.

Read Post

Atatus

Read more about PHP-FPM Performance Optimization: The Complete Tuning + Monitoring Guide

Data quality issues show up faster when...

Jun 12, 2026 By Virtana In Virtana

AI can't reason about what it can't see because AI requires evidence or the observed signals. Correlation requires co-occurrence. If it is not in the data set it doesn't exist to the machine. So the key insight is if there are omitted variables, there are going to be blind spots forever.

View Video

Virtana

Read more about Data quality issues show up faster when...

Why Your Agentic Workflow Succeeds and Still Gets It Wrong

Jun 12, 2026 By Lightrun Team In Lightrun

Agentic workflows are reshaping how engineering teams operate, fetching context, synthesizing decisions, and shipping results across systems without human intervention. But the same design that makes them powerful adds risk in production. Agents do not crash when they hit bad data; they synthesize around it, substituting a stale value, an empty page, or a missing field for the result they were supposed to capture.

Read Post

Lightrun

Read more about Why Your Agentic Workflow Succeeds and Still Gets It Wrong

How to Troubleshoot High CPU Usage on Network Devices

Jun 12, 2026 By Andrii Kernitskyi In Obkio

Most network teams only find out their firewall is overloaded after users start complaining. A slow VPN, dropped calls, and random packet loss at 2 pm every day. The usual suspects get blamed first: the ISP, the switch, the application server. The firewall gets a pass because the dashboard says 40% CPU and everything looks fine. Here is the problem with that picture. Standard SNMP monitoring polls every 5 minutes. A CPU spike that peaks at 95% and recovers within 90 seconds never shows up.

Read Post

Obkio

Read more about How to Troubleshoot High CPU Usage on Network Devices

Generate Synthetic Time Series Data in InfluxDB 3

Jun 12, 2026 By Cole Bowden In InfluxData

Getting InfluxDB 3 up and running is a pretty lightweight process with the installation script. Getting time series data into it is the next step, and for exploration, basic testing, or scenarios where you don’t have a stream of time series data ready to write, that can be a point of friction. That hurdle is particularly high when you want to test the rest of the system around the data you’d be writing.

Read Post

InfluxData

Read more about Generate Synthetic Time Series Data in InfluxDB 3

Your Monitoring Stack Wasn't Designed. It Was Procured.

Jun 12, 2026 By John Williams In eG Innovations

The 2am war room hasn’t gone anywhere. Ten years after Gartner coined the term AIOps, the platforms are bought, the licenses are renewed, the dashboards are live — and serious incidents still get resolved by engineers paging across multiple consoles, trying to work out where the fire actually is. MTTR has barely moved. Alert fatigue hasn’t eased. The outcomes the category promised, in most enterprises, have not arrived. Matt Lowe’s recent article on AIOps names the shortfall well.

Read Post

eG Innovations

Read more about Your Monitoring Stack Wasn't Designed. It Was Procured.

Better, faster, less wrong: Enhancing issue grouping

Jun 12, 2026 By Kush Dubey In Sentry

Sentry’s job is to tell you when your app breaks. To do that, we group individual errors into issues. First by fingerprinting, which lexically matches errors based on their structure, then by an AI fallback: when fingerprinting can’t find a match, an ML model compares the new error’s stacktrace against existing issues and merges it if they’re semantically similar.

Read Post

Sentry

Read more about Better, faster, less wrong: Enhancing issue grouping

How AI is Reshaping IT Operations Management

Jun 12, 2026 By Motadata In Motadata

AI is transforming IT operations through automated incident response, intelligent event correlation, predictive analytics, and agentic AI. But while technology is evolving rapidly, human judgment and strategic decision-making remain essential. In this video, explore what's changing in IT operations, what isn't, and how IT leaders can prepare for an AI-driven future with AIOps, observability, and automation. Learn how Motadata helps organizations build smarter, more proactive IT operations.

View Video

Motadata

Read more about How AI is Reshaping IT Operations Management

Avantra + SAP Cloud ALM Demo: Two-way Cloud ALM sync in action across your entire hybrid estate.

Jun 12, 2026 By Avantra In Avantra

An SAP Cloud ALM Silver Partner, Avantra 26 delivers a production-ready SAP Cloud ALM integration — two-way sync of system data and alerts, multi-tenant Cloud ALM visibility, and the ability to act on Cloud ALM systems directly within Avantra. One platform for RISE, hybrid, and everything beyond.

View Video

Avantra

Read more about Avantra + SAP Cloud ALM Demo: Two-way Cloud ALM sync in action across your entire hybrid estate.

Avantra 26 next-gen automation: self-service SAP workflows with full guardrails

Jun 12, 2026 By Avantra In Avantra

Avantra 26's next-gen automation experience puts SAP automations in the hands of your users — through guided wizards with scoped permissions, lifecycle notifications, and a full audit trail. Watch this demo of SAP client settings (SCC4) change on a RISE with SAP S/4HANA system: configured in five steps, executed automatically, documented end to end. Avantra customers reduce manual operational effort by up to 70%. Now you're really running.

View Video

Avantra

Read more about Avantra 26 next-gen automation: self-service SAP workflows with full guardrails

Avantra 26 Overview: AI-powered SAP operations across your entire hybrid estate.

Jun 12, 2026 By Avantra In Avantra

Avantra 26 brings AI root cause analysis, SAP Cloud ALM integration, expanded BTP visibility, and next gen automation together in one platform. Avantra AIR investigates incidents the moment they're detected and surfaces a structured diagnosis with next steps, cutting resolution times by 60% and turning hours of expert triage into seconds. As an SAP Cloud ALM Silver Partner, Avantra delivers production-ready, two-way synchronisation of systems and alerts across multiple Cloud ALM tenants.

View Video

Avantra

Read more about Avantra 26 Overview: AI-powered SAP operations across your entire hybrid estate.

13 Best Observability Tools in 2026 [Top-Picked]

Jun 12, 2026 By Jagdish Sajnani In Motadata

How many tools does your team open before anyone can say why production is slow? If the answer is more than two, you are paying for that gap in engineering hours every week. We understand the frustration. So we did the research work for you to help you pick the best observability tools.

Read Post

Motadata

Read more about 13 Best Observability Tools in 2026 [Top-Picked]

What is Cloud Security - Explained in 5 minutes

Jun 12, 2026 By Sysdig In Sysdig

Cloud security isn't just about locking things down — it's about staying ahead of threats in fast-moving, dynamic environments. In this video, Kat breaks down what cloud security actually means in 2024 and why traditional approaches don't cut it anymore. In this video: Whether you're securing containers, Kubernetes workloads, or multi-cloud infrastructure, this is your foundation. Subscribe for more cloud security explainers, tutorials, and best practices from Sysdig.

View Video

Sysdig

Read more about What is Cloud Security - Explained in 5 minutes

Trying Claude Code hacks to see what actually makes me more productive

Jun 12, 2026 By Coralogix In Coralogix

#claudecode #anthropic #observability

View Video

Coralogix

Read more about Trying Claude Code hacks to see what actually makes me more productive

Building More Resilient Multi-Cloud Operations

Jun 12, 2026 By Dallon Robinette In Selector

The last post in this series looked at how disconnected alerts can slow incident response and how stronger correlation helps teams investigate issues with more clarity. That same operational context has value beyond triage. It also plays an important role in resilience, service assurance, and the ability to maintain confidence across increasingly complex multi-cloud environments. Resilience depends on more than reacting well during an outage.

Read Post

Selector

Read more about Building More Resilient Multi-Cloud Operations

Home Sweet Hybrid - There's more to your transition than core ERP

Jun 11, 2026 By Avantra Team In Avantra

Most large enterprises elect a hybrid approach to SAP operations, but this is a more recent trend. And, as SAP operations professionals, we are still learning about the impacts of this choice and approach, even though it’s often the most sensible and pragmatic. Years ago, everything ran on-premises.

Read Post

Avantra

Read more about Home Sweet Hybrid - There's more to your transition than core ERP

UK Public Sector AI ambitions hindered by fragmented IT environments

Jun 11, 2026 By SolarWinds In SolarWinds

New SolarWinds data highlights widespread fragmentation and infrastructure challenges, limiting AI's impact and scalability across public sector services.

Read Post

SolarWinds

Read more about UK Public Sector AI ambitions hindered by fragmented IT environments

Tencent Cloud: When systems start reacting to themselves

Jun 11, 2026 By Grace Nalini In ManageEngine

Distributed systems don't just fail. They adapt. Services in Tencent Cloud environments are tightly interconnected. Compute, load balancing, databases, and networking layers continuously respond to each other based on changing conditions. Under normal load, this coordination stays in the background. As pressure builds, the behavior shifts. The system does not degrade in a straight line. Instead, it starts adjusting itself.

Read Post

ManageEngine

Read more about Tencent Cloud: When systems start reacting to themselves

Introducing the StatusGator Notion Integration

Jun 11, 2026 By Valeria Kurolapova In StatusGator

Many teams use Notion as the central hub for documentation, runbooks, incident response, and operational planning. When an outage occurs, the last thing you want is for responders to jump between multiple tools searching for information about the health of critical vendors and dependencies. That’s why we’re excited to introduce the StatusGator Notion integration.

Read Post

StatusGator

Read more about Introducing the StatusGator Notion Integration

No SAP Expertise? No Problem. Automation Just Got Easier

Jun 11, 2026 By Avantra Team In Avantra

Avantra 21 introduced the concept of Automation workflows. By Avantra 23, compatibility with Ansible was added. Along the way, Avantra became the tool SAP teams reached for when they wanted system copies, refreshes, and backup orchestration to just run — predictably, on a schedule, without a senior engineer babysitting them. So, automation isn’t new to Avantra users. Avantra made automation for SAP practical to deploy, predictable in operation, and easier to maintain.

Read Post

Avantra

Read more about No SAP Expertise? No Problem. Automation Just Got Easier

The Next Evolution of Infrastructure Observability

Jun 11, 2026 By Kristy Slimmer In Galileo

Operational visibility is becoming increasingly important as infrastructure teams are asked to support AI initiatives, automation goals, cost accountability, modernization efforts, and growing operational complexity at the same time. Most are expected to do it without expanding headcount, introducing additional risk, or rebuilding the environment from scratch. Those expectations are changing the role of infrastructure operations.

Read Post

Galileo

Read more about The Next Evolution of Infrastructure Observability

How to Choose the Right Server Monitoring Tool: A Step By Step Guide for 2026

Jun 11, 2026 By Jagdish Sajnani In Motadata

How do you pick one server monitoring tool when every vendor page promises the same thing? A few years ago, two monitoring vendor websites showed you two different products. Today you can open five and read nearly the same feature list on each one. Real-time dashboards, instant alerts, AI everywhere. That sameness has made evaluation harder than ever. The marketing tells you nothing, and the wrong choice follows your team for years, either as features nobody opens or as the one missed alert at 2 a.m.

Read Post

Motadata

Read more about How to Choose the Right Server Monitoring Tool: A Step By Step Guide for 2026

How Skylar MCP Gives Agentic Workflows the Operational Context to Act With Confidence

Jun 11, 2026 By ScienceLogic In ScienceLogic

AI models can reason over language, summarize findings, and explain patterns. What they cannot do on their own is see the real-time operational state of your environment. Ask a model about a critical incident and it will answer from whatever context it is given, which means the answer is only as trustworthy as the input. In operations and compliance workflows, an answer is only useful if it is grounded in current service context and governed access to the systems that define reality.

Read Post

ScienceLogic

Read more about How Skylar MCP Gives Agentic Workflows the Operational Context to Act With Confidence

ChangeTower User Stories - Turning Public Web Changes into Recruitment Pipeline

Jun 11, 2026 By ChangeTower In ChangeTower

For modern business teams, the public web is the single largest source of competitive and market intelligence — and one of the hardest to keep up with. Compliance teams track changes to regulations, policies, and terms. Competitive intelligence teams watch rivals’ pricing, positioning, and personnel. Recruiters and business developers monitor hiring activity that signals new opportunities. In every case, the value lies in noticing a change before anyone else does.

Read Post

ChangeTower

Read more about ChangeTower User Stories - Turning Public Web Changes into Recruitment Pipeline

Catch visual regressions with Snapshots, now in beta

Jun 11, 2026 By Max Topolsky In Sentry

Sentry Snapshots diffs screenshots on every commit and blocks the PR if there are any visual changes so you can confirm they’re intentional. Users don’t interact with code, they interact with something they can see and touch. Snapshots gives you a lightweight way to test it. It’s easier than ever to change code. It’s also easier than ever to trade quality for speed. Modern codebases need guardrails to ensure correctness.

Read Post

Sentry

Read more about Catch visual regressions with Snapshots, now in beta

How to use Postman Visualizer: a step-by-step guide

Jun 11, 2026 By Blog In Squared Up

API responses are often easier to understand when they are displayed visually instead of as raw JSON. While Postman is widely used for testing APIs, many developers overlook one of its most useful features which is the Postman Visualizer. While it is not as fully featured as a dedicated dashboarding platform like SquaredUp, it is a great way to quickly visualize API responses during development and debugging.

Read Post

Squared Up

Read more about How to use Postman Visualizer: a step-by-step guide

Federated Search | From Silos to Insight | Azure Blob Schema Discovery with Splunk's Crawler

Jun 11, 2026 By Splunk In Splunk

This walk-through shows how Splunk's Cloud can discover schema and partition keys for Microsoft Azure Blob Storage datasets and create searchable Splunk managed tables. Once the data is mapped, analysts can use Splunk Federated Search to query Azure Blob data where it lives, bringing cloud-resident logs into security, observability, and operational work-flows without re-ingesting the data.

View Video

Splunk

Read more about Federated Search | From Silos to Insight | Azure Blob Schema Discovery with Splunk's Crawler

Monitoring Protocols Compared - Which Standard for What

Jun 11, 2026 By Lionel Porcheron In Bleemeo

Modern applications are distributed, ephemeral and built from a dozen moving parts. To keep them reliable, you need real visibility: not just “is the server up?”, but“how is this request behaving, right now, across every component it touches?”. The good news is that the observability world has converged on a handful of open standards.

Read Post

Bleemeo

Read more about Monitoring Protocols Compared - Which Standard for What

Finding the Slow Query Killing Your Rails App

Jun 11, 2026 By Tarun Singh In AppSignal

Performance problems in Rails applications are sneaky. Generally speaking, nobody opens tickets that say “my application is slower than it was last month (about 20%)”. What you do get instead are vague complaints from team members about a p95 latency that is climbing every week or a background job that used to take 2 seconds now taking 40 seconds to finish.

Read Post

AppSignal

Read more about Finding the Slow Query Killing Your Rails App

Satellite Telemetry, ITAR, and Data Residency: Building Architecture for Speed and Control

Jun 11, 2026 By Allyson Boate In InfluxData

Satellite mission operators depend on telemetry to understand spacecraft health, ground system performance, and mission status in real-time. Operation signals help teams identify risks, investigate anomalies, and keep operations moving. When a spacecraft enters safe mode or signal strength drops during a contact window, teams need trusted telemetry immediately. But mission data moves quickly across operational systems, and every handoff makes it harder to control.

Read Post

InfluxData

Read more about Satellite Telemetry, ITAR, and Data Residency: Building Architecture for Speed and Control

Visualising Claude Code telemetry in SquaredUp

Jun 11, 2026 By Blog In Squared Up

Engineering teams are shipping more AI-generated code than ever, but at what cost? Learn how to build a telemetry pipeline to monitor Claude Code usage and costs directly in SquaredUp. It is estimated that 85-90% of engineering teams are now using AI coding assistants such as Claude, Codex and Cursor. This is not just for small-scale pilot projects— around 40% of all code now being shipped is AI-generated, and in start-ups the figure is around 95%. This can result in incredible productivity gains.

Read Post

Squared Up

Read more about Visualising Claude Code telemetry in SquaredUp

Proactive Alerting with AIOps

Jun 11, 2026 By Jirka Knapek In WhatsUp Gold

Modern IT environments generate huge volumes of telemetry across infrastructure, applications, cloud services, and networks. Teams now have more data than ever, but that does not automatically lead to better decisions. In many organizations, the real problem is no longer visibility alone. It is the ability to identify which signals matter, understand what they mean, and respond before users or business services are affected.

Read Post

WhatsUp Gold

Read more about Proactive Alerting with AIOps

Safeguard Revenue and Brand Trust with Full-Stack Visibility

Jun 11, 2026 By LogicMonitor In LogicMonitor

The quick download: Most observability strategies overlook the internet layer that underpins every user’s digital experience, leaving it almost entirely unmonitored. Most IT teams monitor servers, networks, and applications, yet the infrastructure layer that carries traffic to users remains largely unmonitored.

Read Post

LogicMonitor

Read more about Safeguard Revenue and Brand Trust with Full-Stack Visibility

Seven Straight Years of Verified Customer Trust

Jun 10, 2026 By ScienceLogic In ScienceLogic

Seven years ago, our customers started telling the world what the ScienceLogic AI Platform does for their operations. They haven’t stopped. For the seventh consecutive year, that steady stream of verified customer reviews has earned the ScienceLogic AI Platform a TrustRadius Top Rated award, again. Seven years in a row shows that customers keep choosing to share their experience because the platform keeps delivering value. This recognition doesn’t come from us.

Read Post

ScienceLogic

Read more about Seven Straight Years of Verified Customer Trust

How Managed Digital Employee Experience (DEX) Supports Smarter Device Refresh Decisions

Jun 10, 2026 By Teneo In Teneo

Let’s face it, refreshing devices used to be a guessing game. IT teams would swap out laptops and desktops on a fixed schedule, hoping to keep everyone happy and productive. But in today’s hybrid, cloud-first world, that old approach just doesn’t work. Employees expect seamless experience, and businesses can’t afford to waste money on unnecessary upgrades or risk productivity dips from outdated tech. That’s where Digital Employee Experience (DEX) comes in.

Read Post

Teneo

Read more about How Managed Digital Employee Experience (DEX) Supports Smarter Device Refresh Decisions

Graviton5 in Production at Honeycomb: Per-service Results From the m8g to m9g Migration

Jun 10, 2026 By Liz Fong-Jones In Honeycomb

This is the fourth installment in the Graviton retrospective series we've been writing since 2021. The methodology is the same one I always reach for: hold the workload constant, run both generations on the same Kubernetes namespace concurrently, and let the per-pod numbers speak.

Read Post

Honeycomb

Read more about Graviton5 in Production at Honeycomb: Per-service Results From the m8g to m9g Migration

Balance AI innovation and governance with Sumo Logic AI and ML apps

Jun 10, 2026 By Margaret Selid In Sumo Logic

AI is changing how teams work. Developers are generating code faster, security teams are automating investigations, and employees across the business are using AI tools to accelerate research, content creation, and decision-making. But this adoption comes with a catch. As usage explodes, it introduces a new set of security risks: a rapidly expanding attack surface, faster attack timelines, potential data exposure, and an alarming lack of visibility into how these tools are being used.

Read Post

Sumo Logic

Read more about Balance AI innovation and governance with Sumo Logic AI and ML apps

G2 Names Auvik Network Management & Monitoring Leader Across Summer 2026 Reports

Jun 10, 2026 By Momoko Ishida In Auvik

G2 reports are built around what customers say about the products they use. In the Summer 2026 Reports, that feedback helped Auvik earn top recognition across Grid Reports and Index Reports for Network Management Tools and Network Monitoring. Try Auvik Network Management Free to try! Setup takes less than 15 minutes and you will see results in an hour. Learn more now.

Read Post

Auvik

Read more about G2 Names Auvik Network Management & Monitoring Leader Across Summer 2026 Reports

The Real Cost of Custom Code: Why Buying a Unified Middleware Management Platform Protects Enterprise IT Budgets

Jun 10, 2026 By Jennifer Knutel In meshIQ

Building custom middleware monitoring appears cost-effective but creates expensive maintenance debt, fragmented visibility, and operational risk. Enterprise teams spend 60-80% of IT budgets on software maintenance while unified platforms deliver immediate, production-ready capabilities.

Read Post

meshIQ

Read more about The Real Cost of Custom Code: Why Buying a Unified Middleware Management Platform Protects Enterprise IT Budgets

DASH 2026 Keynote

Jun 10, 2026 By Datadog In Datadog

At, Datadog launched 100+ capabilities to help customers drive autonomy and manage growing AI and security complexity. From new Bits AI, log management, and security capabilities, customers have the visibility and autonomous operations they need to detect, investigate and resolve issues across the development loop and data lifecycle. Tune in to the full keynote to catch the highlights.

View Video

Datadog

Read more about DASH 2026 Keynote

Monitoring Docker Containers with Icinga

Jun 10, 2026 By Blerim Sheqa In Icinga

A container reporting “up” tells you the process is running, not that the workload is healthy – but that caveat is true of any service, on a container or a bare server.

Read Post

Icinga

Read more about Monitoring Docker Containers with Icinga

Unified visibility for MSPs

Jun 10, 2026 By Blog In Squared Up

MSPs operate in high-pressure, customer-first environments where understanding exactly what is happening across client environments is the difference between proactive service and constant firefighting. Yet, many still struggle to get a unified view across their customers, tools, and teams.

Read Post

Squared Up

Read more about Unified visibility for MSPs

Getting started with Prometheus dashboards

Jun 10, 2026 By Blog In Squared Up

Prometheus is a wildly popular open source monitoring tool typically used for monitoring Kubernetes environments and containerized workloads. But how do you turn the mountains of metrics into a clear picture of health and performance? SquaredUp plugs directly into your Prometheus database to visualize and monitor your data. What sets SquaredUp apart from other Prometheus visualization options like Grafana and Perseus is just how easy it is to visualize, monitor and share Prometheus dashboards.

Read Post

Squared Up

Read more about Getting started with Prometheus dashboards

Episode 12 - Human Choices in an AI Future (Part 2)

Jun 10, 2026 By Digitate In Digitate

What does it actually take to thrive in an AI-driven world, not just survive it? In part two of his conversation with Karthik Ravindran, General Manager of Enterprise Data and AI at Microsoft, host Tom Stoneman digs into the human qualities that no model can replicate.

View Video

Digitate

Read more about Episode 12 - Human Choices in an AI Future (Part 2)

Store and search high-volume logs with ClickHouse and Datadog

Jun 10, 2026 By Andy Lihani In Datadog

As teams scale AI and agentic workloads, log volumes can grow fast. That growth can force teams into a difficult trade-off: Keep logs searchable in their existing workflows, or store them cost-effectively for longer periods. For teams that rely on logs during incident response, compliance reviews, and long-running investigations, losing either affordability or searchability can slow down troubleshooting. Datadog and ClickHouse are partnering to help remove that trade-off.

Read Post

Datadog

Read more about Store and search high-volume logs with ClickHouse and Datadog

Automated Network Documentation 101: What You Need to Know to Get Started

Jun 10, 2026 By Mike Grodzki In Auvik

Network documentation has a way of becoming everyone’s problem and nobody’s responsibility. Over time, diagrams become outdated, configuration changes go undocumented, and critical knowledge ends up living in the heads of a few senior technicians instead of somewhere the entire team can access it. That’s why organizations are turning to automated network documentation.

Read Post

Auvik

Read more about Automated Network Documentation 101: What You Need to Know to Get Started

Explore Grafana for Free with Grafana Play

Jun 10, 2026 By Grafana In Grafana

Want to try Grafana without installing anything? Jump into Grafana Play, our free sandbox environment where you can explore dashboards, experiment with features, and see Grafana in action, no login or setup required.

View Video

Grafana

Read more about Explore Grafana for Free with Grafana Play

Grafana Tempo: The distributed tracing journey to 3.0 (June 2026 Community Call)

Jun 10, 2026 By Grafana In Grafana

Our distributed tracing journey from the inception of Tempo to 3.0. Can't comment in the chat? You may need to create a channel. Grafana Cloud is the easiest way to get started with Grafana dashboards, metrics, logs, traces, and profiles.

View Video

Grafana

Read more about Grafana Tempo: The distributed tracing journey to 3.0 (June 2026 Community Call)

Alerts That Tell You Why, Not Just What

Jun 10, 2026 By Checkly In Checkly

Alerts tell you something broke, but not why. So you're stuck digging through logs and trace IDs. Checkly monitors your app from the outside, like a real user. And Rocky, the Checkly agent, automatically pulls the right context to provide a root cause analysis for any failed check.

View Video

Checkly

Read more about Alerts That Tell You Why, Not Just What

Why Fable 5 is faster on Claude Code

Jun 10, 2026 By Coralogix In Coralogix

I tested the new Anthropic model Fable 5 truly is using data, not vibes. Why does it feel faster? Does it actually cost double? Is it better at coding?

View Video

Coralogix

Read more about Why Fable 5 is faster on Claude Code

What is Automated Patch Management?

Jun 10, 2026 By Motadata In Motadata

Learn why manual patch management creates unnecessary risk for IT teams and how automated patch management helps organizations improve security, compliance, and operational efficiency. Discover how automation eliminates repetitive tasks, reduces human error, prioritizes critical vulnerabilities, and accelerates patch deployment across the entire IT environment.

View Video

Motadata

Read more about What is Automated Patch Management?

Nathen Harvey: Scale Brilliance, Not Bottlenecks: Building Platforms for the AI-First World

Jun 10, 2026 By Honeycomb In Honeycomb

Watch Nathen Harvey's full talk at O11yCon 2026, Honeycomb's observability conference, and enjoy Christine Yen's intro as well.

View Video

Honeycomb

Read more about Nathen Harvey: Scale Brilliance, Not Bottlenecks: Building Platforms for the AI-First World

Canvas, MCP, and Claude: How Liz Fixed Three Bugs During a Conference

Jun 10, 2026 By Honeycomb In Honeycomb

In this demo, Liz and Kale talk through a slow query that Liz couldn't get out of her head. During a conference, she set out to solve it... and ended up finding two more bugs to fix with, Honeycomb MCP, and Honeycomb Canvas.

View Video

Honeycomb

Read more about Canvas, MCP, and Claude: How Liz Fixed Three Bugs During a Conference

June 2026 at Bindplane: Monitor our own AI with Bindplane, Sentinel goes native, and configs get rollbacks

Jun 10, 2026 By Adnan Rahic In ObservIQ

If there was a theme this month, it was making the hard parts of a telemetry pipeline less risky. For SIEM customers, we shipped an ASIM-native Microsoft Sentinel destination and automatic OCSF mapping in Pipeline Intelligence, two of the most-requested pieces for teams routing security data through Bindplane. On the platform side, we added config rollback, which turns "I changed something and now it is behaving differently" into a one-click trip back to a known-good version.

Read Post

ObservIQ

Read more about June 2026 at Bindplane: Monitor our own AI with Bindplane, Sentinel goes native, and configs get rollbacks

Why Your Vendor Monitoring Strategy Has a Blind Spot: The Case for Continuous TPRM

Jun 10, 2026 By OpsMatters In OpsMatters

You monitor everything. Network traffic, application performance, authentication events, infrastructure health. If something meaningful changes in your environment, you have a signal for it. That discipline is foundational to how modern IT and security operations work. But there is one part of your stack you almost certainly cannot see in real time: your vendors.

Read Post

OpsMatters

Read more about Why Your Vendor Monitoring Strategy Has a Blind Spot: The Case for Continuous TPRM

How to Size Infrastructure When Hardware Delays and Cost Pressure Change the Equation

Jun 9, 2026 By Kristy Slimmer In Galileo

Sizing infrastructure has always required a balance between performance, capacity, and risk. What has changed is the level of precision required to make those decisions. Hardware timelines are less predictable. Costs are under closer review. Decisions that were once routine now require clear justification. In many cases, the question is no longer just how much capacity is needed, but whether that capacity can be delivered when it is needed and whether the investment will hold up under scrutiny.

Read Post

Galileo

Read more about How to Size Infrastructure When Hardware Delays and Cost Pressure Change the Equation

Time to move to the StatusGator v3 API: What v2 users need to know

Jun 9, 2026 By Colin Bartlett In StatusGator

We launched the StatusGator v3 REST API back in October, and it has only gotten better since. v3 is a ground-up redesign built around organization-level API tokens, a consistent response format, opaque string IDs, pagination, and a large set of write endpoints for managing monitors, incidents, and subscribers. We have kept shipping new capabilities for it, and we will keep doing so. v2, on the other hand, is done.

Read Post

StatusGator

Read more about Time to move to the StatusGator v3 API: What v2 users need to know

Discovering Entities in SolarWinds Observability Self-Hosted

Jun 9, 2026 By solarwindsinc In SolarWinds

Resource Links SolarWinds Observability Self-Hosted version 2026.2 Blog Post: SolarWinds Observability Self-Hosted version 2026.2 Release Notes.

View Video

SolarWinds

Read more about Discovering Entities in SolarWinds Observability Self-Hosted

Infinite Cardinality Metrics: Custom metrics built for modern systems

Jun 9, 2026 By Josh Mirchin In Datadog

Every technology shift adds new context you need to measure. Cloud computing added regions and services. Kubernetes added containers and pods. Multi-tenant applications added users and tenants. AI systems add models, prompts, agents, and execution paths. The result is that metrics are becoming dramatically more dimensional, faster than ever before. Over time, engineers are forced to make tradeoffs.

Read Post

Datadog

Read more about Infinite Cardinality Metrics: Custom metrics built for modern systems

Devolutions Makes an Important Investment in Obkio

Jun 9, 2026 By Andrii Kernitskyi In Obkio

Obkio is proud to announce an important investment from Devolutions, one of the most respected names in IT security and remote access management. This investment marks a new chapter for Obkio as we accelerate our next phase of growth. This isn't a financial transaction between strangers. It's a partnership between two companies that have spent years building tools for the same people: IT professionals, sysadmins, MSPs, and the network engineers who keep critical network infrastructure running.

Read Post

Obkio

Read more about Devolutions Makes an Important Investment in Obkio

Building Enterprise Momentum Across APAC: A Conversation with Dave Patnaik

Jun 9, 2026 By John Grosshans In LogicMonitor

There’s a lot happening across Asia Pacific right now. Enterprises are moving quickly to modernize operations, adopt AI, and manage growing complexity across increasingly distributed environments, and the opportunity ahead for LogicMonitor in the region continues to grow alongside it. That’s why I’m especially excited to welcome Dave Patnaik to LogicMonitor as our new Vice President of APAC.

Read Post

LogicMonitor

Read more about Building Enterprise Momentum Across APAC: A Conversation with Dave Patnaik

Oops! All Robots | SolarWinds TechPod

Jun 9, 2026 By solarwindsinc In SolarWinds

Chrystal Taylor and Sean Sebring explore the fascinating world of robotics, from consumer devices to advanced medical robots, and discuss the future of humanoid and non-humanoid robots in our lives. Chrystal and Sean, with guest Andy Garibay, explore the future of robotics, focusing on form, function, and societal impact. They debate why certain designs, such as dogs, dominate, the potential of automation in construction and security, and the ethical considerations of humanoid robots.

View Video

SolarWinds

Read more about Oops! All Robots | SolarWinds TechPod

Nishi Bhonsle of Salesforce at O11yCon: Speaker Highlight Reel

Jun 9, 2026 By Honeycomb In Honeycomb

In her talk at O11yCon 2026, Nishi Bhonsle of Salesforce talked about,, and provided some great examples of how Honeycomb has helped Salesforce issues in seconds. Here's a 4-minute highlight reel.

View Video

Honeycomb

Read more about Nishi Bhonsle of Salesforce at O11yCon: Speaker Highlight Reel

From API to live dashboard - building a SquaredUp plugin with AI

Jun 9, 2026 By Blog In Squared Up

No matter how fast we build, we'll never integrate with every tool. There are too many, new ones appear constantly, and some are too niche to ever reach the top of our roadmap. So if the tool you care about isn't supported yet, your options have been to wait for us to get to it, or build it yourself with our Web API plugin — a powerful, flexible option, though one that asks you to map out the endpoints, authentication and paging yourself.

Read Post

Squared Up

Read more about From API to live dashboard - building a SquaredUp plugin with AI

Building a Predictive Maintenance Plugin with the InfluxDB 3 Processing Engine

Jun 9, 2026 By Charles Mahler In InfluxData

Predictive maintenance is one of the most compelling use cases for time series data. Instead of waiting for equipment to fail or servicing it on a fixed calendar regardless of condition, you watch the live sensor data and act when it indicates that a failure is coming. That “watch the data and act” loop is exactly what the InfluxDB 3 Processing Engine was built for.

Read Post

InfluxData

Read more about Building a Predictive Maintenance Plugin with the InfluxDB 3 Processing Engine

Why Security Teams Spend So Much Time Reconciling Data

Jun 9, 2026 By Teneo In Teneo

Security teams today are managing growing volumes of cybersecurity data across increasingly complex environments. This blog explores the hidden operational cost of disconnected tools, manual data reconciliation, and fragmented reporting, and how Teneo’s Cyber Asset Attack Surface Management (CAASM), powered by ThreatAware, helps organizations create a more unified and trusted view across their security estate. Most organizations are not short of security tools.

Read Post

Teneo

Read more about Why Security Teams Spend So Much Time Reconciling Data

Why Most Organizations Still Don't Know What's Protected

Jun 9, 2026 By Teneo In Teneo

Organizations invest heavily in cybersecurity tools, yet many still struggle to confidently understand what is actually protected across their environment. This blog explores how disconnected systems, unknown assets, and inconsistent data create blind spots, and how Teneo’s Cyber Asset Attack Surface Management (CAASM), powered by ThreatAware, helps organizations gain a trusted view of security coverage.

Read Post

Teneo

Read more about Why Most Organizations Still Don't Know What's Protected

Native ASIM Ingestion for Microsoft Sentinel, Now in Bindplane

Jun 9, 2026 By Ekansh Gupta In ObservIQ

If you're sending security data to Microsoft Sentinel, you now have a faster path. A new ASIM mode lands your logs directly in Sentinel's native ASIM tables: no custom tables to predefine, no schema to design before data flows. We added ASIM mode to the Microsoft Sentinel destination, backed by a new ASIM standardization processor that converts raw logs to ASIM in the pipeline and routes each record to the table it belongs in. Here's how it works, and why we built it this way.

Read Post

ObservIQ

Read more about Native ASIM Ingestion for Microsoft Sentinel, Now in Bindplane

If You Are Building a Startup from a Vibe-Coded App, Don't Skip This #devops #programming #ai

Jun 9, 2026 By SigNoz - Open Source Observability Platform In SigNoz

Everyone is vibe coding products right now. But most applications are missing one crucial thing: Observability. In this video, I talk about: You can literally start this weekend: If you are turning your vibe-coded app into a real startup, observability should not be an afterthought.

View Video

SigNoz

Read more about If You Are Building a Startup from a Vibe-Coded App, Don't Skip This #devops #programming #ai

Turn Datadog findings into automated code fixes with Bits Code

Jun 9, 2026 By Datadog In Datadog

Engineering teams lose hours in the gap between detecting a problem and getting a fix into review. An on-call engineer sees an error spike in Datadog, pivots to traces and logs to isolate the failure, opens the relevant repository, reproduces the issue, writes a fix, adds tests, waits on CI, and finally opens a pull request. Even when the problem is familiar, the workflow pulls engineers across several tools and stretches remediation from minutes into hours or days.

Read Post

Datadog

Read more about Turn Datadog findings into automated code fixes with Bits Code

Monitor Memory Where Allocations Occur

Jun 9, 2026 By Coralogix In Coralogix

Kubernetes dashboards often mask a system infrastructure failure. When a critical application crashes, it often points to an Out-of-Memory event. Even while standard CPU metrics appear completely healthy. This quick walkthrough shows you how Coralogix integrates continuous memory profiling directly into your production environment. We pair OpenTelemetry trace data with continuous background sampling via the Async Profiler. It helps teams isolate resource heavy code paths before they trigger system degradation.

View Video

Coralogix

Read more about Monitor Memory Where Allocations Occur

Otel Visual Builder

Jun 9, 2026 By Coralogix In Coralogix

Stop editing raw YAML by hand. Discover how to build, validate, and scale complex OTel pipelines visually with the Visual Builder for Coralogix Fleet Management.

View Video

Coralogix

Read more about Otel Visual Builder

DASH 2026 Operating at Scale: Guide to Datadog's newest announcements

Jun 9, 2026 By Datadog In Datadog

A challenge for many teams continues to be managing cost, governance, and reliability across an ever-larger footprint. This year’s DASH announcements help teams operate efficiently at scale, with new tools to cut cloud and AI spend, eliminate waste automatically, maintain observability during outages, and manage many organizations and agents as a single unit.

Read Post

Datadog

Read more about DASH 2026 Operating at Scale: Guide to Datadog's newest announcements

Get reliable answers to business questions with Bits Data Analysis

Jun 9, 2026 By Jonathan Morin In Datadog

Teams are wiring AI coding agents straight to their warehouse over MCP and asking things like “What was our revenue by channel in Q2?” The agent finds a revenue table, runs a query, and returns a number in seconds, with no waiting on the data team. While the answer initially looks right, the problem is that the number is often wrong.

Read Post

Datadog

Read more about Get reliable answers to business questions with Bits Data Analysis

Autonomously monitor for impactful degradations with Bits Detection

Jun 9, 2026 By Samantha Scaglione In Datadog

Monitoring is built around the system a team understands at a point in time. Engineers add endpoints, move dependencies, and change user flows every day. Over time, that creates coverage drift as monitors keep reflecting the system as it used to behave, while changing paths introduce failure modes that teams didn’t yet know to watch for. Bits Detection automatically creates, tunes, and maintains monitors for your services.

Read Post

Datadog

Read more about Autonomously monitor for impactful degradations with Bits Detection

Stop Guessing Why Your Pods Are Crashing

Jun 9, 2026 By Jonny Steiner In Coralogix

Kubernetes dashboards often mask a systemic infrastructure failure. When a critical Java service fluctuates and restarts, the post-mortem often confirms an Out-of-Memory (OOM) event. While CPU metrics appear healthy, memory has silently hit a ceiling, forcing the kernel to terminate the process.

Read Post

Coralogix

Read more about Stop Guessing Why Your Pods Are Crashing

Scout Monitoring Now Supports Node.js: Express, NestJS, Prisma, and More

Jun 9, 2026 By Aspen Clevenger In Scout

We have been getting the same request from teams for a while now: “We use Scout for our Rails app. Can we get the same thing for our Node services?” Today the answer is yes. Scout Monitoring now supports Node.js. If your team runs Express or NestJS in production, you get the same errors-and-traces experience that Ruby, Python, PHP, and Elixir teams have had. Let’s walk through what that means in practice.

Read Post

Scout

Read more about Scout Monitoring Now Supports Node.js: Express, NestJS, Prisma, and More

Deleting Em-Dashes : REALITY BYTES AI WORKPLACE SPECIAL ft. HR Leader Gabi Tofani

Jun 9, 2026 By Nexthink In Nexthink

In this Reality Bytes special, Tom and Oriana welcome Nexthink’s Head of Global Talent Success, Gabi Tofani, to explore how AI is reshaping workplace culture, learning, leadership, and employee experience. From measuring AI adoption and building curiosity-driven cultures to the risks of “AI slop,” homogenized thinking, and performance reviews written by bots (for bots), the conversation examines what organizations might gain (and lose) as AI becomes embedded in daily work.

View Video

Nexthink

Read more about Deleting Em-Dashes : REALITY BYTES AI WORKPLACE SPECIAL ft. HR Leader Gabi Tofani

Best MSP Software in 2026: How to Choose the Right Platform

Jun 9, 2026 By LogicMonitor In LogicMonitor

MSPs already have plenty of tools. The harder problem is getting a clear read on what’s happening across each customer environment, which alerts point to the same issue, and where engineers should start. Choosing the right MSP software is really about choosing the right operating layer for service delivery. MSPs are supporting more customers, more environments, and more alerts, but adding another tool doesn’t always make the work easier.

Read Post

LogicMonitor

Read more about Best MSP Software in 2026: How to Choose the Right Platform

Automatically discover and remediate root causes with Grafana Assistant Investigations

Jun 9, 2026 By Maurice Rochau In Grafana

You can use Grafana Assistant Investigations to automatically discover incidents and help find root causes—and this AI-powered Grafana Cloud feature recently got a major upgrade to give you even more confidence in its findings. You can read more about the behind-the-scenes effort in our new engineering blog Unprompted, where we get into harness engineering, context compaction, benchmarking, and keeping agents alive and working well in long-running sessions.

Read Post

Grafana

Read more about Automatically discover and remediate root causes with Grafana Assistant Investigations

What is DNS TTL and How to Choose the Right Value

Jun 9, 2026 By DNS Spy In DNS Spy

DNS TTL is one of those settings nobody thinks about until it bites them. Then they think about it a lot. This guide explains what DNS TTL is, how it works in plain language, and how to pick the right value for your records. By the end you will know what to set, when to change it, and why it matters when you migrate to a new server.

Read Post

DNS Spy

Read more about What is DNS TTL and How to Choose the Right Value

The AI Bottleneck: Why Your Modern Models Are Choking on Legacy and Streaming Data Architecture

Jun 9, 2026 By Jennifer Knutel In meshIQ

Enterprise AI struggles not from inadequate models, but from fragmented data architecture. Critical business data remains trapped in legacy systems or lost in streaming complexity. Success requires bridging the gap between modern intelligence layers and underlying systems of record.

Read Post

meshIQ

Read more about The AI Bottleneck: Why Your Modern Models Are Choking on Legacy and Streaming Data Architecture

Color-coded log monitoring for simplified log analysis

Jun 8, 2026 By Kaviya Radhakrishnan In ManageEngine

Modern production environments generate massive volumes of logs every day. As systems become more distributed and cloud-native, that volume only increases. The real challenge isn’t collecting logs—it’s identifying what matters fast enough to act using effective log visualization. Most log views fail at this point. Every entry looks the same, forcing engineers to scan them manually and interpret lines under pressure.

Read Post

ManageEngine

Read more about Color-coded log monitoring for simplified log analysis

What is SRE Observability and Key Pillars You Should Know?

Jun 8, 2026 By Arpit Sharma In Motadata

What happens when a critical service slows down, but nothing is technically “broken”? Most teams have monitoring in place. They know when something goes down. But when performance drops or issues spread across services, finding the real cause becomes slow and unclear. Engineering teams end up switching between dashboards, logs, and alerts just to understand what changed. This delays response and increases pressure on on-call teams. This is where SRE observability becomes essential.

Read Post

Motadata

Read more about What is SRE Observability and Key Pillars You Should Know?

It Can Only Goodhart Happen

Jun 8, 2026 By Austin Parker In Honeycomb

When a measure becomes a target, it ceases to be a good measure. Charles Goodhart, 1975 You’ve probably read this quote in relation to any number of things over the years. People complaining about arbitrary metrics like PRs merged, lines of code produced, and now, token usage. But is the era of tokenmaxxing over before it even began? The rise of token leaderboards to the death of token leaderboards at companies like Amazon seem to have taken place in less than three months!

Read Post

Honeycomb

Read more about It Can Only Goodhart Happen

Search and act across Datadog to resolve issues faster with Bits Chat

Jun 8, 2026 By Nicole Parisi In Datadog

Finding the right information across dashboards, monitors, and telemetry sources takes time, even for experienced engineers. When something breaks, it often means figuring out where to start, rebuilding queries, and jumping between metrics, logs, and traces before you can take action. The challenge isn’t a lack of data but the effort required to surface the right information at the right moment.

Read Post

Datadog

Read more about Search and act across Datadog to resolve issues faster with Bits Chat

Top 10 Prompts for Your Monitoring Tool

Jun 8, 2026 By Dejan Lukić In AppSignal

You open a monitoring tool, and the data is all there: errors, traces, anomalies, incidents, and countless intricacies. If you want to get the right slice of that data, you need to know exactly which dashboard to open and what filters to apply. But when the poor UI gets in the way, this can take longer than it should. Luckily, this is not the case with AppSignal. MCP (Model Context Protocol) changes the interface entirely.

Read Post

AppSignal

Read more about Top 10 Prompts for Your Monitoring Tool

Three Years a Leader. Thank You.

Jun 8, 2026 By Pedro Bados In Nexthink

Dear Nexthink community, We are excited to be named a Leader in the 2026 Gartner Magic Quadrant for Digital Employee Experience Tools for the third year in a row. I want to share this recognition with our customers, our partners and ecosystem, and every Nexthinker across the world. As a founder, it’s a true honor to work alongside so many talented people. To us, this recognition is also yours.

Read Post

Nexthink

Read more about Three Years a Leader. Thank You.

Works on my machine: how we use AI to reproduce reported bugs

Jun 8, 2026 By Neel Shah In Sentry

Sentry’s SDK teams maintain and support SDKs for a vast ecosystem of languages and frameworks. See our release registry for a source of truth. We’re currently at 159 published packages across the entire ecosystem. If you use it, we probably support it. All of these SDKs are open source and have their own GitHub repositories that we maintain on a daily basis. And like any other open source project, we get tons of bug reports and issues on these.

Read Post

Sentry

Read more about Works on my machine: how we use AI to reproduce reported bugs

Running the OpenTelemetry Collector as a Lambda

Jun 8, 2026 By Jessica Kerr (Jessitron) In Honeycomb

The OpenTelemetry Collector is usually deployed as a long-running process: a sidecar, a DaemonSet, an EC2 instance, a docker container on my computer. It sits there listening for telemetry. That's fine when I want to send telemetry all day, but not when telemetry is rare. Like right now, when I have an agent defined on AgentCore, and it runs a few times a week maybe. Or my website that hardly sees any traffic. Can I run the OpenTelemetry Collector as a Lambda function?

Read Post

Honeycomb

Read more about Running the OpenTelemetry Collector as a Lambda

New: Introducing the StatusGator Chrome extension

Jun 7, 2026 By Valeria Kurolapova In StatusGator

We’re excited to announce the launch of the StatusGator Chrome extension, a new way to check the status of websites and online services directly from your browser. Whether you’re troubleshooting an issue, wondering if a website is down, or looking for more information about an ongoing incident, the extension gives you instant access to service status information with a single click. Simply install the extension and start checking the status of websites and services as you browse.

Read Post

StatusGator

Read more about New: Introducing the StatusGator Chrome extension

Introducing the StatusGator browser extension for Chrome and Firefox

Jun 7, 2026 By Valeria Kurolapova In StatusGator

We’re excited to announce the launch of the StatusGator browser extension, now available for both Chrome and Firefox. Whether you’re troubleshooting an issue, wondering if a website is down, or looking for more information about an ongoing incident, the extension gives you instant access to service status information with a single click. Simply install the extension and start checking the status of websites and services as you browse.

Read Post

StatusGator

Read more about Introducing the StatusGator browser extension for Chrome and Firefox

API update: Full board management now available

Jun 5, 2026 By Andy Libby In StatusGator

We’re excited to announce expanded functionality for the StatusGator Boards API. You can now create new boards, update existing boards, and delete boards directly through the API. Previously, the Boards API only supported listing boards and retrieving board details. With these new capabilities, you can automate the complete board lifecycle – from provisioning new boards to managing ownership and cleaning up boards that are no longer needed.

Read Post

StatusGator

Read more about API update: Full board management now available

Add custom metadata to your monitors

Jun 5, 2026 By Valeria Kurolapova In StatusGator

We’re excited to introduce monitor metadata, a new feature available in the General tab of monitor settings. You can now add custom key/value metadata to monitors, making it easier to organize resources and add operational context to alerts and integrations.

Read Post

StatusGator

Read more about Add custom metadata to your monitors

Why Engineers Don't Trust Autonomous AI - 4th Annual Observability Survey | Grafana Labs

Jun 5, 2026 By Grafana In Grafana

The 2026 Observability Survey from Grafana Labs heard from over 1,300 engineers and leaders across 76 countries on the real-world role of AI in observability. The data reveals a sharp distinction between intelligence and autonomy — and a critical blind spot most teams have.

View Video

Grafana

Read more about Why Engineers Don't Trust Autonomous AI - 4th Annual Observability Survey | Grafana Labs

Asimov's Zeroth Law of Robotics: testing and observing AI (ExpoQA 2026)

Jun 5, 2026 By Grafana In Grafana

Asimov's Three Laws of Robotics are missing one — and when it comes to testing and observing AI, Nicole van der Hoeven argues that missing rule changes everything: before a robot can avoid harm, obey orders, or protect itself, there has to be a Zeroth Law: a robot must be observable. Because if you can't see what a system is doing, you have no way of knowing whether it's following any rule at all.

View Video

Grafana

Read more about Asimov's Zeroth Law of Robotics: testing and observing AI (ExpoQA 2026)

Give your AI agents live Datadog access from the command line

Jun 5, 2026 By Cody Lee In Datadog

AI agents are becoming a standard part of how engineers write, deploy, and troubleshoot software. Getting observability data into those workflows, securely and without manual intervention, remains the harder problem.

Read Post

Datadog

Read more about Give your AI agents live Datadog access from the command line

Network Device Monitoring: Topology Maps and NetFlow

Jun 5, 2026 By Netdata In netdata

Most teams run one tool for SNMP polling, another for topology, and a third for flow analysis, then spend their time stitching the views together. This webinar shows how Netdata brings all three into a single dashboard, with 100+ vendor profiles out of the box, automatic Layer 2 topology mapping, and a flow collector that auto-detects NetFlow, IPFIX, and sFlow on a single port.

View Video

netdata

Read more about Network Device Monitoring: Topology Maps and NetFlow

Turning Disconnected Alerts into Actionable Insights

Jun 5, 2026 By Dallon Robinette In Selector

The previous post in this series focused on shared context and why hybrid operations depend on a connected view across cloud, network, and infrastructure. Once that context is in place, the operational benefits become easier to see—especially during incident response, where signal volume and fragmented tooling can slow teams down. Alert noise remains one of the most persistent challenges in hybrid environments. Every layer of the stack can generate its own warnings, anomalies, and service events.

Read Post

Selector

Read more about Turning Disconnected Alerts into Actionable Insights

Errors, traces, logs, metrics: when to reach for what

Jun 5, 2026 By Sergiy Dybskiy In Sentry

When should I reach for a log, a trace, or a metric? I hit that question constantly when I instrument code, and I watch coding agents hit it too. It sounds like it should be obvious. Errors, traces, logs, and metrics are the four kinds of telemetry most apps run on, four tools in one box, and they overlap enough that the honest answer is every developer’s favourite: it depends. You can stuff context into span attributes instead of logging it. You can count log events instead of emitting a metric.

Read Post

Sentry

Read more about Errors, traces, logs, metrics: when to reach for what

Progress Wins at the Network Computing Awards

Jun 5, 2026 By Libby Bagley In WhatsUp Gold

Progress has been named a winner at this year's Network Computing Awards, earning industry recognition for its ongoing commitment to innovation and delivering real-world value to customers. A standout event in the UK technology calendar, the Network Computing Awards celebrate organizations and solutions that are driving measurable impact across the industry.

Read Post

WhatsUp Gold

Read more about Progress Wins at the Network Computing Awards

11 Incident Management Best Practices Every IT Team Should Follow

Jun 5, 2026 By Jagdish Sajnani In Motadata

A well-defined incident management process can mean the difference between a minor disruption and a major business outage. When critical services fail, every minute of downtime matters. Yet many IT teams still face challenges such as unclear ownership, poor prioritization, communication gaps, alert fatigue, and manual processes that delay resolution. The result is longer outages, missed SLAs, and frustrated users.

Read Post

Motadata

Read more about 11 Incident Management Best Practices Every IT Team Should Follow

Zero Friction, Zero Tickets, Zero Disruption: The New Operational Mandate for IT

Jun 5, 2026 By Shawn Lazarus In Nexthink

For decades, IT operations have followed a familiar model. Specialized teams manage different parts of the environment, from infrastructure and networks to security and endpoint management. When employees encounter issues, they submit tickets to the service desk, which are then triaged, escalated, and resolved. This structure has endured because it provided a reliable way to maintain system health and respond to problems as they arise.

Read Post

Nexthink

Read more about Zero Friction, Zero Tickets, Zero Disruption: The New Operational Mandate for IT

How Digital Experience Monitoring Protects Your Paid Social ROI

Jun 5, 2026 By OpsMatters In OpsMatters

The Australian digital advertising market is experiencing an unprecedented era of growth. Recent industry data shows that internet advertising investments have reached a staggering record of $18.4 billion. Furthermore, over 77 percent of Australians are now regular social media users, spending nearly two hours every single day on various digital platforms. In response to this captive audience, marketing teams spend immense amounts of time and budget crafting the perfect creative, targeting precise demographics, and optimising their ad bids across platforms like Meta, LinkedIn, and TikTok.

Read Post

OpsMatters

Read more about How Digital Experience Monitoring Protects Your Paid Social ROI

Keeping Critical Systems Online Across Dynamic Operational Locations

Jun 5, 2026 By OpsMatters In OpsMatters

Keeping critical systems online has always been a technical challenge, but the scale of that challenge shifts considerably when operations span multiple physical locations, none of which are fixed. Field sites, temporary installations, marine vessels, mobile command units, and dispersed industrial assets all place unique demands on the infrastructure designed to keep them running. In these environments, avoiding downtime and maintaining business continuity is not simply a matter of patching software or monitoring a server room.

Read Post

OpsMatters

Read more about Keeping Critical Systems Online Across Dynamic Operational Locations

Shopify outage affects stores, admin panels, and APIs on June 3, 2026

Jun 4, 2026 By Colin Bartlett In StatusGator

On June 3, 2026, Shopify experienced a widespread service disruption that affected merchants and customers across multiple regions. Users reported storefront failures, admin dashboard issues, API connectivity problems, and authentication errors that disrupted ecommerce operations for several hours. While the outage did not affect every Shopify customer, reports quickly began arriving from around the world, indicating a significant platform issue.

Read Post

StatusGator

Read more about Shopify outage affects stores, admin panels, and APIs on June 3, 2026

Sponsored Post

How APM fits into the modern observability stack

Jun 4, 2026 By Kirubanandan Rammohan In ManageEngine

Most engineering teams don't have a data problem. They have an interpretation problem. Prometheus is running, logs are shipping to the aggregator, dashboards are green-and then a latency spike hits and the root cause takes 45 minutes to isolate. The data was there but the answer wasn't. That gap is where application performance monitoring (APM) operates. This article explores what APM adds to a modern observability stack, why relying on standalone tools leaves critical blind spots, and how teams can unify infrastructure data with application context for a complete operational picture.

Read Post

ManageEngine

Read more about How APM fits into the modern observability stack

Supercharging the channel: SolarWinds announces new updates to Partner Programme

Jun 4, 2026 By SolarWinds In SolarWinds

Programme upgrades, scheduled for Summer 2026, will focus on more benefits, increased enablement, and an improved partner experience.

Read Post

SolarWinds

Read more about Supercharging the channel: SolarWinds announces new updates to Partner Programme

Escaping the Diderot effect: How to avoid tech-driven spending

Jun 4, 2026 By Priyanka Gs In ManageEngine

Top Tips is a weekly column where we explore emerging trends in technology and share practical ways to stay ahead. This week, we're looking at how technology can nudge us into unnecessary spending—and how to avoid it. Have you ever bought one thing and then felt the need to buy several more to match it? If so, you've experienced what is known as the Diderot Effect. The term comes from the life of Denis Diderot, a famous French philosopher who spent much of his wealth in a matter of months.

Read Post

ManageEngine

Read more about Escaping the Diderot effect: How to avoid tech-driven spending

Why Observability Is Essential for Platform Engineers?

Jun 4, 2026 By Mohana Ayeswariya J In Atatus

Observability is how platform teams stop being the answer to every question and start building platforms that answer those questions themselves. This article explains specifically how observability enables platform engineers to support development teams better which reducing ticket volume, cutting MTTR, enabling SLO ownership, and making microservice debugging something devs can do without escalating to you.

Read Post

Atatus

Read more about Why Observability Is Essential for Platform Engineers?

Anomaly Detection and Forecasting That Learns From Every Write in InfluxDB

Jun 4, 2026 By Cole Bowden In InfluxData

For many operational time series workloads, machine learning can’t operate in the historical way, where data is compiled once and models are trained offline. Sensor readings, infrastructure metrics, application telemetry, energy data, industrial measurements, and financial ticks all share a basic property: the next datapoint is more useful when the system can respond to it immediately (or at least close to immediately).

Read Post

InfluxData

Read more about Anomaly Detection and Forecasting That Learns From Every Write in InfluxDB

Autonomous IT Is Here. Are You Prepared?

Jun 4, 2026 By Sean Malvey In Nexthink

Enterprise IT was built for a more predictable workplace, where support began when an employee reported a problem and IT worked backward from the details they could provide. That model made sense when devices, applications, and ways of working were easier to control. Today, the digital workplace moves too quickly for IT to rely on reported issues alone. By the time a ticket appears, employees may have already lost time, worked around the problem, abandoned the tool, or turned to an unmanaged alternative.

Read Post

Nexthink

Read more about Autonomous IT Is Here. Are You Prepared?

Rocky reads your stack, not just your check

Jun 4, 2026 By Stefan Judis In Checkly

You get an infrastructure alert. Something's off, so you task your agent to investigate, and it ends up grepping through the logs. You both wonder if there's customer impact. There’s a simple rule: AI agents in the incident loop are only as good as the signal you feed them. And logs alone aren't a great signal.

Read Post

Checkly

Read more about Rocky reads your stack, not just your check

Overview Of Dashboard

Jun 4, 2026 By Uptime Website Monitoring In uptime

Learn how to create and customize dashboards on Uptime.com, including setting up the General and Display Settings tabs, configuring check cards, filtering by state, setting sort orders, displaying metrics, managing alerts, and creating multiple dashboards.

View Video

uptime

Read more about Overview Of Dashboard

What is Cloud Infrastructure? Everything You Need to Know

Jun 4, 2026 By Motadata Team In Motadata

Modern businesses need infrastructure that can scale as quickly as their demands change. Yet many organizations still struggle with infrastructure that is costly to maintain, difficult to expand, and slow to adapt to new requirements. As applications, users, and data continue to grow, managing resources efficiently becomes increasingly challenging. Cloud infrastructure provides a more flexible approach.

Read Post

Motadata

Read more about What is Cloud Infrastructure? Everything You Need to Know

Claude Code Observability at Scale: How We Did It With Bindplane

Jun 4, 2026 By Chelsea Wright &Adnan Rahic In ObservIQ

At Bindplane, we iterate fast. One of the most important tools we've adopted across our organization is Claude Code. It helps every team here build solutions to complex problems with both speed and precision. But speed without visibility is a liability. We needed a reliable way to monitor and audit how Claude Code was being used across our team. Luckily, we build the best platform on the market for data in motion.

Read Post

ObservIQ

Read more about Claude Code Observability at Scale: How We Did It With Bindplane

Automating Device and OS Compliance in Air-Gapped Networks with Agentic AI

Jun 4, 2026 By Mehul Patel In Broadcom

For network operations and security teams, maintaining compliance across device hardware and operating systems is a complex and time-consuming task. At any given moment, your network contains thousands of devices from dozens of different vendors. To keep this infrastructure secure, you must constantly know which devices are approaching end-of-life (EOL) milestones, and which platforms are vulnerable to active common vulnerabilities and exposures (CVEs).

Read Post

Broadcom

Read more about Automating Device and OS Compliance in Air-Gapped Networks with Agentic AI

Internet Performance Monitoring: Understand Digital Experience from the User's Perspective

Jun 4, 2026 By LogicMonitor In LogicMonitor

Internet Performance Monitoring (IPM) provides end-to-end visibility into what happens between your infrastructure and your users, across networks and services you don’t own or control. The internet is your network now. Your apps live in the cloud, your users are everywhere, and the systems that deliver your applications and services to them use hundreds of providers, ISPs, and networks beyond your control. In practice, that means infrastructure monitoring is the foundation.

Read Post

LogicMonitor

Read more about Internet Performance Monitoring: Understand Digital Experience from the User's Perspective

Introducing Bits Agent Builder: Build agentic workflows for alert response and remediation

Jun 4, 2026 By Amber Tunnell In Datadog

Building automated workflows that adapt to real-world complexity can be a challenge. As systems scale and scenarios multiply, teams often end up hardcoding endless logic branches just to handle every potential outcome. That’s why we’re introducing Bits Agent Builder, a powerful new tool that lets you create custom AI agents that are fully hosted by Datadog.

Read Post

Datadog

Read more about Introducing Bits Agent Builder: Build agentic workflows for alert response and remediation

How to Build a Cost-Effective Log Retention Strategy

Jun 4, 2026 By Jeff Darrington In Graylog

Nearly every home has that drawer or doom corner where you store all those items that you don’t need every day but that you still want to keep for those “just in case moments.” If you’re a document connoisseur, you may have financial documents that go back years because an accountant once warned you that an IRS audit would require seven years of back documentation. In short, you have a lot of documents that you may or may not need taking up a lot of room in your home.

Read Post

Graylog

Read more about How to Build a Cost-Effective Log Retention Strategy

How to debug REST Collector APIs with Cribl REST Collector Diagnostics

Jun 4, 2026 By Cribl In Cribl

This video introduces the new REST Collector Diagnostics feature in Cribl, which helps you troubleshoot API collection issues faster. It’s designed for observability and data engineers who use REST Collector to pull data from external APIs and need deeper visibility into HTTP requests, responses, and errors.

View Video

Cribl

Read more about How to debug REST Collector APIs with Cribl REST Collector Diagnostics

Office Hours with David Girvin

Jun 4, 2026 By Sumo Logic, Inc. In Sumo Logic

Weekly office hours with David Girvin. Check out recent feature releases and updates, watch a quick live demo, and ask any questions with live Q&A.

View Video

Sumo Logic

Read more about Office Hours with David Girvin

Speed with Confidence: Managing Delivery Risk in an AI-driven Development World

Jun 4, 2026 By Eric Nash In Broadcom

In the modern development landscape, we are seeing a shift in how work is managed. The rise of AI-assisted development and highly distributed teams means that work is moving faster than ever before. However, this increased velocity often comes with a hidden tax: complexity. We are seeing more parallel work streams, more intricate dependencies, and a constant stream of shifting priorities. In this environment, simply moving fast is not enough to guarantee success.

Read Post

Broadcom

Read more about Speed with Confidence: Managing Delivery Risk in an AI-driven Development World

How Kentik Catches Bad AI Answers

Jun 4, 2026 By Kentik In Kentik

Hear how Kentik uses AI to validate AI and catch 90%+ of bad networking answers before users ever see them.

View Video

Kentik

Read more about How Kentik Catches Bad AI Answers

AI Observability Deep Dive Demo | Grafana Cloud

Jun 4, 2026 By Grafana In Grafana

Grafana AI Observability is our new database and platform for observing AI Agents. Over the past year at Grafana Labs, we built Agents and we needed a way to understand how they are performing, what are the costs associated with them, what's the error rate or time to the first token as well as how they are behaving. Grafana Staff Engineer, Ivana Hučková provides a deep dive demo on how Grafana AI Observability connects our experience building Agents with our experience building observability systems.

View Video

Grafana

Read more about AI Observability Deep Dive Demo | Grafana Cloud

Grafana Assistant Context Offloading

Jun 4, 2026 By Grafana In Grafana

Context Offloading is a pipeline solution for managing Observability with AI Agents. If you are building AI Agents that work with real data, the context window can very easily get filled with bloated context that the Agent does not really need. Sven demonstrates "Context Offloading", a solution that stores the JSON result and sends only the summary of the JSON blob, making the LLM loop performance much quicker and keeping your context window small.

View Video

Grafana

Read more about Grafana Assistant Context Offloading

CLI and MCP Basics for AI Agents (The Context Window #4)

Jun 4, 2026 By Grafana In Grafana

MCP vs CLI: which one should your AI agent actually use? We get into it with Grafana's cloud and OSS MCP servers and gcx.

View Video

Grafana

Read more about CLI and MCP Basics for AI Agents (The Context Window #4)

Observability for Healthcare Systems | Grafana Everywhere

Jun 4, 2026 By Grafana In Grafana

Grafana Assistant is going places you might not expect — including healthcare. Golden Grot winner Oren Lion from TeleTracking reveals how Grafana Cloud supports their systems that help keep patient care moving — and how Assistant enables teams to get from “what happened?” to “here’s why” faster. From moon landings to patient care, Grafana is everywhere. Congratulations to Oren, Chris Johnson, Mark Munson, and the entire TeleTracking team on winning this year's Golden Grot Award for Pioneering AI in Observability!

View Video

Grafana

Read more about Observability for Healthcare Systems | Grafana Everywhere

Upgrading to ActiveMQ 5.19.7 or 6.2.6

Jun 4, 2026 By meshIQ In meshIQ

The latest Apache ActiveMQ releases – 5.19.7 and 6.2.6, both from May 27 – are good releases to apply. They close known dependency CVEs and tighten the broker’s default posture. (We covered the full list of changes in our release overview.) But here’s the catch with any “secure-by-default” update: hardening defaults means turning things off.

Read Post

meshIQ

Read more about Upgrading to ActiveMQ 5.19.7 or 6.2.6

Getting Started with NinjaOne dashboards

Jun 4, 2026 By Blog In Squared Up

If you manage endpoints for a living, you'll know the problem isn't a lack of data. It's that there's too much of it, scattered across too many places. A modern IT team or MSP might be looking after thousands of devices spread across dozens of customer organizations, each generating a constant stream of alerts, patch results, antivirus events and disk warnings. NinjaOne does a great job of collecting all of that.

Read Post

Squared Up

Read more about Getting Started with NinjaOne dashboards

The Silent Killer of IBM MQ: How One Leaky App Can Crash Your Entire Estate

Jun 4, 2026 By meshIQ In meshIQ

A single leaky application can crash your entire IBM MQ estate by consuming OS resources through unclosed connections. Traditional monitoring misses these silent killers. Learn how proactive observability detects OPPROCS anomalies before they trigger infrastructure failures.

Read Post

meshIQ

Read more about The Silent Killer of IBM MQ: How One Leaky App Can Crash Your Entire Estate

Apache ActiveMQ 5.19.7 and 6.2.6

Jun 4, 2026 By meshIQ In meshIQ

On May 27, the Apache ActiveMQ project shipped two releases on the same day: 5.19.7 and 6.2.6. Look at the changelogs side by side and the story is clear — this isn’t a feature drop. It’s a coordinated security-hardening pass applied to both maintained branches of ActiveMQ Classic at once, with the same fixes deliberately backported so that no supported line is left behind.

Read Post

meshIQ

Read more about Apache ActiveMQ 5.19.7 and 6.2.6

Sponsored Post

Increase customer retention & stop leaving money in the shopping cart

Jun 3, 2026 By Sumitra Manga In Raygun

We all know the pain and frustration associated with broken software. It's no secret that the internet is rife with broken links, slow pages, and broken shopping carts, often feeling like it's being held together with glue and duct tape. These issues aren't just causing frustration for customers; it costs businesses millions. According to the Consortium for Information and Software Quality, poor software quality cost US companies $2.08 trillion in 2020. Every interaction between a customer and your technology is an opportunity to build or destroy trust.

Read Post

Raygun

Read more about Increase customer retention & stop leaving money in the shopping cart

How to generate real-world load tests using Grafana Cloud k6 and production telemetry

Jun 3, 2026 By Matt Wimpelberg In Grafana

For many development teams, a load test starts with a set of assumptions. You pick 100 virtual users because it sounds reasonable. You ramp for 30 seconds because that's what the tutorial showed. You set a 500ms threshold because it feels like a good target. The test passes, you ship the release, and production falls over at 6 p.m. on a Tuesday because your synthetic load never resembled how real users interact with your application.

Read Post

Grafana

Read more about How to generate real-world load tests using Grafana Cloud k6 and production telemetry

Autonomous Error Remediation in Cursor with Lightrun MCP

Jun 3, 2026 By Lightrun In Lightrun

Lightrun's Gidi Freud demonstrates how your AI coding agent can now investigate and fix production errors, autonomously. Watch how Cursor, guided by Lightrun's Error Remediation skill, picks up a Sentry error, instruments the live service with a runtime snapshot, captures real evidence, and opens a validated PR for approval.

View Video

Lightrun

Read more about Autonomous Error Remediation in Cursor with Lightrun MCP

Best Log Management Software for DevOps and SRE Teams in 2026: Feature and Cost Breakdown

Jun 3, 2026 By Libi Michelson In logz.io

TL;DR Picking the right log management platform in 2026 comes down to three things: how much operational overhead you can absorb, how much AI automation you need, and what you’re willing to spend.

Read Post

logz.io

Read more about Best Log Management Software for DevOps and SRE Teams in 2026: Feature and Cost Breakdown

OnlineOrNot updates from May 2026

Jun 3, 2026 By Max Rozen In OnlineOrNot

Last month I focused on adding Telegram alerts to OnlineOrNot, making scripted browser checks generally available, and making it clearer to understand why checks fail.

Read Post

OnlineOrNot

Read more about OnlineOrNot updates from May 2026

Cribl Search Pack for Zscaler: Setup & security dashboard walkthrough

Jun 3, 2026 By Cribl In Cribl

Learn how to install and configure the Cribl Search Pack for Zscaler, then walk through prebuilt dashboards for your Zscaler security logs. This video is for security engineers, Zscaler administrators, and SOC/observability teams using Cribl Search to monitor and investigate Zscaler activity. In this walkthrough, you’ll see: If you need a reminder or want to share feedback on the pack, you can always refer to the README bundled with the pack or reach out to the Cribl team.

View Video

Cribl

Read more about Cribl Search Pack for Zscaler: Setup & security dashboard walkthrough

Your AI App Is Lying to You - Here's How to Fix That #devops #observability #programming

Jun 3, 2026 By SigNoz - Open Source Observability Platform In SigNoz

You shipped your AI app. But do you have all the answers? Do you actually know which model ran, how many tokens it consumed, or why it stopped? This is what LLM observability gives you, and most AI engineers are skipping it entirely. I built an SOS detection app and used OpenTelemetry to get full visibility into every single call. Token usage, model version, finish reason, and cost per call all in one place, standardised across any provider. Check out the OpenTelemetry GenAI docs in the link below; there is a lot more you can track than you think.

View Video

SigNoz

Read more about Your AI App Is Lying to You - Here's How to Fix That #devops #observability #programming

The Hidden Cost of Network Blind Spots (and How to Fix It)

Jun 3, 2026 By Libby Bagley In WhatsUp Gold

Even the smallest gaps in infrastructure visibility can lead to major impacts to an enterprise. And with modern IT environments becoming more complex it creates rising expectations for uptime. Our recent webinar, The Hidden Cost of Network Blind Spots and Alert Noise, covered this exact topic. The Progress WhatsUp Gold product experts explored why traditional monitoring falls short and best practices to moving toward smarter, more proactive network management.

Read Post

WhatsUp Gold

Read more about The Hidden Cost of Network Blind Spots (and How to Fix It)

Best APM for Small Teams Without Dedicated DevOps in 2026

Jun 3, 2026 By Sarah Morgan In Scout

You don’t have an SRE. There’s no platform team. Your “monitoring strategy” is someone checking Slack for error alerts. When production breaks, the same two or three senior devs drop everything to debug. Sound familiar? Most APM tools are built for organizations with dedicated operations staff. They assume someone has time to configure dashboards, tune alert thresholds, and learn a complex query language. That person does not exist on your team.

Read Post

Scout

Read more about Best APM for Small Teams Without Dedicated DevOps in 2026

What Enterprise AI Gets Wrong About Usage

Jun 3, 2026 By Dallon Robinette In Selector

AI is moving out of the experimental phase and into the everyday rhythm of work. Teams are no longer using it occasionally for novelty or quick wins, but instead are exploring more robust use cases to investigate issues, answer questions faster, surface context, and help them move through complex workflows with more confidence. That’s the shift that most organizations’ leadership teams have been asking for.

Read Post

Selector

Read more about What Enterprise AI Gets Wrong About Usage

DevEx Talks ep 5 - Accessibility in open source and beyond

Jun 3, 2026 By VictoriaMetrics In VictoriaMetrics

In this episode of DevEx Talks, Mike Gifford shares his perspective on accessibility in today’s tech landscape, drawing from his extensive experience in the field. Together, we explore how well accessibility is currently defined, whether the industry is truly meeting the needs of professionals who rely on it, and what gaps still exist. We also discuss the growing importance of accessibility within Developer Relations and where the biggest opportunities lie to create more inclusive tools, communities, and workflows.

View Video

VictoriaMetrics

Monitoring

Read more about DevEx Talks ep 5 - Accessibility in open source and beyond

Best Error Monitoring for Rails in 2026

Jun 3, 2026 By Sarah Morgan In Scout

You deploy on Friday. Sidekiq starts failing on a job that worked fine in staging. Your error tool shows you a NoMethodError on line 47. But it doesn’t tell you that the job only fails when processing records created after the migration you ran on Thursday. The stack trace is correct and completely useless at the same time. This is the core problem with general-purpose error monitoring on Rails apps. Rails teams deal with N+1 queries that cascade into timeout errors.

Read Post

Scout

Read more about Best Error Monitoring for Rails in 2026

DNS Spy Now Has an MCP Server. Ask Your AI About Any Domain.

Jun 3, 2026 By DNS Spy In DNS Spy

DNS monitoring should be simple. You want to know if something changed. You want to know if a record propagated. You want to know if a phishing site just went live with your brand name in the domain. But in practice it takes work. You log in to a dashboard. You click through menus. You run a check, copy the output, paste it somewhere else. You repeat that process every time someone on the team asks a question. AI assistants like Claude and ChatGPT could help.

Read Post

DNS Spy

Read more about DNS Spy Now Has an MCP Server. Ask Your AI About Any Domain.

IBM Think 2026 Infrastructure Insights for IT Leaders

Jun 2, 2026 By Kristy Slimmer In Galileo

IBM Think 2026 made one thing clear: infrastructure leaders are being asked to support more AI, more automation, and faster decision-making without adding unnecessary complexity or risk. Held earlier this month in Boston, IBM Think 2026 focused heavily on enterprise AI, hybrid cloud, automation, governance, and operational transformation.

Read Post

Galileo

Read more about IBM Think 2026 Infrastructure Insights for IT Leaders

May 2026 product updates

Jun 2, 2026 By Valeria Kurolapova In StatusGator

We’ve been busy shipping new features and enhancements to help you monitor critical services more effectively, investigate incidents faster, and customize your StatusGator experience. This month’s updates include historical outage reports, our new Datadog integration, expanded monitoring coverage in Asia Pacific, improved email branding options, and performance upgrades for monitor metrics. We also crossed a major milestone with more than 8,000 services now monitored by StatusGator.

Read Post

StatusGator

Read more about May 2026 product updates

Service Desk Automation: What It Is and How to Get Started

Jun 2, 2026 By Jagdish Sajnani In Motadata

How much of service desk work is problem solving and how much is repeat work that continues every day? Most service desks follow the same pattern daily. Password resets, access requests, software installs, approvals, and routine fixes keep coming in. These tasks are simple on their own, yet together they take most of the team’s time and push important incidents further down the queue. The main challenge is the constant flow of repeat work that reduces time for focused tasks.

Read Post

Motadata

Read more about Service Desk Automation: What It Is and How to Get Started

How Support Uses Honeycomb to Debug Honeycomb

Jun 2, 2026 By Sara Cave In Honeycomb

You'd think that working at an observability company means everyone knows exactly where to find everything in the data. It doesn't. Especially not on the support team. We're the ones who get the tickets. We're in the telemetry every day trying to figure out what went wrong for a customer, and we do that by pointing Honeycomb at itself. Here's how that actually works, and how it's changed.

Read Post

Honeycomb

Read more about How Support Uses Honeycomb to Debug Honeycomb

How LivePerson optimized Logstash and Kafka performance on GCP through benchmarking

Jun 2, 2026 By Emily Chioconi In Elastic

By benchmarking five GCP machine types across both Logstash and Kafka, LivePerson's observability team found that infrastructure selection (not just pipeline configuration) is one of the highest-leverage cost optimization decisions at scale.

Read Post

Elastic

Read more about How LivePerson optimized Logstash and Kafka performance on GCP through benchmarking

Auvik Brings Multi-Vendor Network Intelligence to AI Agents in Cisco Cloud Control

Jun 2, 2026 By Momoko Ishida In Auvik

Modern IT infrastructure is messy, spanning multiple vendors, cloud platforms, and on-premises systems, with critical data spread across separate tools. This patchwork makes troubleshooting harder for IT teams and AI agents alike, forcing them to piece together operational context from different domains and interfaces before they can act with a complete understanding of the environment. But what if AI agents could pull operational data from across your diverse IT environment and correlate it for you?

Read Post

Auvik

Read more about Auvik Brings Multi-Vendor Network Intelligence to AI Agents in Cisco Cloud Control

Splunk Observability at Cisco Live: Agentic Observability for the AI Era

Jun 2, 2026 By Cale Hilts In Splunk

Observability has always been about seeing clearly under pressure. But the pressure has changed. Applications are more distributed. Kubernetes environments keep expanding. Digital experiences depend on services, APIs, networks, third-party providers, and now AI models and agents that can make decisions faster than a human team can review every signal.

Read Post

Splunk

Read more about Splunk Observability at Cisco Live: Agentic Observability for the AI Era

You don't need a paid plan to use AI Root Cause Analysis

Jun 2, 2026 By Rollbar In Rollbar

When an error appears in production, the hardest part often isn’t seeing what broke. It’s understanding why. That’s why we built Root Cause Analysis (RCA). It helps connect the dots between an error and its likely cause, so you can spend less time investigating and more time moving forward. Until now, RCA was only available through plans that included AI credits. Starting today, free plan users can purchase an AI credit subscription and use RCA without changing plans.

Read Post

Rollbar

Read more about You don't need a paid plan to use AI Root Cause Analysis

DataPrime at ingest (DPXL): See the impact of any routing decision

Jun 2, 2026 By Micha Duman In Coralogix

TCO policies have always been one of the most impactful cost levers in Coralogix. Route business-critical data to High, push monitoring data to Medium, archive compliance logs to Low. With the addition of DataPrime expressions (DPXL) – a subset of the DataPrime query language designed for inline filtering at ingest – that routing became even more precise, matching on any field in the event payload, not just application, subsystem, and severity.

Read Post

Coralogix

Read more about DataPrime at ingest (DPXL): See the impact of any routing decision

Lightweight Server Monitoring - One Binary, No Stack

Jun 2, 2026 By Lionel Porcheron In Bleemeo

Monitoring a single server should not require running four daemons. Yet the default open-source recipe for “I just want to watch this one box” still looks like this: install node_exporter, stand up a Prometheus server to scrape it, add Grafana to draw the graphs, and bolt on Alertmanager so you actually hear about a full disk. That is a lot of moving parts — and a lot of YAML — for one machine. This post shows a lighter path.

Read Post

Bleemeo

Read more about Lightweight Server Monitoring - One Binary, No Stack

The Observability Journey: Getty Images and Cribl

Jun 2, 2026 By Cribl In Cribl

I recently sat down with Simon Overbey and Lovepreet Singh - the Engineering Manager and systems engineer (respectively) at Getty Images to talk about their experiences implementing Cribl. After getting a rundown of the pre-Cribl environment (described above) I asked to jump straight to the end, the net benefits. If the "before" was a terrifying tidal wave of cost and complexity, what did the "after" look like?

View Video

Cribl

Read more about The Observability Journey: Getty Images and Cribl

Federated Search | From Silos to Insight | Azure Blob Schema Discovery with Splunk's Crawler

Jun 2, 2026 By Splunk In Splunk

View Video

Splunk

Read more about Federated Search | From Silos to Insight | Azure Blob Schema Discovery with Splunk's Crawler

Observability Summit NA 2026: What the Community Is Thinking About

Jun 2, 2026 By Laura Luttmer In ObservIQ

Two days in Minneapolis with the OpenTelemetry community, talking about where telemetry pipelines are headed and what the AI wave is doing to them. Two topics dominated everything: AI and cost reduction. Not as separate conversations, either. The more the community talked about AI telemetry, the more the cost question followed right behind it. I joined Diana Todea from VictoriaMetrics and Antonio Jimenez Martinez from Cisco ThousandEyes on the Telemetry That Matters panel.

Read Post

ObservIQ

Read more about Observability Summit NA 2026: What the Community Is Thinking About

Microsoft DNS management in OpUtils: One console for complete control

Jun 1, 2026 By Aiswarya Giridharan In ManageEngine

For network administrators, managing DNS has traditionally meant juggling zones and records across separate server interfaces, manually tracking changes, and responding to resolution failures after they’ve already caused disruption. We’re excited to introduce Microsoft DNS management in ManageEngine OpUtils, bringing DNS zone and record administration directly into the same console you already use for IP address management (IPAM).

Read Post

ManageEngine

Read more about Microsoft DNS management in OpUtils: One console for complete control

May 2026 Early Warning Signals

Jun 1, 2026 By Colin Bartlett In StatusGator

In May 2026, StatusGator detected 854 Early Warning Signals across SaaS, cloud, developer, and infrastructure services. Of those incidents, 695 were never acknowledged by providers, while 159 were eventually confirmed on official status pages. Throughout the month, StatusGator’s Early Warning Signals continued to surface emerging outages before many providers published updates, giving teams valuable time to investigate and respond.

Read Post

StatusGator

Read more about May 2026 Early Warning Signals

15 DevOps Metrics Every Engineering Team Should Track in 2026

Jun 1, 2026 By Jagdish Sajnani In Motadata

Software moves from code to production more quickly today, but it is still difficult to tell whether delivery is actually improving or just becoming more active. Most teams rely on dashboards filled with metrics like deployments, uptime, failures, and tickets. The numbers are available, but the meaning behind them is often unclear. DevOps metrics become useful only when grouped into clear categories: DORA metrics cover only delivery speed and stability, which is just part of the picture.

Read Post

Motadata

Read more about 15 DevOps Metrics Every Engineering Team Should Track in 2026

Shifting Streams and AI Surges: What Our Data Reveals About the OTT Landscape

Jun 1, 2026 By Doug Madory In Kentik

OTT data from early 2026 shows streaming hierarchies holding steady while AI platforms reshuffled rapidly. Claude has substantially increased traffic since January, overtaking Gemini, and is on pace to challenge ChatGPT by fall. Doug Madory digs into the data in this new analysis.

Read Post

Kentik

Read more about Shifting Streams and AI Surges: What Our Data Reveals About the OTT Landscape

Inside the Grafana AI Team Weekly: AI Observability for the OTel demo and LLMSpec (May 12, 2026)

Jun 1, 2026 By Grafana In Grafana

This is an excerpt from a real AI team weekly meeting where we talk about the stuff we build and occasionally also demo them! In this one, Principal Software Engineer Sven Großmann demos how he integrated AI Observability into the OTel demo, complete with the guards feature he introduced last week, and Principal Software Engineer Yas Ekinci gives a rare glimpse of LLMSpec, the internal counterpart of the o11ybench benchmark that we use to evaluate Assistant.

View Video

Grafana

Read more about Inside the Grafana AI Team Weekly: AI Observability for the OTel demo and LLMSpec (May 12, 2026)

What's New in Tempo 3.0

Jun 1, 2026 By Grafana In Grafana

Tempo 3.0 introduces a major architectural shift that decouples the read and write paths, with Kafka handling durability on the write side and a new live store serving recent traces on the read side. Blocks are now written at a replication factor of one instead of three, significantly reducing storage overhead. This release also brings TraceQL metrics to general availability, adds comparison operators for filtering metric results at query time, and introduces a new Tempo CLI redact command for removing sensitive trace data on demand without waiting for retention to expire.

View Video

Grafana

Read more about What's New in Tempo 3.0

VictoriaMetrics May 2026 Ecosystem Updates

Jun 1, 2026 By Pablo Fernandez In VictoriaMetrics

The VictoriaMetrics Observability Stack included two releases for VictoriaMetrics and a new LTS release for the VictoriaMetrics Operator in May. We’ve also published two detailed articles in the last few weeks: This release roundup covers updates for.

Read Post

VictoriaMetrics

Read more about VictoriaMetrics May 2026 Ecosystem Updates

A deep dive into AWS data perimeter misconfigurations

Jun 1, 2026 By Mallory Mooney In Datadog

In AWS environments, a data perimeter is a set of preventative controls that help ensure that your trusted cloud identities (principals or AWS services acting on your behalf) are accessing trusted resources from authorized networks. You can apply these controls at various levels of your infrastructure, such as per resource or across all resources in your AWS account.

Read Post

Datadog

Read more about A deep dive into AWS data perimeter misconfigurations

How we cut Spark compute costs by 44% with agentic AI and Datadog Jobs Monitoring

Jun 1, 2026 By Charles Yu In Datadog

Spark jobs only get more expensive and harder to debug as they scale. It’s a problem we’ve run into ourselves. Our Referential Data Platform team builds and maintains the knowledge graph that maps relationships between customers’ observability entities. ServiceQueryEdge is at the center of that graph, mapping service entities to their associated metric and log queries.

Read Post

Datadog

Read more about How we cut Spark compute costs by 44% with agentic AI and Datadog Jobs Monitoring

Migrate to Azure Managed Redis with Datadog and Eden

Jun 1, 2026 By Michael Cronk In Datadog

Azure Managed Redis is a Microsoft first-party, fully managed in-memory data store, replacing Azure Cache for Redis tiers. It includes Redis Enterprise features such as RediSearch for vector search and full-text search, in addition to RedisJSON, RedisTimeSeries, and Active Geo-Replication. As Azure Cache for Redis reaches end of life, more teams are planning migrations to Azure Managed Redis in search of better performance, lower cost, and modern capabilities for AI and real-time workloads.

Read Post

Datadog

Read more about Migrate to Azure Managed Redis with Datadog and Eden

Tempo 3.0 release: a new architecture for scale and lower TCO, TraceQL metrics GA, and more

Jun 1, 2026 By Tiffany Jernigan In Grafana

Tempo started with a simple goal: make distributed tracing easier to run at scale. As tracing adoption has grown, however, so have the challenges, including higher data volumes, more complex architectures, and increasing demand for real-time insights directly from traces. Over the last year, we’ve been evolving Tempo’s architecture to meet that moment. And today, we’re sharing the results of those efforts with the release of Tempo 3.0.

Read Post

Grafana

Read more about Tempo 3.0 release: a new architecture for scale and lower TCO, TraceQL metrics GA, and more

What a Forrester TEI study on Edwin AI actually tells IT leaders-and how to use it

Jun 1, 2026 By Margo Poda In LogicMonitor

This blog helps IT leaders use the Forrester Consulting TEI study as a practical framework for evaluating Edwin AI in their own environments. A Total Economic Impact study is useful for one, critical reason: it takes a broad technology claim and turns it into a financial and operational framework. That matters in AI for IT operations because the market is crowded with claims. Every platform says it reduces noise. Every platform says it improves efficiency. Every platform says it helps teams move faster.

Read Post

LogicMonitor

Read more about What a Forrester TEI study on Edwin AI actually tells IT leaders-and how to use it

Why Spark Felt Different From Every Other AI Support Tool

Jun 1, 2026 By Nexthink In Nexthink

Brandon Woods, Keysight Technologies, on the Difference Between a DEX-Powered IT Agent and a Generic AI Assistant “Spark’s real power is the Nexthink data behind it.

Read Post

Nexthink

Read more about Why Spark Felt Different From Every Other AI Support Tool

Operations | Monitoring | ITSM | DevOps | Cloud