Monthly Archive

Complete Guide to Redis Monitoring: Essential Metrics, Tools & Best Practices 2025

Jul 31, 2025 By Ankit Anand In SigNoz

Redis is a powerful tool, but its position in the critical path of applications means that performance issues can have a widespread impact. Whether you use Redis as a cache, session store, or primary database, effective monitoring is essential to prevent slowdowns and ensure a responsive user experience. This guide provides a comprehensive walkthrough of Redis monitoring, covering the essential metrics you need to track, the tools available to you, and the best practices to adopt in 2025.

Read Post

SigNoz

Read more about Complete Guide to Redis Monitoring: Essential Metrics, Tools & Best Practices 2025

Cloudflare's DNS Downtime: Why BGP Hijacks Were Never to Blame

Jul 31, 2025 By Doug Madory In Kentik

On July 14, Cloudflare’s popular public DNS service (known as 1.1.1.1) suffered an outage lasting over two hours. As rumors swirled about the cause, we were the first to push back on the theory that a BGP hijack had caused the outage. In fact, the hijack was actually a consequence. How did we know this so early when other internet watchers did not? We’ll discuss in this post.

Read Post

Kentik

Read more about Cloudflare's DNS Downtime: Why BGP Hijacks Were Never to Blame

This Month in Datadog - July 2025

Jul 31, 2025 By Datadog In Datadog

In July’s episode of This Month in Datadog, we’re doing things differently by spotlighting the people behind the products you rely on. Jeremy is joined by Tristan Ratchford to discuss saving time and effort when you’re on call with Bits AI SRE, and by Kevin Hu to explore gaining visibility into datasets across the entire data lifecycle with Data Observability.

Read Post

Datadog

Read more about This Month in Datadog - July 2025

Streamlining the Complexity of SD-WAN Deployments With DX NetOps Topology

Jul 31, 2025 By Sandeep Tiwary In Broadcom

If you're feeling like your network operations just keep getting more complicated, you're not wrong. One of the core promises of cloud models was improved simplicity. However, the ensuing reality for your network operations teams has been anything but simple. Suddenly, users and applications are everywhere. Traditional, on-premises equipment now coexists with software-defined wide area networks (SD-WANs), cloud-hosted resources, and hybrid connections that hop across public and private networks.

Read Post

Broadcom

Read more about Streamlining the Complexity of SD-WAN Deployments With DX NetOps Topology

With AI, You're Gonna Have to Manage Your (Massive) Energy Use in SPM

Jul 31, 2025 By Jason Kotlinski In Broadcom

Forget boring spreadsheets. Strategic portfolio management (SPM) isn't just about ticking boxes. It’s the big boss plan that makes sure every penny spent and every project your company starts points towards the main goal. It's your company's smart GPS, guiding you through the AI energy maze. When it comes to AI's power hunger, SPM is a knight in shining armor. It helps leaders get smart, making sure they grab all the fancy tech without trashing the world.

Read Post

Broadcom

Read more about With AI, You're Gonna Have to Manage Your (Massive) Energy Use in SPM

Smarter debugging with Sentry MCP and Cursor

Jul 31, 2025 By Cody De Arkland In Sentry

Debugging a production issue with Cursor? Your workflow probably looks like this: Alt-Tab to Sentry, copy error details, switch back to your IDE, paste into Cursor. By the time you’ve context-switched three times, you’ve lost your flow and you’re looking at generic suggestions that don’t show any understanding of your actual production environment or codebase.

Read Post

Sentry

Read more about Smarter debugging with Sentry MCP and Cursor

Preparing for Infoblox NetMRI End-of-Life: Why Restorepoint is the Ideal Replacement

Jul 31, 2025 By ScienceLogic In ScienceLogic

When a trusted tool like NetMRI reaches its sunset date, it opens the door to modern alternatives that offer more automation, broader integration, and a lower total cost of ownership. You’ve invested time, training, and trust into this solution, and while it may feel like the rug is being pulled out, this is an opportunity to improve how your organization handles network configuration and change management.

Read Post

ScienceLogic

Read more about Preparing for Infoblox NetMRI End-of-Life: Why Restorepoint is the Ideal Replacement

OpenTelemetry Distributed Tracing Implementation Guide

Jul 31, 2025 By Logit.io Team In Logit.io

Distributed tracing has become essential for understanding the performance and behavior of modern microservices architectures. As applications become more complex with multiple services communicating across different environments, traditional logging and metrics alone are insufficient for debugging performance issues and understanding request flows.

Read Post

Logit.io

Read more about OpenTelemetry Distributed Tracing Implementation Guide

Site24x7 partners with BigPanda agentic IT operations platform to further streamline IT operations

Jul 31, 2025 By Ramkumar Ramaswamy In Site24x7

In modern IT management, downtime, performance issues, and alert overload cripple teams, delay resolutions, and frustrate users—a problem solvable with automation and deep integrations that create smoother flow across systems.

Read Post

Site24x7

Read more about Site24x7 partners with BigPanda agentic IT operations platform to further streamline IT operations

Semantic Caching: What We Measured, Why It Matters

Jul 31, 2025 By Rahul Raj In Catchpoint

Semantic caching promises to make AI systems faster and cheaper by reducing duplicate calls to large language models (LLMs). But what happens when it doesn’t work as expected? We built a test environment to find out. Through a caching system, we evaluated how semantically similar queries would behave. When the cache worked, response times were fast. When it didn’t, things got expensive. In fact, a single semantic cache miss increased latency by more than 2.5x.

Read Post

Catchpoint

Read more about Semantic Caching: What We Measured, Why It Matters

Out-of-the-box Alerting for Frontend Observability in Grafana Cloud

Jul 31, 2025 By Grafana In Grafana

Get alerted on frontend issues the moment they happen — no setup headaches required. In this short demo, Elliot Kirk from Grafana Labs introduces out-of-the-box alerting for frontend observability. Whether you're tracking error counts or web vitals, this new feature makes it easy to stay ahead of performance issues. With just a few clicks, you can: Enable prebuilt alerts for your apps Visualize and edit alerts directly in the UI Customize thresholds and durations Set up notifications and stay in the loop Launch alerting with every new app setup.

View Video

Grafana

Read more about Out-of-the-box Alerting for Frontend Observability in Grafana Cloud

k8s-monitoring-helm Chart Office Hours (July 2025)

Jul 31, 2025 By Grafana In Grafana

In the July edition of the Kubernetes Monitoring Helm chart office hours, we discuss the version 3.2 release as well as the plan for upcoming features. Finally, we end with a packed Q&A full of great questions.

View Video

Grafana

Read more about k8s-monitoring-helm Chart Office Hours (July 2025)

Kentik Cause Analysis in 60 Seconds

Jul 31, 2025 By Kentik In Kentik

In a world where network traffic can suddenly spike, manually sifting through flow data is often a daunting task. Kentik AI's new Cause Analysis simplifies troubleshooting by quickly identifying changes in traffic by application, IP, ASN, or service. With just a few clicks, Cause Analysis helps you compare time periods, understand traffic shifts, and detect changes in your network. Kentik: Take the hard work out of running your network.

View Video

Kentik

Read more about Kentik Cause Analysis in 60 Seconds

SentinelOne outage: July 10 incident went unacknowledged

Jul 31, 2025 By Colin Bartlett In StatusGator

July 10, 2025, SentinelOne, a leading cybersecurity platform, experienced a widespread outage that disrupted access to its admin consoles across multiple regions. The incident impacted users in Europe, North America, and beyond, preventing security teams from accessing critical management features. Despite the scale of the disruption, no official public acknowledgment or status update was issued by SentinelOne.

Read Post

StatusGator

Read more about SentinelOne outage: July 10 incident went unacknowledged

Google Workspace outage: July 18, 2025

Jul 31, 2025 By Colin Bartlett In StatusGator

Google Workspace went down again in July 2025—but if you had asked AI tools like Google’s own AI Overviews, ChatGPT, or Claude, you would have been told everything was fine. Every one of these tools incorrectly claimed that services were up and running while users across the globe were unable to connect, send messages, or even log in.

Read Post

StatusGator

Read more about Google Workspace outage: July 18, 2025

Catchpoint News Catchup episode 5

Jul 31, 2025 By Catchpoint In Catchpoint

Join Ankit, Payal, and Leon as they explore recent articles about: the impact of unoptimized images on both your customers and carbon footprint; How generative AI can be confident but not correct; and the pros, cons, challenges, and benefits of disconnecting from our technology once in a while.

View Video

Catchpoint

Read more about Catchpoint News Catchup episode 5

Vector Databases Explained: What they are & Why they Matter [Quick Question Ep. 2]

Jul 31, 2025 By Elastic In Elastic

Ever wondered what a vector database is and why it’s becoming so important in AI search? In this quick video, I’ll break down what a vector database is, how it works, and what you should consider when choosing one. About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

View Video

Elastic

Read more about Vector Databases Explained: What they are & Why they Matter [Quick Question Ep. 2]

Confessions of a CTO: How we Tamed our Cloud Costs

Jul 31, 2025 By Ledion Bitincka In Cribl

If you’ve ever found yourself staring at a cloud bill that could buy a small island or at least a very nice car, you're not alone. Believe me, at Cribl, we've had our share of those "molotov cocktail" bills that make our CFO, Zach, look like he's about to spontaneously combust. And yeah, a few F-bombs might have dropped from various senior leaders (myself included, I won't lie).

Read Post

Cribl

Read more about Confessions of a CTO: How we Tamed our Cloud Costs

Building a bulletproof network disaster recovery plan

Jul 30, 2025 By akash.mj@zohocorp.com In ManageEngine

Imagine it’s 2am. A core switch fries because of a sudden power surge. Most of your users wake up to a blank screen. Your team scrambles: Where’s the backup configuration? Who knows the last working state? Hours pass, productivity tanks, support calls flood in, and costs stack up by the minute. This isn’t a theoretical horror story. According to Gartner, the average cost of network downtime still hovers around $5,600 per minute, or over $300,000 per hour.

Read Post

ManageEngine

Read more about Building a bulletproof network disaster recovery plan

How to use predictive AI for enterprise ITOps

Jul 30, 2025 By Sharon Abraham Ratna In ManageEngine

Learn how predictive AI is transforming enterprise ITOps from reactive to proactive by using machine learning to detect issues, forecast incidents, and automate responses.

Read Post

ManageEngine

Read more about How to use predictive AI for enterprise ITOps

From Anomaly to Action: ScienceLogic's Role in Accelerating Zero Trust Response

Jul 30, 2025 By ScienceLogic In ScienceLogic

In today’s threat landscape, cyber incidents unfold in seconds, not days. Federal agencies and critical infrastructure operators no longer have the luxury of slow detection or manual triage. As Zero Trust Architecture (ZTA) becomes the new security standard, one principle stands above all: time is risk. The faster an organization can detect, diagnose, and respond to anomalous activity, the greater its resilience. ScienceLogic plays a critical role in making that speed possible.

Read Post

ScienceLogic

Read more about From Anomaly to Action: ScienceLogic's Role in Accelerating Zero Trust Response

The Network Impact on Job Completion Time in AI Model Training

Jul 30, 2025 By Phil Gervasi In Kentik

In large-scale AI model training, network performance is no longer a supporting actor — it’s center stage. Job Completion Time (JCT), the key metric for measuring training efficiency, is heavily influenced by the network interconnecting thousands of GPUs. In this post, learn why JCT matters, how microbursts and GPU synchronization delays inflate it, and how platforms like Kentik give network engineers the visibility and intelligence they need to keep training jobs on schedule.

Read Post

Kentik

Read more about The Network Impact on Job Completion Time in AI Model Training

Datadog Disaster Recovery mitigates cloud provider outages

Jul 30, 2025 By Michael Richey In Datadog

A loss in infrastructure and applications observability can leave SRE and DevOps teams without insight into the real-time state of their production systems, causing them to temporarily pause code deployments and limit their ability to troubleshoot issues or respond to critical alerts. In modern cloud environments, where services are distributed and deeply interconnected, this lack of visibility can escalate quickly.

Read Post

Datadog

Read more about Datadog Disaster Recovery mitigates cloud provider outages

Bring high-performance observability to secure Kubernetes environments with Datadog's new CSI driver

Jul 30, 2025 By Adel Haj Hassan In Datadog

In Kubernetes environments, applications often communicate with the Datadog Agent to send telemetry data such as custom metrics via DogStatsD or traces through Datadog APM. How this communication takes place depends on the communication mode set on the Datadog Cluster Agent's Admission Controller. With the sockets option, communication takes place through local inter-process communication via Unix domain sockets (UDS), whereas the service and default hostip options rely on network communication.

Read Post

Datadog

Read more about Bring high-performance observability to secure Kubernetes environments with Datadog's new CSI driver

Azure native integration elevates Elastic Cloud Serverless experience

Jul 30, 2025 By Piyush Dash, In Elastic

We're thrilled to announce a significant leap forward in making Elastic Cloud Serverless even more accessible and powerful for Azure users. With the general availability (GA) of Elastic Cloud Serverless on Azure, we've just released the Azure native integration for Elastic Cloud Serverless. This builds upon our existing Azure native integration for Elastic Cloud Hosted, allowing users to seamlessly discover and manage Elastic Cloud in a way that feels inherently part of the Azure ecosystem.

Read Post

Elastic

Read more about Azure native integration elevates Elastic Cloud Serverless experience

Building an Incident Response Playbook: Templates and Examples

Jul 30, 2025 By Nuno Tomas In isDown

An incident response playbook is your team's emergency manual when things go wrong. It's a documented set of procedures that guides your team through detecting, responding to, and resolving incidents efficiently. Without one, teams often scramble during outages, make inconsistent decisions, and take longer to restore service.

Read Post

isDown

Read more about Building an Incident Response Playbook: Templates and Examples

What's New in InfluxDB 3.3: Managed Plugins, Explorer Updates, and More

Jul 30, 2025 By Peter Barnett In InfluxData

InfluxDB 3.3 is now available for both Core and Enterprise, which introduces new managed plugins for the Processing Engine, making it easier to address common time series tasks with just a plugin. On top of that, 3.3 includes a wide range of performance improvements, feature updates, and bug fixes. InfluxDB 3 Core is free and open source, optimized for recent data, and licensed under MIT and Apache 2.

Read Post

InfluxData

Read more about What's New in InfluxDB 3.3: Managed Plugins, Explorer Updates, and More

Is Your "Single Pane of Glass" Leaving You Blind to the Real Problem?

Jul 30, 2025 By Yann Guernion In Broadcom

In the push to simplify IT management, the idea of a single, all-encompassing AIOps platform is certainly appealing. The promise of one dashboard to monitor the entire IT stack—from applications and infrastructure to the network—suggests a world of streamlined operations. This generalist approach aims to provide a broad overview, correlating data from across the business to spot trends and potential issues.

Read Post

Broadcom

Read more about Is Your "Single Pane of Glass" Leaving You Blind to the Real Problem?

Endpoint Monitoring with Icinga

Jul 30, 2025 By Alvar Penning In Icinga

Monitoring with Icinga primarily focuses on servers and infrastructure. But there are also the people operating these systems from their workstations and laptops. If a server can be accessed from a machine with an outdated operating system, the patch level of the server becomes irrelevant.

Read Post

Icinga

Read more about Endpoint Monitoring with Icinga

Introducing new issue detectors: Spot latency, overfetching, and unsafe queries early

Jul 30, 2025 By Sasha Blumenfeld In Sentry

Not everything in production is on fire. Sometimes it’s just... a little warm. A page that loads a second too slow. An API that returns way more than anyone asked for. A query that feels totally fine until someone sends something unexpected and suddenly you’ve got an incident.

Read Post

Sentry

Read more about Introducing new issue detectors: Spot latency, overfetching, and unsafe queries early

5 Notable Examples of Network Maps and Diagrams

Jul 30, 2025 By Greg Collins In WhatsUp Gold

A network map is a visual representation of the devices and connections that make up an IT network. For IT professionals, network maps are essential tools for monitoring performance, troubleshooting issues, enhancing security and planning infrastructure upgrades. There are multiple types of network maps, each serving a specific purpose, ranging from physical layout diagrams to cloud-based and security-oriented architectures.

Read Post

WhatsUp Gold

Read more about 5 Notable Examples of Network Maps and Diagrams

RUM Versions: one click deployment tracking

Jul 30, 2025 By Ofri Grushka In Coralogix

Deployments should drive your product forward, not slow you down. Yet too often, teams spend hours digging through logs, dashboards, and error reports just to answer a simple question: did the release go smoothly? Coralogix’s new Versions feature answers this in a single click, letting teams spend more time building and less time investigating.

Read Post

Coralogix

Read more about RUM Versions: one click deployment tracking

Grafana Learning Journeys: Transform data in a Grafana Cloud dashboard

Jul 30, 2025 By Grafana In Grafana

In this Grafana Learning Journey supplementary video, Developer Advocate Marie Cruz shows how to use common data transformations for your Grafana Cloud dashboard.

View Video

Grafana

Read more about Grafana Learning Journeys: Transform data in a Grafana Cloud dashboard

Integrating CI/CD Pipelines with Observability Tools

Jul 30, 2025 By Alexandr Bandurchin In Uptrace

CI/CD pipelines are automated workflows that take code from development to production. The CI/CD pipeline meaning encompasses two key practices: A typical CI/CD pipeline includes stages like code compilation, testing, security scanning, artifact creation, and deployment across multiple environments.

Read Post

Uptrace

Read more about Integrating CI/CD Pipelines with Observability Tools

Why Observability Isn't Just for SREs (and How Devs Can Get Started)

Jul 30, 2025 By Elizabeth Mathew In SigNoz

Almost every other day, when I scroll past r/devops or r/sre, I see a post like this asking how a dev can get started with devops, observability, etc. Sample Reddit thread on how to get started with OTel This blog is an attempt for anyone lost to find their way into observability and a wake-up call for devs to they should think about observability more actively today than ever before. A dev’s observability playbook.

Read Post

SigNoz

Read more about Why Observability Isn't Just for SREs (and How Devs Can Get Started)

AIOps Tools: Key Features and Top 8 Solutions in 2025

Jul 30, 2025 By Dallon Robinette In Selector

AIOps tools use machine learning, big data, and automation to enhance IT operations. These tools analyze IT data, detect anomalies, and automate tasks, improving efficiency and reducing manual effort. Popular AIOps tools include Selector, Splunk, Dynatrace, Datadog, BigPanda, Dell AIOps, IBM Cloud Pak for AIOps, and LogicMonitor.

Read Post

Selector

Read more about AIOps Tools: Key Features and Top 8 Solutions in 2025

This Month in Datadog: Bits AI SRE, Datadog Data Observability, and more

Jul 30, 2025 By Datadog In Datadog

Datadog is constantly elevating the approach to cloud monitoring and security. This Month in Datadog updates you on our newest product features, announcements, resources, and events. To learn more about Datadog and start a free 14-day trial, visit Cloud Monitoring as a Service | Datadog. This month, we chat with two guests about Bits AI SRE and Datadog Data Observability.

View Video

Datadog

Read more about This Month in Datadog: Bits AI SRE, Datadog Data Observability, and more

Netdata Overview: All You Need to Know in Under 3 Minutes

Jul 30, 2025 By Netdata In netdata

In just a few minutes, this walkthrough will show you how to unlock the full power of Netdata during your trial period. From real-time metrics to AI-powered insights, learn how to get immediate value without any guesswork. Whether you're running a Homelab or managing production systems at scale, this video will help you hit the ground running and make every minute of your trial count. Let’s turn your trial into insight, clarity, and control.

View Video

netdata

Read more about Netdata Overview: All You Need to Know in Under 3 Minutes

Sponsored Post

AIOps for SAP: From Ground to Cloud

Jul 29, 2025 By Avantra Team In Avantra

Anyone working in the SAP market in 2025 is aware of two big topics: migration to cloud-based ERP and the end of many long-used tools for managing SAP operations including Focused Run, Landscape Manager and Solution Manager. Both are impossible to ignore. Cloud-based ERP presents a new era of business software possibilities, and with it the opportunities and complexities of migration, transformation, and leveraging the elastic capacity and scalability of cloud-based designs. But right behind it, the question becomes "how are we going to run and manage this?"

Read Post

Avantra

Read more about AIOps for SAP: From Ground to Cloud

IT teams overconfident in resilience as outages still consume a quarter of their time

Jul 29, 2025 By SolarWinds In SolarWinds

New SolarWinds report suggests IT leaders underestimate the impact of broken processes and limited staff.

Read Post

SolarWinds

Read more about IT teams overconfident in resilience as outages still consume a quarter of their time

AI gone too far: Is it time to hit the brakes?

Jul 29, 2025 By Alsherin In ManageEngine

Without even realizing it, AI has become a part of our lives—more than we could ever imagine. Things have gotten a lot easier lately. A LOT. Which sounds like a good thing, right? Well… maybe not.

Read Post

ManageEngine

Read more about AI gone too far: Is it time to hit the brakes?

Smarter Insights and Pipeline Control - New in DataStream

Jul 29, 2025 By VirtualMetric In VirtualMetric

We’re constantly improving DataStream to make security data management simpler, smarter, and more efficient for modern SOCs. This latest update introduces new capabilities that bring even more visibility and flexibility to your telemetry pipelines. Let’s take a closer look at what’s new.

Read Post

VirtualMetric

Read more about Smarter Insights and Pipeline Control - New in DataStream

Real-Time Flight Telemetry Monitoring with InfluxDB 3 Enterprise

Jul 29, 2025 By Heather Downing In InfluxData

When Microsoft Flight Simulator 2024 generates telemetry data at 30-60 FPS, capturing and processing that stream in real-time becomes a fascinating engineering challenge. We built a complete telemetry pipeline that reads over 90 flight parameters through FSUIPC, streams them to InfluxDB 3 Enterprise, and displays them in real-time dashboards that respond in under 5 milliseconds.

Read Post

InfluxData

Read more about Real-Time Flight Telemetry Monitoring with InfluxDB 3 Enterprise

Diagnosing Wi-Fi failures that traditional tools miss: a case study

Jul 29, 2025 By Payal Chakraborty In Catchpoint

A global airline experienced persistent Google Meet connectivity issues with no apparent network infrastructure faults. While their APM tool offered visibility into network paths, it didn’t surface any local anomalies. Catchpoint’s endpoint monitoring, however, revealed performance degradation specifically on Wi-Fi Channel 44 (5GHz band), where signal strength dropped to -80 dBm compared to optimal ranges of -30 to -50 dBm.

Read Post

Catchpoint

Read more about Diagnosing Wi-Fi failures that traditional tools miss: a case study

How we're killing YAML fatigue with our new K8s integration process

Jul 29, 2025 By Chris Cooney In Coralogix

Kubernetes has rapidly grown in adoption, with more than 84% of surveyed users evaluating or actively using Kubernetes in some way. It has become the go-to container orchestration deployment. As we grow the Coralogix platform, we continuously go back and improve flows that we believe will have a high impact on our user base.

Read Post

Coralogix

Read more about How we're killing YAML fatigue with our new K8s integration process

Splunk Expands Data Management Capabilities To Include Ingest Monitoring

Jul 29, 2025 By Varun Gupta In Splunk

Managing data ingestion at scale is no easy task. As organizations onboard hundreds or even thousands of data sources into the Splunk platform for security, observability, and other business-critical use cases, it becomes increasingly complex to ensure data is consistently available and onboarded efficiently.

Read Post

Splunk

Read more about Splunk Expands Data Management Capabilities To Include Ingest Monitoring

Multi Factor Authentication for Synthetic Monitoring for AVD

Jul 29, 2025 By SatheeshKumar S In eG Innovations

Today, I’ll cover some of the basics of monitoring Multi-Factor Authentication and why ensuring MFA is implemented is essential, particularly in environments where remote access is possible. I’ll cover some recent, specific case studies where a lack of MFA has led to security breaches and the mechanisms the bad actors used.

Read Post

eG Innovations

Read more about Multi Factor Authentication for Synthetic Monitoring for AVD

AI Agents Console: Monitor the behavior and interactions of any AI agent in your stack

Jul 29, 2025 By Datadog In Datadog

With Datadog's AI Agents Console, you can monitor the behavior and interactions of any AI agent that’s a part of your enterprise stack, whether that’s a computer use agent like OpenAI’s Operator, IDE agent like Cursor, DevOps agent like Github Copilot, enterprise business agent like Agentforce, or your internally built agents. You'll have full visibility into every agent's actions, insights into the security and performance of your agents, analytics on user engagement, and measurable business value from every agent, all in a centralized location.

View Video

Datadog

Read more about AI Agents Console: Monitor the behavior and interactions of any AI agent in your stack

New in APM

Jul 29, 2025 By Datadog In Datadog

Datadog’s Latency Investigator for APM—now in Preview—automatically investigates hypotheses in the background, comparing historical traces and correlating change tracking, DBM, and profiling signals. This helps teams quickly isolate root causes and understand impact without combing through raw telemetry data. You can go from detection to resolution in a single workflow, and generate a pull request to apply a recommended fix, all without leaving Datadog..

View Video

Datadog

Read more about New in APM

Evals are just tests, so why aren't engineers writing them?

Jul 29, 2025 By Eli Hooten In Sentry

You’ve shipped an AI feature. Prompts are tuned, models wired up, everything looks solid in local testing. But in production, things fall apart—responses are inconsistent, quality drops, weird edge cases appear out of nowhere. You set up evals to improve quality and consistency. You use Langfuse, Braintrust, Promptfoo—whatever fits. You start running your evals, tracking regressions, fixing issues, and confidence goes up as a result. Things improve.

Read Post

Sentry

Read more about Evals are just tests, so why aren't engineers writing them?

Micro Lesson: Using Search Template Parameters

Jul 29, 2025 By Sumo Logic, Inc. In Sumo Logic

In this video we'll show you how to use search template parameters to add flexibility and convenience to your Sumo Logic log searches and dashboards.

View Video

Sumo Logic

Read more about Micro Lesson: Using Search Template Parameters

Don't fly blind... monitor from your users' perspective.

Jul 29, 2025 By Catchpoint In Catchpoint

Most monitoring strategies focus only on what happens inside their applications... but that’s not what your users experience. From your backend to the cloud, through third-party APIs, DNS, CDNs, ISPs, and finally to the user’s device, every link in the chain matters. Without that visibility, you're flying blind when something breaks in your Internet Stack. Catchpoint’s 3,000+ intelligent agents across 100+ countries deliver true end-to-end visibility, capturing every hop, every variable, and every moment of user impact.

View Video

Catchpoint

Monitoring

Read more about Don't fly blind... monitor from your users' perspective.

Incident IQ integration is here!

Jul 29, 2025 By Colin Bartlett In StatusGator

We’re excited to launch one of our most highly requested integrations: StatusGator now connects directly with Incident IQ. This powerful new integration bridges the gap between real-time service monitoring and your internal support workflow. Now, whenever someone reports an outage on your public StatusGator page, a ticket is automatically created in Incident IQ—ensuring your IT team can respond quickly and efficiently.

Read Post

StatusGator

Read more about Incident IQ integration is here!

From Alert to Answer in Seconds: Accelerating Incident Response in Dynatrace

Jul 29, 2025 By Mezmo In Mezmo

It is 12PM and you just start eating lunch when your phone starts buzzing. A storm of different monitoring and system-level alerts start stacking up on your phone and slack. The incident response "war room" opens and downtime communications are being drafted to customers. Your team is under pressure to find the root cause, but you are immediately hit with roadblocks.

Read Post

Mezmo

Read more about From Alert to Answer in Seconds: Accelerating Incident Response in Dynatrace

Building an Effective Post-Mortem Culture: A Step-by-Step Guide

Jul 29, 2025 By Nuno Tomas In isDown

Post-mortems are the cornerstone of continuous improvement in incident management. When done right, they transform failures into learning opportunities and prevent future outages. Yet many teams struggle to build a culture where post-mortems are valued rather than feared.

Read Post

isDown

Read more about Building an Effective Post-Mortem Culture: A Step-by-Step Guide

What is Grafana Cloud? Fully Managed Observability Built on Open Standards | Grafana Labs

Jul 29, 2025 By Grafana In Grafana

Grafana Cloud helps teams detect, investigate, and resolve incidents faster—thanks to AI, open standards, and seamless integrations with OpenTelemetry, Prometheus, Salesforce, and more. See how it all works in this live demo of a simulated e-commerce outage.

View Video

Grafana

Read more about What is Grafana Cloud? Fully Managed Observability Built on Open Standards | Grafana Labs

Hands-On with Continuous Observability

Jul 29, 2025 By Johan Kraft (PhD) In Percepio

Ask any embedded developer about their worst debugging experience, and chances are you’ll hear stories of unreproducible bugs, late-night watchdog resets, or CI test failures with no trace. Traditional tools often leave us blind at the exact moment we need insight.

Read Post

Percepio

Read more about Hands-On with Continuous Observability

New in OTel: Auto-Instrument Your Apps with the OTel Injector

Jul 29, 2025 By Anjali Udasi In Last9

As distributed systems scale, maintaining manual instrumentation across services quickly becomes unsustainable. The OTel Injector addresses this by automatically attaching OpenTelemetry instrumentation to applications, no code changes needed. This blog covers how the OTel Injector works, how it integrates with Linux environments, and how to set it up for consistent telemetry across your stack.

Read Post

Last9

Read more about New in OTel: Auto-Instrument Your Apps with the OTel Injector

Why Your Loki Metrics Are Disappearing (And How to Fix It)

Jul 29, 2025 By Faiz Shaikh In Last9

Grafana Loki is up and running, log ingestion looks healthy, and dashboards are rendering without issues. But when you query logs from a few weeks ago, the data's missing. This is a recurring problem for many teams using Loki in production: while the system handles short-term log visibility well, it often lacks the retention guarantees developers expect for historical analysis and incident review.

Read Post

Last9

Read more about Why Your Loki Metrics Are Disappearing (And How to Fix It)

Disposable Code Is Here to Stay, but Durable Code Is What Runs the World

Jul 29, 2025 By Charity Majors In Honeycomb

Every day I seem to run into yet another post with someone solemnly opining that “writing code has never been the hardest part of software engineering. And hey, that’s smashing. As an engineer from the ops/infra/SRE side of the house, I feel like I’ve been saying this my whole career. (Is there anything more satisfying than being proven right in public? Not in my book.) So, which is it?

Read Post

Honeycomb

Read more about Disposable Code Is Here to Stay, but Durable Code Is What Runs the World

Data Observability: Build confidence in the data life cycle

Jul 29, 2025 By Datadog In Datadog

Datadog Data Observability provides a complete solution with quality checks (e.g., volume, row changes, freshness), custom SQL-based monitors, anomaly detection, column-level lineage across systems like Snowflake and Tableau, full pipeline visibility, and targeted alerts when data issues arise.

View Video

Datadog

Read more about Data Observability: Build confidence in the data life cycle

Automatically Catch Failed HTTP Requests in your Playwright Tests!

Jul 29, 2025 By Checkly In Checkly

In this video, Stefan (Playwright Ambassador) dives into his favorite topic, Playwright fixtures, and shows how to set up automatic network monitoring in your Playwright end-to-end tests to catch failed HTTP requests (404s, 500s) to maintain high-quality standards.

View Video

Checkly

Read more about Automatically Catch Failed HTTP Requests in your Playwright Tests!

Scaling Online Game Infrastructure for High-Engagement PvM Content

Jul 29, 2025 By OpsMatters In OpsMatters

The explosive popularity of player-versus-monster (PvM) content in online games brings significant backend challenges, particularly as titles scale globally. Instanced boss fights, real-time combat logic, and mass player concurrency demand robust, responsive server infrastructure that can scale both horizontally and vertically - without degrading the player experience.

Read Post

OpsMatters

Read more about Scaling Online Game Infrastructure for High-Engagement PvM Content

How to Measure VoIP Quality & MOS Score (Mean Opinion Score)

Jul 28, 2025 By Alyssa Lamberti In Obkio

Are you tired of constantly dropping calls or struggling to hear your loved ones on the other end of the line? Fear not, because we're here to talk about the one thing that can make or break your VoIP experience: MOS score. No, we're not talking about the fuzzy creature from Star Wars - we're talking about the Mean Opinion Score, the nifty little metric that can help you measure and improve the quality of your VoIP calls.

Read Post

Obkio

Read more about How to Measure VoIP Quality & MOS Score (Mean Opinion Score)

Proven escalation policy framework (w/ templates & checklists)

Jul 28, 2025 By Leo Baecker In Hyperping

I bet every support team lead has had that moment — a critical incident spiraling out of control because nobody knew exactly when or how to escalate it. Been there, done that. But here's the thing — most organizations treat escalation policies as an afterthought, usually cobbling together makeshift procedures only after a major incident has already caused havoc. There's nothing wrong with learning from experience, of course. It's just not the best approach. So what's better?

Read Post

Hyperping

Read more about Proven escalation policy framework (w/ templates & checklists)

Coralogix secures 188 badges in G2 Summer 2025 Reports

Jul 28, 2025 By Coralogix Team In Coralogix

As we cruise through 2025 with momentum from our recent $115M Series E raise, the launch of Olly (our AI agent for observability), and our recognition as a Visionary in Gartner’s Magic Quadrant for Observability Platforms, we’re excited to celebrate another major milestone – earning 188 badges in the G2 Summer 2025 reports! At the heart of every G2 badge we earn is the voice of our customers, and their continued trust is what drives us forward.

Read Post

Coralogix

Read more about Coralogix secures 188 badges in G2 Summer 2025 Reports

Here's how you can monitor your site's SEO performance

Jul 28, 2025 By Laurens Goethals In Oh Dear

SEO is in a weird place right now. About one in five LinkedIn posts in my feed currently claims that SEO is dead, or has been assimilated by LLMs. Do not be remiss, dearest reader, because even an LLM still uses search engines like Google and Bing for web crawling. In other words, SEO still matters, a lot. Additionally, it's never a bad idea to keep tabs your website's SEO performance.

Read Post

Oh Dear

Read more about Here's how you can monitor your site's SEO performance

How MSPs Can Offer DNS Monitoring as an Add-On Service

Jul 28, 2025 By DNS Spy In DNS Spy

Most MSPs don’t advertise DNS monitoring as a service—but they should. Why? Because when DNS goes wrong, your client won’t blame their registrar or email provider. They’ll blame you. And the worst part? You probably didn’t know anything had changed until the problem reached your inbox.

Read Post

DNS Spy

Read more about How MSPs Can Offer DNS Monitoring as an Add-On Service

How to Create a Runbook Template That Actually Gets Used

Jul 28, 2025 By Nuno Tomas In isDown

A runbook template is only valuable if your team actually uses it during incidents. Yet many organizations create elaborate documentation that sits untouched in wikis, gathering digital dust while engineers scramble through incidents without guidance. The difference between a runbook that gets used and one that doesn't comes down to practicality, accessibility, and continuous improvement. Let's explore how to create runbook templates that become essential tools rather than checkbox exercises.

Read Post

isDown

Read more about How to Create a Runbook Template That Actually Gets Used

How Prometheus 3.0 Fixes Resource Attributes for OTel Metrics

Jul 28, 2025 By Anjali Udasi In Last9

When you export OpenTelemetry metrics to Prometheus, resource fields like service.name or deployment.environment don’t show up as metric labels. Prometheus drops them. To use them in queries, you’d have to join with target_info: This makes filtering and grouping more difficult than necessary. Prometheus 3.0 changes that. It supports resource attribute promotion—automatically converting OpenTelemetry resource fields into Prometheus labels.

Read Post

Last9

Read more about How Prometheus 3.0 Fixes Resource Attributes for OTel Metrics

OTel Weaver: Consistent Observability with Semantic Conventions

Jul 28, 2025 By Anjali Udasi In Last9

Deploying a new service shouldn’t break dashboards. But it happens, usually because metric names or labels aren’t consistent across teams. You end up with traces that don’t link, metrics that don’t align, and queries that take hours to debug, not because the system is complex, but because the telemetry is fragmented. OTel Weaver addresses this by enforcing OpenTelemetry semantic conventions at the source.

Read Post

Last9

Read more about OTel Weaver: Consistent Observability with Semantic Conventions

Grafana 12.1 release: automated health checks for your Grafana instance, streamlined views in Grafana Alerting, visualization updates, and more

Jul 28, 2025 By Grafana Labs Team In Grafana

It’s official: Grafana 12.1 is here! The latest release delivers new features that simplify the management of Grafana instances, streamline how you manage alert rules (so you can find the alerts you need, when you need them), and more. Grafana 12.1: Download now! Below are just some of the highlights from the latest Grafana release. If you are looking for more details about all the changes in this release, refer to the changelog or the What’s New documentation.

Read Post

Grafana

Read more about Grafana 12.1 release: automated health checks for your Grafana instance, streamlined views in Grafana Alerting, visualization updates, and more

Top 10 Status Page Examples: What We Like and What's Missing

Jul 28, 2025 By Sara Miteva In Checkly

A great status page does more than show uptime—it builds trust, communicates clearly during incidents, and empowers users to stay informed. Here are 10 standout examples of public status pages, with a quick breakdown of what they do well—and where there’s room for improvement.

Read Post

Checkly

Read more about Top 10 Status Page Examples: What We Like and What's Missing

Unifying Observability: Intelligence, Automation, and Insights in Action

Jul 28, 2025 By ScienceLogic In ScienceLogic

As enterprise IT environments evolve into ever-greater complexity and scale, demands on operations teams are accelerating. In the traditional model, observability tools collect data, engineers manually correlate events, and remediation follows a ticketing trail. However, that approach no longer matches the speed and scale of today’s digital businesses. Even the most storied dashboards can’t address today’s operational needs.

Read Post

ScienceLogic

Read more about Unifying Observability: Intelligence, Automation, and Insights in Action

5 Assumptions CIOs Need to Rethink: Monitoring in the Age of Complexity

Jul 28, 2025 By Catchpoint In Catchpoint

Today’s digital delivery models have fundamentally changed, yet many CIOs are still using monitoring strategies built for a world that no longer exists. With Internet dependencies, external APIs, SaaS platforms, CI/CD pipelines, and microservices dominating modern architectures, performance and reliability now hinge on systems IT teams don’t fully control. Traditional, reactive monitoring tools fail to provide visibility into the end-to-end experience. They alert you after the customer has already felt the pain.

View Video

Catchpoint

Monitoring

Read more about 5 Assumptions CIOs Need to Rethink: Monitoring in the Age of Complexity

Observing Vercel AI SDK with OpenTelemetry + SigNoz

Jul 28, 2025 By Goutham Karthi In SigNoz

LLM-powered apps are growing fast, and frameworks like the Vercel AI SDK make it easy to build them. But with AI comes complexity. Latency issues, unpredictable outputs, and opaque failures can impact user experience. That’s why monitoring is essential. By using OpenTelemetry for standard instrumentation and SigNoz for observability, you can track performance, detect errors, and gain insights into your AI app’s behavior with minimal setup.

Read Post

SigNoz

Read more about Observing Vercel AI SDK with OpenTelemetry + SigNoz

Updated MPLAB X IDE Plugin

Jul 28, 2025 By Percepio In Percepio

We’re happy to announce that our Trace Export Plugin for MPLAB X IDE has been updated to version 2.3.1 and now supports the latest versions of Microchip’s IDE, including MPLAB X v6.20 and v6.25. This plugin enables saving trace files from Percepio’s TraceRecorder library via the MPLAB X IDE debugger, making it easy to open the trace in Percepio Tracealyzer and related tools.

Read Post

Percepio

Read more about Updated MPLAB X IDE Plugin

How I Use GenAI as a Thought Partner, Not a Shortcut

Jul 28, 2025 By Katie Leonard In Honeycomb

You don’t need to be a power user to get powerful results. I’m not training models or prompting GPTs into poetry—I’m just using them to do what great managers already try to do: communicate clearly, prioritize outcomes, and lead with intention. Over the last few quarters, I’ve built a handful of custom GPTs to support my weekly, monthly, and quarterly workflows.

Read Post

Honeycomb

Read more about How I Use GenAI as a Thought Partner, Not a Shortcut

Deploying a WhatsUp Gold 360 Connector for VMware

Jul 28, 2025 By Progress WhatsUp Gold In WhatsUp Gold

WhatsUp Gold 360 provides real-time insights into your internet connectivity to your remote sites through the use of connectors. Watch this video to learn how to create and deploy a WhatsUp Gold 360 connector to a VMware environment.

View Video

WhatsUp Gold

Read more about Deploying a WhatsUp Gold 360 Connector for VMware

How Synthetic Monitoring Can Warm Up Your CDN (and Why It Matters)

Jul 26, 2025 By Dotcom-Monitor In Dotcom-Monitor

In the high-stakes world of web performance, every millisecond counts. A single second of delay can result in a 7% reduction in conversions, while 10% of users will abandon a site for every additional second it takes to load . For organizations operating at global scale, Content Delivery Networks (CDNs) have become indispensable infrastructure for delivering fast, reliable user experiences.

Read Post

Dotcom-Monitor

Read more about How Synthetic Monitoring Can Warm Up Your CDN (and Why It Matters)

13 Best Log Analysis Tools of 2025. Top Paid, Free & Open-Source Log Analyzers Reviewed

Jul 25, 2025 By Rafal Kuć In Sematext

Log analysis and management tools have become essential in troubleshooting. With log analyzers you can extract meaningful data from logs to pinpoint the root cause of any app or system error, and find trends and patterns to help guide your business decisions, investigations, and security. If you’re not already using such a tool, now is the time to start looking for one.

Read Post

Sematext

Read more about 13 Best Log Analysis Tools of 2025. Top Paid, Free & Open-Source Log Analyzers Reviewed

Grafana Campfire - Using the Grafana MCP Server (Grafana Community Call - July 2025)

Jul 25, 2025 By Grafana In Grafana

In this month of the Campfire Community call, we will exploring the Grafana MCP (Model Context Protocol) server - an open-source tool that enables AI assistants to directly interact with your Grafana instance. We will learn some basics such as: Join me (Usman), Matt Ryer, and David Kaltschmidt for this exciting session. Expert guests: Ioanna Armouti, and Luccas Quadros *HELPFUL LINKS* Feel free to use the YouTube live chat feature to start submitting questions, and we will add them to the agenda.

View Video

Grafana

Read more about Grafana Campfire - Using the Grafana MCP Server (Grafana Community Call - July 2025)

SD-WAN, SASE, SSE, and the Coffee Shop Network: From Distraction to AI Superpower

Jul 25, 2025 By Teneo In Teneo

Back in 2018, I wondered (perhaps loudly if SD-WAN was just IT’s hype-of-the-year, destined for the same eye-rolls as signature-based antivirus and GDPR compliance drives. Even then, I knew we couldn’t let messaging fatigue blind us to real technology shifts. Fast-forward to 2025: SD-WAN (Software-Defined Wide Area Network) not only stuck around, but became the springboard to something far bigger – SASE (Secure Access Service Edge).

Read Post

Teneo

Read more about SD-WAN, SASE, SSE, and the Coffee Shop Network: From Distraction to AI Superpower

How AI Agents Reason, Act, and Automate at Scale

Jul 25, 2025 By John Capobianco In Selector

In our previous post, we explored the urgent need for intelligent automation in network automation, specifically how the Model Context Protocol (MCP) enables AI agents to dynamically discover and interact with the necessary tools. But access to tools is only part of the equation. To truly operate autonomously in complex environments, agents need not only connectivity but also intelligence.

Read Post

Selector

Read more about How AI Agents Reason, Act, and Automate at Scale

JSON Flattening: Fix Query Failures and Cut Storage Costs in SigNoz

Jul 25, 2025 By Anushka Karmakar In SigNoz

You have nested JSON in your logs. When you query body.event.type = "user_action", it fails. Here's why and how to fix it.

Read Post

SigNoz

Read more about JSON Flattening: Fix Query Failures and Cut Storage Costs in SigNoz

7 Clear Signs Your Team Needs Centralized Monitoring

Jul 25, 2025 By Nuno Tomas In isDown

Managing multiple systems without centralized monitoring is like trying to watch security footage from 20 different screens simultaneously. You might catch some issues, but you'll inevitably miss critical problems until they explode into major incidents. If your team is struggling with scattered monitoring tools, delayed incident responses, or constant firefighting mode, it's time to evaluate whether you need a centralized monitoring solution. Here are the key warning signs to watch for.

Read Post

isDown

Read more about 7 Clear Signs Your Team Needs Centralized Monitoring

How to Build Resilient Telemetry Pipelines with the OpenTelemetry Collector: High Availability and Gateway Architecture

Jul 25, 2025 By Adnan Rahic In ObservIQ

Let’s bring that back. Today you’ll learn how to configure high availability for the OpenTelemetry Collector so you don’t lose telemetry during node failures, rolling upgrades, or traffic spikes. The guide covers both Docker and Kubernetes samples with hands-on demos of configs. But first, let’s lay some groundwork.

Read Post

ObservIQ

Read more about How to Build Resilient Telemetry Pipelines with the OpenTelemetry Collector: High Availability and Gateway Architecture

Why continuous profiling is the fourth pillar of observability

Jul 25, 2025 By Marcus Hirt In Datadog

Developers have long used profilers to diagnose performance bottlenecks and improve the efficiency of their code. But a modern version of profiling, continuous profiling, is quietly redefining what profiling is and what it can do. By running nonstop in production with very low overhead, continuous profilers give teams always-on visibility into how their code behaves in the real world.

Read Post

Datadog

Read more about Why continuous profiling is the fourth pillar of observability

How Sentry could stop npm from breaking the Internet

Jul 25, 2025 By Lazar Nikolov In Sentry

Caching is great! When it works… When it fails, it puts a big load on your backend, resulting in either a self-inflicted DoS, increased server bills, or both. This article is inspired by a real-world incident that happened to npm back in 2016. In the next part, Ben recounts his personal experience responding to the incident while working at npm.

Read Post

Sentry

Read more about How Sentry could stop npm from breaking the Internet

StatusGator now supports Microsoft Teams Workflows

Jul 25, 2025 By Colin Bartlett In StatusGator

We’ve updated our Microsoft Teams integration to support workflows — Microsoft’s new and recommended approach to incoming webhooks. As Microsoft evolves its platform, it is phasing out the legacy Connectors feature in favor of Workflows. At StatusGator, we’re committed to keeping up with these changes so your integrations remain reliable and future-proof.

Read Post

StatusGator

Read more about StatusGator now supports Microsoft Teams Workflows

Observability Data: Ingestion Pipeline Best Practices

Jul 25, 2025 By Robert Gauthier In Broadcom

Great data is a prerequisite to all things AIOps and observability. Great observability data results in fewer observability gaps, better analysis and insights, and more confidence within teams that rely on the power of modern AIOps and observability technologies. Goals for improved automation, IT efficiencies, intelligent triage and remediation all become more achievable with better data.

Read Post

Broadcom

Read more about Observability Data: Ingestion Pipeline Best Practices

Happy SysAdmin Day from the Auvik crew!

Jul 25, 2025 By Auvik In Auvik

From all of us on the SS Auvik crew, Happy! We tip our tricorne hats to ye who fend off the IT beasts each and every day!

View Video

Auvik

Read more about Happy SysAdmin Day from the Auvik crew!

Nothing beats a Happy SysAdmin Day!

Jul 25, 2025 By Auvik In Auvik

Featuring real tales from Auvik customers!

View Video

Auvik

Read more about Nothing beats a Happy SysAdmin Day!

Tracking planes with Grafana in real time: How to visualize the aircraft overhead with your own dashboard

Jul 25, 2025 By Alex Burnett In Grafana

Ever since I was little, I’ve been fascinated by airplanes. Whether it was the excitement of boarding a flight for a holiday or the wonder of admiring them from the ground, there’s always been something magical about these incredible machines. Fast forward a few years, and now we have the ability to track aircraft in real-time from the palm of our hands using a variety of apps.

Read Post

Grafana

Read more about Tracking planes with Grafana in real time: How to visualize the aircraft overhead with your own dashboard

How sum_over_time Works in Prometheus

Jul 25, 2025 By Faiz Shaikh In Last9

The sum_over_time() function in Prometheus gives you a way to aggregate counter resets, gauge fluctuations, and histogram samples across specific time windows. Instead of seeing point-in-time values, you get the cumulative total of all data points within your chosen range—useful for calculating totals from rate data, tracking accumulated errors, or understanding resource consumption patterns over custom intervals.

Read Post

Last9

Read more about How sum_over_time Works in Prometheus

Getting started with the Grafana plugin

Jul 25, 2025 By Blog In Squared Up

The idea of having a SquaredUp plugin for Grafana might seem a little bit unnecessary at first. They are both dashboarding products, so why would you want to create a dashboard about dashboards? The answer to this conundrum is that the SquaredUp Grafana plugin isn’t quite a matter of taking Grafana dashboards and recreating them on the SquaredUp canvas.

Read Post

Squared Up

Read more about Getting started with the Grafana plugin

How Secure and Healthy Are Your Custom SCOM Management Packs?

Jul 24, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

Thanks for using the NiCE Log File Management Pack. We know it’s a favorite among experts building custom SCOM Management Packs. But here’s a quick question: When was the last time someone checked your custom Management Packs for security vulnerabilities, performance bottlenecks, or health risks?

Read Post

NiCE IT Mgmt

Read more about How Secure and Healthy Are Your Custom SCOM Management Packs?

OpenTelemetry NestJS Implementation Guide: Complete Setup for Production [2025]

Jul 24, 2025 By Ankit Anand In SigNoz

NestJS applications require comprehensive monitoring to ensure optimal performance and rapid issue resolution. As your application grows—spanning multiple services, databases, and external APIs—understanding what's happening under the hood becomes critical. That's where OpenTelemetry comes in. OpenTelemetry provides vendor-agnostic observability for your NestJS applications through distributed tracing, metrics, and logs.

Read Post

SigNoz

Read more about OpenTelemetry NestJS Implementation Guide: Complete Setup for Production [2025]

Use Telegraf Without the Prometheus Complexity

Jul 24, 2025 By Anjali Udasi In Last9

Every system needs observability. You need to know what your CPU, memory, disk, and network are doing, and maybe keep an eye on database query latency or Redis connection counts. But setting that up isn’t always simple. You start with a couple of shell scripts. Then come exporters. Then Prometheus. Before long, you’re managing scrape configs, tuning retention, and watching dashboards fail under load after two days of data.

Read Post

Last9

Read more about Use Telegraf Without the Prometheus Complexity

SMS alerts enabled for Early Warning Signals

Jul 24, 2025 By Colin Bartlett In StatusGator

When service disruptions happen, every second counts. That’s why we’re excited to announce a major update to StatusGator: Early Warning Signals are now available via SMS. Early Warning Signals have already been helping teams stay ahead of outages via email and Slack alerts — and now, with SMS support, you can get real-time notifications directly on your phone, even before incidents are publicly acknowledged.

Read Post

StatusGator

Read more about SMS alerts enabled for Early Warning Signals

Securely query data sources on your Tailscale network using Private Data Source Connect in Grafana Cloud

Jul 24, 2025 By Fabrizia Rossano In Grafana

Balancing security with your observability needs can be a difficult task. We know our users want to leverage platforms like Grafana Cloud to visualize and gain valuable insights into their data, while also keeping their data sources private and secure.

Read Post

Grafana

Read more about Securely query data sources on your Tailscale network using Private Data Source Connect in Grafana Cloud

Advanced Proactive SSL Certificate Monitoring

Jul 24, 2025 By Ramesh Subramaniam In eG Innovations

eG Enterprise version 7.5 introduces advanced capabilities for detailed SSL Certificate Monitoring including monitoring for web servers and apps using SSL. Monitoring SSL certificates is essential to ensure secure connections, prevent service outages, and maintain user trust. Here are a few things you need to monitor and questions you should ask to keep your services and apps running reliably and securely.

Read Post

eG Innovations

Read more about Advanced Proactive SSL Certificate Monitoring

What is Java Performance Monitoring? [A Guide to DevOps Engineers]

Jul 24, 2025 By Mohana Ayeswariya J In Atatus

You rolled out a Java application that worked fine in development. Fast, clean, no errors. However, once it went into production, things began to change. Suddenly, the app feels slow. CPU usage climbs without warning. Some users start getting timeouts. You check the dashboards, but nothing jumps out. You look through the logs, but it's mostly noise. And then the questions start coming in - "Is the JVM the problem?" If you've been in that situation, you're not alone.

Read Post

Atatus

Read more about What is Java Performance Monitoring? [A Guide to DevOps Engineers]

The Benefits of Visibility in Higher Education Networks

Jul 24, 2025 By Filip Cerny In Flowmon

Higher education institutions face unique cybersecurity challenges due to their complex networks, diverse user base and open academic environments. With thousands of students, staff and faculty members accessing resources from various locations and devices, universities must have visibility of what’s happening on their networks and robust and responsive cybersecurity protection to help safeguard them.

Read Post

Flowmon

Read more about The Benefits of Visibility in Higher Education Networks

Throughput Upgrade (With Train Illustrations!)

Jul 24, 2025 By Pēteris Caune In Healthchecks

URLs) receive spiky traffic:. The Healthchecks open-source project includes a fully functional, tested and type-annotated ping handler written in Python. On self-hosted Healthchecks instances, when you send an HTTP request to a ping URL, a Django view collects and validates information from the request, then uses Django ORM to update a Check object in the database and insert a Ping object in the database. This approach is good for tens to low hundreds of requests per second, depending on hardware.

Read Post

Healthchecks

Read more about Throughput Upgrade (With Train Illustrations!)

Zero Trust Starts with Zero Blind Spots

Jul 24, 2025 By ScienceLogic In ScienceLogic

Zero Trust is more than a buzzword in today’s cybersecurity playbook, it’s a strategic imperative. Federal agencies, defense operations, and civilian infrastructure providers are all under mounting pressure to deploy Zero Trust Architecture (ZTA) frameworks that are not only compliant but truly effective. But there’s a problem: Zero Trust can only succeed if it’s built on real-time, actionable insight. That means eliminating blind spots.

Read Post

ScienceLogic

Read more about Zero Trust Starts with Zero Blind Spots

Debugging with Sentry AI using Seer, MCP, and Agent Monitoring

Jul 24, 2025 By Sentry In Sentry

View Video

Sentry

Read more about Debugging with Sentry AI using Seer, MCP, and Agent Monitoring

Debugging Laravel with logs

Jul 24, 2025 By Sentry In Sentry

Sentry now has logs. Collect and aggregate logs in your Laravel apps, in both the backend and the frontend. Errors won't always capture the whole story, let's see how logs can help us identify a real customer issue.

View Video

Sentry

Read more about Debugging Laravel with logs

Taming Your Dynatrace Bill: How to Cut Observability Costs, Not Visibility

Jul 24, 2025 By Mezmo In Mezmo

Dynatrace is a powerhouse for application performance monitoring and business analytics. But for many organizations, its power comes with a significant challenge: as applications scale across complex hybrid environments and diverse tech stacks, the sheer volume and variety of logs, metrics, and traces sent to the platform can explode, leading to staggering and unpredictable costs.

Read Post

Mezmo

Read more about Taming Your Dynatrace Bill: How to Cut Observability Costs, Not Visibility

Monitor apps using Vercel AI SDK with SigNoz and OpenTelemetry

Jul 24, 2025 By SigNoz - Open Source Observability Platform In SigNoz

Monitor apps using Vercel AI SDK with SigNoz and OpenTelemetry. This video talks about how to configure your AI apps to send data to SigNoz using OpenTelemetry.

View Video

SigNoz

Read more about Monitor apps using Vercel AI SDK with SigNoz and OpenTelemetry

Datadog Log Management: Analyze complex data sets

Jul 24, 2025 By Datadog In Datadog

Datadog Sheets provides a spreadsheet-style interface for analyzing your telemetry data — you can perform lookups, build pivot tables, and create calculated columns using familiar spreadsheet functionality. This enables teams to join datasets, aggregate results, and explore trends without writing code.

View Video

Datadog

Read more about Datadog Log Management: Analyze complex data sets

Debug live production issues with the Datadog Cursor extension

Jul 24, 2025 By Datadog In Datadog

The Datadog Cursor Extension uses the Datadog remote MCP Server to give developers access to Datadog tools and observability data directly from within the Cursor IDE. The Cursor Extension enables you to view live variable values that your logpoints capture during execution, and you can use the Cursor Agent to identify the lines of code responsible for the issue at hand. The Datadog Cursor Extension is now available in Preview.

View Video

Datadog

Read more about Debug live production issues with the Datadog Cursor extension

Datadog IDP: Ship software quickly and confidently

Jul 24, 2025 By Datadog In Datadog

Datadog Internal Developer Portal (IDP) helps developers quickly track down shared engineering knowledge, execute common production tasks in self-service manner, and evaluate the production-readiness of new service code.

View Video

Datadog

Read more about Datadog IDP: Ship software quickly and confidently

Introducing Icinga Dependency Views Webinar

Jul 24, 2025 By Icinga In Icinga

This is the recording from our webinar held on the 23rd July 2025. We have Blerim Sheqa (COO) as your host, and Johannes Meyer (Lead Developer) as the project lead of Icinga Dependency Views and presenter for the webinar.

View Video

Icinga

Monitoring

Read more about Introducing Icinga Dependency Views Webinar

Sneak Peek: Private Data Source Connect + Tailscale | Private Preview | Grafana Labs

Jul 24, 2025 By Grafana In Grafana

With a new integration between Private Data Source Connect in Grafana Cloud and Tailscale, you can now securely query private data sources without opening your network or running extra software.

View Video

Grafana

Read more about Sneak Peek: Private Data Source Connect + Tailscale | Private Preview | Grafana Labs

AI-Driven Alert Correlation with EventiQ in Splunk ITSI

Jul 24, 2025 By Splunk In Splunk

In this video, we introduce EventiQ in Splunk ITSI, a powerful AI-driven solution designed to cut through the noise and help you find the root cause of issues faster. We’ll show you how EventiQ automatically analyzes and groups related alerts into actionable episodes, significantly reducing alert volume. We’ll cover how to enable EventiQ for a Notable Event Aggregation Policy and review the resulting episodes that it creates.

View Video

Splunk

Read more about AI-Driven Alert Correlation with EventiQ in Splunk ITSI

Seeing the Bigger Picture: Why Security Needs Depth, Not Just Products

Jul 24, 2025 By Teneo In Teneo

A recent BBC article, “Weak password allowed hackers to sink a 158-year-old company,” outlined a serious security lapse. This case reinforces the message that we, at Teneo, advocate every day: true resilience comes from defense in depth, i.e. policy, product and process, not just tools at the edge. In a recent customer engagement, we discussed a transition from VPN to ZTNA. While ZTNA offers enhanced security including continual checking, improved segmentation and a minimized attack surface.

Read Post

Teneo

Read more about Seeing the Bigger Picture: Why Security Needs Depth, Not Just Products

AWS Summit NYC 2025: Laser-Focused on AI

Jul 24, 2025 By Ken Rimple In Honeycomb

If you’re unfamiliar with AWS Summits, these are conferences that occur on a yearly basis in different cities. The events are mostly used to announce new products and technologies. This year, the theme was AI, as evidenced by the keynote, a large majority of the talks, and a walk around the vendor floor. The keynote talk was hosted by Swami Sivasubramanian, VP of Agentic AI at AWS.

Read Post

Honeycomb

Read more about AWS Summit NYC 2025: Laser-Focused on AI

Monitoring Ruby on Rails applications with Applications Manager

Jul 23, 2025 By Sujitha Paduchuri In ManageEngine

Ruby on Rails is the go-to framework for organizations to build flexible, database-driven web applications with high speed and efficiency. Enterprises of all sizes rely on it to build user-friendly applications. But like any other modern web stack, optimizing the performance, availability, and reliability of Rails applications, especially in production environments, requires more than just reactive bug fixes.

Read Post

ManageEngine

Read more about Monitoring Ruby on Rails applications with Applications Manager

Why Your Business Needs APM: 10 Key Benefits You Shouldn't Ignore

Jul 23, 2025 By Pavithra Parthiban In Atatus

In today’s digital world, how well your applications perform has a big impact on how people see your business, and how well it runs. Whether you are in finance, e-commerce, SaaS, healthcare, or media, your users expect everything to work smoothly, all the time. Even a few seconds of slow performance can lead to lost sales, lower productivity, and unhappy customers. That’s why Application Performance Monitoring (APM) is so important.

Read Post

Atatus

Read more about Why Your Business Needs APM: 10 Key Benefits You Shouldn't Ignore

How Datadog Cloud Network Monitoring helps you move to a deny-by-default network egress policy at scale

Jul 23, 2025 By Lee Avital In Datadog

When organizations first begin deploying workloads on Kubernetes, it's common for them to start with a permissive egress traffic policy that allows any workload to reach the internet. This approach can make it easier for teams to stay agile and to get services up and running in fast-moving environments. But as your Kubernetes footprint grows, it's important to minimize public internet access on a per-workload basis to improve your organization's security posture.

Read Post

Datadog

Read more about How Datadog Cloud Network Monitoring helps you move to a deny-by-default network egress policy at scale

How SAP achieved world-class uptime through modern observability

Jul 23, 2025 By Gerardo Dada In Catchpoint

SAP Customer Experience (CX) has undergone a remarkable transformation over recent years, evolving from fragmented monitoring to a scalable, automated observability powerhouse. In a recent fireside chat, Martin Norato Auer, SAP CX’s VP of Observability, shed light on the strategies, practices, and measurable impacts behind SAP’s SLA, uptime, and responsiveness achievements.

Read Post

Catchpoint

Read more about How SAP achieved world-class uptime through modern observability

Taking AI Apps From Prototype to Production

Jul 23, 2025 By Phil Gervasi In Kentik

At this year’s AWS Summit in New York, agentic AI took center stage with Amazon’s launch of Bedrock AgentCore — a powerful step toward turning AI prototypes into scalable, production-ready applications. From low-code workflows to turnkey infrastructure, a new generation of tools is enabling teams of all skill levels to build, deploy, and monitor AI agents faster than ever.

Read Post

Kentik

Read more about Taking AI Apps From Prototype to Production

Zero instrumentation distributed tracing is here: Meet OBI on Open Telemetry

Jul 23, 2025 By Ittai Corem In Coralogix

Modern systems generate enormous amounts of telemetry. The hurdle is collecting clean, connected traces without rewriting code or babysitting a fleet of language agents. That’s why Coralogix backed eBPF from the start. eBPF (extended Berkeley Packet Filter) executes sandboxed programs inside the Linux kernel, without modifying kernel source code. This method allows probes to see every request, at runtime with no instrumentation, and with near zero per‑request overhead.

Read Post

Coralogix

Read more about Zero instrumentation distributed tracing is here: Meet OBI on Open Telemetry

Introducing Sentry's Godot SDK 1.0 Alpha, with support for Godot 4.5 Beta

Jul 23, 2025 By Steve Zegalia In Sentry

Debugging during development is easy. You've got a debugger, stack traces, and logs right in front of you. But once your Godot game is in the hands of players, things get trickier. Most won’t report bugs, and if they do, you’re lucky if they include anything more than “it crashed”.

Read Post

Sentry

Read more about Introducing Sentry's Godot SDK 1.0 Alpha, with support for Godot 4.5 Beta

How to Create Playwright Scripts for Website Monitoring with Chrome, ChatGPT & Sematext

Jul 23, 2025 By Sematext In Sematext

Let’s say you want to make sure your website works as expected. You do not want to check if it just loads. You also want to check if important buttons or features are there and working. Oh, and you don’t want to just do it once. You want to keep an eye on this pretty much all the time. And, of course, you don’t want to keep checking manually if anything broke – you want to be notified, alerted when (not if) things break. You can do this by creating a Browser Monitor.

Read Post

Sematext

Read more about How to Create Playwright Scripts for Website Monitoring with Chrome, ChatGPT & Sematext

Introducing the new search box on StatusGator

Jul 23, 2025 By Colin Bartlett In StatusGator

Recently we hit an exciting milestone at StatusGator: 6,000+ services now tracked! To mark the occasion, we’ve made it even easier to find the apps you care about and report outages: A brand-new, lightning-fast search box is now live on the StatusGator website. It’s built right into the top navigation, accessible from any page — and works beautifully on mobile, too.

Read Post

StatusGator

Read more about Introducing the new search box on StatusGator

How the StatusGator name was born

Jul 23, 2025 By Colin Bartlett In StatusGator

More than 10 years ago, StatusGator pioneered the concept of a status page aggregator. How was the StatusGator name created? In a group chat, of course! The incubator of crazy ideas and nerdy discussions, our friendly group chat was a place where I originally discussed the product idea, validated its use cases, and solicited feedback on name concepts. I recently unearthed screenshots from the original group chat among my friends.

Read Post

StatusGator

Read more about How the StatusGator name was born

Grafana Cloud updates: deeper insights in Kubernetes Monitoring, Adaptive Metrics updates, and more

Jul 23, 2025 By Kristin Knapp In Grafana

We consistently roll out helpful updates and fun features in Grafana Cloud, our fully managed observability platform powered by the open source Grafana LGTM Stack: Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics. In case you missed them, here’s our monthly round-up of the latest and greatest updates in Grafana Cloud. You can also check out our What’s new in Grafana Cloud documentation to explore all the latest features. Not a Grafana Cloud user yet?

Read Post

Grafana

Read more about Grafana Cloud updates: deeper insights in Kubernetes Monitoring, Adaptive Metrics updates, and more

Is Your Network Ready for the Perfect Storm?

Jul 23, 2025 By Yann Guernion In Broadcom

For decades, the corporate network has been the central nervous system of the enterprise. It’s the invisible, indispensable fabric that connects everything. And for just as long, the conversation has been about its growing complexity. But today, something feels different. You are no longer dealing with a predictable, manageable evolution. Instead, three immense, converging forces are creating a perfect storm, pushing traditional network management approaches to their breaking point.

Read Post

Broadcom

Read more about Is Your Network Ready for the Perfect Storm?

Anatomy of AI-powered Root Cause Analysis

Jul 23, 2025 By Nikolay Sivko In Coroot

AI is being used to automate just about everything these days, from writing code to making coffee. Observability is no exception. But before we dive into how AI can actually help, it is worth stepping back to look at what already works, what does not, and where the real gaps are.

Read Post

Coroot

Read more about Anatomy of AI-powered Root Cause Analysis

Bringing GitLab Logs into Focus with Graylog

Jul 23, 2025 By Jeff Darrington In Graylog

GitLab’s audit logs offer a goldmine of insights into user activity, project changes, and security events. Getting that data into Graylog for centralized analysis is easier than you might think—especially with the flexibility of our Raw HTTP input and Illuminate’s GitLab Spotlight Pack. In this two-part guide, we’ll walk you through how to get it done, from wiring up GitLab’s Audit Event Streaming to visualizing enriched events in a purpose-built dashboard.

Read Post

Graylog

Read more about Bringing GitLab Logs into Focus with Graylog

Common Network Switch Issues & How to Fix Them

Jul 23, 2025 By Andrii Kernitskyi In Obkio

As a network admin, you're probably all too familiar with the importance of your network switches. These devices keep the heart of your network beating by connecting various devices, from computers to printers, and ensuring data flows smoothly. However, switches, like any hardware, come with their own set of issues that can disrupt productivity and cause headaches if not addressed promptly.

Read Post

Obkio

Read more about Common Network Switch Issues & How to Fix Them

Bits AI Dev Agent: Automatically identify issues and generate code fixes

Jul 23, 2025 By Datadog In Datadog

The Bits Dev Agent is an AI-powered coding assistant in Datadog designed to reclaim developer productivity by autonomously monitoring telemetry data, identifying key issues, and generating production-ready pull requests. Developers receive asynchronous, context-rich PRs with clear explanations, allowing them to shift their focus from troubleshooting to reviewing solutions and building better code.

View Video

Datadog

Read more about Bits AI Dev Agent: Automatically identify issues and generate code fixes

Introducing Bits AI SRE, your AI on-call teammate

Jul 23, 2025 By Datadog In Datadog

Bits AI SRE is your AI on-call teammate, built to autonomously investigate alerts and coordinate incident response. Integrated with Datadog, Slack, GitHub, Confluence, and more, Bits analyzes telemetry, reads documentation, and reviews recent deployments to determine the root cause of alerts—often before you’ve even opened your laptop. In fact, if you're using Datadog On-Call, you can view Bits’s findings right from your phone—so you’re always one step ahead, no matter where you are.

View Video

Datadog

Read more about Introducing Bits AI SRE, your AI on-call teammate

Datadog Incident Response: Unify remediation and communication

Jul 23, 2025 By Datadog In Datadog

With Datadog's new AI voice agent in Incident Response, you can quickly get up to speed on the issue and start taking action directly from your phone. Handoff notifications make it easy to jump straight to the relevant context and quickly communicate with other responders. Finally, our status pages enable you to automatically update users on your remediation progress.

View Video

Datadog

Read more about Datadog Incident Response: Unify remediation and communication

Performance Attribute widgets | Site24x7 Custom Dashboards

Jul 23, 2025 By ManageEngine Site24x7 In Site24x7

Learn how to visualize, analyze, and optimize real-time performance data across your infrastructure using flexible widgets—time series, text, numerical, and more. This video walks you through creating dashboards to track key metrics, compare attributes, and gain instant insights for faster troubleshooting. Perfect for network admins, IT teams, and anyone looking to boost monitoring efficiency.

View Video

Site24x7

Read more about Performance Attribute widgets | Site24x7 Custom Dashboards

You have 200 milliseconds. That's all the time you get to prove your app or website is alive.

Jul 23, 2025 By Catchpoint In Catchpoint

200ms is about the speed of a blink of the eye, but it’s the difference between “this site works” and “this site’s broken.” Today’s users expect instant feedback, and that’s why it’s critical to measure from their perspective.

View Video

Catchpoint

Monitoring

Read more about You have 200 milliseconds. That's all the time you get to prove your app or website is alive.

Mistakes To Avoid With Your Public Status Page

Jul 23, 2025 By Hrishikesh Barua In IncidentHub

A public status page forms the public face of your organization's service availability. It is the first point of contact for your customers to check the status of your services during times of crisis. Hence, ensuring the credibility and uptime of your public status page is crucial to your organization's reputation. In this article we will look at the key mistakes to avoid while hosting and managing a public status page.

Read Post

IncidentHub

Read more about Mistakes To Avoid With Your Public Status Page

Scout Gives Cookpad Actionable, Rails-Specific Performance Insights

Jul 23, 2025 By Aspen Clevenger In Scout

For more than a decade, Cookpad, a global platform for recipe sharing and search, has relied on APM tools to monitor critical application performance metrics, like server response times and resource usage. When their previous APM tool became too expensive after price increases, they needed to find a new solution that could check all of their boxes.

Read Post

Scout

Read more about Scout Gives Cookpad Actionable, Rails-Specific Performance Insights

Architecting for Value: A Playbook for Sustainable Observability

Jul 23, 2025 By Mezmo In Mezmo

You’ve built something amazing. Your services are scaling, your users are happy, and your team is shipping code like never before. Then the cloud bill arrives, and one line item makes your eyes water: observability. That Datadog invoice feels less like a utility bill and more like a ransom note. It’s a modern engineering paradox. The tools that give you sight into your complex systems are the same ones that can blind you with runaway costs.

Read Post

Mezmo

Read more about Architecting for Value: A Playbook for Sustainable Observability

AIOps in 2025: 4 Components and 4 Key Capabilities

Jul 23, 2025 By Dallon Robinette In Selector

AIOps, or Artificial Intelligence for IT Operations, is the application of artificial intelligence and machine learning to automate and improve IT operations. It combines big data analytics, AI, and machine learning to monitor, manage, and optimize IT environments, enabling organizations to proactively detect, diagnose, and resolve issues more efficiently than traditional methods.

Read Post

Selector

Read more about AIOps in 2025: 4 Components and 4 Key Capabilities

Payment Orchestration: Leveraging AI for Smarter Payment Routing and Fraud Prevention

Jul 23, 2025 By OpsMatters In OpsMatters

The digital payment landscape has undergone a remarkable transformation with the integration of artificial intelligence technologies. Modern businesses face the challenge of managing complex payment ecosystems while maintaining security and customer satisfaction. Payment orchestration emerges as the solution that bridges this gap, creating unified systems from fragmented payment infrastructures.

Read Post

OpsMatters

Read more about Payment Orchestration: Leveraging AI for Smarter Payment Routing and Fraud Prevention

PDF Redaction for Compliance: Best Practices in IT Monitoring

Jul 23, 2025 By OpsMatters In OpsMatters

If you're working in IT, especially in security, audit, or monitoring roles within this industry, you're familiar with the term 'compliance' and understand that it holds significant importance. Because compliance is not just a legal term, it's a set of rules that helps to protect sensitive data, avoid penalties, and save a company's reputation from potential PR problems.

Read Post

OpsMatters

Read more about PDF Redaction for Compliance: Best Practices in IT Monitoring

66% of us use AI every day, but do we actually know how it works?

Jul 22, 2025 By Harsitha P In ManageEngine

There’s a new kind of thinking happening in the world. It doesn’t come with memories, emotions, or doubt. It doesn’t hesitate. It doesn’t wonder. However, it appears to be thinking. Ask it a question, and it responds in perfect grammar. Ask for help, and it gives you options. You could almost believe it understands. Almost. This is what happens when machines are trained to speak like us, but without ever needing to understand us.

Read Post

ManageEngine

Read more about 66% of us use AI every day, but do we actually know how it works?

LLM and AI Application Observability vs. Traditional and Cloud Native Observability

Jul 22, 2025 By Shailesh Manjrekar In Fabrix

OpenLLMetry extends the industry-standard OpenTelemetry framework with features specifically for monitoring large language model (LLM) and GenAI application behavior.

Read Post

Fabrix

Read more about LLM and AI Application Observability vs. Traditional and Cloud Native Observability

An Introduction to Oban for Elixir Monitoring Using AppSignal

Jul 22, 2025 By Aestimo Kirina In AppSignal

Background task processing is something that many developers may encounter when building Elixir applications. This might include sending emails asynchronously, posting and fetching data from an API, and more. Oban, a powerful and persistent job processing library, offers a reliable way to handle background tasks, scheduled operations, and more. However, like any complex system, Oban requires careful monitoring to ensure its smooth operation, identify bottlenecks, and prevent unexpected failures.

Read Post

AppSignal

Read more about An Introduction to Oban for Elixir Monitoring Using AppSignal

Taming Complexity: Addressing Infrastructure Monitoring Challenges in Banking and Finance

Jul 22, 2025 By david.arrowsmith In Interlink

Banks and financial institutions operate in one of the most complex, highly regulated and risk-averse industries.

Read Post

Interlink

Read more about Taming Complexity: Addressing Infrastructure Monitoring Challenges in Banking and Finance

The one where we talk about what's next for Cribl U!

Jul 22, 2025 By Cribl In Cribl

What's next for Cribl University? Tune in to find out.

View Video

Cribl

Read more about The one where we talk about what's next for Cribl U!

400 Million Reasons Hackers Will Target Microsoft Again...

Jul 22, 2025 By Teneo In Teneo

Yesterday, like many others in the tech community, I found myself pausing to fully grasp the implications of the Microsoft SharePoint hack. As one of the most widely adopted document management and collaboration platforms globally, SharePoint’s compromise inevitably sends ripples of concern through businesses everywhere. This news reminded me of a conversation I had just last week with an enterprise customer. We were discussing how one might approach cybersecurity from a hacker’s perspective.

Read Post

Teneo

Read more about 400 Million Reasons Hackers Will Target Microsoft Again...

What is Python Application Performance Monitoring? - [A Complete Guide]

Jul 22, 2025 By Mohana Ayeswariya J In Atatus

A recent study looked at real-world Python programs and found something important: Python isn’t the main reason apps slow down. The real problems come from inside the code like poor logic, memory issues, and slow database queries. The problem is, these issues often go unnoticed. Your app may seem fine until users start complaining about slowness or things start breaking under pressure.

Read Post

Atatus

Read more about What is Python Application Performance Monitoring? - [A Complete Guide]

Ingest, Explore, Validate: A Quickstart with InfluxDB 3 Enterprise and Explorer UI

Jul 22, 2025 By Jameelah Mercer In InfluxData

Great observability doesn’t just collect metrics—it tells you exactly what’s broken, why it’s broken, and what to do about it. InfluxDB 3 Enterprise delivers this through real-time ingestion, fast queries, and scalable storage. InfluxDB 3 Explorer provides the intuitive interface your team needs for database management, data ingestion, querying, and visualization without the usual complexity.

Read Post

InfluxData

Read more about Ingest, Explore, Validate: A Quickstart with InfluxDB 3 Enterprise and Explorer UI

From Reactive to Resilient: Why CIOs Must Lead the Automation Shift to Achieve True Business Agility

Jul 22, 2025 By ScienceLogic In ScienceLogic

For decades, CIOs have fought to keep pace with rising digital complexity. As IT environments have grown more fragmented and dynamic, operational stability has often come at the cost of strategic agility. But the game is changing. What once required heroic effort to maintain is now table stakes—and the new expectation is that IT won’t just support the business, it will help steer it.

Read Post

ScienceLogic

Read more about From Reactive to Resilient: Why CIOs Must Lead the Automation Shift to Achieve True Business Agility

How to Monitor JavaScript Memory Leaks in Production

Jul 22, 2025 By Todd H. Gardner In TrackJS

Remember when JavaScript was just for making snowflakes fall on your GeoCities page? Those were simpler times. Now we’re building entire applications in the browser, and surprise! JavaScript wasn’t exactly designed with memory management in mind. While other languages have garbage collectors that actually, you know, collect garbage, JavaScript’s garbage collector is more like that roommate who promises to clean but just shoves everything under the bed. The real kicker?

Read Post

TrackJS

Read more about How to Monitor JavaScript Memory Leaks in Production

How to improve observability with fast log analysis (using FOSS!)

Jul 22, 2025 By Coroot In Coroot

Log analysis can take only seconds (not hours) with time-mapped heat graphs, pattern clustering and analysis, and errors sorted by severity.

View Video

Coroot

Read more about How to improve observability with fast log analysis (using FOSS!)

Ship Confluent Cloud Observability in Minutes

Jul 22, 2025 By Anjali Udasi In Last9

You're running Kafka on Confluent Cloud. You care about lag, throughput, retries, and replication. But where do you see those metrics? Confluent gives you metrics, sure, but not all in one place. Some live behind a metrics API, others behind Connect clusters or Schema Registries. You either wire them manually or give up. What if you could stream those metrics to a platform built for high-frequency, high-cardinality time series, and do it in minutes?

Read Post

Last9

Read more about Ship Confluent Cloud Observability in Minutes

How to Cut Observability Costs with Synthetic Monitoring and Responsive Pipelines

Jul 22, 2025 By Mezmo In Mezmo

Platform teams are struggling with observability noise, bloated storage costs, and lack of clarity during incidents. Most teams capture everything all the time, leading to expensive, overwhelming, and often unnecessary data volumes. In Telemetry for Modern Apps, Mezmo teamed up with Checkly to demonstrate how synthetic monitoring triggers and responsive telemetry pipelines can help reduce costs while maintaining the context needed during incidents.

Read Post

Mezmo

Read more about How to Cut Observability Costs with Synthetic Monitoring and Responsive Pipelines

How To Perform A TCP Check | Grafana Synthetic Monitoring

Jul 22, 2025 By Grafana In Grafana

Learn how to set up TCP checks using Grafana Cloud Synthetic Monitoring. In this video, we walk through how to create a TCP check and analyze test results.

View Video

Grafana

Read more about How To Perform A TCP Check | Grafana Synthetic Monitoring

Six platform updates giving you time back in your day

Jul 22, 2025 By Margaret Selid In Sumo Logic

Ever look at your to-do list at the end of the day and realize it’s grown longer, not shorter? We get it—there’s always more to do and never enough time. But if you’re a Sumo Logic user, reading this blog will be a win for your day because we’re giving you six ways to slash the time you spend on tasks in your platform.

Read Post

Sumo Logic

Read more about Six platform updates giving you time back in your day

Silent Support Systems and The Infrastructure That Keeps Factories Running

Jul 22, 2025 By OpsMatters In OpsMatters

What keeps a factory running when no one's watching? Behind every smooth production line is a network of support systems (compressed air, steam, HVAC, fuel delivery, hydraulic circuits) that operate quietly but are vital to performance. These systems don't grab attention like robotics or automation, yet they prevent downtime, protect equipment, and ensure safety. Ignoring them can lead to unexpected failures and costly interruptions. Understanding how they function and why they matter is essential for maintaining efficient, reliable operations in any industrial setting.

Read Post

OpsMatters

Read more about Silent Support Systems and The Infrastructure That Keeps Factories Running

NiCE Active 365 Management Pack 4.4 for Microsoft SCOM

Jul 21, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

We’re thrilled to release NiCE Active 365 Management Pack 4.4 for Microsoft SCOM. The new 4.4 release is packed with powerful new enhancements driven by customer input and evolving needs. It especially focuses on improving monitoring capabilities for Azure-based services and ensuring compatibility with Microsoft’s evolving ecosystem.

Read Post

NiCE IT Mgmt

Read more about NiCE Active 365 Management Pack 4.4 for Microsoft SCOM

Sponsored Post

Atlassian Jira Monitoring on Microsoft SCOM

Jul 21, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

As part of a customer project, we developed a custom Jira Management Pack for Microsoft System Center Operations Manager (SCOM). This tailored solution enables IT operations teams to monitor key performance and health metrics of Jira environments, ensuring planning and bug-tracking platforms remain available and performant. With this Use Case paper, we want to share our knowledge with the SCOM Community to highlight the possibilities of advanced monitoring on Microsoft SCOM, helping teams get better in their day-to-day tasks.

Read Post

NiCE IT Mgmt

Read more about Atlassian Jira Monitoring on Microsoft SCOM

Sponsored Post

Streamlining multi-cloud complexity with unified observability

Jul 21, 2025 By ManageEngine In ManageEngine

A wave of businesses are embracing multi-cloud strategies to gain flexibility and scalability. By combining on-premises infrastructure, private clouds, and public platforms like AWS, Azure, and Google Cloud Platform (GCP), IT teams can experiment, deploy, transform, and improve their IT applications significantly. On the down side, this modern IT approach of employing multiple clouds (in both public and private forms) also brings significant complexity, making it challenging to monitor systems, control costs, and secure environments. There are just too many threads to track and tie together to ensure a taut IT fabric.

Read Post

ManageEngine

Read more about Streamlining multi-cloud complexity with unified observability

The Ultimate Network Assessment Template for Your Business

Jul 21, 2025 By Andrii Kernitskyi In Obkio

In the fast-paced realm of IT businesses, it's easy to overlook the intricate web that powers your operations – your network infrastructure. Let's face it, most enterprises only give it the attention it deserves when something goes wrong. And by then, the issue has often snowballed into a full-blown crisis.

Read Post

Obkio

Read more about The Ultimate Network Assessment Template for Your Business

MTTR, MTBF, MTTA & MTTF - Metrics, examples, challenges, and tips

Jul 21, 2025 By Leo Baecker In Hyperping

When your system crashes at 3 AM and customers start flooding your support channels, every minute feels like an eternity. Mean Time to Repair (MTTR) measures exactly how long these painful moments last and more importantly, how you can make them shorter. MTTR tracks the average time between when a failure occurs and when your system is fully operational again. This metric directly impacts customer satisfaction, revenue, and your team's sanity during incident response.

Read Post

Hyperping

Read more about MTTR, MTBF, MTTA & MTTF - Metrics, examples, challenges, and tips

OpenTelemetry at Grafana Labs: the latest on how we're investing in the emerging industry standard

Jul 21, 2025 By Grafana Labs Team In Grafana

Here at Grafana Labs, open source has always been core to what we do. So it should come as no surprise that we’re going all in on OpenTelemetry—an open source project that’s quickly becoming an industry standard for vendor-neutral telemetry.

Read Post

Grafana

Read more about OpenTelemetry at Grafana Labs: the latest on how we're investing in the emerging industry standard

Monitor Nginx with OpenTelemetry Tracing

Jul 21, 2025 By Prathamesh Sonpatki In Last9

At 3:47 AM, your NGINX logs show a 500 error. Around the same time, your APM flags a spike in API latency. But what's the root cause, and why is it so hard to correlate logs, traces, and metrics? When API response times cross 3 seconds, identifying whether the slowdown is at the NGINX layer, the application, or the database shouldn't require guesswork. That's where OpenTelemetry instrumentation for NGINX becomes essential.

Read Post

Last9

Read more about Monitor Nginx with OpenTelemetry Tracing

How to Set Up Real User Monitoring

Jul 21, 2025 By Anjali Udasi In Last9

Synthetic monitoring provides consistent, repeatable results, 2.1s load times, passing Lighthouse scores, and minimal variability. But those numbers reflect lab conditions. On slower networks, like 3G in Southeast Asia, real users may see much higher load times, 5.8s or more. This isn’t a fault of the tools. It’s a difference in testing context. Synthetic tests run on fast machines, stable connections, and clean environments.

Read Post

Last9

Read more about How to Set Up Real User Monitoring

VirtualMetric Achieves SOC 2 Certification: A Milestone in Trust and Security

Jul 21, 2025 By VirtualMetric In VirtualMetric

We’re excited to announce that VirtualMetric has achieved SOC 2 Type 2 certification. This is a key step in our mission to deliver secure, resilient, and efficient telemetry solutions. This certification confirms that our controls for security, availability, confidentiality, and data integrity don’t just look good on paper — they work in practice, over time.

Read Post

VirtualMetric

Read more about VirtualMetric Achieves SOC 2 Certification: A Milestone in Trust and Security

VirtualMetric in the 2025 Comprehensive Market Guide: Rising Data Pipeline Security

Jul 21, 2025 By VirtualMetric In VirtualMetric

Over the past year, much of cybersecurity’s attention has centered on the promise of AI-powered SOCs. But as the Market Guide 2025 by Francis Odum reveals, the true foundation of modern security success lies in the data layer. “Without clean, well-routed telemetry, even the smartest AI is starved of context,” points out the researcher. And that’s where Security Data Pipeline Platforms (SDPPs) have become essential.

Read Post

VirtualMetric

Read more about VirtualMetric in the 2025 Comprehensive Market Guide: Rising Data Pipeline Security

VirtualMetric Earns ISO 27001:2022 Certification: Security at Every Level

Jul 21, 2025 By VirtualMetric In VirtualMetric

We’re excited to share that VirtualMetric has officially achieved ISO 27001:2022 certification, a globally recognized standard for building and managing an effective Information Security Management System (ISMS). This confirms that we’ve implemented robust controls to protect data, manage risks, and ensure the resilience of our infrastructure in today’s security landscape.

Read Post

VirtualMetric

Read more about VirtualMetric Earns ISO 27001:2022 Certification: Security at Every Level

From Sequential Bottlenecks to Concurrent Performance: Optimizing Log Processing at Scale

Jul 21, 2025 By Anushka Karmakar In SigNoz

We optimized log processing pipeline by moving from sequential to concurrent processing at the entry level, achieving 30% higher throughput and better resource utilization without increasing infrastructure costs. When customers start sending millions of logs per minute, you quickly discover whether your processing pipeline can actually scale with vertical scaling.

Read Post

SigNoz

Read more about From Sequential Bottlenecks to Concurrent Performance: Optimizing Log Processing at Scale

Will AI Speed Development in Your Legacy App?

Jul 21, 2025 By Jessica Kerr In Honeycomb

Some people can get an AI assistant to write a day’s worth of useful code in ten minutes. Others among us can only watch it crank out hundreds of lines of crap that never works. What’s the difference? There are some skills specific to AI development. There are also properties of the codebase we’re working in that make it amenable to AI assistance. Most AI demos use projects created from scratch with AI in mind—cute.

Read Post

Honeycomb

Read more about Will AI Speed Development in Your Legacy App?

The Hidden Cost of Not Using APM in Production

Jul 21, 2025 By Pavithra Parthiban In Atatus

Many organizations don’t realize how important it is to monitor how their applications run in production. Without Application Performance Monitoring (APM), it becomes difficult to detect and resolve issues quickly, leading to increased downtime, wasted developer effort, and poor user experience. These hidden costs, though not always visible at first, can impact customer satisfaction, reduce team efficiency, and result in lost revenue.

Read Post

Atatus

Read more about The Hidden Cost of Not Using APM in Production

IT Service Performance Monitoring: Key Metrics, Best Practices, and Future Trends

Jul 21, 2025 By Muhammad Raza In Splunk

As organizations rely more on complex IT systems and cloud-based services, keeping everything running smoothly — and reliably — has become a top priority. That’s where IT service performance monitoring comes in, giving teams the visibility they need to make sure systems stay healthy and responsive. By tracking a range of technical and user-focused metrics, businesses can quickly identify and address issues before they impact operations or end users.

Read Post

Splunk

Read more about IT Service Performance Monitoring: Key Metrics, Best Practices, and Future Trends

Autonomous Operations Are Here

Jul 21, 2025 By ScienceLogic In ScienceLogic

ScienceLogic’s vision for IT operations isn’t just about improving tools—it’s about changing the entire paradigm, flipping your day-to-day upside down. We’re moving beyond dashboards and alerts, beyond human-only workflows and rules-based systems. The future is autonomous. It’s intelligent. It’s agentic. And it’s already being realized through the power of Skylar AI.

Read Post

ScienceLogic

Read more about Autonomous Operations Are Here

The AI Monitoring crisis that no one's talking about

Jul 21, 2025 By Coralogix Team In Coralogix

When I spoke at AWS London earlier this year, I had the chance to discuss something that more and more teams are starting to feel: traditional observability doesn’t cut it for AI systems. In AI, “Is it running?” is no longer enough. We have to ask, “Is it right?” When I delivered that line, I saw the heads nodding. Everyone’s excited to build with LLMs, but when it comes to actually monitoring them in production? That’s where things fall apart.

Read Post

Coralogix

Read more about The AI Monitoring crisis that no one's talking about

The original "business card demo" of TrackJS

Jul 21, 2025 By TrackJS In TrackJS

So you're at the bar and someone asks, "What does your software do anyway?". How do you respond? How do you condense your technical feats into a response concise enough for the alcohol-fueled mind and amazing enough to remember tomorrow? Todd Gardner shares the demo that launched TrackJS around the world.

View Video

TrackJS

Read more about The original "business card demo" of TrackJS

Get started with Grafana Alerting: Multi-dimensional alerts and how to route them

Jul 21, 2025 By Grafana In Grafana

In this tutorial, we dig into more complex yet equally fundamental elements of Grafana Alerting: alert instances and notification policies. Don't miss the rest of the "Get started with Grafana Alerting" series! Each part dives into a different feature to help you get the most out of alerting in Grafana.

View Video

Grafana

Read more about Get started with Grafana Alerting: Multi-dimensional alerts and how to route them

Getting Started With Catchpoint Traceroute

Jul 21, 2025 By Catchpoint In Catchpoint

in this episode of our Getting Started With Catchpoint series, Leon walks you through setting up a traceroute test. He covers the various options - the type of traceroute, thresholds, etc and even touches on what you can do with the test data once it's being collected - from smartboards to alerts and beyond.

View Video

Catchpoint

Read more about Getting Started With Catchpoint Traceroute

10 Essential Tips for Setting Up Monitoring for Your SaaS

Jul 21, 2025 By Nuno Tomas In isDown

Setting up monitoring for your SaaS application is crucial for maintaining reliability and keeping customers happy. Without proper monitoring, you're essentially flying blind – unable to detect issues before they impact users or understand how your system performs under different conditions. Here are 10 essential tips to help you build a comprehensive monitoring strategy for your SaaS application.

Read Post

isDown

Read more about 10 Essential Tips for Setting Up Monitoring for Your SaaS

Top 3 Intune reporting tools: SquaredUp, Microsoft admin center, and Power BI

Jul 21, 2025 By Blog In Squared Up

As the unsung hero of modern endpoint management, Microsoft Intune quietly ensures security, compliance, and seamless user experiences across a range of devices and platforms. Where many organisations go wrong, however, is not having the right tool to monitor and leverage Intune’s full potential. But for an organization relying on Intune, what tool should you use?

Read Post

Squared Up

Read more about Top 3 Intune reporting tools: SquaredUp, Microsoft admin center, and Power BI

Why Use a Status Page Aggregator?

Jul 20, 2025 By Nuno Tomas In isDown

Managing multiple vendor dependencies has become a critical challenge for modern businesses. When your operations rely on dozens of third-party services, tracking their status individually becomes inefficient and risky. A status page aggregator solves this problem by consolidating all vendor status information into a single dashboard.

Read Post

isDown

Read more about Why Use a Status Page Aggregator?

How to Choose the Best Vendor Monitoring Platform for Your Team

Jul 19, 2025 By Nuno Tomas In isDown

Modern businesses rely on dozens of third-party services to operate effectively. When AWS goes down, your application might crash. When Stripe has issues, payments fail. When Slack experiences an outage, team communication grinds to a halt. Vendor monitoring platforms help you track the health of these critical dependencies before they impact your operations. But with numerous options available, selecting the right platform requires careful evaluation of your team's specific needs and workflows.

Read Post

isDown

Read more about How to Choose the Best Vendor Monitoring Platform for Your Team

Application monitoring in Google Cloud: Bridging manual and AI-assisted troubleshooting

Jul 19, 2025 By Dave Raffensperger In Google Operations

Cloud Observability’s curated Application Monitoring dashboards improve troubleshooting with best practices from Google SREs.

Read Post

Google Operations

Read more about Application monitoring in Google Cloud: Bridging manual and AI-assisted troubleshooting

Golang Application Performance Monitoring: A Comprehensive Guide

Jul 18, 2025 By Pavithra Parthiban In Atatus

Application Performance Monitoring (APM) refers to the practice of tracking, analyzing, and optimizing the performance and availability of software applications. When it comes to Go (Golang), a language known for its concurrency, speed, and efficiency, APM becomes crucial to ensure that your applications stay fast, reliable, and scalable under real-world loads. APM in Go involves monitoring the runtime behavior, request response times, system resource usage, and error patterns across your application.

Read Post

Atatus

Read more about Golang Application Performance Monitoring: A Comprehensive Guide

Risk Register for SREs: A Practical Guide to Proactive Incident Prevention

Jul 18, 2025 By Nuno Tomas In isDown

A risk register is one of the most powerful tools in an SRE's arsenal for maintaining system reliability. By systematically documenting potential threats to your infrastructure and services, you can shift from reactive firefighting to proactive risk management.

Read Post

isDown

Read more about Risk Register for SREs: A Practical Guide to Proactive Incident Prevention

Set Up ClickHouse with Docker Compose

Jul 18, 2025 By Preeti Dewani In Last9

ClickHouse is built for high-performance OLAP workloads, capable of scanning billions of rows in seconds. If your analytical queries are bottlenecked on PostgreSQL or MySQL, or you're burning too much on Elasticsearch infrastructure, ClickHouse offers a faster and more cost-efficient alternative. This blog walks through setting up ClickHouse locally with Docker Compose and scaling toward a production-grade cluster with monitoring in place.

Read Post

Last9

Read more about Set Up ClickHouse with Docker Compose

Stream AWS Metrics to Grafana with Last9 in 10 minutes

Jul 18, 2025 By Faiz Shaikh In Last9

It’s 2:47 AM and your Lambda functions are timing out. API response times are spiking. You’re flipping between the CloudWatch console, your APM tool, and your logs, trying to figure out what’s going wrong. CloudWatch has the metrics you need: CPU usage, memory pressure, and request rates — but connecting that data to what your app is doing takes time. The delay in stitching it all together slows down your incident response.

Read Post

Last9

Read more about Stream AWS Metrics to Grafana with Last9 in 10 minutes

I built an MCP Server for Observability. This is my Unhyped Take

Jul 18, 2025 By Elizabeth Mathew In SigNoz

Recently, I read a blog titled “It’s The End Of Observability As We Know It (And I Feel Fine)”, which discussed MCP servers in observability and how these systems would potentially be the “end of observability”. As someone who has spun up an MCP server for an observability backend and as someone who has been in the space for a while, I certainly do not think so.

Read Post

SigNoz

Read more about I built an MCP Server for Observability. This is my Unhyped Take

Cloud or Self-Hosted - Which Deployment Model is Right For You?

Jul 18, 2025 By Anushka Karmakar In SigNoz

Choosing the right observability platform is a critical decision. But how you deploy it is just as important. The right deployment strategy can accelerate your team, simplify operations, and ensure you meet compliance and security requirements. The wrong one can lead to operational headaches and slow you down. At SigNoz, we believe in flexibility. There is no single "best" way to deploy an observability platform; there's only the way that's best for you.

Read Post

SigNoz

Read more about Cloud or Self-Hosted - Which Deployment Model is Right For You?

Log Management Custom Ingestion Filters | WhatsUp Gold 2025.0

Jul 18, 2025 By Progress WhatsUp Gold In WhatsUp Gold

This video explains how to use filters in WhatsUp Gold Log Management to gather and view only the log data that you’re interested in. You’ll learn about report filters as well as the new ingestion filters introduced in WhatsUp Gold version 25.0.

View Video

WhatsUp Gold

Read more about Log Management Custom Ingestion Filters | WhatsUp Gold 2025.0

What's New with Progress WhatsUp Gold 2025.0

Jul 18, 2025 By Progress WhatsUp Gold In WhatsUp Gold

Efficient network monitoring starts with visibility and control. The latest release of Progress WhatsUp Gold 2025.0 will help you stay ahead of issues and maintain a healthy, secure network. Join our upcoming session to explore how the newest enhancements simplify monitoring, improve workflows, and provide deeper insights into your infrastructure.

View Video

WhatsUp Gold

Read more about What's New with Progress WhatsUp Gold 2025.0

The Dashboard That Lets You Track the ISS in Real Time | Golden Grot Awards | Grafana Everywhere

Jul 18, 2025 By Grafana In Grafana

Ruben Fernandez turned his love for space into a stunning ISS dashboard that won the Golden Grot—twice. Watch how he brings data and dreams together. Congratulations to Ruben Fernandez, our 2025 Golden Grot Award winner, recognized for this unique use case and dashboard! Grafana Cloud is the easiest way to get started with Grafana dashboards, metrics, logs, and traces. Our forever-free tier includes access to 10k metrics, 50GB logs, 50GB traces and more.

View Video

Grafana

Read more about The Dashboard That Lets You Track the ISS in Real Time | Golden Grot Awards | Grafana Everywhere

How to Know if Emails Are Actually Reaching Customers in Time

Jul 18, 2025 By Richa Gupta In WebSitePulse

TL;DR: Timely email delivery is critical. Delays in transactional or marketing emails can result in missed opportunities and reduced customer trust. Utilize monitoring methods such as timestamp analysis and round-trip tracking to ensure emails reach inboxes within seconds.

Read Post

WebSitePulse

Read more about How to Know if Emails Are Actually Reaching Customers in Time

How to monitor your Laravel app for critical vulnerabilities using Oh Dear

Jul 18, 2025 By Freek Van der Herten In Oh Dear

A critical security vulnerability was recently discovered in Livewire v3 that allows remote code execution, as Stephen Rees-Carter reported on Securing Laravel. While patches are released quickly, many applications remain vulnerable because developers simply don't know about the issue yet. Oh Dear's Application Health monitoring solves this by continuously checking your production environment for security vulnerabilities and immediately notifying you when issues are detected.

Read Post

Oh Dear

Read more about How to monitor your Laravel app for critical vulnerabilities using Oh Dear

Release v2.6: MCP Server, AI Insights Enhancement, Okta SCIM Integration, SNMP Monitoring and more.

Jul 18, 2025 By Netdata In netdata

Netdata 2.6.0 is here and it’s our most intelligent release yet! This version brings AI-powered monitoring, easier network visibility, and smoother enterprise integrations, all designed to help you troubleshoot faster and scale smarter. What's New: Netdata Referral Program Every referred user will get a 10% discount when they subscribe to Netdata Business or Homelab - and you will receive 10% of their subscription value (up to a max of 1000$ per space). You can refer an unlimited number of users, so there's no real limit to how much you can earn with the referral program.

View Video

netdata

Read more about Release v2.6: MCP Server, AI Insights Enhancement, Okta SCIM Integration, SNMP Monitoring and more.

Overview of Alerts, Real-Time Analysis, & Traceroute

Jul 18, 2025 By Uptime Website Monitoring In uptime

Learn how Uptime.com alerts you the moment a check goes Up or Down, complete with technical details and root cause analysis for API and Transaction checks. Dive into Real-Time Analysis to track outage timelines and get detailed insight into every alert. Plus, see how Traceroute from global or private probe servers helps identify connection issues quickly and accurately. Stay informed. Respond faster. Resolve smarter.

View Video

uptime

Read more about Overview of Alerts, Real-Time Analysis, & Traceroute

The Case for Intelligent Automation in Network Operations

Jul 18, 2025 By John Capobianco In Selector

In the last decade or so, network infrastructure has undergone a massive transformation. With the rise of hybrid cloud, distributed applications, and software-defined everything, managing networks has become exponentially more complex. What used to be a stable, predictable environment is now a constantly evolving system of interconnected services, protocols, and devices, each with its own telemetry, APIs, and failure models.

Read Post

Selector

Read more about The Case for Intelligent Automation in Network Operations

Top tips: How to be a beginner again

Jul 17, 2025 By Nandana Ann Mathew In ManageEngine

Top tips is a weekly column where we highlight what’s trending in the tech world and list ways to explore these trends. This week, we're talking about what it really means to start fresh, stay curious, and make space to be a beginner again—even when your calendar’s packed. If your calendar is crammed with back-to-back meetings, messages that never stop, and deadlines breathing down your neck, you're not alone.

Read Post

ManageEngine

Read more about Top tips: How to be a beginner again

Catchpoint News Catchup Episode 4

Jul 17, 2025 By Catchpoint In Catchpoint

Join Kelly, Sergey, and Leon as they discuss encryption as a public good, better ways to ship your code, and the impact of not being able to “click to cancel”.

View Video

Catchpoint

Monitoring

Read more about Catchpoint News Catchup Episode 4

Automate the removal of agents with Opslogix Lifecycle Management Pack

Jul 17, 2025 By Jonas Lenntun In OpsLogix

Automate the removal of agents with Opslogix Lifecycle Management Pack Lifecycle Management is something that falls behind in many SCOM environments. It is common for organizations to reach out to us for help with manually removing agents when servers are no longer in use. To decrease the manual tasks and automate the removal of agents, we created the Opslogix Lifecycle Management Pack.

Read Post

OpsLogix

Read more about Automate the removal of agents with Opslogix Lifecycle Management Pack

How APM Can Improve Your Digital Customer Experience?

Jul 17, 2025 By Mohana Ayeswariya J In Atatus

When a customer taps a button, submits a form or waits for a page to load, they’re not thinking about your backend architecture, microservices, or CDN; they want it to work instantly. But when it doesn’t, the frustration is immediate. Maybe the app freezes. Maybe a checkout fails. Maybe the entire experience just feels laggy. And the worst part? They don't complain, they just leave the application.

Read Post

Atatus

Read more about How APM Can Improve Your Digital Customer Experience?

Query and Analyze Logs Visually, Without Writing LogQL

Jul 17, 2025 By Anjali Udasi In Last9

It’s 2 AM. An incident’s in progress. Error rates are climbing. You jump into the logs, filter by service, adjust the time window… and now you need a LogQL query. You write one. It errors out. You fix the syntax, try again, only to realize you need a different filter or a new aggregation. Back to rewriting. By the time you’ve got the query right, you’ve already lost 10–15 minutes. The system is still broken, and you still don’t know why.

Read Post

Last9

Read more about Query and Analyze Logs Visually, Without Writing LogQL

Trace Go Apps Using Runtime Tracing and OpenTelemetry

Jul 17, 2025 By Preeti Dewani In Last9

When your Go service hits 500ms latencies but CPU usage is flat, tracing gives you visibility into what the profiler misses. With 1–2% runtime overhead, Go’s built-in tracing tools help you: This makes it easier to debug performance regressions that don’t leave a clear footprint.

Read Post

Last9

Read more about Trace Go Apps Using Runtime Tracing and OpenTelemetry

Honeycomb Named a Visionary in the 2025 Gartner Magic Quadrant for Observability Platforms

Jul 17, 2025 By Julie Neumann In Honeycomb

In the era of AI, software development is at an inflection point, and observability has never been more critical. Teams are dealing with more code, more data, and more pressure than ever before. To navigate these new challenges, you need a partner with a strong vision for the future and a knack for looking around corners. Honeycomb is proud to be named a Visionary in the 2025 Gartner Magic Quadrant for Observability Platforms.

Read Post

Honeycomb

Read more about Honeycomb Named a Visionary in the 2025 Gartner Magic Quadrant for Observability Platforms

Elasticsearch is a recommended vector database in the NVIDIA Enterprise AI Factory validated design

Jul 17, 2025 By Aditya Tripathi, In Elastic

Elastic now integrates with the NVIDIA Enterprise AI Factory validated design to provide users with a recommended vector database for their on-premises AI Factories. The validated design provides enterprises with a framework for building and deploying AI Factories on-premises.

Read Post

Elastic

Read more about Elasticsearch is a recommended vector database in the NVIDIA Enterprise AI Factory validated design

Getting started with Dynatrace dashboards

Jul 17, 2025 By Sameer Mhaisekar In Squared Up

Dynatrace gives you incredibly deep observability data. But all that depth can bury the insights needed. In this blog, we show how to turn Dynatrace's complex telemetry into visual dashboards that actually make sense. Dynatrace is a leading observability and application performance monitoring (APM) platform, known for its deep insight into complex, modern cloud environments. With capabilities spanning infrastructure monitoring, real user monitoring, and security, Dynatrace offers powerful telemetry.

Read Post

Squared Up

Read more about Getting started with Dynatrace dashboards

ScienceLogic Wins AI Breakthrough Award for Predictive Analytics Platform of the Year

Jul 17, 2025 By ScienceLogic In ScienceLogic

We’re excited to announce that ScienceLogic has been recognized in the 2025 AI Breakthrough Awards as the winner of “Predictive Analytics Platform of the Year.” This marks our second consecutive win in the program—and highlights our leadership in shaping the future of intelligent automation and observability. As organizations move from traditional monitoring toward autonomous operations, the need for real-time insight, automation, and predictive intelligence has never been greater.

Read Post

ScienceLogic

Read more about ScienceLogic Wins AI Breakthrough Award for Predictive Analytics Platform of the Year

MCP Server on Splunk Cloud Platform Demo

Jul 17, 2025 By Splunk In Splunk

Discover the future of data interaction! This video introduces the Model Context Protocol (MCP) server on Splunk Cloud Platform, a groundbreaking capability that seamlessly connects your Splunk data with advanced AI models (LLMs). Learn how to leverage natural language to query, analyze, and manage your Splunk environment without complex SPL. In this comprehensive setup and configuration guide, we'll walk you through.

View Video

Splunk

Read more about MCP Server on Splunk Cloud Platform Demo

Router Monitoring for Network Admins: A How-To Guide

Jul 17, 2025 By Andrii Kernitskyi In Obkio

As network admins, we know that routers are the lifeblood of any network. They’re the unsung heroes, routing data from your internal systems to external destinations like the Internet. When routers are performing at their best, everything flows smoothly. But when they’re overloaded, misconfigured, or simply not up to snuff, your network’s performance and security are at risk.

Read Post

Obkio

Read more about Router Monitoring for Network Admins: A How-To Guide

Part One 'The 5 Essential Capabilities of Event Intelligence Platforms'

Jul 17, 2025 By david.arrowsmith In Interlink

With it a touch of hype, the term Event Intelligence has gained traction in recent months as large enterprises seek smarter ways to manage events, reduce noise – driven by that never ending quest to improve uptime.

Read Post

Interlink

Read more about Part One 'The 5 Essential Capabilities of Event Intelligence Platforms'

Checkly Is Now Available in the AWS Marketplace

Jul 17, 2025 By Sara Miteva In Checkly

If your team runs on AWS, getting new tools into your workflow isn’t just about functionality. It’s about how quickly you can procure, integrate, and see value. With Checkly now available on AWS Marketplace, monitoring doesn’t have to be an exception. This launch means Checkly fits into your procurement flow the same way it fits into your CI/CD: seamlessly. No vendor approval bottlenecks, no procurement delays, just faster access to the tools your developers already want to use.

Read Post

Checkly

Read more about Checkly Is Now Available in the AWS Marketplace

DNS Misconfigurations MSPs Can't Ignore

Jul 17, 2025 By DNS Spy In DNS Spy

When something goes wrong in a client’s infrastructure, MSPs are expected to fix it—fast. But there’s one area most teams still overlook, and it’s often the first point of failure: DNS. Misconfigured DNS doesn’t always break things immediately. It’s subtle. It lingers. And when it finally causes an outage, broken email, or a security issue, it’s often too late. Here are the DNS misconfigurations MSPs can’t afford to ignore—and what to do about them.

Read Post

DNS Spy

Read more about DNS Misconfigurations MSPs Can't Ignore

Monitor Lambda-hosted web apps with the Lambda Web Adapter integration

Jul 17, 2025 By Jordan Obey In Datadog

As organizations migrate their legacy web applications from containerized or server-based deployments to serverless environments, they often run into a critical compatibility challenge. Traditional web frameworks like Flask, Express, or SpringBoot are designed to run on persistent HTTP servers, not event-driven, stateless environments like AWS Lambda. The AWS Lambda Web Adapter bridges this gap by allowing teams to run web server-based applications inside Lambda with minimal changes.

Read Post

Datadog

Read more about Monitor Lambda-hosted web apps with the Lambda Web Adapter integration

How Payconiq Centralized Monitoring and Enabled Real-Time Insights with Elastic

Jul 17, 2025 By Elastic In Elastic

Yannick Boulleys, Head of Platform at Payconiq, shares how Elastic helped the company consolidate fragmented monitoring tools into a single platform. With real-time user monitoring, built-in anomaly detection, and GenAI-powered root cause analysis, Elastic has transformed how Payconiq manages system visibility, consumer behavior, and cost efficiency, without requiring deep technical expertise.

View Video

Elastic

Read more about How Payconiq Centralized Monitoring and Enabled Real-Time Insights with Elastic

Stuck in the Past - Network Edition

Jul 17, 2025 By Kentik In Kentik

Ever feel like your network is punishing you for sticking to old ways? Get a look at what happens when network teams rely on outdated tools. Watch as our "experts" face the consequences of ignoring real-time insights, manual correlation, and constant guessing games.

View Video

Kentik

Read more about Stuck in the Past - Network Edition

Network Intelligence: The Future Is Now

Jul 17, 2025 By Kentik In Kentik

The future of your network isn't a distant dream—it's here.

View Video

Kentik

Read more about Network Intelligence: The Future Is Now

Kubernetes Observability with OpenTelemetry | A Complete Setup Guide

Jul 17, 2025 By Elizabeth Mathew In SigNoz

Kubernetes provides a wealth of telemetry data from container metrics and application traces to cluster events and logs. OpenTelemetry offers a vendor-neutral, end-to-end solution for collecting and exporting this telemetry in a standardised format.

Read Post

SigNoz

Read more about Kubernetes Observability with OpenTelemetry | A Complete Setup Guide

Unlock Deeper Insights: Introducing GitLab Event Integration with Mezmo

Jul 17, 2025 By Mezmo In Mezmo

Following the popularity of our existing GitHub integration, we’ve extended similar capabilities to GitLab users. You can now ingest GitLab events directly into Mezmo Telemetry Pipelines and route them to any destination. This provides a powerful new way to monitor, alert, and react to activity within your GitLab repositories.

Read Post

Mezmo

Read more about Unlock Deeper Insights: Introducing GitLab Event Integration with Mezmo

Modern Redux Debugging: Common Bugs and Solutions in 2024-2025

Jul 17, 2025 By Todd H. Gardner In TrackJS

Redux remains a cornerstone of React state management, but developers continue to encounter persistent bugs and new challenges. State mutation errors remain the most common Redux bug, affecting over 70% of Redux applications, while new issues emerge with Redux Toolkit 2.0, TypeScript integration, and React 18/19 compatibility. This comprehensive guide explores the most prevalent Redux debugging challenges and provides practical solutions for modern development.

Read Post

TrackJS

Read more about Modern Redux Debugging: Common Bugs and Solutions in 2024-2025

SLA vs SLO vs SLI - Examples, tips, challenges, and key differences

Jul 16, 2025 By Leo Baecker In Hyperping

Service Level Agreements (SLAs), Service Level Objectives (SLOs), and Service Level Indicators (SLIs) form the backbone of reliable service delivery. Understanding how these three elements work together helps you build trust with users, maintain service quality, and create accountability across your organization.

Read Post

Hyperping

Read more about SLA vs SLO vs SLI - Examples, tips, challenges, and key differences

Here's how you can build site templates for Oh Dear

Jul 16, 2025 By Sean White In Oh Dear

When you're managing a handful of client sites, setting things up manually is fine. Though if you're managing dozens of them, you're going to think twice about your approach. For agencies, development teams and platforms who are responsible for loads of websites, having to repeat the same configuration over and over is not only inefficient but also more prone to errors. That’s where this blog post comes in handy.

Read Post

Oh Dear

Read more about Here's how you can build site templates for Oh Dear

Kibana Logs: Advanced Query Patterns and Visualization Techniques

Jul 16, 2025 By Anjali Udasi In Last9

Kibana gives you a structured way to explore log data indexed in Elasticsearch. With the right queries and visualizations, you can identify anomalies, debug issues more quickly, and track trends across services. This blog covers practical ways to query logs using Kibana’s Lucene and KQL syntax, build visualizations that surface meaningful signals, and set up dashboards for ongoing log-based monitoring.

Read Post

Last9

Read more about Kibana Logs: Advanced Query Patterns and Visualization Techniques

Enable Kong Gateway Tracing in 5 Minutes

Jul 16, 2025 By Anjali Udasi In Last9

Kong Gateway is a popular API gateway that sits at the edge of your infrastructure, routing and shaping traffic across microservices. It’s fast, pluggable, and battle-tested, but for many teams, it remains a black box. You might have OpenTelemetry set up across your application stack. Traces flow from your app servers, databases, and third-party APIs. But the moment a request enters through Kong, observability drops off.

Read Post

Last9

Read more about Enable Kong Gateway Tracing in 5 Minutes

Build Log Automation with Last9's Query API

Jul 16, 2025 By Prathamesh Sonpatki In Last9

Manual log investigation is one of those engineering tasks that quietly drains hours without offering much real value. You're debugging an incident. Monitoring shows elevated error rates. Now begins the familiar drill: It’s a tedious cycle, and it doesn’t scale. The whole process breaks down when you’re trying to automate incident response, run continuous security monitoring, or generate compliance reports.

Read Post

Last9

Read more about Build Log Automation with Last9's Query API

Choosing the right OpenTelemetry Collector distribution

Jul 16, 2025 By Juliano Costa In Datadog

The OpenTelemetry (OTel) Collector plays a central role in collecting, processing, and exporting telemetry data. If you’re deploying the Collector in production, chances are you’ve reached for the otelcol-contrib distribution. It’s the easiest, most flexible, and most documented distribution, used in nearly every demo and getting-started guide. But here’s the catch: It’s not actually recommended for production use.

Read Post

Datadog

Read more about Choosing the right OpenTelemetry Collector distribution

Challenges in AIOps and how to sail through them

Jul 16, 2025 By Swaminathan J In eG Innovations

AIOps (Artificial Intelligence for IT Operations) is not only a game changer, but the need of the hour as modern IT grows and becomes increasingly complex. The promises of AIOps are both overwhelming and tantalizing. AI-powered monitoring and observability can help predict issues, automatically resolve incidents, and optimize performance across the IT infrastructure. However, onboarding an AIOps monitoring tool can be more complicated than it sounds on paper.

Read Post

eG Innovations

Read more about Challenges in AIOps and how to sail through them

Honeycomb In Your IDE? Yes, With Hosted MCP Now Available in AWS Marketplace AI Agents and Tools Category

Jul 16, 2025 By Austin Parker In Honeycomb

I’m pleased to announce the public beta of Honeycomb Hosted MCP, along with our first wave of one-click integrations for Cursor, Visual Studio Code, and Claude Desktop. We’re also very excited to announce that Hosted MCP is available on AWS AI Agents marketplace and for all Honeycomb plans (including our free plan!) at no charge. Honeycomb was built with a singular focus: how do we help teams become better at the art and craft of software development, delivery, and operations?

Read Post

Honeycomb

Read more about Honeycomb In Your IDE? Yes, With Hosted MCP Now Available in AWS Marketplace AI Agents and Tools Category

Key Early Considerations Before Big Architecture or Technology Decisions

Jul 16, 2025 By Sarah Morgan In Scout

‍In this final part, the Scout team continues our talk with Freedom Dumalo, former CTO at Flexcar and current CTO at Vestmark. We discuss some essential questions about architecture, touch on Rails, Turbo, and Stimulus, and the key considerations for those starting off before they lock in an architecture or tech decision. ‍ By the way, before we jump in, Scout Error Monitoring is coming!

Read Post

Scout

Read more about Key Early Considerations Before Big Architecture or Technology Decisions

Snowflake data visualization: all the latest features to monitor metrics, enhance security, and more

Jul 16, 2025 By Kristin Knapp In Grafana

In 2020, we introduced the Snowflake Enterprise data source for Grafana, allowing users to seamlessly pull data from the Snowflake cloud-based data storage and analytics service into Grafana dashboards. Available for Grafana Enterprise and Grafana Cloud users, it’s a powerful way to not only query and visualize Snowlake data, but to do so alongside other data sources, so you can discover correlations and other meaningful insights within minutes.

Read Post

Grafana

Read more about Snowflake data visualization: all the latest features to monitor metrics, enhance security, and more

Your AI Strategy Is Failing in the Seams

Jul 16, 2025 By Yann Guernion In Broadcom

There’s a certain comfort in the glow of your network operations center (NOC) dashboards. For some time, the sign of a well-run NOC was that sprawling bank of screens, each dedicated to a different domain. One for the WAN, showing link status. Another for the data center, tracking backbone health. A third for cloud consumption, pulling metrics from your provider. Each screen is a neatly bordered kingdom, diligently monitored by its own set of tools. As long as the lights are green, all is well.

Read Post

Broadcom

Read more about Your AI Strategy Is Failing in the Seams

What is RemoteJS

Jul 16, 2025 By TrackJS In TrackJS

Discover how to debug JavaScript applications remotely without complex configurations or cables! This comprehensive explainer shows you how RemoteJS revolutionizes web development debugging by connecting you directly to remote browser sessions.

View Video

TrackJS

Read more about What is RemoteJS

Announcing SystemEDGE 6.5

Jul 16, 2025 By Abhinav Shroff In Broadcom

We are pleased to announce the general availability of SystemEDGE 6.5. For customers using DX NetOps, SystemEDGE is a key component for gaining a comprehensive view of server infrastructure health. It functions as an agent that resides on systems like physical servers or virtual machines. SystemEDGE collects fundamental performance and status information and delivers reports via SNMP.

Read Post

Broadcom

Read more about Announcing SystemEDGE 6.5

LogicMonitor in Hybrid Environments: Observability with Edwin AI powered by AWS

Jul 16, 2025 By LogicMonitor In LogicMonitor

As enterprises scale in complexity, the infrastructure landscape is no longer just cloud or on-premises, it’s both. Hybrid is the new normal and it’s here to stay. And with that shift comes a new demand: a unified, scalable observability solution that works across the entire tech stack, from legacy hardware to cloud-native workloads. That’s where LogicMonitor comes in.

Read Post

LogicMonitor

Read more about LogicMonitor in Hybrid Environments: Observability with Edwin AI powered by AWS

Missing container-layer metadata: Why it happens and what you can do

Jul 16, 2025 By Stephanie Wei In Datadog

Container image layers provide valuable insight into what goes into a container, including which packages were installed, what commands were run, and where vulnerabilities might live. The metadata associated with these image layers is essential for debugging, optimizing image size, and managing security risks. However, key container-layer metadata fields such as digest, size, and created_by are sometimes missing, which can disrupt important tasks.

Read Post

Datadog

Read more about Missing container-layer metadata: Why it happens and what you can do

How Tech Careers Happen by Accident | TechPod Insights

Jul 16, 2025 By solarwindsinc In SolarWinds

Ever stumble into something amazing? That’s how many people end up in tech — by following curiosity. In this TechPod snippet, we talk about the unexpected paths into technology, security, and education.

View Video

SolarWinds

Read more about How Tech Careers Happen by Accident | TechPod Insights

ITRS named in Gartner Magic Quadrant for Observability Platforms

Jul 16, 2025 By Uptrends In Uptrends

When Uptrends became part of ITRS, we knew we were joining a team deeply committed to innovation, precision, and people — whether those people were troubleshooting transaction journeys from their laptops at 8am or keeping enterprise-scale operations online 24x7. We’ve come far since then.

Read Post

Uptrends

Read more about ITRS named in Gartner Magic Quadrant for Observability Platforms

Digitate and Tata Chemicals collaborate on spend intelligence solutions to power enterprise-grade procurement

Jul 15, 2025 By Digitate In Digitate

AI-powered spend Intelligence solution advances intelligent procurement with near real-time spend classification, optimization, and monitoring.

Read Post

Digitate

Read more about Digitate and Tata Chemicals collaborate on spend intelligence solutions to power enterprise-grade procurement

ScienceLogic Named a Visionary in the 2025 Gartner Magic Quadrant for Observability Platforms

Jul 15, 2025 By ScienceLogic In ScienceLogic

It’s official: ScienceLogic has entered the observability arena. Named a Visionary in the 2025 Gartner Magic Quadrant for Observability Platforms, we believe we’re helping define where observability is heading, not just where it’s been. This marks our first inclusion in this Magic Quadrant and, in our opinion, validates our mission to redefine intelligent, actionable observability in the era of AI and automation.

Read Post

ScienceLogic

Read more about ScienceLogic Named a Visionary in the 2025 Gartner Magic Quadrant for Observability Platforms

IT Task Automation: Best Practices and Use Cases for IT Management with Pandora FMS

Jul 15, 2025 By Ahinóam Rodríguez In Pandora FMS

IT teams must handle a large number of tasks on a daily basis. Many of these tasks, while essential, are repetitive: resetting passwords, rebooting servers, monitoring logs for errors, applying patches… When performed manually, they can overwhelm technical staff and compromise operational efficiency. IT automation has emerged as the answer to this challenge. It involves using scripts and specialized tools to automatically execute these and other tasks that previously required human intervention.

Read Post

Pandora FMS

Read more about IT Task Automation: Best Practices and Use Cases for IT Management with Pandora FMS

Kubernetes Monitoring backend 2.2: better cluster observability through new alert and recording rules

Jul 15, 2025 By Serena Kei In Grafana

We’re excited to announce version 2.2.0 of the backend for our Kubernetes Monitoring solution in Grafana Cloud is now available. The app’s backend is supported by kubernetes-mixin, an open source Prometheus Monitoring Mixin, and this latest version features significant improvements to alert rules and recording rules that will enhance your cluster observability and monitoring experience. There’s a lot to tell you about, so let’s dive in.

Read Post

Grafana

Read more about Kubernetes Monitoring backend 2.2: better cluster observability through new alert and recording rules

A look back at DASH 2025

Jul 15, 2025 By Claire Laurence In Datadog

DASH 2025 brought the Datadog community together like never before. During our biggest event yet, thousands of attendees gathered at the North Javits Center in New York City for two and a half days of content, learning, and community, where they deepened their knowledge and connected with peers. Here's a quick look back at some of the highlights from this year's DASH.

Read Post

Datadog

Read more about A look back at DASH 2025

Proactively troubleshoot with synthetic testing and distributed tracing

Jul 15, 2025 By Addie Beach In Datadog

As your application grows in complexity, identifying the root cause of issues becomes increasingly difficult. Many monitoring strategies make this even harder by siloing frontend and backend data. To effectively troubleshoot problems that spread across your app, you need visibility not just into each part of your stack, but also into how these parts interact.

Read Post

Datadog

Read more about Proactively troubleshoot with synthetic testing and distributed tracing

Monitor agents built on Amazon Bedrock with Datadog LLM Observability

Jul 15, 2025 By Barry Eom In Datadog

As large language models (LLMs) grow more powerful, organizations are deploying agentic AI applications to tackle complex, multi-step tasks. With Amazon Bedrock Agents, developers can orchestrate these agents to manage tasks such as triggering serverless functions, calling APIs, accessing knowledge bases, and maintaining contextual conversations—all while breaking down complex user requests or tasks into manageable steps.

Read Post

Datadog

Read more about Monitor agents built on Amazon Bedrock with Datadog LLM Observability

Smarter Workflows, Faster Insights: How InfluxDB 3 Unlocks the Power of Python at the Source

Jul 15, 2025 By Allyson Boate In InfluxData

Businesses across industries rely on time-stamped data to track system health, monitor performance, and improve operations. Whether it’s sensors on a factory floor or usage logs from a SaaS platform, time series data reveals how things change. As businesses digitize operations and add connected devices, sensors produce growing streams of time-based data. This opens the door to faster analytics and smarter automation. But legacy approaches can’t keep up.

Read Post

InfluxData

Read more about Smarter Workflows, Faster Insights: How InfluxDB 3 Unlocks the Power of Python at the Source

If your site is slow, it might as well be down.

Jul 15, 2025 By Catchpoint In Catchpoint

It’s no longer enough for a site to just be available; it had to be fast. If the experience lags, your customers will bounce within seconds. The consequences scale fast: business stops and revenue disappears. You need to monitor performance across the full delivery chain because speed is what keeps users engaged.

View Video

Catchpoint

Read more about If your site is slow, it might as well be down.

From Reactive to Proactive: A User-Centric Digital Strategy for Banks

Jul 15, 2025 By Catchpoint In Catchpoint

In today's digital-centric banking environment, financial institutions must be able to provide seamless and reliable application performance across all digital channels - from a branch to a mobile device. Failure to do so results in real impact to customer satisfaction, trust, and loyalty. Modern banking applications are increasingly complex, running off of internet-centric distributed architectures involving many different parties and services. For these modern tech frameworks, traditional APM tools are no longer sufficient to ensure service reliability and optimal customer experience.

View Video

Catchpoint

Read more about From Reactive to Proactive: A User-Centric Digital Strategy for Banks

Cloudflare's Resolver Outage: More Than Just DNS

Jul 15, 2025 By Catchpoint Team In Catchpoint

“It’s always DNS.” That’s the running joke in IT. When websites won’t load and apps grind to a halt, DNS—the internet’s address book—is often the first to get blamed. That’s because DNS translates human-friendly names like google.com into IP addresses that computers use to route traffic.

Read Post

Catchpoint

Read more about Cloudflare's Resolver Outage: More Than Just DNS

How to Troubleshoot Outages Faster Using Elastic Observability [2 Min Live Demo]

Jul 15, 2025 By Elastic In Elastic

In this video, I’ll show you how Elastic Observability helps you reduce downtime, accelerate root cause analysis, and unify logs, metrics, and traces in one powerful dashboard. With native OpenTelemetry support, AI-powered troubleshooting, and built-in anomaly detection, you can streamline your workflows and boost service reliability.

View Video

Elastic

Read more about How to Troubleshoot Outages Faster Using Elastic Observability [2 Min Live Demo]

Atatus APM: Full-Stack Visibility for Modern Engineering Teams 2025

Jul 15, 2025 By Pavithra Parthiban In Atatus

APM stands for Application Performance Monitoring or Application Performance Management. It helps engineering teams track key metrics, detect slowdowns, and improve the overall performance of their applications. With Atatus APM, you get complete visibility into your application, from backend code and databases to external services and frontend performance.

Read Post

Atatus

Read more about Atatus APM: Full-Stack Visibility for Modern Engineering Teams 2025

Real-Time Alerting for AI-Optimized Data Centers

Jul 15, 2025 By Phil Gervasi In Kentik

Kentik transforms real-time network telemetry into actionable alerts for AI-optimized data centers. By converting database queries into custom alerts, engineers can detect issues like elephant flows, idle links, and packet loss before performance suffers and triggers alerts in systems like ServiceNow or PagerDuty.

Read Post

Kentik

Read more about Real-Time Alerting for AI-Optimized Data Centers

Identifying Idle Paths in a Data Center Leaf-Spine Fabric

Jul 15, 2025 By Kentik In Kentik

In a perfect leaf-spine network, traffic evenly spreads across all links. But reality is often different, leaving costly, idle paths hidden in your data center fabric. Kentik's Phil Gervasi demonstrates how Kentik's network intelligence platform helps engineers quickly identify and address these underutilized paths. With powerful visualizations, detailed telemetry analysis, and customizable alerts integrated into your ticketing systems, Kentik makes it easy to spot persistent traffic imbalances, troubleshoot ECMP issues, and optimize your infrastructure.

View Video

Kentik

Read more about Identifying Idle Paths in a Data Center Leaf-Spine Fabric

Arie's Adventures with Coroot

Jul 15, 2025 By Arie Van Den Heuvel In Coroot

Arie van den Heuvel is an engineer, a System and Application Management Specialist, and a valued member of our community. Below he has shared his journey using Coroot, and how it has helped improve observability for his team. You can read more of Arie’s writing and support the resource articles he has created for open source on his blog.

Read Post

Coroot

Read more about Arie's Adventures with Coroot

Jaeger Metrics: Internal Operations and Service Performance Monitoring

Jul 15, 2025 By Faiz Shaikh In Last9

You're monitoring a microservices-based system. Alerts trigger when response times exceed 2 seconds. But when you open Jaeger, you're faced with thousands of traces. Identifying which service or operation is responsible becomes time-consuming. Jaeger metrics help reduce this friction by exposing aggregated telemetry. Instead of scanning individual traces, you get service-level and operation-level performance metrics, latency, throughput, and error rates that highlight where the issue lies.

Read Post

Last9

Read more about Jaeger Metrics: Internal Operations and Service Performance Monitoring

Splunk Named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms

Jul 15, 2025 By Dayna Lord In Splunk

We are proud to announce that Splunk has been named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms for the third year in a row. In our opinion, our recognition in the Observability category comes on the heels of Splunk being recognized for a tenth consecutive time as a Leader in the 2024 Gartner Magic Quadrant for Security Information and Event Management (SIEM). Splunk was the only vendor named a Leader in both SIEM and Observability for the Gartner Magic Quadrant three times.

Read Post

Splunk

Read more about Splunk Named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms

Quantifying the True Cost of Healthcare IT Downtime

Jul 15, 2025 By LogicMonitor In LogicMonitor

In today’s hospitals, technology is woven into every touchpoint of patient care. Nurses check vitals through digital monitors. Physicians review test results in the EHR. Medications get ordered, verified, and delivered through a network of connected systems. But when even one link in that chain fails, the impact isn’t just inconvenient—it’s dangerous. Downtime doesn’t just slow operations.

Read Post

LogicMonitor

Read more about Quantifying the True Cost of Healthcare IT Downtime

Introducing Coralogix's MCP Server: Helping customers build smarter AI agents

Jul 15, 2025 By Liran Hason In Coralogix

Now available: Secure, real-time access to your observability data via Coralogix’s Model Context Protocol (MCP) Server. AI agents are only as powerful as the context they’re given. Today, we’re excited to announce the launch of the Coralogix MCP Server, which enables third-party AI agents to connect directly to your observability data across production, staging, and other environments.

Read Post

Coralogix

Read more about Introducing Coralogix's MCP Server: Helping customers build smarter AI agents

Getting Started Guide with Netdata

Jul 15, 2025 By Netdata In netdata

New to Netdata? Start here. In this quick and practical guide, we’ll help you get set up and confident with Netdata in just a few minutes. You’ll learn how to: Access your Netdata Space Connect your nodes—servers, VMs, containers, network devices, and more Organize your infrastructure with Spaces and Rooms Collaborate with your team in real time Explore alerting and integrations Customize notifications so you’re only alerted when it truly matters.

View Video

netdata

Read more about Getting Started Guide with Netdata

Is Your OpenSearch Cluster Getting Expensive or Hard to Manage? Try Logz.io Optimization Tool

Jul 15, 2025 By Logz.io In logz.io

Logz.io’s OpenSearch Optimization Tool is a free, open-source CLI utility that gives you fast, actionable insights into your cluster’s performance, cost, and configuration.

View Video

logz.io

Read more about Is Your OpenSearch Cluster Getting Expensive or Hard to Manage? Try Logz.io Optimization Tool

Keep SQL Servers healthy with this handy maintenance checklist

Jul 15, 2025 By ManageEngine

A quick-read e-book with practical SQL Server checklists, tips, and fixes. Perfect for busy DBAs.

Get EBook

ManageEngine

Read more about Keep SQL Servers healthy with this handy maintenance checklist

Grok 4 Sets Records - But I'm Focused on Microsoft's 9% Sales Growth

Jul 14, 2025 By Teneo In Teneo

The recent launch of Grok 4 has set the AI community buzzing. With an impressive score of 73 on TLDR’s AI benchmark, Grok 4 edges ahead of OpenAI’s O3 and Google’s Gemini 2.5 Pro, both scoring 70. Elon and the X AI team deserve praise for this breakthrough, reinforcing Grok 4’s status as potentially the most powerful LLM yet.

Read Post

Teneo

Read more about Grok 4 Sets Records - But I'm Focused on Microsoft's 9% Sales Growth

Why MSPs Can't Afford to Ignore DNS Monitoring

Jul 14, 2025 By DNS Spy In DNS Spy

Most MSPs don’t think much about DNS—until something breaks. A record is deleted, an MX entry is misconfigured, or a zone is out of sync. Suddenly, your client’s email is bouncing, their site is down, and your phone is ringing. The problem? DNS issues are easy to miss. They don’t always trigger alerts, logs, or tickets. But when they surface, you’re the one your client calls first.

Read Post

DNS Spy

Read more about Why MSPs Can't Afford to Ignore DNS Monitoring

How to analyze Core Web Vitals in Grafana Cloud Frontend Observability

Jul 14, 2025 By Bukola Ayodele In Grafana

One of the biggest challenges in frontend development is understanding how users actually experience your application. Slow load times, layout shifts, and a slow response to user interactions can quietly degrade the user experience if they go unnoticed. This is where Grafana Cloud Frontend Observability comes in. Frontend Observability is a hosted service for real user monitoring (RUM) that provides immediate, clear, and actionable insights into the end user experience of web applications.

Read Post

Grafana

Read more about How to analyze Core Web Vitals in Grafana Cloud Frontend Observability

LDI Connect Steps Up Microsoft 365 and Teams Managed Services with Martello's Vantage DX

Jul 14, 2025 By Sara Purdon In Martello Technologies

LDI Connect is taking its managed IT services to the next level by adding Martello’s Vantage DX platform to its toolkit. This move is all about giving clients a smoother, more reliable Microsoft 365 and Teams experience … and it couldn’t come at a better time. By integrating Vantage DX, LDI Connect can now proactively monitor clients’ Microsoft environments.

Read Post

Martello Technologies

Read more about LDI Connect Steps Up Microsoft 365 and Teams Managed Services with Martello's Vantage DX

How to Get Grafana Iframe Embedding Right

Jul 14, 2025 By Anjali Udasi In Last9

Adding Grafana dashboards directly into your app lets users see monitoring data without switching tabs or tools. Using an iframe to embed Grafana does work, but it brings along some tricky authentication and security issues that aren’t always obvious at first. In this blog, we’ll go over the practical ways to embed Grafana dashboards from easy public snapshots to secure, private dashboards that need authentication.

Read Post

Last9

Read more about How to Get Grafana Iframe Embedding Right

Optimize LangChain Performance with Trace Analytics

Jul 14, 2025 By Anjali Udasi In Last9

You’ve instrumented your LangChain app, and traces are now flowing into Last9. Now the issues are visible: API costs are crossing $200/day, average response times exceed 3 seconds, and performance degrades under 100 concurrent users. A single tool call adds over 2 seconds. Bloated context windows are pushing up token usage, wasting $50/day. Here’s how to use trace data to identify and fix these inefficiencies, systematically and at scale.

Read Post

Last9

Read more about Optimize LangChain Performance with Trace Analytics

Generating end-to-end tests with AI and Playwright MCP

Jul 14, 2025 By Stefan Judis In Checkly

When I started using Playwright, there was a single command that blew me away. I immediately became (and still am) a huge Playwright Codegen fanboy. Playwright's codegen command opens up a browser window, and whatever you do in this window will be recorded. Navigating URLs, clicking links, and filling out form elements—the Playwright inspector records all your actions and generates a Playwright test for you. Magic!

Read Post

Checkly

Read more about Generating end-to-end tests with AI and Playwright MCP

The Fast Path to More Useful Telemetry

Jul 14, 2025 By Bernardo Guerreiro In Honeycomb

Over and over, we’ve seen that teams who invest in adding rich, relevant context to their telemetry end up debugging faster and collaborating more effectively during incidents. Getting meaningful context added can feel like a big cross-team project, but some of the highest-leverage improvements don’t require app code changes or coordination across services.

Read Post

Honeycomb

Read more about The Fast Path to More Useful Telemetry

Observability as Code: Why You Should You Use OaC

Jul 14, 2025 By Caitlin Halla In Splunk

Key takeaways In the fast-moving world of CI/CD pipelines, microservice architectures, and container orchestration, software changes rapidly. What exists in a codebase today might be gone next week. At this scale and speed, it’s impossible for development teams to manually track every line of code and every new piece of functionality.

Read Post

Splunk

Read more about Observability as Code: Why You Should You Use OaC

Uptrace v2.0: The Future of Observability is Here

Jul 14, 2025 By Vladimir Mihailenco In Uptrace

The Uptrace team is thrilled to announce the release of v2.0—our biggest update yet! This release represents a complete reimagining of how observability data should be stored, queried, and managed. With multi-project support, revolutionary JSON-based storage, powerful data transformations, and a host of developer-friendly features, Uptrace v2.0 is designed to scale with your growing infrastructure needs.

Read Post

Uptrace

Read more about Uptrace v2.0: The Future of Observability is Here

Microsoft SCOM Management Pack Housekeeping in Secured, Offline, or Air-Gapped Environments

Jul 14, 2025 By NiCE IT Management Solutions In NiCE IT Mgmt

MP Catalog Offline Toolkit by NiCE | 20min Walkthrough Struggling with Management Pack updates in restricted environments? Discover how the MP Catalog Offline Toolkit by NiCE simplifies SCOM MP management—without the need for an internet connection. Watch the 20-minute walkthrough now and see how this free tool helps your SCOM team stay compliant, efficient, and secure Download it now on GitHub – absolutely free, from the experts at NiCE.

View Video

NiCE IT Mgmt

Read more about Microsoft SCOM Management Pack Housekeeping in Secured, Offline, or Air-Gapped Environments

Best on-call scheduling tools in 2025 [10 reviewed]

Jul 13, 2025 By Leo Baecker In Hyperping

Managing developer on-call rotations and escalations isn't just about who gets woken up at 2 a.m. — it's about ensuring reliability, minimizing downtime, and scaling operational excellence. With so many tools out there, choosing the right on-call solution can be tough. We've analyzed 10 of the most trusted on-call scheduling platforms in 2025 — comparing usability, pricing, integrations, automation, and support — to help you choose the best tool for your engineering or DevOps team.

Read Post

Hyperping

Read more about Best on-call scheduling tools in 2025 [10 reviewed]

ManageEngine is recognized as a Strong Performer in 2025 Gartner Peer Insights Voice of the Customer for Digital Experience Monitoring

Jul 11, 2025 By Shree Harish S B In ManageEngine

We're thrilled to announce that we have been recognized as a Strong Performer in the 2025 Gartner Peer Insights Voice of the Customer for Digital Experience Monitoring (DEM). We think that this recognition is a result of direct customer feedback on their experience with our solutions, underscoring the trust and value users associate with our solutions.

Read Post

ManageEngine

Read more about ManageEngine is recognized as a Strong Performer in 2025 Gartner Peer Insights Voice of the Customer for Digital Experience Monitoring

Enhancing authentication security: Inside Microsoft's open source contribution to Grafana

Jul 11, 2025 By Trevor Jones In Grafana

When Microsoft engineers went looking for a modern visualization platform to help track critical signals and make quicker decisions, Grafana emerged as the clear favorite. But there was just one hitch: the available authentication methods didn’t quite meet their needs.

Read Post

Grafana

Read more about Enhancing authentication security: Inside Microsoft's open source contribution to Grafana

Is Your Network Automation Strategy Already Obsolete?

Jul 11, 2025 By Yann Guernion In Broadcom

You know the feeling. It’s that familiar rhythm of playing defense, racing from one network fire to the next. The alerts pile up, users report slowdowns, and your team of brilliant engineers spends its days tracing packets instead of focusing on the future. For years, automation has been the answer. You’ve built scripts and workflows to handle repetitive tasks, which has certainly helped.

Read Post

Broadcom

Read more about Is Your Network Automation Strategy Already Obsolete?

Introducing DX NetOps Topology: What It Provides, How It Works

Jul 11, 2025 By Sandeep Tiwary In Broadcom

Networks aren’t what they used to be. While your network operations teams still have legacy equipment to manage, they’re also contending with the expanded reliance on software-defined networking (SDN), hybrid and multi-cloud architectures, private clouds, and more. These environments are anything but static. They’re sprawling, dynamic, and evolving faster than ever—which means that establishing and retaining visibility and control is more challenging than ever.

Read Post

Broadcom

Read more about Introducing DX NetOps Topology: What It Provides, How It Works

Here's the proof: What the fastest sites on the web have in common

Jul 11, 2025 By Piril Kavlak In Catchpoint

60% of Gen Z won’t engage with a slow-loading website. In today’s digital economy, that’s a deal-breaker. Whether it’s a banking portal, a travel app, or an AI-powered SaaS platform, users expect performance. Instant loading, global reliability, and smooth interactivity aren’t just nice to have—they define the winners.

Read Post

Catchpoint

Read more about Here's the proof: What the fastest sites on the web have in common

Elasticsearch with Python: A Detailed Guide to Search and Analytics

Jul 11, 2025 By Anjali Udasi In Last9

If you’re using Python for search, log aggregation, or analytics, you’ve probably worked with Elasticsearch. It’s fast, scalable, and fairly complex once you go beyond the basics. The official Python client gives you raw access to Elasticsearch’s REST API. But getting it to work the way you want, especially under load, can be tricky. This blog walks through practical ways to index, query, and monitor Elasticsearch from Python code, without getting lost in the docs.

Read Post

Last9

Read more about Elasticsearch with Python: A Detailed Guide to Search and Analytics

Get started with Grafana Alerting: Create and receive your first alert

Jul 11, 2025 By Grafana In Grafana

In this tutorial, we walk you through the process of setting up your first alert in just a few minutes. Don't miss the rest of the "Get started with Grafana Alerting" series! Each part dives into a different feature to help you get the most out of alerting in Grafana.

View Video

Grafana

Read more about Get started with Grafana Alerting: Create and receive your first alert

Datadog vs Jaeger - Features, Pricing & Use Cases [Updated for 2025]

Jul 10, 2025 By Ankit Anand In SigNoz

Datadog and Jaeger are both leading tools in the observability space, but they represent two fundamentally different philosophies. Datadog is a commercial, all-in-one SaaS platform that unifies metrics, traces, and logs. Jaeger is a popular, open-source project focused specifically on distributed tracing. Choosing between them isn't just a technical decision; it's about balancing the convenience of a fully managed, integrated platform against the power and control of a self-hosted, specialized tool.

Read Post

SigNoz

Read more about Datadog vs Jaeger - Features, Pricing & Use Cases [Updated for 2025]

Why APM Is Essential for Microservices Architecture?

Jul 10, 2025 By Mohana Ayeswariya J In Atatus

According to Statista, over 85% of large enterprises and nearly 50% of small to midsize businesses will have adopted microservices as part of their software architecture. The shift is clear: organizations of all sizes are moving away from monolithic applications toward microservices to accelerate development cycles, improve scalability, and support continuous delivery. But this architectural freedom comes with a hidden cost, which increases operational complexity.

Read Post

Atatus

Read more about Why APM Is Essential for Microservices Architecture?

Show your work: Prove your MSP value

Jul 10, 2025 By Sara Purdon In Martello Technologies

It’s one of those unspoken facts of business that the better something is managed, the less people notice. Everything just glides along. That’s a double-edged sword for managed service providers. On the one hand, you want to give clients a frictionless experience. On the other, you want them to know that’s what they’re getting — and how much you put in to deliver it.

Read Post

Martello Technologies

Read more about Show your work: Prove your MSP value

When Do You Really Need SNMP Device Monitoring?

Jul 10, 2025 By Andrii Kernitskyi In Obkio

In the world of network monitoring, SNMP is the tried-and-true protocol that’s been helping IT teams monitor device health for decades. Whether you're managing switches, routers, firewalls, or access points, SNMP Device Monitoring remains one of the most widely used methods for tracking device performance and status. At Obkio, one of the most common questions we hear from prospects is: “Do you offer SNMP device monitoring?” The short answer: Yes, absolutely.

Read Post

Obkio

Read more about When Do You Really Need SNMP Device Monitoring?

Want to hear your users' complaints? There's a widget for that (now available on mobile)

Jul 10, 2025 By Steve Zegalia In Sentry

A disappearing “Submit” button. A modal stuck half-offscreen. It's not a crash or a performance regression. Just broken UX. Frustrating enough to make users rage-quit or leave a 1-star review. Error and performance monitoring catch the technical stuff: crashes, bottlenecks, slow APIs. But they won’t tell you when a layout breaks, or a UI flow subtly unravels after a redesign.

Read Post

Sentry

Read more about Want to hear your users' complaints? There's a widget for that (now available on mobile)

What Is Hybrid Observability? A Healthcare IT Explainer

Jul 10, 2025 By LogicMonitor In LogicMonitor

Healthcare IT environments have become incredibly complex. Think about everything running simultaneously in your organization: physical medical devices, cloud platforms, clinical applications like Epic, and patient-facing applications. Each component needs to work together seamlessly, much like how ICU monitors track multiple vital signs at once. Many healthcare organizations still use monitoring solutions designed for simpler times, when systems were more isolated.

Read Post

LogicMonitor

Read more about What Is Hybrid Observability? A Healthcare IT Explainer

Grafana Labs named a Leader again in the 2025 Gartner Magic Quadrant for Observability Platforms

Jul 10, 2025 By Jen Villa In Grafana

We’re thrilled to share that Grafana Labs has been recognized as a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms—for the second year in a row. This year’s report placed Grafana Labs furthest in “Completeness of Vision,” which we believe reflects our deep commitment to building a truly open, composable observability stack that gives users flexibility, control, and the tools to own their observability strategy.

Read Post

Grafana

Read more about Grafana Labs named a Leader again in the 2025 Gartner Magic Quadrant for Observability Platforms

Elastic named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms

Jul 10, 2025 By Natalie Blake In Elastic

Observability has an investigation problem, and dashboards and alerts aren’t enough for solving problems in today’s complex systems. AI-driven capabilities, powerful analytics, and the ability to scale are essential to drive real-time investigations while keeping costs low. We think this is why Elastic has been named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms for the second time.

Read Post

Elastic

Read more about Elastic named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms

Beyond Metrics: How We Reimagined Incident Response with RUM

Jul 10, 2025 By Datadog In Datadog

When your monitoring tools and logs tell you everything's fine, but users can't access critical healthcare services, where do you look? Our team discovered that Real User Monitoring (RUM) isn't just for tracking page load times and user journeys – it's a powerful incident response tool that can uncover issues traditional monitoring misses entirely.

View Video

Datadog

Read more about Beyond Metrics: How We Reimagined Incident Response with RUM

Account Details and User Tutorial

Jul 10, 2025 By Uptime Website Monitoring In uptime

In this video, we will discuss the Account Details and User configuration.

View Video

uptime

Monitoring

Read more about Account Details and User Tutorial

Introducing the Hyperping Intercom Integration: Reduce Support Tickets with Proactive Status Communication

Jul 10, 2025 By Leo Baecker In Hyperping

"Is our API down?" "Why can't I access the dashboard?" "Are you having server problems?" When incidents happen, support teams face a familiar nightmare: tickets flood in faster than you can respond. Your team scrambles to check system status and respond to dozens of identical questions while engineering focuses on fixing the actual problem.

Read Post

Hyperping

Read more about Introducing the Hyperping Intercom Integration: Reduce Support Tickets with Proactive Status Communication

Top 3 reporting tools for Microsoft Teams: SquaredUp, Power BI & M365 Admin Center

Jul 10, 2025 By Louise Berry In Squared Up

Microsoft Teams is a ubiquitous presence in workplaces all over the world. Prior to 2020, its usage was relatively moderate, with around 20 million users. However, global restrictions during the pandemic led to a 3,500% growth. Teams is now so central to business operations that Microsoft retired Skype in its favor. But this massive scale created a new problem – businesses needed better ways to monitor and report on their Teams usage.

Read Post

Squared Up

Read more about Top 3 reporting tools for Microsoft Teams: SquaredUp, Power BI & M365 Admin Center

Getting started with ElasticSearch dashboards

Jul 10, 2025 By John Hayes In Squared Up

ElasticSearch is one of the IT and software industry’s most established platforms for storing and analyzing log data. As its name suggests it also has a powerful search and analytics engine based on the ElasticSearch Query language. ElasticSearch itself is essentially a backend store, so if you want to explore and analyze your data, you will need a visualization layer such as SquaredUp and our ElasticSearch PlugIn.

Read Post

Squared Up

Read more about Getting started with ElasticSearch dashboards

Visibility Is the First Line of Defense: Operational Readiness in a Zero Trust World

Jul 10, 2025 By ScienceLogic In ScienceLogic

As global cyber threats continue to evolve at unprecedented speed, the United States public sector faces growing pressure to enhance operational readiness. Agencies must now contend with adversaries who are not only well-funded but also increasingly sophisticated in their ability to exploit visibility gaps. In the face of this dynamic threat landscape, the Zero Trust Architecture (ZTA) model has become an essential security framework.

Read Post

ScienceLogic

Read more about Visibility Is the First Line of Defense: Operational Readiness in a Zero Trust World

How We Made Our Queries 99.5% Faster

Jul 10, 2025 By Anushka Karmakar In SigNoz

We cut log-query scanning from ~100% of data blocks to < 1% by reorganizing how logs are stored in ClickHouse. Instead of relying on bloom-filter skip indexes, they generate a deterministic “resource fingerprint” (hash of cluster + namespace + pod, etc.) for every log source and sort the table by this fingerprint in the primary-key ORDER BY clause. This packs logs from the same pod/service contiguously, letting ClickHouse’s sparse primary-key index skip irrelevant blocks.

Read Post

SigNoz

Read more about How We Made Our Queries 99.5% Faster

How to improve your observability

Jul 10, 2025 By Coroot In Coroot

Coroot was designed to solve the problem of time-consuming root cause analysis. It handles the full observability journey - from collecting telemetry automatically with zero code setup (thanks, eBPF!) to simplifying the role of SREs and DevOps everywhere with instant root cause analysis powered by AI. We also strongly believe that simple observability should be an innovation everyone can afford to benefit from: which is why our software is open source!

View Video

Coroot

Read more about How to improve your observability

What is a Data Lake, Data Warehouse, and a Data Lakehouse? (Learn the difference)

Jul 10, 2025 By Coroot In Coroot

Altinity, Inc. Developer Advocate Josh Lee walks us from an '80s IBM to a present day where columnar formats like and querying tools like Iceberg are often used to manage data.

View Video

Coroot

Read more about What is a Data Lake, Data Warehouse, and a Data Lakehouse? (Learn the difference)

Cloud Log Management: A Developer's Guide to Scalable Observability

Jul 10, 2025 By Anjali Udasi In Last9

As systems move to microservices, serverless, and multi-cloud setups, debugging gets harder. You’re no longer dealing with a single log file; you’re looking at logs from dozens of services, running across different environments. Traditional debugging methods like SSH-ing into servers or adding print statements don’t scale in these environments. Cloud log management tools help by collecting logs from all your services into one place.

Read Post

Last9

Read more about Cloud Log Management: A Developer's Guide to Scalable Observability

What is Log Loss and Cross-Entropy

Jul 10, 2025 By Faiz Shaikh In Last9

You're building a classification model, and your framework throws around terms like "log loss" and "cross-entropy loss." Are they the same thing? When should you use binary cross-entropy versus categorical cross-entropy? What about focal loss? This blog breaks down these loss functions with practical examples and real-world implementations.

Read Post

Last9

Read more about What is Log Loss and Cross-Entropy

Datadog named Leader in 2025 Gartner Magic Quadrant for Observability Platforms

Jul 10, 2025 By Yanbing Li In Datadog

We are thrilled to announce that, for the fifth consecutive year, Datadog has been named a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms. We believe that this recognition reflects our continued focus on helping customers observe, secure, and act on everything that matters across their technology stack.

Read Post

Datadog

Read more about Datadog named Leader in 2025 Gartner Magic Quadrant for Observability Platforms

What Are Traces? A Developer's Guide to Distributed Tracing

Jul 10, 2025 By Rox Williams In Honeycomb

One of the most common challenges in modern software engineering today is understanding how requests flow through applications. As system architectures shift to favor widely distributed, cloud-native designs, keeping track of how an application processes user actions is more difficult than ever. A single user action may trigger events processed in dozens of backend services. Traces are helping software developers today with this challenge.

Read Post

Honeycomb

Read more about What Are Traces? A Developer's Guide to Distributed Tracing

The Inconvenient Truth About AI Ethics in Observability

Jul 10, 2025 By Mezmo In Mezmo

Let's be honest: most conversations about AI ethics sound like they're happening in a boardroom, not an ops room. But here's the thing, when you're using AI to make sense of your telemetry data, ethics isn't some abstract concept. It's the difference between insights you can trust and algorithmic noise that leads you down the wrong path. The uncomfortable reality? Your AI is only as ethical as the messiest, most biased piece of telemetry data you feed it. And if you think your data is clean, well...

Read Post

Mezmo

Read more about The Inconvenient Truth About AI Ethics in Observability

Grafana Labs is a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms

Jul 10, 2025 By Grafana In Grafana

For the second year in a row, Grafana Labs has been named a Leader in the Gartner Magic Quadrant for Observability Platforms — and this year, we’re proud to be recognized as the furthest in Completeness of Vision. In this video, Grafana Labs CTO Tom Wilkie shares what this recognition means, why our scores for execution and vision both improved, and how it reflects years of building a truly open, composable observability stack.

View Video

Grafana

Read more about Grafana Labs is a Leader in the 2025 Gartner Magic Quadrant for Observability Platforms

Coralogix | Magic Quadrant 2025

Jul 10, 2025 By Ariel Assaraf In Coralogix

Today marks an exciting moment for all of us at Coralogix. We’re proud to share that Gartner has named us a Visionary in the 2025 Magic Quadrant for Observability Platforms. This recognition, we believe, reflects what we’ve been building toward for years: an observability platform that delivers scale, cost-efficiency, AI-powered insights, and tangible customer success.

Read Post

Coralogix

Read more about Coralogix | Magic Quadrant 2025

Here's how to add business data to logs from retail endpoints | Datadog Tips & Tricks

Jul 10, 2025 By Datadog In Datadog

Some sources simply do not generate data-rich logs. Retail endpoints that are older or run on proprietary services, for example, very often produce logs without the kinds of data that are needed to perform useful business analytics. So, what can you do?

View Video

Datadog

Read more about Here's how to add business data to logs from retail endpoints | Datadog Tips & Tricks

Check in with Checkly: H1 2025 Features, Fixes & What's Next

Jul 10, 2025 By Checkly In Checkly

Dan Giordano and Alberto Gomez walk through the new features released in the first half of 2025.

View Video

Checkly

Read more about Check in with Checkly: H1 2025 Features, Fixes & What's Next

Practical guide to implement and succeed in configuration and change management

Jul 9, 2025 By akash.mj@zohocorp.com In ManageEngine

In an era where networks are the arteries of every enterprise, ensuring they run smoothly is non-negotiable. From small branch offices to sprawling data centers, a single misstep in configuration can trigger costly downtime, security breaches, and compliance headaches.

Read Post

ManageEngine

Read more about Practical guide to implement and succeed in configuration and change management

OpenTelemetry Collector: A Complete Guide [2025]

Jul 9, 2025 By Ankit Anand In SigNoz

The OpenTelemetry Collector is a stand-alone service that acts as a powerful, vendor-neutral pipeline for your telemetry data. It can receive, process, and export logs, metrics, and traces, giving you full control over your observability data before it reaches a backend. This guide will provide a comprehensive overview of the OpenTelemetry Collector, its architecture, deployment patterns, and how to configure it for production use.

Read Post

SigNoz

Read more about OpenTelemetry Collector: A Complete Guide [2025]

Kubernetes Monitoring 101: 25 Tools And Must-Know Tips

Jul 9, 2025 By CloudZero In CloudZero

The Kubernetes platform is the standard for orchestrating containerized applications. It’s ideal for large applications running on distributed instances. However, monitoring Kubernetes infrastructure can be notoriously challenging. This guide will cover Kubernetes monitoring in more detail, including what metrics to track to improve visibility and control over your K8s containers, apps, microservices, etc.

Read Post

CloudZero

Read more about Kubernetes Monitoring 101: 25 Tools And Must-Know Tips

Not all monitoring sees what your users are seeing.

Jul 9, 2025 By Catchpoint In Catchpoint

APM tools are great, but they have blind spots; they do not monitor from where your users actually are. There’s a gap between lab-perfect APM tests and real-world experience. There’s a lot that can degrade performance between your cloud environment and your users. If you’re not monitoring that path, you’re missing critical context.

View Video

Catchpoint

Monitoring

Read more about Not all monitoring sees what your users are seeing.

Observability's Moneyball Moment: How AI Is Changing the Game (Not Ending It)

Jul 9, 2025 By Mezmo In Mezmo

‍ We're not witnessing the end of observability, we're witnessing its evolution into something far more powerful. The observability industry is having its Moneyball moment. Just like Billy Beane revolutionized baseball by using data analytics to compete with teams that had vastly larger budgets, observability is undergoing a fundamental transformation.

Read Post

Mezmo

Read more about Observability's Moneyball Moment: How AI Is Changing the Game (Not Ending It)

Monitoring AWS billing costs with AWS tags

Jul 9, 2025 By Babu Sundaram In eG Innovations

Today, I’ll be covering how AWS tags can help you keep track of and monitor your AWS billing costs with the granularity and depth needed to reduce and optimize your AWS costs.

Read Post

eG Innovations

Read more about Monitoring AWS billing costs with AWS tags

Honeycomb Users Are Living in the Future, Part 1: Sampling

Jul 9, 2025 By Irving Popovetsky In Honeycomb

When we talk to new Honeycomb users, a few things stand out as sounding downright magical. Sometimes we’ll hear, “Wow, is that a new feature?” and we’ll say that no, it’s been like that for years. Clearly we need to get the word out! This is the first installment of a blog series I’ll be writing, covering areas of Honeycomb that elicit reactions of awe and disbelief from new users.

Read Post

Honeycomb

Read more about Honeycomb Users Are Living in the Future, Part 1: Sampling

Lumigo Launches AI Agent Observability

Jul 9, 2025 By Orr Weinstein In Lumigo

LLM-powered agents are reshaping software, but when they fail, troubleshooting is guesswork. Lumigo’s new AI Agent Observability, now in beta, gives you visibility into the entire lifecycle of your agents, from prompt to response to internal decision logic. Built for modern AI workloads, this feature is designed to help engineers monitor, debug, and optimize agents running on platforms like OpenAI, Anthropic, and open-source models.

Read Post

Lumigo

Read more about Lumigo Launches AI Agent Observability

The one where Ed and Sydnee talk all about AI

Jul 9, 2025 By Cribl In Cribl

Join Ed Bailey, Principal Technical Evangelist and Sydnee Mayers, Sr. Staff Product Manager as they chat all about AI!

View Video

Cribl

Read more about The one where Ed and Sydnee talk all about AI

Top 5 MSP takeaways from the 2025 IT Trends Report

Jul 9, 2025 By Rebecca Grassing In Auvik

Earlier this year, Auvik released our annual IT Trends Report, spotlighting some of the key changes for network management, MSP, and IT practitioners. We know the market and its ups and downs can have a huge impact on the success of MSPs, so we’re bringing you a roll-up of key statistics and findings related to MSP specifically. Read on to see what we found.

Read Post

Auvik

Read more about Top 5 MSP takeaways from the 2025 IT Trends Report

How to Get Logs from Docker Containers

Jul 9, 2025 By Preeti Dewani In Last9

When a container misbehaves, logs are the first place to look. Whether you're debugging a crash, tracking API errors, or verifying app behavior—docker logs gives you direct access to what's happening inside. This blog covers the full workflow: how to retrieve logs, filter them by time or service, and set up logging for production environments.

Read Post

Last9

Read more about How to Get Logs from Docker Containers

Troubleshooting LangChain/LangGraph Traces: Common Issues and Fixes

Jul 9, 2025 By Anjali Udasi In Last9

We’ve covered how to get LangChain traces up and running. But even when everything’s instrumented, traces can still go missing, show up half-broken, or look nothing like what you expected. This guide is about what happens after setup, when traces exist, but something’s off.

Read Post

Last9

Read more about Troubleshooting LangChain/LangGraph Traces: Common Issues and Fixes

Troubleshoot root causes with GitHub commit and ownership data in Error Tracking

Jul 9, 2025 By Tarun Kothandaraman In Datadog

When an error occurs, developers need to act quickly. But too often, they’re left searching through stack traces without enough context to understand what happened, who owns the code, or what change may have introduced the issue. This slows down triage, creates inefficient handoffs, and takes time away from building new features.

Read Post

Datadog

Read more about Troubleshoot root causes with GitHub commit and ownership data in Error Tracking

Monitor your LiteLLM AI proxy with Datadog

Jul 9, 2025 By Barry Eom In Datadog

As organizations rapidly scale their use of large language models (LLMs), many teams are adopting LiteLLM to simplify access to a diverse set of LLM providers and models. LiteLLM provides a unified interface through both an SDK and proxy to speed up development, centralize control, and optimize LLM-powered workflows. But introducing a proxy layer adds abstraction, making it harder to understand how requests are processed.

Read Post

Datadog

Read more about Monitor your LiteLLM AI proxy with Datadog

Understanding data lineage

Jul 9, 2025 By Aaron Kaplan In Datadog

Data lineage is the evolutionary history of datasets. More concretely, lineage is metadata that captures the flow and transformation of data in data pipelines, also called the data lifecycle.

Read Post

Datadog

Read more about Understanding data lineage

Reduce your mean time to repair with the Datadog mobile app

Jul 9, 2025 By Nancy Zhu In Datadog

For on-call engineers responding to alerts, every minute counts. Faster incident response means faster mitigation, reduced downtime, and better customer experience. But even the most finely tuned, meticulously detailed alerts can leave responders scrambling for more information. In order to effectively triage and investigate incidents and set remediation in motion, responders need data to help them contextualize alerts.

Read Post

Datadog

Read more about Reduce your mean time to repair with the Datadog mobile app

How to turn logs into metrics with Grafana Loki (Loki Community Call July 2025)

Jul 9, 2025 By Grafana In Grafana

Cyril Tovena shows us how to turn logs into metrics with Grafana Loki using metric queries in LogQL. What do you do when all you have are logs, but you want to count them, aggregate them, or parse them for numbers you want to graph? Well, there's a query for that! Cyril is joined by Jay Clifford and Nicole van der Hoeven to discuss everything you need to know about metric queries and how to use them to get numbers out of Loki.

View Video

Grafana

Read more about How to turn logs into metrics with Grafana Loki (Loki Community Call July 2025)

Grafana Learning Journeys: Send metrics to Grafana Cloud using Prometheus remote write

Jul 9, 2025 By Grafana In Grafana

This video is part of the Grafana Learning Journeys, and it walks you through how to send metrics to Grafana Cloud using Prometheus remote write.

View Video

Grafana

Read more about Grafana Learning Journeys: Send metrics to Grafana Cloud using Prometheus remote write

Dashboard Sharing - The Hard Way

Jul 9, 2025 By Eric Lippmann In Icinga

Unlike menu items, dashboards in Icinga Web 2 currently can’t be shared across users. This is something we will implement in future versions, but for now users can only create dashboards for themselves. We don’t have an exact timeline for the dashboard sharing feature yet and our roadmap is already pretty packed for this year, so we won’t be tackling this until later next year.

Read Post

Icinga

Read more about Dashboard Sharing - The Hard Way

Introducing the InfluxDB 3 MCP Server: Natural Language for Time Series

Jul 9, 2025 By Gary Fowler In InfluxData

Time series data underpins all real-time systems. From high-resolution telemetry to long-range trends, it’s essential for monitoring, automation, predictive maintenance, and operational insight. But it’s also hard to work with: high cardinality, shifting schemas, and time-based queries make even basic tasks feel heavy.

Read Post

InfluxData

Read more about Introducing the InfluxDB 3 MCP Server: Natural Language for Time Series

What You Actually Need to Monitor AI Systems in Production

Jul 9, 2025 By Rahul Chhabria In Sentry

You did it. You added the latest AI agent into your product. Shipped it. Went to sleep. Woke up to find it returning a blank string, taking five seconds longer than yesterday, or confidently outputting lies in perfect JSON. Naturally, you check your logs. You see a prompt. You see a response. And you see nothing helpful. Surprise. Prompt in and response out is not observability. It is vibes.

Read Post

Sentry

Read more about What You Actually Need to Monitor AI Systems in Production

Observability for containerized workloads: How to run Grafana Beyla as a sidecar in Amazon ECS

Jul 9, 2025 By Matt Wimpelberg In Grafana

Note: Grafana Beyla has been donated to OpenTelemetry under the new project name OpenTelemetry eBPF Instrumentation. Beyla will continue to exist as Grafana Labs’ distribution of the upstream project. Grafana Beyla is an open source eBPF-based auto-instrumentation tool that helps you easily get started with application observability, allowing you to monitor and visualize traces without modifying the application code.

Read Post

Grafana

Read more about Observability for containerized workloads: How to run Grafana Beyla as a sidecar in Amazon ECS

How OutboundSync Improved Transparency with StatusGator

Jul 9, 2025 By Colin Bartlett In StatusGator

OutboundSync, a powerful platform that helps marketers sync outbound sales data to CRMs like HubSpot and Salesforce, knows that transparency is key component of delivering a service that hundreds of teams rely on. And as an integration platform, OutboundSync is deeply reliant on other providers, making vendor reliability a key part of their own transparency.

Read Post

StatusGator

Read more about How OutboundSync Improved Transparency with StatusGator

Getting started with VMware dashboards

Jul 9, 2025 By Sameer Mhaisekar In Squared Up

VMware is a leading platform for virtualization and cloud infrastructure, widely used to manage compute, storage, and networking resources across on-premises and hybrid environments. While it offers powerful capabilities and extensive telemetry through tools like vCenter, navigating this data can be overwhelming – especially when trying to spot performance issues, capacity trends, or VM sprawl in real time. That’s where a solution like SquaredUp can make a significant difference.

Read Post

Squared Up

Read more about Getting started with VMware dashboards

Investigating High Partition Load in Honeycomb

Jul 9, 2025 By Honeycomb In Honeycomb

Here, Pierre Tessier shows how he looks into partition load in Honeycomb's distributed datastore in production.

View Video

Honeycomb

Read more about Investigating High Partition Load in Honeycomb

Customizing your Azure DevOps DORA metrics dashboard

Jul 9, 2025 By SquaredUp In Squared Up

Looking to configure and customize a DORA metrics dashboard? Our Director of Engineering Services, Tim Wheeler, demonstrates how to customize the DORA Metrics dashboard in Azure DevOps for SquaredUp. He shows how to populate key metrics like deployment frequency and change failure rate by selecting a pipeline, specifically the Squared Up multi-stage pipeline.

View Video

Squared Up

Read more about Customizing your Azure DevOps DORA metrics dashboard

Notes from the Field: Seamless SSO 404s Impacting Citrix on Windows Server 2025

Jul 8, 2025 By Marc van der Veer In GripMatix

As a Citrix consultant, not every issue I troubleshoot is directly tied to Citrix, but many of them dramatically impact the end-user experience. This is one of those cases. A customer had begun testing Windows Server 2025 as Multi-Session hosts in their environment. The new servers were domain-joined and fully patched, and they expected a smooth experience with Office 365, Entra ID–backed apps, and cloud-based authentication. Everything had worked flawlessly on Server 2022.

Read Post

GripMatix

Read more about Notes from the Field: Seamless SSO 404s Impacting Citrix on Windows Server 2025

How to Block an External Attack with FortiGate and Progress Flowmon ADS

Jul 8, 2025 By Jiri Knapek In Flowmon

It’s a question we hear often - how do we use the Progress Flowmon solution to block an attack? Flowmon is not an inline appliance that stands in the path of inbound traffic, so we partner with third-party vendors who supply equipment such as firewalls or unified security gateways. In this post, we’re going to show you how to instruct Fortinet’s firewall FortiGate via Flowmon ADS to block traffic in response to a detected anomaly or attack.

Read Post

Flowmon

Read more about How to Block an External Attack with FortiGate and Progress Flowmon ADS

The Real Business Value of Time Series Database

Jul 8, 2025 By Allyson Boate In InfluxData

Time series data powers nearly every modern system, from industrial equipment and energy grids to financial platforms and digital services. Devices and software continuously generate streams of time-stamped metrics that reflect how systems perform moment to moment. Most businesses collect this data, but far fewer utilize its full potential. Storing information and reviewing dashboards offers limited value.

Read Post

InfluxData

Read more about The Real Business Value of Time Series Database

Ensure the availability of critical services with the Opslogix Core Windows Service Management Pack

Jul 8, 2025 By Jonas Lenntun In OpsLogix

Ensure the availability of critical services with the Opslogix Core Windows Service Management Pack In a typical SCOM environment, a lot of the Management Packs are designed to monitor services tied to a specific technology, such as SQL Server, IIS, or the Windows operating system itself. But what about services that don’t belong to any particular application but are essential across all servers?

Read Post

OpsLogix

Read more about Ensure the availability of critical services with the Opslogix Core Windows Service Management Pack

Enforce configuration standards with the Opslogix Compliance Management Pack

Jul 8, 2025 By Jonas Lenntun In OpsLogix

Enforce configuration standards with the Opslogix Compliance Management Pack Maintaining compliance is not just a matter of policy, it is a matter of operational stability and security. But with so many moving parts, configuration drift is almost inevitable. The Opslogix Compliance Management Pack helps identify these deviations by continuously verifying key system configurations and alerting when they fall out of alignment.

Read Post

OpsLogix

Read more about Enforce configuration standards with the Opslogix Compliance Management Pack

From Weeks to Hours: How Technical Teams Are Driving Fast ROI

Jul 8, 2025 By ScienceLogic In ScienceLogic

Speed is no longer a luxury in IT operations—it’s a requirement. When systems falter, alerts spike, or new services go live, time becomes the most valuable resource. And yet, many IT teams are still shackled to tools and processes that take weeks—or months—to show measurable value. The question technical leaders increasingly ask is: How fast can we get value? Not just dashboards. Not just data.

Read Post

ScienceLogic

Read more about From Weeks to Hours: How Technical Teams Are Driving Fast ROI

Improve Consistency Across Signals with OTel Semantic Conventions

Jul 8, 2025 By Anjali Udasi In Last9

It’s 2 AM. Your API is timing out. Logs show a slow query. Metrics flag a spike in DB connections. Traces reveal a 5-second delay on a database call. But then the questions start:- Which database?- Does the query match the delay?- Why doesn’t this align with the connection pool metrics? Each tool uses different labels, db.name, database, sometimes nothing at all. Without a shared schema, connecting the dots is slow and frustrating.

Read Post

Last9

Read more about Improve Consistency Across Signals with OTel Semantic Conventions

How Replicas Work in Kubernetes

Jul 8, 2025 By Faiz Shaikh In Last9

Replicas in Kubernetes control how many copies of your pods run simultaneously. They're the foundation of scaling, availability, and recovery in your cluster. When you're running a stateless API or a background worker, understanding how replicas work directly impacts your application's reliability and performance. This blog walks through replica management, from basic concepts to production monitoring patterns that help you maintain healthy, scalable applications.

Read Post

Last9

Read more about How Replicas Work in Kubernetes

Introducing Grafana Learning Journeys | Grafana Labs

Jul 8, 2025 By Grafana In Grafana

Grafana Learning Journeys are a guided, step-by-step way to learn and grow with Grafana! In this quick video, learn why Grafana Labs created Grafana Learning Journeys, how it's different from a tutorial or a Grot Guide, and how it can help you become a Grafana expert.

View Video

Grafana

Read more about Introducing Grafana Learning Journeys | Grafana Labs

See System Logs Alongside your Metrics Using Loki, Grafana, and Graphite

Jul 8, 2025 By MetricFire In MetricFire

In this quick demo, we show how you can transform logs collected by Grafana Loki into actionable Graphite metrics using MetricFire. Watch as we convert structured logs into performance insights. Perfect for teams looking to bridge the gap between logging and monitoring. This workflow helps you move beyond basic log storage and turn raw logs into meaningful metrics for alerts, dashboards, and capacity planning.

View Video

MetricFire

Read more about See System Logs Alongside your Metrics Using Loki, Grafana, and Graphite

An open-source SDK for finding dead code

Jul 8, 2025 By Noah Martin In Sentry

Writing code is easier than ever. We want to make deleting code just as easy – introducing Reaper for iOS and Android. Reaper was an Emerge Tools product that helped companies like Duolingo delete 1% of their iOS codebase. And just like with Emerge Tools’ Launch Booster, we’re making Reaper open-source for anyone to use. In this post, we’ll explain what Reaper is, why you should care about dead code, and how Reaper works on both platforms.

Read Post

Sentry

Read more about An open-source SDK for finding dead code

How a Fortune 500 Company Eliminated 93% of IT Incidents in 72 Hours

Jul 8, 2025 By LogicMonitor In LogicMonitor

Sometimes the biggest transformations begin with what sounds like the worst possible news. One day, this Fortune 500 technology company’s observability platform was running smoothly. The next, they learned their critical monitoring solution would be discontinued as part of a corporate buyout. For a leading global IT vendor in data infrastructure serving customers across storage, cloud, and managed services, this was a potential catastrophe.

Read Post

LogicMonitor

Read more about How a Fortune 500 Company Eliminated 93% of IT Incidents in 72 Hours

Observability in under 5 seconds: Reflecting on a year of grafana/otel-lgtm

Jul 8, 2025 By Gregor Zeitlinger In Grafana

With grafana/otel-lgtm, observability is just one Docker command away. Over the past year, grafana/otel-lgtm has simplified observability setups, helping developers get a complete OpenTelemetry stack running in under five seconds. With integrations for metrics, logs, traces, and now profiles via Grafana Pyroscope, it has become a go-to solution for demos, development, and testing, as evidenced by its growing community (1k stars on GitHub and growing!) and notable adopters.

Read Post

Grafana

Read more about Observability in under 5 seconds: Reflecting on a year of grafana/otel-lgtm

Introducing Selector AI

Jul 8, 2025 By Selector In Selector

View Video

Selector

Read more about Introducing Selector AI

How They Handle 44 Million Searches a Day...Without Breaking! | Rightmove and Elastic

Jul 8, 2025 By Elastic In Elastic

Rightmove, the UK's number one property search, and buying and selling platform has trusted Elastic for more than 11 years. Hear Andrei Nicusan, Principal Engineer at Rightmove on why Elastic has been Rightmove's number one Search and Observability solution for more than a decade. And now with the move to Elastic Cloud and Google Cloud Platform, you can find out how Rightmove are taking advantage of reductions in their infrastructure overheads too!

View Video

Elastic

Read more about How They Handle 44 Million Searches a Day...Without Breaking! | Rightmove and Elastic

Introduction to Kafka Scaling Challenges

Jul 8, 2025 By Adam Podracky In meshIQ

Apache Kafka has become the go-to platform for organizations handling high-throughput, real-time data streaming. Its ability to manage massive data volumes while ensuring reliability is second to none. However, as businesses grow and demand for data increases, scaling Kafka isn’t always a walk in the park. It often comes with its own set of challenges that can throw even the most seasoned teams for a loop.

Read Post

meshIQ

Read more about Introduction to Kafka Scaling Challenges

We now support Google Chat

Jul 8, 2025 By Freek Van der Herten In Oh Dear

I'm pleased to share that we've can now notify you via Google Chat. Here's what that looks like: Our Google Chat notifications include: You can read more on how to set up Google Chat notifications in our docs. Of course, we also offer numerous other channels to notify you when something is wrong with your site. I'm pleased to share that we've can now notify you via Google Chat.

Read Post

Oh Dear

Read more about We now support Google Chat

Introducing MetricFire Logging: Visualize Logs Alongside Metrics

Jul 8, 2025 By Benjamin Pitts In MetricFire

As modern infrastructure grows more dynamic and distributed, collecting logs alongside metrics becomes a critical part of any observability strategy. To make this easy and powerful, MetricFire now supports a direct logging pipeline using Grafana Loki. This allows you to forward system logs from your servers to Hosted Graphite's Loki backend and visualize them in your Hosted Grafana dashboards with full control over queries, filtering, and alerting.

Read Post

MetricFire

Read more about Introducing MetricFire Logging: Visualize Logs Alongside Metrics

The Defense-in-Depth Approach To Application Monitoring

Jul 8, 2025 By Dan Giordano In Checkly

In cybersecurity, defense-in-depth is a fundamental principle – you never rely on a single security measure to protect your systems. The same philosophy applies to application monitoring. No single monitoring approach, no matter how sophisticated, can capture every possible failure mode of your application. This is why layered monitoring isn't just a best practice – it's essential risk mitigation.

Read Post

Checkly

Read more about The Defense-in-Depth Approach To Application Monitoring

Announcing Checkly Uptime Monitors: Simple, Scalable, and Built for Developers

Jul 8, 2025 By Checkly In Checkly

When Checkly launched, it was the first of its kind, enabling developers to monitor complex workflows easier than ever using the automation tooling (Playwright, Terraform, etc) they already knew and loved. We’ve helped detect and resolve issues for 1000s of companies—ranging from monitoring crucial log-ins, to purchasing products, to setting up client instances for millions of monthly users But what about the simpler stuff?

Read Post

Checkly

Read more about Announcing Checkly Uptime Monitors: Simple, Scalable, and Built for Developers

Global API downtime increases by 60% in 2025, new data shows

Jul 8, 2025 By Uptrends In Uptrends

London, 8 July 2025: Global API downtime increased by 60% in Q1 2025 compared to Q1 2024, shows new data from web service monitoring provider Uptrends, part of ITRS’ comprehensive observability platform. The State of API Reliability 2025 report — based on over 2 billion API monitoring checks across 20 industries in Q1 2024 and Q1 2025 — reveals a year-on-year drop in average API uptime from 99.66% to 99.46%, representing a decline of 0.2%.

Read Post

Uptrends

Read more about Global API downtime increases by 60% in 2025, new data shows

Coralogix Expands AWS Partnership to Deliver AI-Driven Observability and Edge Threat Detection

Jul 8, 2025 By Coralogix Team In Coralogix

Coralogix is proud to announce a new phase in its partnership with AWS through a Strategic Collaboration Agreement (SCA) focused on bringing AI-powered observability and security to the enterprise. At the heart of this collaboration is Amazon Bedrock, AWS’s managed service for foundation models.

Read Post

Coralogix

Read more about Coralogix Expands AWS Partnership to Deliver AI-Driven Observability and Edge Threat Detection

Introduction to Apache Kafka Scaling Challenges

Jul 8, 2025 By Adam Podracky In meshIQ

Read Post

meshIQ

Read more about Introduction to Apache Kafka Scaling Challenges

Bringing Intelligence and Automation Together to Change the Shape of Work

Jul 8, 2025 By Dr. Maitreya Natu In Digitate

The aspirational target state for a cognitive system is to “take responsibility” for a domain (e.g., an autonomous car). To reach that level of sophistication, the system must achieve high levels of maturity simultaneously along two dimensions: Reasoning ability and Automation ability.

Read Post

Digitate

Read more about Bringing Intelligence and Automation Together to Change the Shape of Work

Comparing The Top 9 Datadog Alternatives and Competitors in 2025

Jul 7, 2025 By Ankit Anand In SigNoz

The rising costs and complexities of monitoring cloud infrastructure are pushing many organizations to explore alternatives to Datadog. With monthly bills sometimes reaching thousands of dollars and feature sets that can be overwhelming, teams are looking for practical, cost-effective solutions that better fit their needs.

Read Post

SigNoz

Read more about Comparing The Top 9 Datadog Alternatives and Competitors in 2025

The design process of the TrackJS redesign #coding #programming #webdesign #debugging #css

Jul 7, 2025 By TrackJS In TrackJS

Eric from @Trackjs talks through the design process of the TrackJS 2025 reskin project.

View Video

TrackJS

Read more about The design process of the TrackJS redesign #coding #programming #webdesign #debugging #css

From chaos to clarity with Grafana dashboards: How video game company EA monitors 200+ metrics

Jul 7, 2025 By Joey Bartolomeo In Grafana

To be a successful gamer, you have to think strategically and creatively. Working as a software engineer at Electronic Arts (EA), a top video game company, requires the same skills. That’s especially true when it comes to monitoring the EA app, which is the launcher for EA games and used by hundreds of millions of people around the world.

Read Post

Grafana

Read more about From chaos to clarity with Grafana dashboards: How video game company EA monitors 200+ metrics

Recipes for automating Oh Dear

Jul 7, 2025 By Sean White In Oh Dear

Today we are releasing oh-dear-api-examples a brand-new open-source repository on GitHub. It curates bite-size scripts, helpers and ideas for helping you bulk-manage your Oh Dear account via the API.

Read Post

Oh Dear

Read more about Recipes for automating Oh Dear

Instrument LangChain and LangGraph Apps with OpenTelemetry

Jul 7, 2025 By Anjali Udasi In Last9

In our previous blog, we talked about how LangChain and LangGraph help structure your agent’s behavior. But structure isn’t the same as visibility. This one’s about fixing that. Not with more logs. Not with generic dashboards. You need to see what your agent did, step by step, tool by tool, so you can understand how a simple query turned into a long, expensive run.

Read Post

Last9

Read more about Instrument LangChain and LangGraph Apps with OpenTelemetry

Prometheus Group By Label: Advanced Aggregation Techniques for Monitoring

Jul 7, 2025 By Faiz Shaikh In Last9

Your Prometheus dashboard shows 847 CPU metrics. The alert fired—but is the problem in us-east or us-west? You're trying to rule out whether that new feature caused a latency spike, but the sheer number of time series isn’t helping. Grouping can make this manageable. By organizing metrics by shared label values, you can quickly spot which service or region is behaving differently, without digging through every metric.

Read Post

Last9

Read more about Prometheus Group By Label: Advanced Aggregation Techniques for Monitoring

Best Network Monitoring Tools of 2025

Jul 7, 2025 By Zoe Collins In OnPage

Keeping tabs on your network has never been more important. Whether you’re running a small business or managing infrastructure across cloud environments, visibility into what’s happening behind the scenes is essential. But visibility alone isn’t enough…when something breaks, the IT engineer needs to know immediately, so they can take action and resolve critical issues.

Read Post

OnPage

Read more about Best Network Monitoring Tools of 2025

Getting Started with AI Agent Monitoring From Sentry

Jul 7, 2025 By Sentry In Sentry

Sentry has released AI Agent monitoring, and in this video you can see the fast path to getting started with it using the Vercel AI SDK and Anthropic Claude. AI Agent Monitoring uses tracing to let you see details around how AI interactions are happening inside your application. You can see the back and forth conversation flow, token usage, model usage, durations, and much more. Agent Monitoring is out now, take it for a spin, let us know what you think in Discord!

View Video

Sentry

Read more about Getting Started with AI Agent Monitoring From Sentry

How to Simplify AI Observability Across Hybrid and Cloud Environments

Jul 7, 2025 By LogicMonitor In LogicMonitor

As companies adopt more artificial intelligence (AI) to stay competitive and simplify operations, they’re hitting a snag they’ve seen plenty of times before: complexity. Those user-friendly chatbots and impressive predictive models aren’t magic—they run on powerful GPUs like NVIDIA’s and rely on cloud services such as Azure OpenAI or Amazon SageMaker.

Read Post

LogicMonitor

Read more about How to Simplify AI Observability Across Hybrid and Cloud Environments

What's New At Catchpoint? Episode 1

Jul 7, 2025 By Catchpoint In Catchpoint

Find out what’s new at Catchpoint this month, including improvements to Stack Map and Mobile RUM. For even more information, check out the following links.

View Video

Catchpoint

Monitoring

Read more about What's New At Catchpoint? Episode 1

WhatsUp Gold 2025.0 Release Overview

Jul 7, 2025 By Progress WhatsUp Gold In WhatsUp Gold

Watch this video to learn about the features included in version 2025.0 of WhatsUp Gold.

View Video

WhatsUp Gold

Read more about WhatsUp Gold 2025.0 Release Overview

Automated Certificate Discovery and Monitoring | WhatsUp Gold 2025.0

Jul 7, 2025 By Progress WhatsUp Gold In WhatsUp Gold

Watch this video to learn how you can discover and monitor your SSL certificates using the new features available in WhatsUp Gold version 25.0.

View Video

WhatsUp Gold

Read more about Automated Certificate Discovery and Monitoring | WhatsUp Gold 2025.0

Get structured visibility across network devices with device templates

Jul 7, 2025 By ManageEngine Site24x7 In Site24x7

Manually mapping object identifiers (OIDs) for every network device? Struggling to make sense of hundreds of SNMP metrics? Site24x7’s device templates give you a smarter, more scalable way to monitor routers, switches, firewalls, and more—without manual guesswork. In this video, we’ll walk you through how to use device templates in Site24x7 to get actionable insights into your network performance.

View Video

Site24x7

Monitoring

Read more about Get structured visibility across network devices with device templates

How to troubleshoot Kubernetes issues using Events | Site24x7 Kubernetes Monitoring

Jul 7, 2025 By ManageEngine Site24x7 In Site24x7

Troubleshooting Kubernetes just got easier. In this video, we walk you through how to use Kubernetes Events in Site24x7 to quickly detect, analyze, and resolve issues like CrashLoopBackOff, ImagePullBackOff, Evicted pods, and more without the guesswork. Learn how to: With Site24x7 Kubernetes Monitoring, you get full observability—right down to every critical event in your cluster.

View Video

Site24x7

Read more about How to troubleshoot Kubernetes issues using Events | Site24x7 Kubernetes Monitoring

How to troubleshoot Windows server monitoring challenges

Jul 7, 2025 By ManageEngine Site24x7 In Site24x7

In this detailed walkthrough, we solve 5 common issues faced right after installing the Site24x7 server monitoring agent.

View Video

Site24x7

Read more about How to troubleshoot Windows server monitoring challenges

Running #playwright Tests in Multiple Environments with Checkly. #sdet #devops

Jul 6, 2025 By Checkly In Checkly

Learn how to efficiently run Playwright tests across different environments without rewriting them. This tutorial covers managing environment variables in Checkly for API and browser checks, handling global and group-specific settings, and integrating with CI/CD processes. Discover the best practices for setting up environment variables, duplicating test groups, and customizing alerts to ensure your checks are environment-specific.

View Video

Checkly

Read more about Running #playwright Tests in Multiple Environments with Checkly. #sdet #devops

Sponsored Post

The Agentic Network: How AI Agents Are Transforming Infrastructure from Liability to Living Intelligence

Jul 5, 2025 By Andrew Mallaband In Fabrix

Modern enterprises depend on networks that are increasingly complex, dynamic, and opaque. Yet, instead of confronting this complexity head-on, most organizations fall into the trap of superficial control, layering more monitoring tools atop their stack in hopes of achieving resilience. In reality, this only fragments visibility, deepens operational silos, and leaves a crucial layer of the digital enterprise, the network, under-managed and misunderstood.

Read Post

Fabrix

Read more about The Agentic Network: How AI Agents Are Transforming Infrastructure from Liability to Living Intelligence

What is UDP Packet Loss & How to Monitor It

Jul 5, 2025 By Alyssa Lamberti In Obkio

Have you ever found yourself scratching your head over UDP packet loss? You're not alone. UDP (User Datagram Protocol) is a go-to for streaming, gaming, and VoIP, but when packets start going AWOL, it can spell trouble for your network's performance. Imagine you're in the middle of an important VoIP call, a critical online gaming session, or a live video stream, and suddenly things get choppy or laggy—that's UDP packet loss rearing its ugly head.

Read Post

Obkio

Read more about What is UDP Packet Loss & How to Monitor It

Best Practices for Planning for Upcoming Cloud Maintenance

Jul 5, 2025 By Hrishikesh Barua In IncidentHub

Cloud maintenance is a common practice in the tech industry. Whether you manage your own infrastructure or use a cloud provider, you will need to plan for maintenance and include it as part of your operational readiness. This ensures that your team is prepared for potential downtime and can deal with any incidents in a timely manner. This article will cover some best practices for planning for upcoming cloud maintenance.

Read Post

IncidentHub

Read more about Best Practices for Planning for Upcoming Cloud Maintenance

Silent Downtime: The Hidden Cost of Delayed Awareness in Banking

Jul 4, 2025 By Raja Shekar Mulpuri In HEAL Software

Ask banking leaders if their systems are healthy, and most respond confidently: “Yes, everything’s up.” But track a transaction closely, and reality shifts. A high-value payment retries repeatedly before settling. A KYC process silently times out, losing a verified customer. Compliance checks complete using stale data. No visible outages. Yet silent failures accumulate, becoming costly and increasingly damaging. This is downtime that dashboards never flag.

Read Post

HEAL Software

Read more about Silent Downtime: The Hidden Cost of Delayed Awareness in Banking

Docker Status Unhealthy: What It Means and How to Fix It

Jul 4, 2025 By Faiz Shaikh In Last9

If your container shows Status: unhealthy, Docker's health check is failing. The container is still running, but something inside, usually your app, isn’t responding as expected. This doesn’t always mean a crash. It just means Docker can’t verify the app is working. Here’s how to debug the issue and restore the container to a healthy state.

Read Post

Last9

Read more about Docker Status Unhealthy: What It Means and How to Fix It

Introducing UptimeRobot's official Terraform provider

Jul 4, 2025 By Tomas Koprusak In Uptime Robot

We’re excited to announce the official release of the UptimeRobot Terraform provider, a feature that many of you have been requesting. Starting today, you can manage your UptimeRobot resources, including monitors, alerting integrations, maintenance windows, and public status pages, directly in your Terraform configuration. Let’s take a closer look.

Read Post

Uptime Robot

Read more about Introducing UptimeRobot's official Terraform provider

How AI-driven Anomaly Detection Fortifies Compliance in Multi-Cloud Infrastructures

Jul 4, 2025 By Arpit Sharma In Motadata

In a multi-cloud environment, each cloud platform brings its unique tech stack to record events, manage services, set up configurations, manage user access and permissions, etc. While this allows you to leverage the best-of-breed services from different cloud vendors, the complexity of this setup makes it challenging to detect and respond to anomalies across clouds in real-time.

Read Post

Motadata

Read more about How AI-driven Anomaly Detection Fortifies Compliance in Multi-Cloud Infrastructures

Watch AI Set up and Test Checkly Application Monitoring in Minutes!

Jul 4, 2025 By Checkly In Checkly

Join Stefan Judis, Playwright ambassador, as he answers the question of whether Checkly is AI-ready.

View Video

Checkly

Read more about Watch AI Set up and Test Checkly Application Monitoring in Minutes!

Want recurring revenue? Deliver value - and prove it.

Jul 4, 2025 By Sara Purdon In Martello Technologies

Recurring revenue streams bring stability to any business. For MSPs, that stability can be an essential foundation for service innovation and pursuing growth. Offering subscription-based managed services is an obvious way to get recurring revenue in place. Collaboration platforms like Microsoft Teams and Zoom are prime candidates for subscriptions since businesses depend on them every day, at all levels of the organization, to be productive and profitable.

Read Post

Martello Technologies

Read more about Want recurring revenue? Deliver value - and prove it.

Why is Open Source Important?

Jul 4, 2025 By Coroot In Coroot

🐧🐝 Try Coroot fully #FOSS and check out the latest open source observability tips on our blog: https://t.ly/qBH9f

#opensource #linux #eBPF #observability #DevOps #Coroot #SREs #kubernetes #softwarelibre #freesoftware

View Video

Coroot

Read more about Why is Open Source Important?

Application Performance Monitoring (APM) Use Cases Every DevOps Team Should Know

Jul 4, 2025 By Pavithra Parthiban In Atatus

Modern applications are built using distributed architectures, microservices, and cloud-native technologies. As these systems grow in complexity, it becomes harder for DevOps teams to maintain performance, track issues, and ensure a consistent user experience across all environments. Application Performance Monitoring (APM) helps solve these challenges by providing real-time visibility into how applications behave, from user interactions to backend services and infrastructure.

Read Post

Atatus

Read more about Application Performance Monitoring (APM) Use Cases Every DevOps Team Should Know

Grafana Pyroscope: MCP Server (Community Call July 2025)

Jul 4, 2025 By Grafana In Grafana

You may have been hearing folks talking lately about MCP servers. Let's look into what that is and what Pyroscope tools now exist in the Grafana MCP server.

View Video

Grafana

Read more about Grafana Pyroscope: MCP Server (Community Call July 2025)

Why SaaS Infrastructure Monitoring Is Critical for Modern IT Operations

Jul 4, 2025 By Gayathri R L In Infraon

In today’s cloud-driven world, keeping track of your software services is no longer optional—it’s essential. SaaS infrastructure monitoring helps IT teams keep an eye on the performance, uptime, and health of all cloud-based applications and systems in real time. With businesses relying heavily on remote tools and digital platforms, monitoring your SaaS stack ensures smooth operations and quick issue resolution.

Read Post

Infraon

Read more about Why SaaS Infrastructure Monitoring Is Critical for Modern IT Operations

How to Monitor MPLS Networks

Jul 3, 2025 By Alyssa Lamberti In Obkio

If you manage an enterprise network, then you’ve definitely come across MPLS. Although many businesses rely on MPLS technology for large, high performing networks, they can suffer from network problems, like network congestion, that can impact user experience. Monitoring MPLS using a Network Monitoring tool is key to identifying and solving network issues that impact MPLS performance.

Read Post

Obkio

Read more about How to Monitor MPLS Networks

Observability isn't about the tool. It's about the truth

Jul 3, 2025 By Wasil Banday In Catchpoint

An enterprise client reports latency. Your dashboards say everything is fine. They blame you. You blame them. Nobody can prove it either way. This is where most monitoring efforts hit a wall. Too often, the conversation gets stuck on dashboards and tools instead of the one thing that really matters: truth. Observability isn’t about collecting metrics or building pretty dashboards.

Read Post

Catchpoint

Read more about Observability isn't about the tool. It's about the truth

LangChain Observability: From Zero to Production in 10 Minutes

Jul 3, 2025 By Anjali Udasi In Last9

LangChain apps are powerful, but they’re not easy to monitor. A single request might pass through an LLM, a vector store, external APIs, and a custom chain of tools. And when something slows down or silently fails, debugging is often guesswork. In one instance, a developer ended up with an unexpected $30,000 OpenAI bill, with no visibility into what triggered it. This blog shows how to avoid that using OpenTelemetry and LangSmith. With this setup, you’ll be able to.

Read Post

Last9

Read more about LangChain Observability: From Zero to Production in 10 Minutes

Introducing DNS Monitoring - Stay Ahead of DNS Issues Before They Impact You

Jul 3, 2025 By Tomas Koprusak In Uptime Robot

We’re excited to announce a powerful new addition to your monitoring toolkit: DNS Monitoring is now available on UptimeRobot! DNS (Domain Name System) is a core component of internet functionality. When DNS records are misconfigured, hijacked, or simply expire, they can lead to serious outages, broken email services, or even security risks. That’s why we’ve introduced DNS Monitoring – to help you stay in control of your domain’s health at all times.

Read Post

Uptime Robot

Read more about Introducing DNS Monitoring - Stay Ahead of DNS Issues Before They Impact You

IT Event Console: Centralize Logs, Correlate Alerts, and Detect Incidents

Jul 3, 2025 By Isaac García In Pandora FMS

When you’re just starting out, you might picture yourself managing your IT infrastructure like Tom Cruise in Minority Report—key information projected in front of you, predicting events before they happen, controlling everything at the speed of thought with cinematic gestures on some kind of holographic computer.

Read Post

Pandora FMS

Read more about IT Event Console: Centralize Logs, Correlate Alerts, and Detect Incidents

Top Automation Use Cases for IT (in End User Computing)

Jul 3, 2025 By Nexthink In Nexthink

As digital transformation continues to reshape the business landscape, IT teams are under more pressure than ever. Organizations demand faster service, always-on support, and seamless user experiences – all while IT budgets remain stagnant or even shrink. Organizations urgently need solutions that help them keep up without burning out their teams or inflating costs. This is where IT automation becomes essential.

Read Post

Nexthink

Read more about Top Automation Use Cases for IT (in End User Computing)

From Zero to Dashboard in 10 Minutes with Telegraf, InfluxDB 3, and Grafana

Jul 3, 2025 By Suyash Joshi In InfluxData

In this tutorial, let’s walk through setting up a modern TIG stack in 10 minutes. TIG stands for three popular open source tools that complement each other: Telegraf, InfluxDB 3, and Grafana. They are often used to collect, store, and visualize time series data from servers, containers, APIs, or even IoT devices. We will be using a read-to-use GitHub repository that includes.

Read Post

InfluxData

Read more about From Zero to Dashboard in 10 Minutes with Telegraf, InfluxDB 3, and Grafana

The Business Case for Network Automation: Cost Savings and Efficiency

Jul 3, 2025 By ScienceLogic In ScienceLogic

Let’s get real: the cost of not automating your network operations is probably already showing up on your P&L, and not in the column you like. Manual configuration changes, ad hoc backups, and frantic compliance prep aren’t just operational headaches, they’re quiet killers of budget flexibility and scale readiness. Network automation is no longer a “nice to have” for companies with massive IT budgets or unicorn-level engineering teams.

Read Post

ScienceLogic

Read more about The Business Case for Network Automation: Cost Savings and Efficiency

APM best practices: Dos and don'ts guide for practitioners

Jul 3, 2025 By Elastic Observability Team In Elastic

Application performance management (APM) is the practice of regularly tracking, measuring, and analyzing the performance and availability of software applications. APM helps you get visibility into complex microservices environments, which can overwhelm site reliability engineering (SRE) teams. The generated insights create an optimal user experience and achieve desired business outcomes.

Read Post

Elastic

Read more about APM best practices: Dos and don'ts guide for practitioners

How we created a single app to automate repetitive tasks with Datadog Workflow Automation, Datastore, and App Builder

Jul 3, 2025 By Barak Shoushan In Datadog

For many organizations, scaling up their systems means incorporating new tools to build out infrastructure, optimize code performance and security, improve communication, and track cost changes. While these changes are necessary to support an increased workload, they often result in a situation where even the most basic tasks involve switching between multiple platforms.

Read Post

Datadog

Read more about How we created a single app to automate repetitive tasks with Datadog Workflow Automation, Datastore, and App Builder

Maximizing Uptime: How to Monitor Network Ports

Jul 3, 2025 By ScienceLogic In ScienceLogic

Keeping critical services running smoothly starts with visibility, and that begins at the port level. Whether you're managing a lean environment or a complex network infrastructure, knowing which ports are active, listening, or down can make or break your response time. In this video, we walk through how to fully configure port discovery and monitoring in SL1. You'll learn how to track availability, respond to port failures with automated alerts, and ensure your systems are always one step ahead of potential issues.

View Video

ScienceLogic

Read more about Maximizing Uptime: How to Monitor Network Ports

Opsgenie is shutting down: Complete guide to alternatives in 2025

Jul 3, 2025 By Leo Baecker In Hyperping

Atlassian just pulled the plug on Opsgenie. On December 3, 2024, they announced that Opsgenie will reach end-of-life by April 2027. New sales stopped on June 4, 2025, and if you're using the JSM-bundled version, you'll lose access even sooner—October 2025. Here's the kicker: Atlassian wants you to migrate to their fragmented JSM + Compass combo, which splits your incident management across multiple tools. The reality? Teams are frustrated.

Read Post

Hyperping

Read more about Opsgenie is shutting down: Complete guide to alternatives in 2025

InfluxDB 3 Core: a complete rewrite designed for speed and simplicity

Jul 3, 2025 By Trevor Jones In Grafana

InfluxDB has been a popular time series database for the better part of a decade, and the latest release represents years of work behind the scenes to address several major feature requests users have been asking for since the earliest days of the time series database.

Read Post

Grafana

Read more about InfluxDB 3 Core: a complete rewrite designed for speed and simplicity

AI-Powered Monitoring with Checkly

Jul 3, 2025 By Stefan Judis In Checkly

Most monitoring tools weren't built for the AI-first world. By nature, traditional monitoring platforms force you out of your natural coding environment and trap you in clunky web interfaces, brittle configuration panels, and rigid APIs. And sadly, when monitoring providers do offer "AI features," it's usually a chatbot bolted onto their existing UI, being nothing more than a pale imitation of the AI tools you’re reading about every day on Hacker News. All this creates friction.

Read Post

Checkly

Read more about AI-Powered Monitoring with Checkly

Overview of Subaccounts

Jul 3, 2025 By Uptime Website Monitoring In uptime

Learn how to manage multiple client environments efficiently using Uptime.com Subaccounts. Subaccounts allow you to separate checks, alerts, reports, and user permissions—perfect for agencies or teams handling multiple clients.

View Video

uptime

Monitoring

Read more about Overview of Subaccounts

Why Healthcare IT Can't Keep Relying on Legacy Monitoring

Jul 3, 2025 By LogicMonitor In LogicMonitor

Supporting every hospital chart, scan, and bedside alert is a web of digital systems—EHRs, lab interfaces, clinical apps, networks, and connected devices—all working in sync or struggling to. When something slips, say, an Epic interface queue backs up and lab results don’t reach the attending physician on time, the consequences aren’t theoretical. That delay might mean a sepsis alert gets missed. A treatment window closes. A patient’s outcome changes.

Read Post

LogicMonitor

Read more about Why Healthcare IT Can't Keep Relying on Legacy Monitoring

Catchpoint News Catchup Episode 3

Jul 3, 2025 By Catchpoint In Catchpoint

Join Kelly, Brandon, Payal, and Leon to talk about recent news about your workday, the latest password breach, Cloudflare’s DDoS defense, and (sigh) AI.

View Video

Catchpoint

Monitoring

Read more about Catchpoint News Catchup Episode 3

Honeycomb Telemetry Pipeline Demo

Jul 3, 2025 By Honeycomb In Honeycomb

Jessitron takes you through a 3-minute demo of the Honeycomb Telemetry Pipeline. Our makes it easier to manage telemetry so you always have the data you need, when you need it, without trading off cost, control, or visibility.

View Video

Honeycomb

Read more about Honeycomb Telemetry Pipeline Demo

Sponsored Post

Introducing Raygun CLI: Level-up your error tracking workflow

Jul 2, 2025 By Kai Koenig In Raygun

Raygun CLI is a powerful command-line interface tool designed to enhance the developer experience when working with Raygun's error tracking and performance monitoring platform. With this tool, we bring Raygun's features directly to your terminal, making it easier to integrate some important elements of Raygun Crash Reporting and error tracking into your development and CI/CD workflow. We are excited to announce the release of version 1.0.0 of Raygun CLI.

Read Post

Raygun

Read more about Introducing Raygun CLI: Level-up your error tracking workflow

LangChain & LangGraph: The Frameworks Powering Production AI Agents

Jul 2, 2025 By Anjali Udasi In Last9

Your AI agent worked flawlessly in development, with fast responses, clean tool use, and nothing out of place. Then it hit production. A simple "What's our pricing?" query triggered six API calls, took 8 seconds, and returned the wrong answer. No errors. No stack traces. Unlike traditional systems, AI agents don't crash, they drift. They make poor decisions quietly, and your monitoring says everything's fine.

Read Post

Last9

Read more about LangChain & LangGraph: The Frameworks Powering Production AI Agents

How to Run Elasticsearch on Kubernetes

Jul 2, 2025 By Anjali Udasi In Last9

Elasticsearch stands as one of the most robust open-source search engines available today. Built on Apache Lucene, it handles complex search operations, real-time analytics, and large-scale data processing with impressive speed and accuracy. Kubernetes has transformed how we deploy and manage containerized applications. This orchestration platform automates deployment, scaling, and operations of application containers across clusters of hosts.

Read Post

Last9

Read more about How to Run Elasticsearch on Kubernetes

Close the gaps in your SCOM monitoring with the Opslogix Autonomous Windows Service Management Pack

Jul 2, 2025 By Jonas Lenntun In OpsLogix

Close the gaps in your SCOM monitoring with the Opslogix Autonomous Windows Service Management Pack SCOM offers strong monitoring capabilities, which is extended through its various Management Packs. However, a common challenge is that some Windows services goes unmonitored, simply because they don’t belong to a specific Microsoft technology like SQL Server or IIS.

Read Post

OpsLogix

Read more about Close the gaps in your SCOM monitoring with the Opslogix Autonomous Windows Service Management Pack

A little love for two old fellas - Icinga Business Process Modeling and Icinga Web Graphite Integration

Jul 2, 2025 By Johannes Meyer In Icinga

Today is the day, we grant two products their long overdue maintenance. Maintenance always sounds boring, I hear you. But let me remind you that this also means we do and take care! And what this actually is all about: Now let’s see what each release offers!

Read Post

Icinga

Read more about A little love for two old fellas - Icinga Business Process Modeling and Icinga Web Graphite Integration

The Complete Guide to APM Best Practices for Developers, DevOps & SREs

Jul 2, 2025 By Pavithra Parthiban In Atatus

Application Performance Monitoring (APM) is no longer optional, it is essential for delivering fast, reliable, and seamless digital experiences. But simply installing an APM tool isn’t enough. To truly know its potential, IT teams need to follow APM best practices. Best practices for APM refer to the most effective ways to monitor, analyze, and optimize your application’s performance using APM tools.

Read Post

Atatus

Read more about The Complete Guide to APM Best Practices for Developers, DevOps & SREs

Detecting & Diagnosing Problems, Across Logs, Metrics, & Traces

Jul 2, 2025 By Honeycomb In Honeycomb

What does it look like to notice & debug a problem in Honeycomb? Start with a Service Level Objective (SLO), and Honeycomb can tell you what's unusual about the events that are failing. Continue to dig into all your telemetry.

View Video

Honeycomb

Read more about Detecting & Diagnosing Problems, Across Logs, Metrics, & Traces

Introducing Netdata Insights

Jul 2, 2025 By Netdata In netdata

Subscribe to the channel → / @netdata Now in research preview: Netdata Insights The problem: Incident? You're jumping between dashboards, piecing together timelines. Reporting? You're copy-pasting charts and correlating trends by hand. The data’s there, but turning it into a narrative doesn’t scale. The solution: Netdata Insights. Synthesizes high-fidelity telemetry using the latest LLMs into AI-powered reports with natural-language explanations, visuals, and clear recommendations.

View Video

netdata

Read more about Introducing Netdata Insights

Netdata: The Fastest Path to Full Stack Observability. AI Powered.

Jul 2, 2025 By Netdata In netdata

Netdata is a real-time, high-performance and on-premises observability platform designed to monitor metrics and logs with unparalleled efficiency. Netdata requires zero-configuration to get started, and provides alerts, anomaly detection and AI assisted troubleshooting out of the box, providing a powerful and comprehensive infrastructure monitoring experience. Netdata is known for its distributed design. Instead of funneling all data into a few central databases like most traditional monitoring solutions, Netdata processes data at the edge, keeping it close to the source.

View Video

netdata

Read more about Netdata: The Fastest Path to Full Stack Observability. AI Powered.

Debug smarter with Session Replay in Site24x7 real user monitoring (RUM)

Jul 2, 2025 By ManageEngine Site24x7 In Site24x7

Frontend errors can be tricky to trace without context. Site24x7's Session Replay gives developers, SREs, and DevOps teams complete visibility into the user journey by capturing every click, scroll, and interaction as it happened. With visual replays and correlated performance data, you can quickly identify what went wrong, why it happened, and how to fix it—without relying on user screenshots or log reports.

View Video

Site24x7

Read more about Debug smarter with Session Replay in Site24x7 real user monitoring (RUM)

Effortless customer monitoring with Site24x7's MSP Customer Health View

Jul 2, 2025 By ManageEngine Site24x7 In Site24x7

As a Managed Service Provider, staying on top of your customers’ monitor statuses shouldn't be a hassle. With Site24x7's Customer Health View, you get a centralized, real-time summary of every customer account you manage. Access monitor statuses, alarm counts, and overall account health—all in one place. Switch between List View and Grid View, apply filters to prioritize issues, and let auto-refresh keep you up to date every five minutes.

View Video

Site24x7

Read more about Effortless customer monitoring with Site24x7's MSP Customer Health View

Dynamic Status Pages on Demand

Jul 2, 2025 By Sean White In Oh Dear

Clients expect transparency - especially when things go wrong. But manually updating a status page during an incident or maintenance window slows you down when speed matters most. Oh Dear’s status pages are more than just a pretty uptime dashboard. They’re fully API-driven and designed to scale with your workflow. Whether you manage five client sites or five hundred, you can create, update and sync status pages as needed. Here’s how to do it.

Read Post

Oh Dear

Read more about Dynamic Status Pages on Demand

Robust Time Series Monitoring: Anomaly Detection Using Matrix Profile and Prophet

Jul 2, 2025 By Ram Senthamarai In Sentry

Monitoring production systems often feels like searching for a moving needle in a constantly shifting haystack. At Sentry, our goal was to empower customers to move beyond traditional threshold and percentage-based alerting. We aimed to help them detect subtle and complex anomalies in their systems in near real-time. This post will detail how our AI/ML team developed a time series anomaly detection system using Matrix Profile and Meta’s Prophet.

Read Post

Sentry

Read more about Robust Time Series Monitoring: Anomaly Detection Using Matrix Profile and Prophet

What is a Jitter Buffer and How It Works

Jul 2, 2025 By Alyssa Lamberti In Obkio

If you've ever been on a choppy VoIP call or sat through a video meeting where people sounded like robots from the ‘90s, you’ve likely run into a little thing called jitter. It’s one of those sneaky network issues that doesn’t always get the attention it deserves, until it ruins your real-time traffic. As IT pros and network admins, you're probably used to dealing with packet loss and latency. But jitter? That one's a bit trickier.

Read Post

Obkio

Read more about What is a Jitter Buffer and How It Works

Top Kubernetes Monitoring Tools in 2025, And Why Alerting Is Critical for DevOps and SRE Teams

Jul 2, 2025 By Ritika Bramhe In OnPage

What are the best Kubernetes monitoring tools in 2025? And how can you ensure alerts actually drive action when something goes wrong? Kubernetes monitoring is critical for keeping your containerized applications healthy, but alerting is often overlooked. This blog compares popular tools like Prometheus and Datadog and explains why intelligent alerting solutions like OnPage are essential for effective incident response.

Read Post

OnPage

Read more about Top Kubernetes Monitoring Tools in 2025, And Why Alerting Is Critical for DevOps and SRE Teams

What is eBPF and how can it improve observability? (in 45 seconds)

Jul 2, 2025 By Coroot In Coroot

🐧🐝 Use open source, automatic eBPF observability to gain instant system insights: https://t.ly/qBH9f

#eBPF #Linux #Kubernetes #OTEL #observability #DevOps #SRE

View Video

Coroot

Read more about What is eBPF and how can it improve observability? (in 45 seconds)

You can't fix what you can't see, especially when the problem isn't in your infrastructure. #ipm

Jul 2, 2025 By Catchpoint In Catchpoint

Most teams monitor from the inside, tracking internal metrics, logs, and uptime. But internal health doesn’t always reflect what your users experience. The internet is made up of many parts you don’t own (ISPs, CDNs, DNS, cloud providers), and any one of them can introduce friction. That’s why monitoring from the outside in matters. By testing from real user vantage points, you get a clearer picture of network reachability and performance as it’s actually experienced.

View Video

Catchpoint

Monitoring

Read more about You can't fix what you can't see, especially when the problem isn't in your infrastructure. #ipm

MCP Server Integration & Much More: What's New in VictoriaMetrics Cloud Q2 2025

Jul 2, 2025 By Jose Gomez-Selles In VictoriaMetrics

Q2 2025 has brought another wave of improvements to VictoriaMetrics Cloud! If you tuned in to our latest Quarterly Virtual Meetup, you saw firsthand how we’re making observability even more accessible, powerful, and interactive.

Read Post

VictoriaMetrics

Read more about MCP Server Integration & Much More: What's New in VictoriaMetrics Cloud Q2 2025

Is Your Observability Strategy Boardroom-Ready?

Jul 2, 2025 By Colin Burke In Honeycomb

At LDX3 in London last week, two roundtables I hosted with engineering leaders confirmed what many of us are starting to feel: observability isn’t just important—it’s becoming essential to how modern teams navigate the pressure to move fast and stay resilient.

Read Post

Honeycomb

Read more about Is Your Observability Strategy Boardroom-Ready?

MCP Observability with OpenTelemetry

Jul 2, 2025 By Elizabeth Mathew In SigNoz

2025 has truly been the year of Agentic AI, with MCP (Model Context Protocol) emerging as one of its flashy and most talked-about innovations. While many products have seamlessly integrated MCP servers into their systems, these servers are increasingly being labelled as black boxes, opaque components that handle critical tasks but offer little visibility into what's happening under the hood. We prompt an agent, a tool gets invoked, and a response is generated. But what really happens in between?

Read Post

SigNoz

Read more about MCP Observability with OpenTelemetry

Introducing the Coralogix Operator for Kubernetes

Jul 2, 2025 By Coralogix Team In Coralogix

As organizations begin to scale their observability strategy, point and click methods of management become increasingly unworkable. This is why Coralogix has now fully released the Coralogix Operator for Kubernetes. Kubernetes operators are control loops that allow users to declare their desired state in their Kubernetes clusters, and the operator is responsible for resolving this state.

Read Post

Coralogix

Read more about Introducing the Coralogix Operator for Kubernetes

Coralogix launches OpenAPI endpoints

Jul 2, 2025 By Coralogix Team In Coralogix

Observability is about much more than dashboards and alerts. Extensible platforms that integrate into the user’s tech stack are fundamental parts of a great developer experience. This is why Coralogix has supported gRPC APIs for account management, data ingress & query, alert definition, dashboard creation, permissions management and more. Today, Coralogix adds a new integration, with the launch of OpenAPI endpoints for all existing functionality.

Read Post

Coralogix

Read more about Coralogix launches OpenAPI endpoints

IT Monitoring News | July '25 Edition

Jul 1, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

Welcome to the July edition of the NiCE bi-monthly IT monitoring news! As we reach the height of summer, we’re thrilled to share the latest updates, insights, and resources to help you stay ahead in IT monitoring. With new developments and recent releases, there’s plenty to discover, enhance, and get excited about. Let’s jump in!

Read Post

NiCE IT Mgmt

Read more about IT Monitoring News | July '25 Edition

Logging in Docker Swarm: Visibility Across Distributed Services

Jul 1, 2025 By Faiz Shaikh In Last9

Docker Swarm's logging model shifts from individual container logs to service-level aggregation. The docker service logs command batch-retrieves logs present at the time of execution, pulling data from all containers that belong to a service across your cluster. This approach gives you a unified view of distributed applications, but it comes with its patterns and considerations for effective observability.

Read Post

Last9

Read more about Logging in Docker Swarm: Visibility Across Distributed Services

How to Write Logs to a File in Go

Jul 1, 2025 By Anjali Udasi In Last9

When your Go application moves beyond development, you need structured logging that persists. Writing logs to files gives you the control and reliability that stdout can't match, especially when you're debugging production issues or need to meet compliance requirements. This blog walks through the practical approaches, from Go's standard library to structured logging with popular packages.

Read Post

Last9

Read more about How to Write Logs to a File in Go

The Hidden Cost of Downtime: Why IT Leaders Are Prioritizing Resilient Operations

Jul 1, 2025 By ScienceLogic In ScienceLogic

No business sets out to tolerate downtime. And yet, across industries, unexpected service disruptions continue to drain revenue, erode customer trust, and expose operational fragility. For CIOs and IT leaders, the real concern isn’t if systems will break, it’s whether your team can outpace the fallout. Because in a crisis, speed isn’t just an advantage it’s survival.

Read Post

ScienceLogic

Read more about The Hidden Cost of Downtime: Why IT Leaders Are Prioritizing Resilient Operations

Elephant Flows: The Hidden Heavyweights of AI Data Center Networks

Jul 1, 2025 By Phil Gervasi In Kentik

Elephant flows are no longer rare. They’re foundational to AI workloads. In today’s GPU-heavy data centers, long-lived, high-volume flows can distort ECMP, overflow buffers, and rack up unexpected cloud bills. Kentik helps you see and tame these elephants with real-time flow analytics, automated alerting, and predictive capacity planning.

Read Post

Kentik

Read more about Elephant Flows: The Hidden Heavyweights of AI Data Center Networks

The Dos and Don'ts of Successful Software Rollouts

Jul 1, 2025 By Shawn Lazarus In Nexthink

Launching new enterprise software is one of the most strategic—but risk-laden—internal initiatives any organization can undertake. Done right, it accelerates transformation, streamlines operations, and boosts employee productivity. Done wrong, it can paralyze teams, spike IT tickets, and erode employee trust in the tools they’re given and the teams that support them.

Read Post

Nexthink

Read more about The Dos and Don'ts of Successful Software Rollouts

Can Claude Code Observe Its Own Code?

Jul 1, 2025 By Austin Parker In Honeycomb

One of the great things about OpenTelemetry is that it’s a standard, and standards tend to proliferate. I was excited to see Claude Code add OpenTelemetry metric and log support in a recent release. What was really interesting—beyond the ability to capture usage data from Claude Code—is that you can also get pretty detailed logs about what you’re doing with Claude Code.

Read Post

Honeycomb

Read more about Can Claude Code Observe Its Own Code?

When Will We See the First $1 Billion Company Run by a Single Individual?

Jul 1, 2025 By Teneo In Teneo

It’s only a matter of time. OpenAI CEO Sam Altman said in 2024 that he thought this could be achieved by the end of 2026. Personally, I feel this is a little optimistic; however, based on the evidence I’ve seen, it won’t be long after that. Consider Telegram: a global messaging giant with just 30 employees, already achieving a remarkable $1 billion in revenue. Or Midjourney, revolutionizing creative industries with only 40 employees and generating an impressive $500 million.

Read Post

Teneo

Read more about When Will We See the First $1 Billion Company Run by a Single Individual?

Status Page Aggregator: Best Practices and Use Cases

Jul 1, 2025 By Colin Bartlett In StatusGator

A status page aggregator is a powerful tool that brings together the status updates of multiple cloud services, SaaS providers, and third-party services into a single, unified view. Whether you’re tracking the health of critical dependencies like AWS, Cloudflare, or niche SaaS tools your teams rely on, a status page aggregator simplifies monitoring and helps you stay ahead of outages.

Read Post

StatusGator

Read more about Status Page Aggregator: Best Practices and Use Cases

Automate server restarts in SCOM with the Opslogix Autonomous Maintenance Mode Management Pack

Jul 1, 2025 By Jonas Lenntun In OpsLogix

Automate server restarts in SCOM with the Opslogix Autonomous Maintenance Mode Management Pack Server restarts are routine, but in SCOM they often result in unwanted alerts if not handled properly. The Opslogix Autonomous Maintenance Mode Management Pack addresses this by automatically managing maintenance mode during restarts, minimizing false alerts and improving operational efficiency.

Read Post

OpsLogix

Read more about Automate server restarts in SCOM with the Opslogix Autonomous Maintenance Mode Management Pack

Enhanced monitoring of Amazon EKS with Elastic add-on capabilities

Jul 1, 2025 By Nima Rezainia, In Elastic

Easily enable Elastic add-on within the Amazon EKS Console for streamlined monitoring and quick data onboarding. Amazon Elastic Kubernetes Service (EKS) makes running Kubernetes on AWS simple and scalable. But as your workloads grow, so does the need for robust monitoring and observability. Enter Elastic Agent, a powerful, unified way to collect logs, metrics, and security data from your EKS clusters, all managed through Elastic Fleet.

Read Post

Elastic

Read more about Enhanced monitoring of Amazon EKS with Elastic add-on capabilities

Instrument NextJS with OpenTelemetry in 100 seconds

Jul 1, 2025 By SigNoz - Open Source Observability Platform In SigNoz

What if setting up observability in your Next.js app was as easy as running a few commands? In this quick guide, we show you how to instrument your Next.js application using OpenTelemetry and visualize with SigNoz — without all the headaches.

View Video

SigNoz

Read more about Instrument NextJS with OpenTelemetry in 100 seconds

Perform Distributed Tracing for your MCP system with OpenTelemetry

Jul 1, 2025 By SigNoz - Open Source Observability Platform In SigNoz

2025 has truly been the year of Agentic AI, with MCP (Model Context Protocol) emerging as one of its flashy and most talked-about innovations. While many products have seamlessly integrated MCP servers into their systems, these servers are increasingly being labelled as black boxes, opaque components that handle critical tasks but offer little visibility into what’s happening under the hood. We prompt an agent, a tool gets invoked, and a response is generated. But what really happens in between? And when something breaks, how do we trace the failure and debug it effectively?

View Video

SigNoz

Read more about Perform Distributed Tracing for your MCP system with OpenTelemetry

PHP Monitoring Best Practices for Developers, DevOps, and SREs

Jul 1, 2025 By Mohana Ayeswariya J In Atatus

In 2025, PHP still powers over 75% of the web from ecommerce platforms like Magento to CMSs like WordPress and Laravel-powered web apps. As user expectations rise and digital experiences become mission-critical, real-time PHP monitoring has moved from a luxury to a necessity. According to Statista, PHP continues to rank in the top 10 most-used programming languages globally. Despite the popularity of modern stacks, legacy and modern PHP coexist in thousands of production environments.

Read Post

Atatus

Read more about PHP Monitoring Best Practices for Developers, DevOps, and SREs

Why GovRAMP-authorized observability matters for state, local, and education IT teams

Jul 1, 2025 By Greg Reeder In Datadog

Building on our FedRAMP Moderate authorization and our “In Process” status for FedRAMP High, Datadog for Government is now "In Process" for GovRAMP High Authorization, giving agencies a unified observability platform that meets the toughest public-sector security bars.

Read Post

Datadog

Read more about Why GovRAMP-authorized observability matters for state, local, and education IT teams

Best Website Monitoring Systems of 2025

Jul 1, 2025 By Zoe Collins In OnPage

If you still think websites are a “set it and forget it” asset, your business is going to get left behind. Fast. Nowadays, they are known as a place where business happens, patients connect, and money moves.

Read Post

OnPage

Read more about Best Website Monitoring Systems of 2025

Kentik AI in 20 Seconds

Jul 1, 2025 By Kentik In Kentik

Network intelligence brings the power of AI and the wealth of insight we find in all the data we collect from our networks on premises and in the cloud. And with Journeys, Kentik AI provides relevant, actionable insight to network operations and enables a natural, easy way to interrogate data at scale and with confidence.

View Video

Kentik

Read more about Kentik AI in 20 Seconds

Faster incident response through distributed tracing: Inside Glovo's use of Traces Drilldown

Jul 1, 2025 By Trevor Jones In Grafana

It’s almost 1 p.m. on a Monday afternoon and you’re hungry. You pull up your meal delivery app and select your favorite restaurant and dish. Then you go to check out and nothing happens. Your frustration mounts as you get hungrier by the minute. But there’s frustration on the other side of that transaction as well—engineers are scrambling to figure out what’s wrong as orders drop and revenue losses rise.

Read Post

Grafana

Read more about Faster incident response through distributed tracing: Inside Glovo's use of Traces Drilldown

June product updates

Jul 1, 2025 By Colin Bartlett In StatusGator

In June, we introduced several powerful updates designed to enhance your monitoring experience and keep your team ahead of downtime. From the new TrackSSL integration to an embeddable status modal for your website, here’s a quick look at what’s new.n your site, here’s everything you need to know.

Read Post

StatusGator

Read more about June product updates

Top 5 outages detected by StatusGator in June 2025

Jul 1, 2025 By Colin Bartlett In StatusGator

June 2025 saw several high-impact outages across popular cloud services — from infrastructure giants like Google Cloud to developer platforms like Supabase and Heroku. For IT teams, MSPs, and developers, even short service disruptions can have ripple effects across workflows and customer experience. At StatusGator, we continuously monitor thousands of services to detect issues in real time — often before they’re publicly acknowledged.

Read Post

StatusGator

Read more about Top 5 outages detected by StatusGator in June 2025

StatusGator now monitors 6,000+ services

Jul 1, 2025 By Colin Bartlett In StatusGator

Today, StatusGator monitors over 6,000 cloud services and tools — a massive expansion that reflects how far we’ve come, and how deeply embedded we are in the fabric of modern infrastructure. In today’s world, your product’s reliability depends on a web of vendors — authentication providers, analytics platforms, CDNs, payment processors, communication tools, and more. At 6,000+ services, StatusGator now reflects your entire digital supply chain.

Read Post

StatusGator

Read more about StatusGator now monitors 6,000+ services

Operations | Monitoring | ITSM | DevOps | Cloud