Monthly Archive

Sponsored Post

How to Reduce Continuous Monitoring Costs

Aug 31, 2025 By David Bunting In ChaosSearch

Continuous monitoring is a crucial practice in the fields of DevOps, cybersecurity, and compliance. It involves the proactive and ongoing process of observing, assessing, and collecting data from various systems, applications, and infrastructure components in real-time or near real-time. Continuous monitoring is closely related to observability, which goes beyond simple monitoring to provide a deep understanding of complex and dynamic systems.

Read Post

ChaosSearch

Read more about How to Reduce Continuous Monitoring Costs

How Data Ingestion Works in Elasticsearch (Quick Guide)

Aug 30, 2025 By Elastic In Elastic

Before you can search, analyze, or visualize anything in Elasticsearch, you need data ingestion. In this quick guide, we explain how data moves from raw logs, metrics, or JSON into an index using tools like Logstash, Beats, or language clients. Learn why consistency matters more than perfection and how once data is ingested, it’s ready for search, analysis, and insight.

View Video

Elastic

Read more about How Data Ingestion Works in Elasticsearch (Quick Guide)

95% of AI Pilots Fail - Here's How to Be the 5%

Aug 29, 2025 By Dallon Robinette In Selector

When MIT released research showing that 95% of enterprise AI pilots fail to deliver measurable business impact, it made headlines for a reason. After years of heavy investment in artificial intelligence, the vast majority of organizations still haven’t moved beyond pilots that promise much but deliver little. This doesn’t mean AI itself is broken. In most cases, the technology performs as intended.

Read Post

Selector

Read more about 95% of AI Pilots Fail - Here's How to Be the 5%

Why Do SSL Certificates Fail in Multi-Cloud Environments (AWS, Azure, GCP)?

Aug 29, 2025 By Simon Rodgers In WebSitePulse

SSL certificates keep websites and apps secure, but in AWS, Azure, and Google Cloud Platform (GCP), misconfigurations or expirations can still cause services to go offline. Why do these failures happen, and how can you prevent them?

Read Post

WebSitePulse

Read more about Why Do SSL Certificates Fail in Multi-Cloud Environments (AWS, Azure, GCP)?

Nonsense Networking: Tech Talk #8

Aug 29, 2025 By VictoriaMetrics In VictoriaMetrics

Ever feel like getting simple data from your network is way harder than it should be? You're not alone. With so many devices, the amount of data can be overwhelming, making it tough to see what's actually happening. In this stream, we're breaking down the common frustrations with network monitoring. We'll cover: The SNMP Problem: We'll start with why the "standard" method, SNMP, is often a pain. We'll look at the challenge of finding the right MIBs and OIDs just to get tools like Telegraf or Prometheus to work.

View Video

VictoriaMetrics

Monitoring

Read more about Nonsense Networking: Tech Talk #8

(ServiceNow + Kentik) From Reactive to Proactive: The Rise of Agentic Networks

Aug 29, 2025 By Kentik In Kentik

Agentic AI is not just hype—it’s a force multiplier that enables infrastructure and operations teams to do more, with less effort, in less time. Importantly, it helps IT teams compress time to resolution and even proactively detect and respond to issues, before they escalate.

View Video

Kentik

Read more about (ServiceNow + Kentik) From Reactive to Proactive: The Rise of Agentic Networks

How to Monitor OTP-Protected Web Applications

Aug 29, 2025 By Dotcom-Monitor In Dotcom-Monitor

If you’ve ever used an online banking application to complete a transaction or gone through a checkout on an e-commerce platform, chances are you’ve utilized or interacted with an OTP-protected application. One-Time Password (OTPs) are at the center of most multi-factor authentication (MFA) systems. OTPs are temporary codes delivered by SMS, email, authenticator apps, push notifications, etc.

Read Post

Dotcom-Monitor

Read more about How to Monitor OTP-Protected Web Applications

Meet Bits: Your Always-On AI Teammate for Faster Incident Resolution

Aug 29, 2025 By Datadog In Datadog

What if you could instantly add an engineer to your team -- one who knows your system and is on call 24/7? That’s Datadog BITS. From gathering context to generating and testing hypothesis, BITS helps you find root causes in minutes - not hours.

View Video

Datadog

Read more about Meet Bits: Your Always-On AI Teammate for Faster Incident Resolution

How to deploy Elastic Agents in air-gapped environments

Aug 29, 2025 By Pete Ward In Elastic

From manual downloads to automated artifact management.

Read Post

Elastic

Read more about How to deploy Elastic Agents in air-gapped environments

A Practical Guide to Python Application Performance Monitoring (APM)

Aug 29, 2025 By Anjali Udasi In Last9

When your Python app starts slowing down, maybe queries are taking longer, memory keeps creeping up, or API calls are lagging—basic server metrics won’t tell you why. You need to see what’s happening inside the application itself. That’s the role of Application Performance Monitoring (APM). It gives you a breakdown of database queries, external API calls, memory usage, error rates, and more, so you can connect the dots between code and performance.

Read Post

Last9

Read more about A Practical Guide to Python Application Performance Monitoring (APM)

High Availability by Design | WhatsUp Gold

Aug 29, 2025 By Progress WhatsUp Gold In WhatsUp Gold

As IT environments grow more distributed and resilient, the Progress WhatsUp Gold network monitoring solution is evolving to meet the moment. Starting in early 2026, Progress will officially retire the legacy Failover Manager and usher in a new era of high availability (HA) by design. This modern, scalable approach aligns with today’s best practices in infrastructure. Find more information on High Availability by Design.

View Video

WhatsUp Gold

Read more about High Availability by Design | WhatsUp Gold

AI, IT and HR: Strategy, Risks, and the Future of Work (w/ Ben Eubanks)

Aug 29, 2025 By Nexthink In Nexthink

Tim and Tom are joined by Ben Eubanks, Chief Research Officer at Lighthouse Research & Advisory, bestselling author of Artificial Intelligence for HR, and a leading thinker on the intersection of people, technology, and the future of work. Together they explore how AI is reshaping HR — not only in how the function operates day-to-day, but also in how it redefines HR’s outward-facing strategic role in organizations.

View Video

Nexthink

Read more about AI, IT and HR: Strategy, Risks, and the Future of Work (w/ Ben Eubanks)

PARALLEL SEASON FINALE: S1E5 - The Forgotten

Aug 29, 2025 By Nexthink In Nexthink

Back again for the last installment of this story! Here's the season finale of Tim Flower's brand new IT mystery series, Parallel. Knowledge is power—until you forget.

View Video

Nexthink

Read more about PARALLEL SEASON FINALE: S1E5 - The Forgotten

ALL NEW PARALLEL: S1E4 - The Virtual Murder

Aug 29, 2025 By Nexthink In Nexthink

Back AGAIN by popular demand! Here's episode 4 of Tim Flower's brand new IT mystery series, Parallel. The team at Zentech has been hard at work investigating a murder. This time it's actually time that's the victim, and everyone is a suspect - especially the VDI team. "The Foundation" makes another appearance, and the team gets closer to uncovering the source of many of their recurring IT issues, with the help of an AI assistant and a very persistent DEX team.

View Video

Nexthink

Read more about ALL NEW PARALLEL: S1E4 - The Virtual Murder

Reading the Room: the Power of Unstructured Data (w/ Tania Benade-Meyer)

Aug 29, 2025 By Nexthink In Nexthink

In this episode, Tom speaks with Tania Benade-Meyer, founder of Unstructured, about the power of unstructured data, the shift from SLAs to XLAs, and what it takes to be an “enlightened leader” in 2025.

View Video

Nexthink

Read more about Reading the Room: the Power of Unstructured Data (w/ Tania Benade-Meyer)

Set up Splunk AI Assistant for SPL in Enterprise environments with Cloud Connected Integration

Aug 29, 2025 By Splunk In Splunk

Unlock the power of the Splunk AI Assistant for SPL in your enterprise environment! In this quick tutorial, we'll walk you through the entire process, from downloading the app on Splunkbase, accepting the license agreement, and installing it in your environment, to completing the cloud-connected configuration which now allows you to use the AI Assistant in even more environments!

View Video

Splunk

Read more about Set up Splunk AI Assistant for SPL in Enterprise environments with Cloud Connected Integration

Your New AI Assistant for a Smarter Workflow

Aug 29, 2025 By Mezmo In Mezmo

We're thrilled to announce the launch of our new AI Chatbot, now live and in production for all users. This isn't just a simple Q&A bot; it's a powerful assistant designed to streamline your workflow, provide instant context, and help you navigate the Mezmo platform more efficiently.

Read Post

Mezmo

Read more about Your New AI Assistant for a Smarter Workflow

Serverless Applications: Why Monitoring is Essential for Speed and Reliability

Aug 29, 2025 By Pavithra Parthiban In Atatus

Serverless applications are becoming the go-to architecture for modern developers. Startups and enterprises are building serverless applications because they offer scalability, cost-efficiency, and flexibility. However, these advantages come with unique challenges, especially when it comes to monitoring serverless applications. Traditional server monitoring tools fail to capture short-lived functions, making serverless application monitoring essential for maintaining performance and reliability.

Read Post

Atatus

Read more about Serverless Applications: Why Monitoring is Essential for Speed and Reliability

DORA Compliance Software Options And Use Cases

Aug 29, 2025 By OpsMatters In OpsMatters

DORA entered into application on January 17, 2025, and since then, DORA compliance software, such as Spektion, has become an essential part of many DORA-compliant workflows. However, in this article, we go beyond just one software solution and round up the most common DORA compliance software categories that covered entities are currently using. We also examine what they excel at and how they come together in the context of DORA compliance.

Read Post

OpsMatters

Read more about DORA Compliance Software Options And Use Cases

Digitate Advances Native OpenTelemetry Integration, Offering Unified AIOps and Observability Platform for IT and Business Transformation

Aug 28, 2025 By Digitate In Digitate

ignioÔ Platform Unites Open Standards with AI-Driven Automation to Transform Enterprise Observability from Reactive Monitoring to Autonomous Operations.

Read Post

Digitate

Read more about Digitate Advances Native OpenTelemetry Integration, Offering Unified AIOps and Observability Platform for IT and Business Transformation

Early Warning Signals now in Google Chat

Aug 28, 2025 By Valeria Kurolapova In StatusGator

Good news – Early Warning Signals are now available in Google Chat! This means you’ll get real-time alerts about possible outages—before providers officially acknowledge them—delivered right where your team is already chatting and collaborating.

Read Post

StatusGator

Read more about Early Warning Signals now in Google Chat

Top tips to keep calm when everything is needed ASAP

Aug 28, 2025 By Nandana Ann Mathew In ManageEngine

Top tips is a weekly column where we highlight what’s trending in the tech world and list ways to explore these trends. This week, we’re looking at how to keep your cool when everything lands on your desk with an ASAP tag. There’s always that day at work. Meetings stacked back-to-back, emails piling faster than you can open them, and just when you think you’ve got a handle on things, your boss drops the golden line: Can you get this done today?

Read Post

ManageEngine

Read more about Top tips to keep calm when everything is needed ASAP

Real-time Alerting for Data Center Networks

Aug 28, 2025 By Kentik In Kentik

Kentik’s Phil Gervasi shows how modern data centers—especially those powering AI workloads—can spot and fix problems before they impact performance or budgets. See how Kentik’s Data Explorer helps you identify disruptive flows, reclaim wasted network capacity, and turn insights into real-time alerts. With monitor-only mode and integrations with systems like PagerDuty and ServiceNow, your network becomes its own early warning system—driving uptime, cost savings, and better AI performance.

View Video

Kentik

Read more about Real-time Alerting for Data Center Networks

Getting started with Jira dashboards

Aug 28, 2025 By Blog In Squared Up

Jira is an industry favorite when it comes to managing software projects, yet its native dashboards can sometimes leave teams wanting more insight. The default views give a general update, but often lack the connection to the day-to-day activity happening in other parts of your workflow. As organizations use a wide mix of modern tools – from code repositories and cloud services to spreadsheets and reporting apps, it’s easy for critical details to get scattered or overlooked.

Read Post

Squared Up

Read more about Getting started with Jira dashboards

The core KPIs of LLM performance (and how to track them)

Aug 28, 2025 By Sergiy Dybskiy In Sentry

A few months ago, I built an MCP server for Toronto’s Open Data portal so an agent could fetch datasets relevant to a user’s question. I threw the first version together, skimmed the code, and everything looked fine. Then I asked Claude: “What are all the traffic-related data sources for the city of Toronto?” The tool call fired. I got relevant results. And then I hit an error: “Conversation is too long, please start a new conversation.” I had only asked one question.

Read Post

Sentry

Read more about The core KPIs of LLM performance (and how to track them)

How should Prometheus handle OpenTelemetry resource attributes?

Aug 28, 2025 By Victoria Nduka In Grafana

Note: A version of this post originally appeared on the OpenTelemetry blog. Victoria Nduka is user experience designer and open source contributor making her way into the cloud native space. She writes about design, accessibility, and open source with the same curiosity she brings to her work. On May 29, I wrapped up my mentorship with Prometheus through the Linux Foundation mentorship program.

Read Post

Grafana

Read more about How should Prometheus handle OpenTelemetry resource attributes?

Micro Lesson: Why you should move to OTel

Aug 28, 2025 By Sumo Logic, Inc. In Sumo Logic

The video discusses the advantages of using OpenTelemetry Collector over Sumo Logic's installed collectors for data collection and observability. It highlights key features and serves as a guide for organizations considering the transition to OpenTelemetry.

View Video

Sumo Logic

Read more about Micro Lesson: Why you should move to OTel

Micro Lesson: OTel Benefits

Aug 28, 2025 By Sumo Logic, Inc. In Sumo Logic

The video highlights the advantages of using OpenTelemetry Collector, emphasizing its scalability, flexibility, and cost-effectiveness.

View Video

Sumo Logic

Read more about Micro Lesson: OTel Benefits

Reality Bytes #62: Digital Overload - The Distraction Episode

Aug 28, 2025 By Nexthink In Nexthink

In this episode of Reality Bytes, Sean, Oriana, Dina, Tim, and Tom dive into the ever-present challenge of digital distraction in the workplace. From smartphones and smartwatches to endless Teams notifications, our panelists share their personal "kryptonite" when it comes to staying focused.

View Video

Nexthink

Read more about Reality Bytes #62: Digital Overload - The Distraction Episode

Tech Talk - Splunk ITSI & Correlated Network Visibility

Aug 28, 2025 By Splunk In Splunk

In this technical session, we’ll highlight how you and your team can integrate network telemetry in ITSI to extend your visibility for faster root cause analysis, and more context into network-related service impact.

View Video

Splunk

Read more about Tech Talk - Splunk ITSI & Correlated Network Visibility

Traces and Spans, Oh my!

Aug 28, 2025 By Sentry In Sentry

What even is a trace span?

View Video

Sentry

Read more about Traces and Spans, Oh my!

The vendor trap: why your next outage won't be your fault-but will be your problem

Aug 28, 2025 By Payal Chakraborty In Catchpoint

Today’s enterprises don’t run on singular self-contained systems—they’re intricate webs of interdependence: cloud services, APIs, CI/CD tools, DNS, CDNs, SASE vendors, identity management providers, cloud interconnects, ISPs, SaaS applications, application components, microservices, etc. A recent industry survey found that 84% of organizations suffered operational disruption from third-party risk incidents, with 66% facing adverse financial impact.

Read Post

Catchpoint

Read more about The vendor trap: why your next outage won't be your fault-but will be your problem

Updated Guide: Using Tracealyzer with IAR Embedded Workbench for Arm

Aug 28, 2025 By Percepio In Percepio

Using IAR Embedded Workbench for Arm with an IAR I-jet probe? Did you know this provides an excellent data channel for Tracealyzer trace streaming? We have just updated Percepio Application Note PA-023 with a simpler setup for trace streaming over ITM/SWO, enabled by improvements in IAR’s ITM logging support. This makes it easier than ever to combine IAR’s powerful debugging with Tracealyzer’s RTOS-level insight. Read the updated guide here.

Read Post

Percepio

Read more about Updated Guide: Using Tracealyzer with IAR Embedded Workbench for Arm

Windows Security Event Collection for Microsoft Sentinel with Datastream

Aug 28, 2025 By VirtualMetric In VirtualMetric

Collecting Windows Security Events has always been a necessary but difficult job. Traditional methods depend on third-party collectors that must be installed, configured, and constantly maintained. They break, they lag behind updates, and they create unnecessary operational work. At the same time, they often flood Microsoft Sentinel with redundant or irrelevant data, driving up costs and slowing down investigations.

Read Post

VirtualMetric

Read more about Windows Security Event Collection for Microsoft Sentinel with Datastream

The Right Tool for the Right Job: How to Bring CSV Data into InfluxDB 3

Aug 28, 2025 By Allyson Boate In InfluxData

Comma-separated value (CSV) files are one of the simplest formats for structured data and remain widely used across industries. From machine exports to business reports, CSVs are easy to create, edit, and share. They serve as a backbone for data management, ensuring teams can exchange information quickly and consistently. However, CSVs alone are static. When ingested into a time series database, they shift from flat files to part of a living data pipeline.

Read Post

InfluxData

Read more about The Right Tool for the Right Job: How to Bring CSV Data into InfluxDB 3

What is Database Monitoring

Aug 28, 2025 By Anjali Udasi In Last9

Database monitoring transforms from a reactive troubleshooting exercise into a proactive optimization strategy when you have the right tools and approaches in place. This blog shares practical ways to choose monitoring solutions, set up observability for different database platforms, and design workflows that scale in modern distributed systems.

Read Post

Last9

Read more about What is Database Monitoring

Data Sovereignty vs Data Residency vs Data Localization

Aug 28, 2025 By Rachel Berry In eG Innovations

Awareness of data sovereignty is increasing within organizations. Geo-political situations and recent news stories are causing many to formally evaluate their data management strategies and policies. This means that organizations are also looking at the tools and platforms they use to run and maintain key IT infrastructure and undertake tasks such as monitoring and management. SaaS and cloud first/only tooling can often present data sovereignty challenges and complications.

Read Post

eG Innovations

Read more about Data Sovereignty vs Data Residency vs Data Localization

Flow Like a Pro - Episode 1: Master Flowmon Profiles and Slash Your Query Time!

Aug 28, 2025 By Progress Flowmon In Flowmon

Network analysis doesn’t have to be slow or complicated. In this episode, we’ll show you how to master Flowmon Profiles so you can dramatically cut down your query time and focus on solving problems faster.

View Video

Flowmon

Read more about Flow Like a Pro - Episode 1: Master Flowmon Profiles and Slash Your Query Time!

Copyleft and Apache 2.0: From Redis to Valkey

Aug 28, 2025 By Coroot In Coroot

Valkey OSS Developer Advocate Roberto Luna Rojas explains how Valkey can help improve your toolstack, and the history it shares with Redis.

View Video

Coroot

Read more about Copyleft and Apache 2.0: From Redis to Valkey

OpenTelemetry Deep Dive: Resilience & High Availability in the OTel Collector

Aug 28, 2025 By Bindplane In ObservIQ

Missed it live? Catch the full recording of OpenTelemetry Deep Dive: Resilience & High Availability in the OTel Collector — a 1-hour workshop on building telemetry pipelines that never drop a signal. We’ll show you why resilience matters, how to design high-availability architectures, and how to configure the OpenTelemetry Collector with retries, batching, and persistent queues. Plus, you’ll see live demos in both Docker and Kubernetes — including scaling Gateway collectors with an HPA — and how Bindplane makes large-scale management seamless.

View Video

ObservIQ

Read more about OpenTelemetry Deep Dive: Resilience & High Availability in the OTel Collector

Catchpoint News Catchup Episode 07

Aug 28, 2025 By Catchpoint In Catchpoint

Join, Ankit and Leon as they explore recent articles about: The current state of the AI bubble; the compelling evidence for a 4-day workweek; the enduring legacy of and nostalgia for CDs;

View Video

Catchpoint

Monitoring

Read more about Catchpoint News Catchup Episode 07

What is HIPAA Compliance?

Aug 28, 2025 By Staff Contributor In SolarWinds

Passed in 1996, the Health Insurance Portability and Accountability Act (HIPAA) was established to improve the healthcare system’s storage and use of patient data. As health insurance and healthcare services modernize and digitalize, more health information is stored, transferred, and updated digitally.

Read Post

SolarWinds

Read more about What is HIPAA Compliance?

Built for Scale: Why Enterprises, GSIs, and MSPs Choose ScienceLogic for Intelligent Operations

Aug 28, 2025 By ScienceLogic In ScienceLogic

As companies shift towards a digital first strategy and enterprise architectures become more complex to support it, the demands on IT operations platforms have evolved significantly. Today’s global enterprises, system integrators (GSIs), and managed service providers (MSPs) require more than traditional observability tools. They need scalable, intelligent platforms that can manage sprawling environments with consistency, speed, and precision.

Read Post

ScienceLogic

Read more about Built for Scale: Why Enterprises, GSIs, and MSPs Choose ScienceLogic for Intelligent Operations

Understanding Incident Response vs Incident Remediation

Aug 28, 2025 By Jeff Darrington In Graylog

At a high level, incident remediation is a part of the incident response process. An Incident response plan manages the incident lifecycle across planning, detection, investigation, and recovery. Meanwhile, incident remediation focuses on identifying root causes and implementing measures to prevent future occurrences.

Read Post

Graylog

Read more about Understanding Incident Response vs Incident Remediation

Advances in Furnace Repair Through Modern Technology

Aug 28, 2025 By OpsMatters In OpsMatters

Heating systems have long been a cornerstone of comfortable living, keeping homes and workplaces warm through the coldest months. Over time, furnace repair has shifted from manual inspection and guesswork to a field guided by technological precision. These advances not only improve repair accuracy but also reduce downtime, energy costs, and long-term maintenance burdens. Modern tools, smart diagnostics, and digital platforms have shaped an environment where technicians can provide faster, safer, and more effective care for heating systems.

Read Post

OpsMatters

Read more about Advances in Furnace Repair Through Modern Technology

Upcoming Webinar: Discover the New NiCE VMware vSphere Management Pack 6.1

Aug 27, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

Modern VMware Monitoring. Native to SCOM. Built for Scale.

Read Post

NiCE IT Mgmt

Read more about Upcoming Webinar: Discover the New NiCE VMware vSphere Management Pack 6.1

Stay ahead of downtime: OpManager's new mobile widgets redefine on-the-go network monitoring

Aug 27, 2025 By monicaa.mn@zohocorp.com In ManageEngine

In today’s mobile-first IT world, the difference between reacting late and staying ahead lies in how quickly you access critical data. For network admins who are always on their toes, waiting to open the app to check for alarms or down devices can be a bottleneck. That’s where OpManager’s latest mobile app upgrade comes in, with interactive home screen widgets designed to deliver instant visibility and control—without even launching the app.

Read Post

ManageEngine

Read more about Stay ahead of downtime: OpManager's new mobile widgets redefine on-the-go network monitoring

Optimize application performance at the network layer: introducing HTTP Performance Insights in Frontend Observability

Aug 27, 2025 By Cedric Ziel In Grafana

Imagine you’re a frontend engineer monitoring the user experience for an e-commerce app. You notice your checkout flow has a 15% abandonment rate. Your API responses are inconsistent. Your users are frustrated, and you’re drowning in data and complex queries trying to figure out why. Sound familiar? You can use real user monitoring (RUM) to determine what has happened, looking at page load times, error counts, user sessions, etc.

Read Post

Grafana

Read more about Optimize application performance at the network layer: introducing HTTP Performance Insights in Frontend Observability

What's New in InfluxDB 3.4: Simpler Cache Management, Provisioned Tokens, and More

Aug 27, 2025 By Peter Barnett In InfluxData

Today, we’re releasing InfluxDB 3.4 for Core and Enterprise, as well as our 1.2 update for the Explorer UI. This release focuses on developer efficiency, operational automation, and targeted security enhancements, giving teams faster setup, smoother workflows, and stronger guardrails for production use. InfluxDB 3 Core is free and open source, optimized for recent data, and licensed under MIT and Apache 2.

Read Post

InfluxData

Read more about What's New in InfluxDB 3.4: Simpler Cache Management, Provisioned Tokens, and More

AI That Knows Networking: Selector vs. Generic GPT Integrations

Aug 27, 2025 By Dallon Robinette In Selector

The hype around generative AI has led many IT teams to experiment with plugging generic GPT models into their workflows. On paper, this is the beginning of true AI networking, featuring conversational interfaces, instant summaries, and faster troubleshooting. However, as we discussed in the previous post, “Why Your IT Copilot Needs Context, Not Just Data,” copilots are only as effective as the intelligence behind them.

Read Post

Selector

Read more about AI That Knows Networking: Selector vs. Generic GPT Integrations

You don't control most of the infrastructure your digital services rely on.

Aug 27, 2025 By Catchpoint In Catchpoint

However, your customers still expect a flawless experience, every time. The complexity of modern architectures (CDNs, DNS, APIs, cloud platforms) means that even “simple” applications can break in ways you don’t see coming. So how do you stay ahead of issues you don’t even own? By monitoring the digital delivery chain as your users experience it, across networks, geographies, and third-party dependencies, and catching performance degradations before they become business problems.

View Video

Catchpoint

Monitoring

Read more about You don't control most of the infrastructure your digital services rely on.

Tech Talk - Aligning Observability Costs with Business Value Practical Strategies

Aug 27, 2025 By Splunk In Splunk

Learn how to tackle the challenges of growing telemetry data and optimize your observability model to maximize value while minimizing costs. This session will explore strategies to reduce log ingestion, centralize pipeline management, and gain visibility into metric usage to identify waste.

View Video

Splunk

Read more about Tech Talk - Aligning Observability Costs with Business Value Practical Strategies

MySQL vs. NoSQL: What's the difference?

Aug 27, 2025 By Coroot In Coroot

Valkey OSS Developer Advocate Roberto explains the difference between a Relational Database (think Postgres or MySQL) and a key-value datastore. Spoiler: Key-value in-memory stores are mind-bogglingly fast.

View Video

Coroot

Read more about MySQL vs. NoSQL: What's the difference?

EP #2: Valkey, Vector, Redis, and the History of Databases - The Open Source Observability Podcast

Aug 27, 2025 By Coroot In Coroot

In this episode we learn how Valkey, the lightning-speed open source key-value datastore, can help improve your observability toolstack. Dive in to learn what differentiates a NoSQL data store from a relational database, more about data structures such as HyperLogLog and Bloom Filter, and all about the history of how data is stored.

View Video

Coroot

Read more about EP #2: Valkey, Vector, Redis, and the History of Databases - The Open Source Observability Podcast

Debugging with Session Reply

Aug 27, 2025 By Sentry In Sentry

What is a Session Replay? Record video-like user-sessions to debug your app. Sessions Replays show you what happened and when, along with providing you with all of the relevant logs and traces to debug a user issue without them ever needed to write in.

View Video

Sentry

Read more about Debugging with Session Reply

LangChain Observability: How to Monitor LLM Apps with OpenTelemetry (With Demo App)

Aug 27, 2025 By Goutham Karthi In SigNoz

LangChain has become one of the most popular frameworks for building LLM-powered applications, making it easier to create agents that can reason, plan, and take actions. But like any production-grade AI app, LangChain agents can run into performance bottlenecks, hallucinations, or tool call failures. And without proper LangChain observability, it’s hard to know where things break down.

Read Post

SigNoz

Read more about LangChain Observability: How to Monitor LLM Apps with OpenTelemetry (With Demo App)

Full-Circle Observability: Using SigNoz to monitor a LangChain agent that queries SigNoz MCP

Aug 27, 2025 By Goutham Karthi In SigNoz

In Part 1 of this series, we explored how to instrument a LangChain trip planner agent with OpenTelemetry and send telemetry data to SigNoz. By tracing each step of the planning process: LLM reasoning, tool calls for flights, hotels, weather, and activities, and the final itinerary response, we saw how observability turns a black-box agent workflow into a transparent, debuggable system.

Read Post

SigNoz

Read more about Full-Circle Observability: Using SigNoz to monitor a LangChain agent that queries SigNoz MCP

Smarter Network Monitoring: Reduce Alert Noise for MSPs & IT Teams

Aug 27, 2025 By Mike Grodzki In Auvik

If you’ve ever worked in a loud office, you know the drill: A co-worker’s on a call, someone’s talking about the next Taylor Swift album in the break room, another’s constantly clearing their throat, and the HVAC sounds like a jet engine. It’s loud. Your brain tries to filter it all out, but it’s no use. Then you put on noise-canceling headphones… and suddenly, you can think again.

Read Post

Auvik

Read more about Smarter Network Monitoring: Reduce Alert Noise for MSPs & IT Teams

The inadequate guide to Rails security

Aug 27, 2025 By Starr Horne In Honeybadger

If you're like me, you got into this business because you love building awesome apps. If you've been in the development space long enough, you'll eventually have to do work on those awesome apps that doesn't feel so awesome. Security can be one of those things. Taking Rails security seriously is important, even though the Rails framework does much of the heavy lifting. Before we get too deep into the details of Ruby on Rails security, let's take a second to reflect on the good times. ...

Read Post

Honeybadger

Read more about The inadequate guide to Rails security

The business impact of Elasticsearch logsdb index mode and TSDS

Aug 27, 2025 By Tim Brophy In Elastic

The Elasticsearch storage engine team has made significant strides in improving storage efficiency and performance in Elasticsearch 8.19 and 9.1. Now that these changes are available, what impact can they have on your business? And how do you make the most of them?

Read Post

Elastic

Read more about The business impact of Elasticsearch logsdb index mode and TSDS

Status Page Snapshot

Aug 27, 2025 By Uptime Website Monitoring In uptime

Get a quick tour of the Uptime.com Status Page solution. Uptime.com has you covered regardless of your needs, from SLA accountability to Public Status updates to Internal communications.

View Video

uptime

Read more about Status Page Snapshot

Tech Talk - Mastering Data Pipelines Unlocking value with Splunk

Aug 27, 2025 By Splunk In Splunk

On this Tech Talk to learn how Splunk can help you unlock the value of your security and observability data by building an effective data management strategy. Understand how Splunk’s approach to federated data management can help you maximize the value of data. Build effective pipelines using our latest SPL2-powered data processing capabilities to collect, transform and route data based on your business needs. Run effective searches on data in Amazon S3 without having to ingest or index data into Splunk.

View Video

Splunk

Read more about Tech Talk - Mastering Data Pipelines Unlocking value with Splunk

Targeting hosts and services in Icinga 2 API requests

Aug 27, 2025 By Julian Brost In Icinga

Today, we are going to take a look at the Icinga 2 API and the various ways targets can be specified for different actions, such as querying information or scheduling downtimes. This post focuses on the API request payloads themselves and assumes some familiarity with sending requests to the Icinga 2 API. Please refer to our documentation for the missing details if you want to try the requests yourself. In general, specifying the objects to which an action applies works the same way for all actions.

Read Post

Icinga

Read more about Targeting hosts and services in Icinga 2 API requests

How to surface misconfigured resources by defining policies | Datadog Tips & Tricks

Aug 27, 2025 By Datadog In Datadog

Misconfigured infrastructure resources can be easy to miss, especially in multi-account or multi-cloud environments. From EKS clusters running on deprecated versions to RDS engines on extended support, these issues can disrupt services or drive up costs if left unchecked. In this video, we show you how to: By centralizing policies, you’ll gain a clear view of where to focus your remediation efforts.

View Video

Datadog

Read more about How to surface misconfigured resources by defining policies | Datadog Tips & Tricks

Optimize Kubernetes and Container Costs with Datadog Cloud Cost Management

Aug 27, 2025 By Datadog In Datadog

Struggling to understand the true cost of your Kubernetes workloads? With Datadog Cloud Cost Management, you can automatically allocate container costs by team, product, and service down to the pod. Instantly identify idle resources, surface optimization opportunities, and act with confidence. All in one unified platform.

View Video

Datadog

Read more about Optimize Kubernetes and Container Costs with Datadog Cloud Cost Management

Observability Without Limits - Uptrace Pricing Explained

Aug 27, 2025 By Uptrace In Uptrace

Welcome to Uptrace, the modern observability platform. Our pricing is simple: pay only for the data you ingest. Unlimited users, services, and hosts Billed per uncompressed GB for spans & logs Billed by active timeseries for metrics Automatic volume discounts as your usage grows Free trial includes 1 TB of spans & logs and 100,000 timeseries — no credit card required.

View Video

Uptrace

Read more about Observability Without Limits - Uptrace Pricing Explained

Catch core banking issues (before they impact customers and compliance)

Aug 27, 2025 By Uptrends In Uptrends

APAC customers have high expectations around instant payments, open banking, and mobile-first experiences. In March 2025, India’s real-time payment system, UPI went down for five hours. Millions experienced payment failures, failed fund transfers, and login errors and many vented their frustrations on social media. With banking and payment disruptions on the rise, regulators are calling for proof of resilience.

Read Post

Uptrends

Read more about Catch core banking issues (before they impact customers and compliance)

Eliminate cloud waste across AWS, Azure, and Google Cloud with Cloud Cost Recommendations

Aug 27, 2025 By Candace Shamieh In Datadog

As organizations increasingly adopt multi-cloud strategies, identifying areas to reduce cloud spend has become highly complex and time consuming. While there are many reasons that organizations choose to run their infrastructure in a multi-cloud environment, many do so to comply with regional data requirements, take advantage of best-of-breed offerings, or avoid vendor lock-in.

Read Post

Datadog

Read more about Eliminate cloud waste across AWS, Azure, and Google Cloud with Cloud Cost Recommendations

Tech Talk - The Latest Cisco Integrations With Splunk Platform

Aug 27, 2025 By Splunk In Splunk

In this session will provide insights into optimizing performance, streamlining operations, and gaining deeper insights into your infrastructure. Don't miss out on this opportunity to learn how Cisco and Splunk are revolutionizing IT and security operations.

View Video

Splunk

Read more about Tech Talk - The Latest Cisco Integrations With Splunk Platform

Tech Talk - Unleash Unified Security and Observability with Splunk Cloud Platform

Aug 27, 2025 By Splunk In Splunk

On this Tech Talk we dive into the top use cases of Splunk on Azure cloud migration and Splunk AI integrations with Microsoft Co-pilot – helping customers successfully navigate the ever-increasing threat landscape.

View Video

Splunk

Read more about Tech Talk - Unleash Unified Security and Observability with Splunk Cloud Platform

Reduce cloud waste with Datadog Cost Recommendations

Aug 27, 2025 By Datadog In Datadog

Struggling to optimize your cloud spend across AWS, Azure, and Google Cloud? Datadog Cloud Cost Management highlights underutilized or legacy resources and lets engineers take immediate action using Datadog Workflows. Eliminate waste and drive savings with recommendations that your teams can trust.

View Video

Datadog

Read more about Reduce cloud waste with Datadog Cost Recommendations

Monitoring websites from the United States

Aug 27, 2025 By Bela Susan Thomas In Site24x7

Monitoring your websites from the US region is critical for serving users from the US as it helps you improve website performance and compliance-related practices, ensure business continuity, and offer a better customer experience. Just a few milliseconds of added lag specific to US connections can impact bounce rates and conversions, making localized monitoring essential.

Read Post

Site24x7

Read more about Monitoring websites from the United States

Will custobots drive $98 trillion in payments by 2027?

Aug 26, 2025 By Harsitha P In ManageEngine

It starts like this. You wake up groggily and stumble into the kitchen. The coffee machine is already brewing your favourite blend. But that’s not the surprise. It's the message on your phone: "Coffee beans restocked. $10 paid. Delivery by noon." You didn’t place an order. You didn't lift a finger. Your machine did it for you. Welcome to 2025, where your devices aren't just smart, they're economically independent. These are machine customers (custobots). They negotiate. They pay.

Read Post

ManageEngine

Read more about Will custobots drive $98 trillion in payments by 2027?

Nearly 1 in 2 UK public sector IT leaders say their cybersecurity tools fall short

Aug 26, 2025 By SolarWinds In SolarWinds

System complexity and budget limitations contribute to the reported cybersecurity gaps.

Read Post

SolarWinds

Read more about Nearly 1 in 2 UK public sector IT leaders say their cybersecurity tools fall short

Tech Talk - App Building 101 Recording

Aug 26, 2025 By Splunk In Splunk

Watch this Tech Talk to learn… What is a Splunk app and how do I get started building an app! How do I test my app and deploy my app on Splunk! How can I share my app with the broader Splunk community!

View Video

Splunk

Read more about Tech Talk - App Building 101 Recording

Tech Talk - Exporting Splunk Apps

Aug 26, 2025 By Splunk In Splunk

On this Tech Talk you can learn: How to keep local copies and exported snapshots of your apps and associated app data.How to simplify the development, management and debugging of Splunk Cloud Apps.Recommended tactics with ACS's APIs, CLI, and Terraform.

View Video

Splunk

Read more about Tech Talk - Exporting Splunk Apps

Tech Talk - Build Your First SPL2 App

Aug 26, 2025 By Splunk In Splunk

Watch this Tech Talk to learn: What SPL2 is, and how it extends SPL’s capabilities for developers and admins!How to build your first app with SPL2!How SPL2 can help solve common access control challenges with run-as-owner views, and conduct data quality validation with custom type checks!

View Video

Splunk

Read more about Tech Talk - Build Your First SPL2 App

Introducing ping and TCP port monitoring (and lots of other improvements)

Aug 26, 2025 By Freek Van der Herten In Oh Dear

A couple months ago, we sent out a survey to all our users asking what they like about Oh Dear, how they use it, and how we could improve our service. One of the things that was asked a lot was ping and TCP port monitoring. The past few months we worked hard to add this kind of monitoring to our service. And while building it, we touched upon other parts of our service and improved lots of little things. And I'm proud to share that we now have shipped it all! Let's go through it!

Read Post

Oh Dear

Read more about Introducing ping and TCP port monitoring (and lots of other improvements)

We vibe coded a path tracer: Here's how we used static and dynamic analysis to fix it

Aug 26, 2025 By Bowen Chen In Datadog

When developing software, the longer you intend to keep a system around, the more important it becomes to prioritize its code quality. But as more organizations move toward microservice architectures and adopt agentic AI and LLMs into their development workflows, many engineering teams have increased their emphasis on accelerating developer velocity, often at the expense of code quality. This can often result in code that fails to meet standards for performance, reliability, and security.

Read Post

Datadog

Read more about We vibe coded a path tracer: Here's how we used static and dynamic analysis to fix it

How to measure and fix latency with edge deployments and Sentry

Aug 26, 2025 By Kyle Tryon In Sentry

A 2017 study by Google, researchers found: That was over 8 years ago. And let’s be honest, it’s not likely users have found any additional patience in that time. Web Vitals are a set of performance metrics defined by Google that measure user experience. They focus on things like LCP (how long the main content takes to load), INP (how quickly the page responds to input), and CLS (how visually stable the app is, meaning whether content shifts unexpectedly).

Read Post

Sentry

Read more about How to measure and fix latency with edge deployments and Sentry

Put Cloud Costs in Front of Engineers with Datadog Cloud Cost Management

Aug 26, 2025 By Datadog In Datadog

Tired of surprises on your cloud bills? With Datadog Cloud Cost Management integrated into the Software Catalog, engineers see cost, performance, and reliability side by side—no context switching required. Give every service owner the visibility they need to make cost-aware decisions.

View Video

Datadog

Read more about Put Cloud Costs in Front of Engineers with Datadog Cloud Cost Management

Track Cloud Unit Economics with Datadog Cloud Cost Management

Aug 26, 2025 By Datadog In Datadog

Do you know the true cost per user, API call, or checkout? Datadog Cloud Cost Management lets you break down spend by combining cost, observability, and custom business metrics—all in one place. Track cost per transaction, alert on changes, and align engineering and finance with real-time unit economics.

View Video

Datadog

Read more about Track Cloud Unit Economics with Datadog Cloud Cost Management

The Gartner 2025 Market Guide for Log Monitoring and Analysis Solutions

Aug 26, 2025 By Merylee Heggem In Sumo Logic

The zettabyte era of data is alive and well. Every tool in your tech stack now has some sort of AI functionality, while middleware sprawl and feature wars between vendors become daily battles.

Read Post

Sumo Logic

Read more about The Gartner 2025 Market Guide for Log Monitoring and Analysis Solutions

Tech Talk - Holistic Visibility and Effective Alerting Across IT and OT Assets

Aug 26, 2025 By Splunk In Splunk

On this Tech Talk to learn how to gain complete visibility into all hosts and their potential vulnerabilities, misconfigurations and unpatched components in a single analytics platform, adding Tenable asset and exposure risk context improves alert prioritization and joint customers use Splunk for Centralized Reporting.

View Video

Splunk

Read more about Tech Talk - Holistic Visibility and Effective Alerting Across IT and OT Assets

API Monitoring Basics

Aug 26, 2025 By AlertBot In AlertBot

Are you interested in learning about API monitoring? Then you’ve come to the right article. Below, we explain what APIs are and why they’re valuable, explore the basics of API monitoring, and wrap up with practical advice on how to get comprehensive API monitoring in your organization.

Read Post

AlertBot

Read more about API Monitoring Basics

Monitor Apple Silicon GPU on macOS with macmon + Hosted Graphite

Aug 26, 2025 By Benjamin Pitts In MetricFire

Your Mac’s GPU is a massively parallel processor that handles anything from animating the UI to heavy lifting in video editors, 3D tools, games, and on-device machine learning models. Think Final Cut Pro exports, Blender renders, Stable Diffusion, WebGPU demos, or shader builds in Xcode - which are all tasks that require heavy GPU.

Read Post

MetricFire

Read more about Monitor Apple Silicon GPU on macOS with macmon + Hosted Graphite

5 DevOps Team Structures (Plus Actionable Strategies for Automation, Monitoring & Culture Change)

Aug 26, 2025 By Leo Baecker In Hyperping

An effective DevOps team is about creating the right structure, culture, and processes that enable collaboration across traditionally siloed departments. The right DevOps team structure can dramatically improve software delivery speed, reliability, and overall customer satisfaction. But what exactly makes a great DevOps team? And how can you build one that works for your organization?

Read Post

Hyperping

Read more about 5 DevOps Team Structures (Plus Actionable Strategies for Automation, Monitoring & Culture Change)

Why Even Krispy Kreme Got Hacked

Aug 26, 2025 By solarwindsinc In SolarWinds

Think only critical infrastructure gets attacked? Wrong. Ryan explains why every business—big or small—is on hackers’ radar.

View Video

SolarWinds

Read more about Why Even Krispy Kreme Got Hacked

ManageEngine recognized as a Customers' Choice in the 2025 Gartner Peer Insights Voice of the Customer for Network Management Tools

Aug 25, 2025 By Shree Harish S B In ManageEngine

We are thrilled to share that ManageEngine has been recognized as a Customers’ Choice in the 2025 Gartner Peer Insights Voice of the Customer for Network Management Tools. We are even more excited to be the only vendor positioned in the Customers' Choice quadrant for this category! This recognition is especially meaningful because it's completely based on reviews and feedback from our customers.

Read Post

ManageEngine

Read more about ManageEngine recognized as a Customers' Choice in the 2025 Gartner Peer Insights Voice of the Customer for Network Management Tools

What's new for scheduling and resource management in Kubernetes v1.34?

Aug 25, 2025 By Nicholas Thomson In Datadog

Kubernetes v1.34, which is scheduled for release August 27, 2025, focuses on improved scheduler visibility, deeper life cycle observability, and enhanced resource management. As always, the list of changes and improvements in the official changelog is extensive, and cluster operators may be wondering which changes are most important. If you're operating a monitoring platform or depend on deep Kubernetes observability, here's how a number of new features will affect your workflows.

Read Post

Datadog

Read more about What's new for scheduling and resource management in Kubernetes v1.34?

Manage your dashboards and monitors at scale

Aug 25, 2025 By Khang Truong In Datadog

In the early stages of building a system, a few well-placed dashboards and monitors can provide sufficient visibility into service health and performance. However, as infrastructure scales and teams grow, so does the complexity of the monitoring landscape. In organizations where individual teams manage their own services but rely on a central platform or observability team for tooling and guidance, this complexity can quickly multiply.

Read Post

Datadog

Read more about Manage your dashboards and monitors at scale

Visualize Logs Alongside Metrics: Complete Observability for Slow PostgreSQL Queries

Aug 25, 2025 By Benjamin Pitts In MetricFire

When latency creeps into your app, metrics tell you that performance regressed, but logs tell you why. PostgreSQL’s slow-query logging gives you the exact statement, duration, user, and database which is perfect for hunting down missing indexes, inefficient filters, or N+1 patterns.

Read Post

MetricFire

Read more about Visualize Logs Alongside Metrics: Complete Observability for Slow PostgreSQL Queries

Caddy Webserver Data in Graylog

Aug 25, 2025 By Jeff Darrington In Graylog

If you’re running Caddy Webserver on Ubuntu, Graylog now has a new way to make your access logs more actionable without tedious parsing or manual setup. The new Caddy Webserver Content Pack, available in Illuminate 6.4 and a Graylog Enterprise or Graylog Security license, delivers ready-to-use parsing rules, streams, and dashboards so you can quickly turn raw logs into structured, searchable insights.

Read Post

Graylog

Read more about Caddy Webserver Data in Graylog

The Complete Angular Error Handling Guide for Production-Ready Apps

Aug 25, 2025 By Todd H. Gardner In TrackJS

Your Angular app just crashed in production with ‘ERROR Error: Uncaught (in promise): ’. Sound familiar? After debugging countless production fires, I’ve learned that proper error handling isn’t optional—it’s the difference between sleeping through the night and getting paged at 3 AM.

Read Post

TrackJS

Read more about The Complete Angular Error Handling Guide for Production-Ready Apps

Real User Experiences: How Auvik Network Management Transforms Remote Support

Aug 25, 2025 By Rebecca Grassing In Auvik

When distributed teams need network support, traditional approaches often fall short. The difference between a quick remote fix and hours of on-site troubleshooting can make or break productivity for organizations with dispersed infrastructure. Based on feedback from real users on PeerSpot, an enterprise technology buying intelligence platform, Auvik Network Management is changing how IT teams deliver remote support by eliminating common barriers and reducing resolution times.

Read Post

Auvik

Read more about Real User Experiences: How Auvik Network Management Transforms Remote Support

How Auvik Network Management Optimizes Network Performance: Real User Insights

Aug 25, 2025 By Rebecca Grassing In Auvik

Network performance challenges can cripple business operations, leaving IT teams scrambling to identify bottlenecks while users experience frustrating slowdowns. Without proper visibility into bandwidth utilization, latency issues, packet loss, and network availability, organizations risk reactive troubleshooting that costs time and productivity.

Read Post

Auvik

Read more about How Auvik Network Management Optimizes Network Performance: Real User Insights

OpenTelemetry API vs SDK: Understanding the Architecture

Aug 25, 2025 By Anjali Udasi In Last9

When you're instrumenting applications with OpenTelemetry, you'll encounter two core components: the API and the SDK. The API defines what telemetry data looks like and how it is created, while the SDK handles how that data is processed and exported. Understanding this split helps you build more maintainable observability and avoid tight coupling between your business logic and telemetry infrastructure.

Read Post

Last9

Read more about OpenTelemetry API vs SDK: Understanding the Architecture

Evaluate and Improve Your Site's Web Performance With Honeycomb for Frontend Observability

Aug 25, 2025 By Mae Capozzi In Honeycomb

As an engineer on Honeycomb’s frontend platform team, I’m constantly trying to understand and improve our web performance. And I have a whole lot of questions. I tried answering these types of questions without Honeycomb in the past, and it was difficult and time consuming. It used to take me days to identify performance issues and their causes, let alone fix them and confirm that they improved web performance for some subset of users.

Read Post

Honeycomb

Read more about Evaluate and Improve Your Site's Web Performance With Honeycomb for Frontend Observability

k8s-monitoring-helm Chart Office Hours (August 2025)

Aug 25, 2025 By Grafana In Grafana

In the August edition of the Kubernetes Monitoring Helm chart office hours, we discuss the version 3.3 release as well as the plan for upcoming features. Finally, we end with a packed Q&A full of great questions.

View Video

Grafana

Read more about k8s-monitoring-helm Chart Office Hours (August 2025)

Exploring our new PHP SDK, built using Saloon

Aug 25, 2025 By Freek Van der Herten In Oh Dear

Today, next to Ping and TCP monitoring, we've also launched a new PHP SDK package, which has been rebuilt from scratch using the wonderful Saloon library. Using our new SDK, you can easily use the entire Oh Dear API. In this blog post, I'd like to show you how you can use the new SDK and how it works under the hood.

Read Post

Oh Dear

Read more about Exploring our new PHP SDK, built using Saloon

Raising the bar in observability and security: Coralogix extensions at scale

Aug 24, 2025 By Mayur Moon & Yogita Grewal In Coralogix

In today’s high-velocity digital ecosystem, visibility isn’t enough. SREs and engineering leaders need real-time insights, actionable signals, and automated workflows to operate at scale. As systems grow more distributed and cloud-native, the demand for intelligent observability and security has never been higher. Extensions are solutions to get instant observability with prepackaged parsing rules, alerts,dashboards and more.

Read Post

Coralogix

Read more about Raising the bar in observability and security: Coralogix extensions at scale

Grafana Campfire - Using the Drilldown Apps (Grafana Community Call - August 2025)

Aug 23, 2025 By Grafana In Grafana

In this Campfire Community call, we will discuss about the new Grafana Drilldown Apps and how they differ from Explore. We will discuss how it has been continuously evolving to become a core part of Grafana OSS, enabling users to access data easily.

View Video

Grafana

Read more about Grafana Campfire - Using the Drilldown Apps (Grafana Community Call - August 2025)

Alerting Best Practices

Aug 22, 2025 By Roman Khavronenko In VictoriaMetrics

A firing alert is like someone ringing your doorbell - it demands your immediate attention, interrupting whatever else you’re doing. It requires focus and a quick response. But imagine trying to live in an apartment where the doorbell never stops ringing. You could put in earplugs to block the noise, but that only masks the problem - it doesn’t solve it. On the other hand, disconnecting the doorbell entirely isn’t a solution either.

Read Post

VictoriaMetrics

Read more about Alerting Best Practices

High Availability by Design: WhatsUp Gold Strategic Shift from Failover

Aug 22, 2025 By Jason Alberino In WhatsUp Gold

Read Post

WhatsUp Gold

Read more about High Availability by Design: WhatsUp Gold Strategic Shift from Failover

Why Your IT Copilot Needs Context, Not Just Data

Aug 22, 2025 By Dallon Robinette In Selector

In the rush to adopt AI in IT operations, many organizations focus on feeding copilots as much data as possible. But here’s the problem: data without context is just noise. An IT copilot that can’t distinguish what matters from what doesn’t won’t reduce alert fatigue or accelerate troubleshooting.

Read Post

Selector

Read more about Why Your IT Copilot Needs Context, Not Just Data

Instrument your Azure Container Apps workloads with the new Datadog Agent sidecar

Aug 22, 2025 By Jordan Obey In Datadog

Modern application development is evolving rapidly, with serverless containers and microservices becoming the standard for scalable, resilient architectures. Azure Container Apps is at the forefront of this movement, enabling developers to deploy containerized applications without having to manage infrastructure.

Read Post

Datadog

Read more about Instrument your Azure Container Apps workloads with the new Datadog Agent sidecar

Debugging Slow PHP Applications with APM Tools

Aug 22, 2025 By Pavithra Parthiban In Atatus

A slow PHP application in production is not just a performance issue, it poses a significant risk to business operations and user satisfaction. Slow page loads frustrate users, increase bounce rates, and directly impact revenue. For developers, the bigger challenge is that these slowdowns often hide deep in the code, database queries, or external dependencies, making them hard to find.

Read Post

Atatus

Read more about Debugging Slow PHP Applications with APM Tools

Grafana Mimir: 3 reasons to run the TSDB for Prometheus on bare metal

Aug 22, 2025 By Wilfried Roset In Grafana

Wilfried Roset is an engineering manager who leads an SRE team and he is a Grafana Champion. Wilfried currently works at OVHcloud where he focuses on prioritizing sustainability, resilience, and industrialization to guarantee customers satisfaction. Whether it’s for efficient resource allocation, flexibility, high availability, or scalability, it makes a lot of sense to run Grafana Mimir on Kubernetes—but it’s not the only way to deploy Mimir.

Read Post

Grafana

Read more about Grafana Mimir: 3 reasons to run the TSDB for Prometheus on bare metal

How to Prove DNS Monitoring ROI to Clients (Without Getting Technical)

Aug 22, 2025 By DNS Spy In DNS Spy

Most clients don’t care how DNS works—until it breaks. But as an MSP, you know the damage a single DNS misconfiguration or unnoticed change can cause. So how do you prove the ROI of DNS monitoring to clients who don't speak in TTLs or CNAMEs? Here’s how to bridge the gap between technical benefits and business value—so your clients understand exactly why they’re paying for DNS protection.

Read Post

DNS Spy

Read more about How to Prove DNS Monitoring ROI to Clients (Without Getting Technical)

Identify slowdowns across your entire network with Datadog Network Path

Aug 22, 2025 By Cat Yao In Datadog

As modern infrastructure becomes increasingly distributed across on-premises data centers, multi-cloud environments, ISPs, and remote offices, understanding how traffic flows across your network is critical to delivering reliable performance and great user experiences. But pinpointing the source of network slowdowns remains one of the most persistent challenges for operations, network, and IT teams.

Read Post

Datadog

Read more about Identify slowdowns across your entire network with Datadog Network Path

How to Monitor WiFi Access Points: Best Practices for Business WiFi

Aug 22, 2025 By Andrii Kernitskyi In Obkio

WiFi Access points (APs) are the foundation of business WiFi. They’re the devices making sure laptops, smartphones, and even IoT gadgets connect reliably without cables. If an access point fails or becomes overloaded, the entire wireless experience can collapse, no matter how strong your Internet connection is. By keeping a close eye on your APs with the right WiFi access point monitoring software, you can catch issues before users even notice them.

Read Post

Obkio

Read more about How to Monitor WiFi Access Points: Best Practices for Business WiFi

The Convergence of ITSM and EAM: Why Unified Operations Matter More Than Ever

Aug 22, 2025 By Arpit Sharma In Motadata

The need to differentiate IT Service Management (ITSM) and Enterprise Asset Management (EAM) has now become impractical in an era of immense technological complexity and unlimited demands for operational efficiency. Organizations today increasingly rely on both digital services and physical assets to derive value; however, siloed processes and disparate data repositories lead to slow incident resolution, uncoordinated change initiatives, and hidden risks.

Read Post

Motadata

Read more about The Convergence of ITSM and EAM: Why Unified Operations Matter More Than Ever

How Tipalti mastered Elasticsearch performance with AutoOps

Aug 22, 2025 By Oz Levy, In Elastic

From manual monitoring to proactive optimization, learn how Tipalti used AutoOps to save 10% annual costs. For a global payables automation leader like Tipalti, where financial transactions are the lifeblood of the business, infrastructure performance isn't just a technical goal; it's a core business requirement. Managing a complex ecosystem of databases, including Postgres, SQL Server, MongoDB, Kafka, and Elasticsearch, with a lean team of four engineers demands efficiency and powerful tooling.

Read Post

Elastic

Read more about How Tipalti mastered Elasticsearch performance with AutoOps

When Milliseconds become Make-or-Break, Fragile Ops are a Brand Liability

Aug 22, 2025 By Matt Belanger In Digitate

A major studio drops its new episode at midnight. Millions are queued to watch. Push notifications hit, the app surges in traffic, and then timeout. Spinning wheels. Frozen screens. Social media lights up. Customers don’t just notice they remember. For today’s communications, media, and information (CMI) brands, digital reliability is the product. Viewers, subscribers, and enterprise users aren’t comparing your uptime to industry benchmarks.

Read Post

Digitate

Read more about When Milliseconds become Make-or-Break, Fragile Ops are a Brand Liability

The Outage You Didn't See Coming: How to Discover and Monitor Certificates Proactively

Aug 22, 2025 By Progress WhatsUp Gold In WhatsUp Gold

Progress WhatsUp Gold Certificate Discovery and Monitoring is a seamless capability included out of the box. It’s a proactive safeguard designed to help you spot certificate issues before they escalate into business problems.

View Video

WhatsUp Gold

Read more about The Outage You Didn't See Coming: How to Discover and Monitor Certificates Proactively

New privacy controls for your status page

Aug 21, 2025 By Valeria Kurolapova In StatusGator

StatusGator is beloved by teams across the world for its internal team status page that shows the status of all your third party services. We’ve rolled out two new privacy features to give you more control over how your status page is shared and discovered.

Read Post

StatusGator

Read more about New privacy controls for your status page

Sponsored Post

Atlassian Bitbucket Monitoring on Microsoft SCOM

Aug 21, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

As part of a customer project, we developed a custom Bitbucket Management Pack for Microsoft System Center Operations Manager (SCOM). This tailored solution enables IT operations teams to monitor key performance and health metrics of Bitbucket environments, ensuring planning and bug-tracking platforms remain available and performant. With this Use Case paper, we aim to share our knowledge with the SCOM community, highlighting the possibilities of advanced monitoring on Microsoft SCOM and helping teams improve their day-to-day tasks.

Read Post

NiCE IT Mgmt

Read more about Atlassian Bitbucket Monitoring on Microsoft SCOM

NiCE VMware vSphere Management Pack 6.1 - Coming Soon

Aug 21, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

Modern VMware Monitoring. Native to SCOM. Built for Scale. Start Now.

Read Post

NiCE IT Mgmt

Read more about NiCE VMware vSphere Management Pack 6.1 - Coming Soon

How Product Managers Can Benefit From Honeycomb

Aug 21, 2025 By Rox Williams In Honeycomb

Observability tools like Honeycomb are built for engineers, not PM teams… but that doesn’t mean there’s no benefit to having your PMs in Honeycomb. Whether it’s debugging a weird customer issue or tracking how a feature is used in the wild, observability gives PMs something traditional product tools can’t: real-time answers with full context, down to a single user.

Read Post

Honeycomb

Read more about How Product Managers Can Benefit From Honeycomb

Reduce PHI Risk Exposure With a Strategy That Supports HIPAA Compliance

Aug 21, 2025 By Eoin Keenan In SolarWinds

Health Insurance Portability and Accountability Act (HIPAA) compliance is about more than firewalls and passwords. Your file-sharing solutions could be the weakest link in protecting sensitive patient data. When we think about healthcare cybersecurity, we tend to focus on large systems: electronic health records, databases, and billing platforms. But one everyday workflow that’s also as vulnerable – and often overlooked – is file transfer.

Read Post

SolarWinds

Read more about Reduce PHI Risk Exposure With a Strategy That Supports HIPAA Compliance

Anomaly detection explained: Why your monitoring needs it

Aug 21, 2025 By ManageEngine Site24x7 In Site24x7

Anomaly detection goes beyond fixed thresholds to catch the issues your monitoring might miss—like unusual latency spikes, sudden drops in traffic, or odd system behavior that doesn’t throw an error. In this video, we explain: With Site24x7’s AI-powered monitoring, anomaly detection is built-in—helping DevOps teams move from reactive fixes to proactive observability.

View Video

Site24x7

Read more about Anomaly detection explained: Why your monitoring needs it

Monitor your mobile apps with Site24x7 Mobile real user monitoring (RUM)

Aug 21, 2025 By ManageEngine Site24x7 In Site24x7

Get end-to-end visibility into how your apps perform in the real world. Quickly detect app crashes, start-up delays, slow API calls, and performance issues after updates. Drill down by device, OS, network, or geography to troubleshoot faster and deliver seamless user experiences. Key highlights in this video: Stay ahead of performance issues and keep your users happy with Site24x7 Mobile RUM.

View Video

Site24x7

Read more about Monitor your mobile apps with Site24x7 Mobile real user monitoring (RUM)

Fix It Fast: Tips, Tricks & Tools for Sumo Logic Success -- Customer Brown Bag -- August 21st, 2025

Aug 21, 2025 By Sumo Logic, Inc. In Sumo Logic

Led by Sumo Logic experts Andrei and Austin, this session dives into troubleshooting dashboards, silent failure scenarios, and missing collector data—helping your team spot blind spots, catch incidents you never knew you missed, close visibility gaps, and ensure dashboards reflect the full picture for faster resolution.

View Video

Sumo Logic

Read more about Fix It Fast: Tips, Tricks & Tools for Sumo Logic Success -- Customer Brown Bag -- August 21st, 2025

The Smartest Member of Your Developer Ecosystem: Introducing the Mezmo MCP Server

Aug 21, 2025 By Mezmo In Mezmo

Building a great developer experience is about more than just the code. It’s about creating a unified ecosystem where your tools work together seamlessly. That’s been the vision behind our work on the Mezmo MCP Server, and I’m excited to share it with you. At its core, the MCP Server is a universal remote for your data pipeline.

Read Post

Mezmo

Read more about The Smartest Member of Your Developer Ecosystem: Introducing the Mezmo MCP Server

How Much Time Could You Save with Network Config Automation?

Aug 21, 2025 By ScienceLogic In ScienceLogic

If you’re a network admin reading this, you already know the feeling. You’ve probably lost track of how many hours you spend doing the same repetitive tasks week after week, month after month. Backing up configs manually. Rolling back failed changes at 2 AM. Hunting down that one switch that somehow lost its configuration. Compiling compliance reports that should take minutes but somehow eat up your entire afternoon. Yet all those “quick” tasks add up.

Read Post

ScienceLogic

Read more about How Much Time Could You Save with Network Config Automation?

Visualize Salesforce data in Grafana: flexible query options, powerful data correlations, and more

Aug 21, 2025 By Kristin Knapp In Grafana

As part of our big tent philosophy at Grafana Labs, we think you should be able to dig into your data and find meaningful insights — wherever that data happens to live. For many of our users, that data lives in Salesforce, the cloud-based customer relationship management (CRM) platform. In this post, we’ll take a closer look at how you can use the Salesforce Enterprise data source for Grafana to quickly and easily visualize your Salesforce data using Grafana dashboards.

Read Post

Grafana

Read more about Visualize Salesforce data in Grafana: flexible query options, powerful data correlations, and more

What Is Vector Search? Difference Between Vector & Semantic Search Explained [Quick Question Ep. 5]

Aug 21, 2025 By Elastic In Elastic

What is vector search? In this breakdown, learn how vector search leverages machine learning to capture the meaning and context of unstructured data by transforming it into a numeric representation that is stored in a vector database. This video also explains the difference between sparse and dense embeddings, and how vector search differs from semantic search and lexical search.

View Video

Elastic

Read more about What Is Vector Search? Difference Between Vector & Semantic Search Explained [Quick Question Ep. 5]

Master Supply Chain Resilience with Splunk

Aug 21, 2025 By Splunk In Splunk

Strengthen your supply chain with unified, real-time visibility from Splunk. Monitor, trace, and optimize every stage from sourcing to delivery to reduce risk, ensure quality, and improve resilience.

View Video

Splunk

Read more about Master Supply Chain Resilience with Splunk

How to Reduce Downtime by 90% with Proactive Monitoring Strategies

Aug 21, 2025 By Nuno Tomas In isDown

Downtime costs businesses an average of $5,600 per minute according to Gartner research. For many organizations, even a few hours of unplanned outages can mean lost revenue, damaged reputation, and frustrated customers. The good news? You can reduce downtime by up to 90% by implementing the right proactive monitoring strategies.

Read Post

isDown

Read more about How to Reduce Downtime by 90% with Proactive Monitoring Strategies

How to go from ingestion to insights in 10 minutes

Aug 21, 2025 By Ofri Grushka In Coralogix

When assessing SaaS observability solutions, customers often explore features that are built into the platform, but there ia a whole collection of deployable libraries across all SaaS vendors. In Coralogix, we lead the way in deployable assets, with 4400+ alerts, dashboards, parsing rules, metric generation rules and more. But why should you care about these deployable assets, and why do they accelerate insight generation so profoundly?

Read Post

Coralogix

Read more about How to go from ingestion to insights in 10 minutes

Common Issues in PHP Applications and How Monitoring Tools Help

Aug 21, 2025 By Mohana Ayeswariya J In Atatus

PHP has been powering the web for over two decades and continues to be a dominant server-side scripting language. From small business websites to massive enterprise applications, PHP sits at the heart of many critical digital experiences. "But with great popularity comes great responsibility and challenges" Performance bottlenecks, security vulnerabilities, and inefficient coding practices can cripple applications, frustrate end-users, and burn out engineering teams.

Read Post

Atatus

Read more about Common Issues in PHP Applications and How Monitoring Tools Help

Your Help Desk Can Be a Powerful Ally in Maintaining HIPAA Compliance

Aug 21, 2025 By Staff Contributor In SolarWinds

Each industry has standards and regulatory compliance concerns. The health care industry arguably has the most well-known, thanks to the Health Insurance and Portability Accountability Act (HIPAA) and its efforts to keep electronic protected health information (ePHI) safe. HIPAA compliance is essential for organizations that store, maintain or transmit ePHI and staying on top of HIPAA regulations can be challenging.

Read Post

SolarWinds

Read more about Your Help Desk Can Be a Powerful Ally in Maintaining HIPAA Compliance

Announcing Monitor Grouping in UptimeRobot

Aug 21, 2025 By Tomas Koprusak In Uptime Robot

We’re excited to introduce Monitor Grouping, a new way to organize your monitors directly from the UptimeRobot dashboard. This feature makes it easier to keep track of large sets of monitors and quickly see the health of related services at a glance. Monitor Grouping is available on Solo, Team, and Enterprise plans starting today. Downtime happens. Get notified! Join the world's leading uptime monitoring service with 2.1M+ happy users. Register for FREE.

Read Post

Uptime Robot

Read more about Announcing Monitor Grouping in UptimeRobot

APM Logs: How to Get Started for Faster Debugging

Aug 21, 2025 By Anjali Udasi In Last9

When application performance monitoring detects a spike in latency or error rates, the immediate challenge is determining the underlying cause. APM logs address this by correlating performance metrics with the specific log events that occurred at the same time. Instead of switching between monitoring dashboards and manually searching through log files, APM log correlation consolidates both views.

Read Post

Last9

Read more about APM Logs: How to Get Started for Faster Debugging

How To Visualize Your Sales Data: Salesforce Enterprise Data Source for Grafana

Aug 21, 2025 By Grafana In Grafana

Learn how to monitor your organizations sales performance by connecting Salesforce with Grafana! In this quick-start tutorial, Shawn Pitts walks you through everything — from setting up your Salesforce connection to visualizing real-time data in Grafana. Whether you’re on a free Grafana Cloud plan, a paid tier, or running Grafana Enterprise on-prem, you’ll see exactly how to unlock powerful dashboards for your team.

View Video

Grafana

Read more about How To Visualize Your Sales Data: Salesforce Enterprise Data Source for Grafana

Creating & Scheduling SLA Reports

Aug 21, 2025 By Uptime Website Monitoring In uptime

Learn how to create an SLA report on Uptime.com to track uptime and performance. This guide walks you through configuring the report and selecting the right checks and date range. Get detailed metrics on uptime and response times, helping you meet service goals and client expectations with ease. Scheduled SLA reports in Uptime.com let you automatically send PDF or XLS reports to up to 100 recipients, including both Uptime.com users and external email addresses. You can schedule reports for daily, weekly, monthly, quarterly, or yearly delivery.

View Video

uptime

Monitoring

Read more about Creating & Scheduling SLA Reports

Why SaaS Startups Need PHP Application Monitoring for Scalability

Aug 21, 2025 By Pavithra Parthiban In Atatus

For SaaS startups, speed and reliability are everything. A few seconds of downtime or slow performance can turn away users, impact sign-ups, and directly affect revenue. Unlike traditional apps, SaaS platforms operate on an always-on model, which means performance and scalability must be built in from day one. PHP remains one of the most popular choices for startups due to its flexibility, cost-effectiveness, and fast development cycle.

Read Post

Atatus

Read more about Why SaaS Startups Need PHP Application Monitoring for Scalability

Energy Monitoring and Targeting: Saving Costs Through Proactive Billing Software

Aug 21, 2025 By OpsMatters In OpsMatters

In today's energy-conscious world, businesses and utility providers alike are seeking smarter ways to manage costs, improve efficiency, and promote sustainability. Energy monitoring and targeting (M&T) has emerged as one of the most effective strategies to achieve these goals. By combining accurate monitoring with actionable insights, organizations can identify inefficiencies, reduce waste, and lower utility expenses.

Read Post

OpsMatters

Read more about Energy Monitoring and Targeting: Saving Costs Through Proactive Billing Software

Sponsored Post

Status Page Aggregator: How To Stay Ahead of Outages in 2025

Aug 20, 2025 By StatusGator In StatusGator

Outages happen, and they often catch us off guard. If your team relies on multiple status pages to track cloud infrastructure, SaaS tools, or distributed systems, staying ahead of outages is essential. It's far better to know about issues with your services or dependencies before your users do, so you can act fast and stay in control. That's where a status page aggregator like StatusGator comes in.

Read Post

StatusGator

Read more about Status Page Aggregator: How To Stay Ahead of Outages in 2025

Don't Just Monitor SLAs - Validate Them Automatically

Aug 20, 2025 By Kristopher Sandoval In Speedscale

Service level agreements (SLAs) are the contractual backbone between customers and technology vendors, outlining expected service availability, performance metrics, and remedies like service credits when service providers fail to meet agreed-upon service levels. This service agreement assures both the technical quality as well as the service quality of the services provided, and underpins the value perspective of the client.

Read Post

Speedscale

Read more about Don't Just Monitor SLAs - Validate Them Automatically

Log Files Explained: Types, Uses, and Best Practices for IT Teams

Aug 20, 2025 By Patrick Sites In LogicMonitor

Every system in your environment—cloud, on-prem, or hybrid—generates log files. They capture everything from user actions to system failures, security events, and performance issues. But with so many log types and so much raw data, it’s easy to get buried in noise and miss what matters.

Read Post

LogicMonitor

Read more about Log Files Explained: Types, Uses, and Best Practices for IT Teams

Every second of digital downtime has a cost.

Aug 20, 2025 By Catchpoint In Catchpoint

When a site disruption hits, businesses face immediate and visible fallout: customer churn spikes, and revenue takes a direct hit. If customers can’t transact, your bottom line suffers, plain and simple. This insight comes from a recent Forrester survey commissioned by Catchpoint, where respondents revealed the real business impacts of Internet disruptions.

View Video

Catchpoint

Monitoring

Read more about Every second of digital downtime has a cost.

Extending Unit-Testing on Icinga2

Aug 20, 2025 By Johannes Schmidt In Icinga

Obviously nobody is disagreeing with this. It’s just that during ongoing development and while focusing on features and bug-fixes, testing often falls behind in priority, especially when developers would need to write tests for existing or legacy code, teams can be hesitant to invest the time. C++ applications have to run a diverse set up target environments, varying in OS, compilers, C/C++ standard libraries and dependency versions.

Read Post

Icinga

Read more about Extending Unit-Testing on Icinga2

React Native performance tactics: Modern strategies and tools

Aug 20, 2025 By Simon Grimm In Sentry

This is a guest post by Simon Grimm, founder of Galaxies.dev, a platform dedicated to helping developers master React Native through hands-on courses, expert guidance, and personal support. React Native performance matters more in 2025 than ever before. With the New Architecture now stable and apps competing against lightning-fast native experiences, users expect sub-second load times and buttery-smooth 60fps interactions.

Read Post

Sentry

Read more about React Native performance tactics: Modern strategies and tools

What's Hiding in Your Wiring Closets?

Aug 20, 2025 By Yann Guernion In Broadcom

Let's be provocative for a moment. You probably don't know what is actually on your network. You have the CMDB, spreadsheets, diagrams from the last big refresh, and the institutional knowledge of your veteran engineers. But is this information accurate? Is it complete? Answering that question with absolute certainty can be difficult for many who manage complex IT environments.

Read Post

Broadcom

Read more about What's Hiding in Your Wiring Closets?

The Real Cost of Choosing the Wrong Database

Aug 20, 2025 By Allyson Boate In InfluxData

Data is more than a record of what happened—it shapes what happens next. Across industries, connected devices continuously stream time-stamped data that reflects the current state of machines, environments, and systems. This steady flow gives organizations a live view of operations and the ability to catch issues early, adjust quickly, and operate more efficiently. However, capturing data alone does not create value.

Read Post

InfluxData

Read more about The Real Cost of Choosing the Wrong Database

Supercharge your Android app

Aug 20, 2025 By Ofri Grushka In Coralogix

In today’s technological landscape, mobile applications are on the rise, boosting efficiency, portability and accessibility in daily life, across a spectrum of industries, from financial services to food delivery. As mobile apps become more essential, the quality of their features, performance, and user experience is critical.l.

Read Post

Coralogix

Read more about Supercharge your Android app

Advanced SLO Alerting: Tracking burn rate

Aug 20, 2025 By Chris Cooney In Coralogix

Service Level Objectives (SLOs) are a cornerstone of modern software engineering. Defining alerts around SLOs has become standard practice, but many of the common patterns in use today miss the early signals that can tell a customer before an SLO breach has occurred.

Read Post

Coralogix

Read more about Advanced SLO Alerting: Tracking burn rate

vmanomaly Deep Dive: Smarter Alerting with AI (Tech Talk Companion)

Aug 20, 2025 By Marc Sherwood In VictoriaMetrics

I was thrilled to host our latest tech talk, where we got to do a deep dive into vmanomaly with the best possible guests: Fred Navruzov, the actual team lead for the product, and Co-Host, Matthias Palmersheim. We covered a ton of ground, from high-level concepts to the nitty-gritty of configuration. For everyone who couldn’t make it, I wanted to share my personal recap of the most important technical takeaways from our conversation.

Read Post

VictoriaMetrics

Read more about vmanomaly Deep Dive: Smarter Alerting with AI (Tech Talk Companion)

Proactive Observability - Predictive Analytics Models and Algorithms for IT Systems and Metrics

Aug 20, 2025 By Srividhya Seshachalam In eG Innovations

Predictive Analytics Models and Algorithms are an important component of eG Enterprise’s AIOps engine for proactive observability. eG Enterprise collects and analyses metrics, events, logs and traces and the data including real usage data is used to make intelligent predictions to forecast future system behavior and IT resource metric levels.

Read Post

eG Innovations

Read more about Proactive Observability - Predictive Analytics Models and Algorithms for IT Systems and Metrics

A Detailed Guide to Azure Kubernetes Service Monitoring

Aug 20, 2025 By Faiz Shaikh In Last9

Azure Kubernetes Service (AKS) continuously generates a high volume of telemetry, ranging from node-level CPU and memory usage to request latencies and error rates within individual pods and services. Without a structured monitoring strategy, this flood of metrics can easily become noise, leaving teams blind to early warning signs. Effective monitoring in AKS is about identifying the right signals, correlating them across layers, and acting before they impact application performance or cluster stability.

Read Post

Last9

Read more about A Detailed Guide to Azure Kubernetes Service Monitoring

Your Apps Are Green. Your Infrastructure Is Dying.

Aug 20, 2025 By Nishant Modak In Last9

Launch Week Day 3: Introducing Discover Infrastructure Your dashboard looks perfect. APIs responding in 80ms, background jobs processing smoothly, error rates at 0.02%. Everything's green. Then production breaks. "Why is checkout so slow?" "The payment service keeps timing out!" You run kubectl get pods and discover payment-service pods restarting every 3 minutes due to OOM kills. Then you check your database host—CPU at 98% because someone forgot the new ML training job runs there too.

Read Post

Last9

Read more about Your Apps Are Green. Your Infrastructure Is Dying.

How we saved $1.5 million per year with Cloud Cost Management

Aug 20, 2025 By Qasim Jamal In Datadog

In collecting and analyzing trillions of events each day, Datadog ingests a massive amount of data. We spend substantially to process and store this data in the cloud, and teams across the organization are committed to optimizing the return on this investment. To this end, our FinOps analysts have always tracked the costs of delivering our services and identified opportunities for savings.

Read Post

Datadog

Read more about How we saved $1.5 million per year with Cloud Cost Management

Datadog governance 101: From chaos to consistency

Aug 20, 2025 By David Iparraguirre In Datadog

As your organization scales, managing observability resources and usage becomes increasingly important. More users and teams mean more dashboards, tags, API keys, and costs to manage. The job of keeping track of these resources and ensuring that they’re compliant can quickly grow in complexity.

Read Post

Datadog

Read more about Datadog governance 101: From chaos to consistency

Discover Infrastructure: Kubernetes & Hosts - Launch Week / Day 03

Aug 20, 2025 By Last9 - Monitoring for AI Native SDLC In Last9

Stop debugging infrastructure issues across multiple dashboards. See how Last9's Discover Infrastructure monitors K8s pods and traditional hosts together—with resource analysis, pod-level debugging, and AI that correlates app problems to infrastructure root causes. One setup (K8s + host monitoring) → Complete infrastructure visibility that connects to your services and jobs. No more blind spots between application performance and underlying resources.

View Video

Last9

Read more about Discover Infrastructure: Kubernetes & Hosts - Launch Week / Day 03

How our engineers use AI for coding (and where they refuse to)

Aug 20, 2025 By Elizabeth Mathew In SigNoz

Okay, picture this: if you drew a Venn diagram of folks in tech right now, it'd probably look something like this: You'll probably find yourself in one of those circles, right? I’m guilty of falling in the intersection! Because let's be real, the 'will AI replace developers by 20xx?' debate is everywhere – Reddit, Hacker News, team Slack and even your local cafe. Well, we decided to go straight to the source.

Read Post

SigNoz

Read more about How our engineers use AI for coding (and where they refuse to)

Secure credential storage for your observability stack: Introducing secrets management in Grafana Cloud

Aug 20, 2025 By Michael Mandrus In Grafana

The more your infrastructure grows, the more likely you are to face a familiar challenge: where to safely store the API keys, passwords, and tokens that power your observability stack. Unfortunately, a common response to this dilemma is to scatter credentials across configurations, making security and management of secrets increasingly complex.

Read Post

Grafana

Read more about Secure credential storage for your observability stack: Introducing secrets management in Grafana Cloud

Grafana Cloud updates: onboard teams with new AI-powered tooling, secrets management for enhanced security, and more

Aug 20, 2025 By Kristin Knapp In Grafana

We consistently roll out helpful updates and fun features in Grafana Cloud, our fully managed observability platform powered by the open source Grafana LGTM Stack (Loki for logs, Grafana for visualization, Tempo for traces, and Mimir for metrics). In case you missed them, here’s our monthly round-up of the latest and greatest Grafana Cloud updates. You can also read about all the features we add to Grafana Cloud in our What’s New in Grafana Cloud documentation.

Read Post

Grafana

Read more about Grafana Cloud updates: onboard teams with new AI-powered tooling, secrets management for enhanced security, and more

Nginx Logs & Performance Monitoring with Loki and Telegraf | MetricFire

Aug 20, 2025 By Benjamin Pitts In MetricFire

When a web service slows down or errors spike, metrics can tell you what changed (active connections rise, error rate increases), but the root cause can sometimes be found in your logs (which IPs are hammering POST endpoints, 4XX/5XX occurrences). Put the two together and you get the full observability picture. Time-series metric trends to spot incidents, and line-level details to fix them fast.

Read Post

MetricFire

Read more about Nginx Logs & Performance Monitoring with Loki and Telegraf | MetricFire

Incident post-mortems: the complete, blameless guide

Aug 20, 2025 By Leo Baecker In Hyperping

Most companies run post-mortems like autopsies. They dissect the corpse, assign blame, and file it away. The body count keeps rising. Here's what actually works: post-mortems as learning machines. Systems thinking over finger-pointing. Patterns over pain. What you'll get: A copy-paste template, real metrics that matter, and the mindset shift that turns outages into intelligence. Who this is for: SRE leads tired of repeating incidents. Engineering managers who want learning over theater.

Read Post

Hyperping

Read more about Incident post-mortems: the complete, blameless guide

Pioneering DEX Agents and Benchmarks

Aug 19, 2025 By Samuele Gantner In Nexthink

At Nexthink, our focus is Digital Employee Experience (DEX), it’s all we do, and all we aim to be the very best at. Today, we have a unique opportunity to deliver the world’s most advanced DEX models and agents, fine-tuned and trained specifically on real DEX use cases from our thousands of users. This matters because, in our vision, most IT operations will eventually be fully automated by AI and technology.

Read Post

Nexthink

Read more about Pioneering DEX Agents and Benchmarks

A Practical Guide for Developers: Preventing PHP Mistakes with Performance Monitoring

Aug 19, 2025 By Pavithra Parthiban In Atatus

Performance is one of the most critical aspects of any PHP application. A few seconds of delay or an unnoticed bottleneck can cause users to leave your site, increase bounce rates, and reduce business conversions. For developers, ensuring top performance is not always easy. Small coding mistakes, inefficient queries can accumulate into major problems over time. Without visibility into what’s happening inside the application, it becomes difficult to identify the root cause of slowdowns or failures.

Read Post

Atatus

Read more about A Practical Guide for Developers: Preventing PHP Mistakes with Performance Monitoring

How ScienceLogic Supports Zero Trust and FedRAMP-Secure Operations

Aug 19, 2025 By ScienceLogic In ScienceLogic

Cybersecurity leaders across the public sector are facing a moment of reckoning. Whether at the Department of Defense, a federal agency, or a public university, IT teams are under pressure to defend sprawling infrastructure, detect fast-moving threats, and prove compliance across multiple frameworks—all with fewer resources and tighter timelines. This challenge has accelerated interest in Zero Trust Architecture (ZTA), a paradigm shift in how we think about security.

Read Post

ScienceLogic

Read more about How ScienceLogic Supports Zero Trust and FedRAMP-Secure Operations

Tracking Errors in Absinthe for Elixir with AppSignal

Aug 19, 2025 By Aestimo Kirina In AppSignal

GraphQL provides a powerful approach to building APIs, and Absinthe is the leading GraphQL implementation for Elixir applications. While GraphQL offers many benefits, it can introduce a set of errors and performance bottlenecks that might be challenging to track and debug. In this article, you’ll learn how to use AppSignal to monitor, debug, and resolve errors in your Absinthe-based GraphQL API.

Read Post

AppSignal

Read more about Tracking Errors in Absinthe for Elixir with AppSignal

From SEO to AEO: Why Web Performance Is the Key to AI Search Success

Aug 19, 2025 By Piril Kavlak In Catchpoint

Search isn’t what it used to be. The way people discover information online is shifting. Instead of clicking through search results, many now ask AI answer engines like ChatGPT and Perplexity to do the research for them. In March 2025, 13.1% of Google desktop searches featured AI Overviews— doubling from over 6% in January, according to Semrush analysis of 10+ million queries.

Read Post

Catchpoint

Read more about From SEO to AEO: Why Web Performance Is the Key to AI Search Success

How PHP Monitoring Helps Prevent Bugs in Production?

Aug 19, 2025 By Mohana Ayeswariya J In Atatus

When a PHP application hits production, the stakes are high. Even a minor bug can escalate into downtime, data loss, or frustrated customers. For developers, DevOps teams, and SREs, the real challenge is not just writing efficient code but ensuring that the application continues to run flawlessly in production. This is where PHP monitoring tools play a critical role.

Read Post

Atatus

Read more about How PHP Monitoring Helps Prevent Bugs in Production?

Discover Jobs - Launch Week / Day 02

Aug 19, 2025 By Last9 - Monitoring for AI Native SDLC In Last9

Stop debugging background jobs with docker logs and prayer. See how Last9's Discover Jobs monitors async operations like APIs—with P95 latencies, error breakdowns, and operation-level traces for every job type.

View Video

Last9

Read more about Discover Jobs - Launch Week / Day 02

Part Two - Event Intelligence vs. AIOps: Key Differences, When to Use Each and Why

Aug 19, 2025 By david.arrowsmith In Interlink

The IT environments of large enterprises have become so complex that operational teams have turned to two solution categories in particular to help them improve visibility and gain faster incident response, automate and enable more effective decision-making.

Read Post

Interlink

Read more about Part Two - Event Intelligence vs. AIOps: Key Differences, When to Use Each and Why

Kafka Performance Crisis: How We Scaled OpenTelemetry Log Ingestion by 150%

Aug 19, 2025 By Dakota Paasman In ObservIQ

When your telemetry pipeline starts falling behind, the countdown to production impact has already begun. One Bindplane customer operating a large-scale log ingestion pipeline built on the OpenTelemetry Collector and Kafka hit that breaking point. Instead of keeping pace with incoming data, their pipeline was ingesting just 12,000 events per second (EPS) per partition/collector—and this Kafka topic had 16 partitions. In aggregate, that was roughly 192K EPS.

Read Post

ObservIQ

Read more about Kafka Performance Crisis: How We Scaled OpenTelemetry Log Ingestion by 150%

Improving the Developer Experience by Monitoring Third-Party Outages

Aug 19, 2025 By Hrishikesh Barua In IncidentHub

The role of third-party SaaS and cloud services in the modern software development stack needs no explanation. Primarily due to the ease of setting up and hooking them together, they make the software development lifecycle (SDLC) much easier than it was 10 years ago. No more managing the overhead of installing, configuring, maintaining, backing up, and scaling of source code repos, virtual machines, and CI/CD systems. Some services don't have any in-house options, e.g. payment gateways.

Read Post

IncidentHub

Read more about Improving the Developer Experience by Monitoring Third-Party Outages

What is Real User Monitoring

Aug 19, 2025 By Anjali Udasi In Last9

Real User Monitoring (RUM) measures how real users interact with your application in production. Unlike synthetic monitoring, which relies on scripted tests, RUM collects data from actual sessions. This means performance is observed across different devices, networks, and usage patterns. The result is a clear view of how the application behaves under real conditions, where latency is introduced, which features take longer to load, and at what points users drop off.

Read Post

Last9

Read more about What is Real User Monitoring

AI-Driven Application Monitoring with Checkly and Claude Code

Aug 19, 2025 By Checkly In Checkly

In this webinar, Stefan Judis (Developer Relations at Checkly) and Dan Giordano (VP of Marketing at Checkly) dive into how LLMs and AI tools can be used with application monitoring. You’ll see a live demos of integrating Claude Code, Playwright MCP, and Checkly’s Monitoring as Code. ⸻ Timestamps ⸻ Resources & Next Steps ⸻ Subscribe for more sessions on application reliability, testing, and AI-powered DevOps!

View Video

Checkly

Read more about AI-Driven Application Monitoring with Checkly and Claude Code

Why (Enriched) Flow Data Belongs in Every Network Operator's Daily Toolbox

Aug 19, 2025 By Eric Hian-Cheong In Kentik

Flow data has always held immense potential, but was often inaccessible because it lacked context and speed. Kentik removes that friction by automatically enriching flow with human-readable context, making it a daily driver for everyone, not just specialists.

Read Post

Kentik

Read more about Why (Enriched) Flow Data Belongs in Every Network Operator's Daily Toolbox

Your APIs Are Green. Your Background Jobs Are Dying.

Aug 19, 2025 By Nishant Modak In Last9

Launch Week Day 2: Introducing Discover Jobs Your dashboard looks perfect. APIs responding in 80ms. Error rates at 0.02%. Kubernetes pods healthy. Everything's green. Then Slack explodes: "Why didn't my invoice generate?" "Where's my password reset email?" "The data export I requested yesterday is still processing?" You check your job queue. Sidekiq dashboard shows 47,000 jobs processed today. Redis looks fine. Workers are running. But somehow, your business logic is silently falling apart.

Read Post

Last9

Read more about Your APIs Are Green. Your Background Jobs Are Dying.

Ultimate Guide to PCI DSS Compliance Requirements

Aug 19, 2025 By Staff Contributor In SolarWinds

When you make a credit card transaction, the last thing you want to think about is your data getting stolen. Fortunately, credit card companies put several measures in place to make sure this doesn’t happen. For businesses dealing with customer payments, PCI DSS compliance measures are a simple and necessary step in making sure customer credit card data is well protected. Ensuring PCI compliance can be a complex undertaking.

Read Post

SolarWinds

Read more about Ultimate Guide to PCI DSS Compliance Requirements

10 Best PCI Compliance Software and PCI DSS Tools

Aug 19, 2025 By Staff Contributor In SolarWinds

PCI DSS is an industry security standard existing primarily to minimize the risk of debit and credit card data being lost. This is in the interest of both the customer and the merchant, because if data is lost or misused, the merchant could be subject to legal action. To protect yourself and your customers, you first need to understand the six PCI DSS control objectives and how to meet them.

Read Post

SolarWinds

Read more about 10 Best PCI Compliance Software and PCI DSS Tools

IT Security and Compliance Guide

Aug 19, 2025 By Staff Contributor In SolarWinds

This guide provides a comprehensive overview of IT compliance and the part it plays in IT security. It will also help you choose the right compliance reports tool for your company. As you get started, SolarWinds Security Event Manager (SEM) comes highly recommended as a near-automated IT security compliance solution that enables you to verify IT compliance and helps you perform many compliance-related IT operations.

Read Post

SolarWinds

Read more about IT Security and Compliance Guide

Early Warning Signals now available via Webhooks

Aug 19, 2025 By Valeria Kurolapova In StatusGator

We’re excited to announce that Early Warning Signals — proactive alerts that notify you of potential service issues before official acknowledgment—are now fully supported in StatusGator Webhooks. With Early Warning Signals delivered through your webhook integrations, you can detect early signs of trouble and act before a full incident is posted. This means more time to prepare, fewer surprises, and better uptime for your customers.

Read Post

StatusGator

Read more about Early Warning Signals now available via Webhooks

How to use AI tools more effectively: Tips from Datadog Engineers

Aug 19, 2025 By Bowen Chen In Datadog

A growing number of engineering organizations have adopted or are trialing agentic AI-based coding tools and LLMs in an effort to increase their teams’ development velocity. If you’re a developer, this means you’ve likely had to try out different agentic tools and models and determine how to best incorporate them into your existing workflows.

Read Post

Datadog

Read more about How to use AI tools more effectively: Tips from Datadog Engineers

How to monitor Claude usage and costs: introducing the Anthropic integration for Grafana Cloud

Aug 19, 2025 By Ishan Jain In Grafana

Generative AI is becoming a core part of modern applications, making it essential to monitor and manage how these services are used. That’s why, today, we’re excited to introduce the Anthropic integration for Grafana Cloud, a new solution that lets you connect directly to the Anthropic Usage and Cost API from within Grafana Cloud.

Read Post

Grafana

Read more about How to monitor Claude usage and costs: introducing the Anthropic integration for Grafana Cloud

The Observability Problem Isn't Data Volume Anymore-It's Context

Aug 19, 2025 By Mezmo In Mezmo

For years, the observability industry has been obsessed with one thing: data volume. We've built incredible pipelines, optimized agents, and scaled storage to handle petabytes of logs, metrics, and traces. The promise was simple: collect more data, get more visibility. But we've hit a wall.

Read Post

Mezmo

Read more about The Observability Problem Isn't Data Volume Anymore-It's Context

Elastic Powers GitHub's Seamless Developer Experience

Aug 19, 2025 By Elastic In Elastic

David Tippet, Search Engineer at GitHub, shares how Elastic powers GitHub’s massive search platform and enables a seamless developer experience. He explains how GitHub balances AI-driven semantic search with traditional keyword search, ensuring accuracy for millions of diverse users, from engineers to security researchers.

View Video

Elastic

Read more about Elastic Powers GitHub's Seamless Developer Experience

Why Alert Fatigue is a Major Challenge in Observability (2025 Survey Insights) | Grafana Labs

Aug 19, 2025 By Grafana In Grafana

Over 1,200 engineers, leaders, and teams shared their biggest observability challenges in our third annual Observability Survey — and the results are in. In this video, Marc Chipouras (Head of Emerging Products, Grafana Labs) breaks down the top insights: Thanks for watching!

View Video

Grafana

Read more about Why Alert Fatigue is a Major Challenge in Observability (2025 Survey Insights) | Grafana Labs

How to Build a Strategic Roadmap for Site Reliability Engineering Implementation

Aug 19, 2025 By OpsMatters In OpsMatters

Getting your site reliability engineering solutions in place can seriously boost how your systems perform. But implementing site reliability engineering (SRE) isn't a simple flip of a switch-it's a process. If you want to keep your systems running smoothly, with minimal downtime and top-notch performance, you need a solid, strategic plan. This roadmap should guide you step-by-step, from setting clear goals to constantly improving your processes.

Read Post

OpsMatters

Read more about How to Build a Strategic Roadmap for Site Reliability Engineering Implementation

Major Opportunities and Technologies in Business HVAC Operation

Aug 19, 2025 By OpsMatters In OpsMatters

The backbone of comfort, energy efficiency, and indoor air quality of buildings depends on commercial HVAC systems. Efficient environmental conditions in office buildings, manufacturing plants, and much more are crucial to the functionality of such systems. Yet, commercial HVAC operations have their challenges as well, and a new wave of technologies is enabling operators to meet them.

Read Post

OpsMatters

Read more about Major Opportunities and Technologies in Business HVAC Operation

Agent-Gateway Pattern with the OTel Collector Explained in Kubernetes

Aug 18, 2025 By Bindplane In ObservIQ

Check out the full @bindplane community call in August.

View Video

ObservIQ

Read more about Agent-Gateway Pattern with the OTel Collector Explained in Kubernetes

Chaos Testing the OTel Collector's AWS S3 Receiver

Aug 18, 2025 By Bindplane In ObservIQ

Check out the full @bindplane community call in August.

View Video

ObservIQ

Read more about Chaos Testing the OTel Collector's AWS S3 Receiver

Discover Services - Launch Week / Day 01

Aug 18, 2025 By Last9 - Monitoring for AI Native SDLC In Last9

Stop playing detective during incidents. See how Last9's Discover Services automatically builds your service map from traces, shows real-time dependencies, and lets you debug with both conversational AI and visual dashboards.

View Video

Last9

Read more about Discover Services - Launch Week / Day 01

Investigate Problems With Mobile Frontend Observability

Aug 18, 2025 By Honeycomb In Honeycomb

You can use your mobile tools to debug errors, but are you really looking at the root cause? With end-to-end observability, powered by Honeycomb's Mobile Android and iOS SDKs, you can see everything! We'll show you how to start from a mobile launchpad, view the errors, select a trace, and find that root cause.

View Video

Honeycomb

Read more about Investigate Problems With Mobile Frontend Observability

Session Replay - See Errors Through Your Users' Eyes

Aug 18, 2025 By Rollbar In Rollbar

Introducing Session Replay from Rollbar — the fastest way to go from error to insight. Watch how you can see exactly what your users experienced leading up to an error, without switching tools. Real-time error monitoring + replay, all in one screen.

View Video

Rollbar

Monitoring

Read more about Session Replay - See Errors Through Your Users' Eyes

Scale Your Monitoring Solution With the VictoriaMetrics Ecosystem

Aug 18, 2025 By VictoriaMetrics In VictoriaMetrics

When it comes down to scaling time series monitoring solutions things can get messy. That’s one of the reasons why VictoriaMetrics, a Silver member of the Cloud Native Computing Foundation (CNCF), started its journey some years ago. It is a simple, reliable and efficient set of Observability Solutions that's been adopted by many organizations. It's open source, with a strong community behind it, with enterprise and managed (Cloud) options for those who need support. VictoriaMetrics plays well with many standards, including Grafana and OpenTelemetry. Apart from that, in case you didn’t know, VictoriaLogs is the new kid in the block that's seriously outperforming other solutions. In this presentation, we’ll present the VictoriaMetrics Open Source projects and how they differ from other solutions, especially when it comes to scaling from single small setups to massive cluster deployments. Come learn how VictoriaMetrics projects can help to ease Observability!

View Video

VictoriaMetrics

Monitoring

Read more about Scale Your Monitoring Solution With the VictoriaMetrics Ecosystem

What is the User Lifecycle & How Can IT Teams Manage It?

Aug 18, 2025 By Ben Botti In Auvik

It’s Monday morning, and a new hire is walking into the office for their first day. Before they can dive into the work, they need access to email, project management tools, cloud storage, and a dozen other SaaS apps their role depends on. IT has already been hard at work behind the scenes, provisioning accounts, assigning permissions, and making sure everything is ready the moment they sign in.

Read Post

Auvik

Read more about What is the User Lifecycle & How Can IT Teams Manage It?

AI-Driven Application Monitoring with Claude Code, Playwright, and Checkly

Aug 18, 2025 By Checkly In Checkly

View Video

Checkly

Read more about AI-Driven Application Monitoring with Claude Code, Playwright, and Checkly

Run Checkly Monitors Against Multiple Environments

Aug 18, 2025 By Checkly In Checkly

Learn how to run Playwright tests across different environments without rewriting them. This tutorial covers managing environment variables in Checkly for API and browser checks, handling global and group-specific settings, and integrating with CI/CD processes. Discover the best practices for setting up environment variables, duplicating test groups, and customizing alerts to ensure your checks are environment-specific.

View Video

Checkly

Read more about Run Checkly Monitors Against Multiple Environments

Large-Scale Logging Made Easy

Aug 18, 2025 By VictoriaMetrics In VictoriaMetrics

Logging at scale is a common source of infrastructure expenses and frustration. While logging is something any organization does, there is still no silver bullet or just a simple and scalable solution without tradeoffs.

View Video

VictoriaMetrics

Read more about Large-Scale Logging Made Easy

Honeycomb Launches Integration With the Anthropic Usage and Cost API

Aug 18, 2025 By Austin Parker In Honeycomb

If your organization is anything like ours, then you’ve probably embraced using large language models like Claude. Just last week, we gave all Honeycomb employees access to Claude. Now, developers can generate AI-assisted code, product managers can perform analysis on customer usage trends, marketers can test messaging, sales can do customer discovery and we are shipping AI-powered features to improve user experience.

Read Post

Honeycomb

Read more about Honeycomb Launches Integration With the Anthropic Usage and Cost API

Choosing the Right PHP Monitoring Tools: A Practical Guide

Aug 18, 2025 By Pavithra Parthiban In Atatus

When it comes to building fast, reliable, and user-friendly PHP applications, performance and stability are everything. A small slowdown in load times, a memory leak, or unhandled errors can frustrate users, impact revenue, and harm your brand’s reputation. This is why PHP Application Monitoring has become a necessity for businesses of all sizes.

Read Post

Atatus

Read more about Choosing the Right PHP Monitoring Tools: A Practical Guide

Monitor Claude usage and cost data with Datadog Cloud Cost Management

Aug 18, 2025 By Patrick Krieger In Datadog

Managing the cost of foundation models is a critical challenge as AI adoption surges, particularly for teams using powerful models like Anthropic's Claude Opus and Claude Sonnet. Growing teams generate larger prompt volumes and escalating model complexity, making it difficult to have clear visibility, accountability, and control of cloud AI spending.

Read Post

Datadog

Read more about Monitor Claude usage and cost data with Datadog Cloud Cost Management

What is SNMP (Simple Network Management Protocol)?

Aug 18, 2025 By Jeff Edwards In WhatsUp Gold

The Simple Network Management Protocol (SNMP) sure does pack a punch for something with “simple” in its name, as it literally provides the lifeblood of network monitoring and device communications. Network admins rely heavily on SNMP because nearly every technology manufacturer supports the protocol. And, in turn, it enables them to collect information, configure devices and receive alerts about network performance and issues.

Read Post

WhatsUp Gold

Read more about What is SNMP (Simple Network Management Protocol)?

The 15 Best DevOps Monitoring Tools for Lightning-Fast Incident Response

Aug 18, 2025 By Nuno Tomas In isDown

When incidents strike, every second counts. The difference between a minor hiccup and a major outage often comes down to how quickly your team detects and responds to issues. That's why choosing the best DevOps monitoring tools for incident response can make or break your operational excellence. Modern DevOps teams need more than just basic uptime checks.

Read Post

isDown

Read more about The 15 Best DevOps Monitoring Tools for Lightning-Fast Incident Response

The Service Discovery Problem Every Developer Knows (But Pretends Doesn't Exist)

Aug 18, 2025 By Nishant Modak In Last9

Launch Week Day 1: Introducing Discover Services Picture this: It's 2 AM, alerts are firing, and you're staring at a dashboard trying to figure out which service is causing the cascade of failures. Your service map is a six-month-old Miro board, and you have no idea what's actually talking to what in production right now. If you've been there, you're not alone. In fast-moving teams, new services get deployed faster than you can track them.

Read Post

Last9

Read more about The Service Discovery Problem Every Developer Knows (But Pretends Doesn't Exist)

Stop Your Flaky Playwright Tests.

Aug 18, 2025 By Checkly In Checkly

Watch the full webinar: https://youtu.be/C-OCuo9URGE

Read more on identifying flaky tests: https://www.checklyhq.com/learn/playwright/assertions/#identifying-flaky-tests

View Video

Checkly

Read more about Stop Your Flaky Playwright Tests.

The Starlink Outage and Its Impact on Community Gateways

Aug 18, 2025 By Doug Madory In Kentik

Last month, Starlink suffered its largest outage in years, arguably its biggest since becoming a major internet provider. In addition to the millions of individual customers around the world, the outage disconnected the Community Gateways, customers of Starlink’s new transit service. In this post, we delve into the outage and its impact on these far-flung networks.

Read Post

Kentik

Read more about The Starlink Outage and Its Impact on Community Gateways

Building a K12 IT Command Center: Monitor All Your Educational Services

Aug 17, 2025 By Nuno Tomas In isDown

Managing technology in K-12 schools has become increasingly complex. With dozens of educational platforms, administrative systems, and communication tools running simultaneously, IT teams need a comprehensive k12 it monitoring dashboard to maintain visibility across their entire technology ecosystem.

Read Post

isDown

Read more about Building a K12 IT Command Center: Monitor All Your Educational Services

How to Effectively Monitor Kubernetes in 2025

Aug 17, 2025 By Jade Lassery In logz.io

As Kubernetes environments continue to grow in scale and complexity, having a robust monitoring strategy is no longer just good practice, it’s essential for survival. For engineering teams in 2025, effective monitoring and observability is the bedrock of performance, reliability, and cost control. This guide dives into the critical aspects of modern Kubernetes monitoring, from key metrics to the top tools/frameworks and the rising role of AI in managing these complex systems.

Read Post

logz.io

Read more about How to Effectively Monitor Kubernetes in 2025

Taming Alert Chaos: Modern Incident Alert Management Strategies

Aug 16, 2025 By Nuno Tomas In isDown

Every IT team knows the feeling: your phone buzzes at 3 AM with yet another alert. Is it critical? Can it wait until morning? With dozens of monitoring tools and hundreds of potential failure points, incident alert management has become one of the most challenging aspects of maintaining reliable systems.

Read Post

isDown

Read more about Taming Alert Chaos: Modern Incident Alert Management Strategies

Jira plugin spotlight

Aug 15, 2025 By SquaredUp In Squared Up

Native reporting Jira is limited. Learn how you can surface and make the most of your Jira data with SquaredUp's intuitive dashboards.

View Video

Squared Up

Read more about Jira plugin spotlight

Scale Observability, Streamline Operations with AppNeta Monitoring Policies

Aug 15, 2025 By Alec Pinkham In Broadcom

In today's sprawling enterprise environments, keeping the network running smoothly isn’t just a technical hurdle—it’s a logistical marathon. Enterprise IT environments are in constant motion. New employees come on board. Contractors rotate in and out. Departments roll out new tools. Corporate offices expand, consolidate, or close. And users demand flawless connectivity from wherever they are.

Read Post

Broadcom

Read more about Scale Observability, Streamline Operations with AppNeta Monitoring Policies

All Network Monitoring Tools Are Created Equal, Right?

Aug 15, 2025 By Yann Guernion In Broadcom

There’s a question I hear quite often in my conversations about network management: "Aren't all network monitoring tools basically the same?" Honestly, I understand why so many people feel this way. For as long as I remember, the primary role of these tools has been to tell you when something is already broken. Your team gets an alert—a switch is down, an application is slow, a circuit is saturated—and the fire-fighting process begins.

Read Post

Broadcom

Read more about All Network Monitoring Tools Are Created Equal, Right?

How ScienceLogic Drives FedRAMP-Authorized Automated IT at Scale

Aug 15, 2025 By ScienceLogic In ScienceLogic

As Government agencies modernize IT operations, many are adopting hybrid cloud and multi-tenant environments to drive agility and resilience. But as environments scale, so does complexity, especially when aligning with overlapping frameworks like FedRAMP, NIST, and CMMC. Today’s cybersecurity landscape—rising threats, shrinking budgets, and expanding compliance demands—requires more than manual oversight.

Read Post

ScienceLogic

Read more about How ScienceLogic Drives FedRAMP-Authorized Automated IT at Scale

Integrating Deno and Grafana Cloud: How to observe your JavaScript project with zero added code

Aug 15, 2025 By Andy Jiang In Grafana

Andy Jiang is a JavaScript engineer with nearly 10 years of experience. He’s interested in making JavaScript and TypeScript simpler to use. He currently works at Deno as a product marketing manager. Outside of work, Andy likes cooking, writing, and playing tennis. Observability is essential for modern applications. Metrics, logs, and traces allow you to troubleshoot production issues, monitor performance, and understand usage patterns.

Read Post

Grafana

Read more about Integrating Deno and Grafana Cloud: How to observe your JavaScript project with zero added code

The first rule of DORA Metrics...

Aug 15, 2025 By Blog In Squared Up

DORA Metrics are widely regarded as the gold standard for measuring the performance of software development teams. The metrics themselves though are generic, high-level pointers – they are not an instruction manual. Adopting the DORA approach is the first step down the path to continuous improvement. The next steps are deciding how the measures should be defined in the context of your own organisations processes and then figuring out how to retrieve (and present) the relevant data.

Read Post

Squared Up

Read more about The first rule of DORA Metrics...

Why SSL Certificate Verification Failed: All Causes, Fixes & Prevention

Aug 15, 2025 By Simon Rodgers In WebSitePulse

SSL Certificate Verification Failed errors are one of the most common and frustrating issues for developers, DevOps engineers, and system administrators. Whether you're building a Python application, running a Docker container, or managing a web server, this guide will help you.

Read Post

WebSitePulse

Read more about Why SSL Certificate Verification Failed: All Causes, Fixes & Prevention

How to Monitor Multiple School Platforms: Google Workspace, Canvas, and PowerSchool from One Dashboard

Aug 15, 2025 By Nuno Tomas In isDown

Managing technology in K12 schools means juggling dozens of critical platforms simultaneously. When Google Workspace goes down during morning classes, Canvas experiences issues during exam submissions, or PowerSchool becomes unavailable during grade entry periods, the impact ripples through entire school communities. The ability to monitor multiple school platforms from a centralized dashboard has become essential for educational IT teams.

Read Post

isDown

Read more about How to Monitor Multiple School Platforms: Google Workspace, Canvas, and PowerSchool from One Dashboard

How to Adjust Semantic and Lexical Search Weights in Elasticsearch

Aug 15, 2025 By Elastic In Elastic

In this session, we’ll show you how *hybrid search using Elastic* lets you assign weights to different search types — for example, giving semantic search three times more influence than lexical search. This lets you fine-tune the balance between precise keyword matching and broader, context-aware results.

View Video

Elastic

Read more about How to Adjust Semantic and Lexical Search Weights in Elasticsearch

Announcing the Winner of the 2025 StatusGator Women in Tech Scholarship: Lara Djukic

Aug 15, 2025 By Colin Bartlett In StatusGator

Earlier this year, we launched the StatusGator Women in Tech Scholarship to support and empower women pursuing careers in technology. We are thrilled to announce that our 2025 scholarship recipient is Lara Djukic, an inspiring young technologist whose vision blends innovation with a deep commitment to her community. Through the Bold.org scholarship platform, we’ve award Lara a $3,100 scholarship.

Read Post

StatusGator

Read more about Announcing the Winner of the 2025 StatusGator Women in Tech Scholarship: Lara Djukic

Visualize Logs Alongside Metrics: A Complete Guide for Monitoring Slow MySQL Queries

Aug 15, 2025 By Benjamin Pitts In MetricFire

When a service slows down, metrics will tell you that it’s happening but logs tell you why. For MySQL, slow queries can be a silent performance killer, gradually chewing through resources until users start complaining. By enabling MySQL’s slow query log and forwarding it to Loki (via Promtail), you can visualize query-level details right alongside your metrics on Grafana dashboards. This makes it easy to correlate what is slow (metrics) with what is causing the slowdown (logs).

Read Post

MetricFire

Read more about Visualize Logs Alongside Metrics: A Complete Guide for Monitoring Slow MySQL Queries

How Elastic Powers Search in Real-Time (Explained in 52 Seconds)

Aug 15, 2025 By Elastic In Elastic

Ever wondered how Wikipedia loads answers instantly? Or how does your Uber update in real-time? That’s Elastic Search working behind the scenes. In this video, I break down how Elastic powers lightning-fast, scalable search for complex data from ride requests to stock prices.

View Video

Elastic

Read more about How Elastic Powers Search in Real-Time (Explained in 52 Seconds)

Real-World Use Cases for Natural Language Copilots

Aug 15, 2025 By Dallon Robinette In Selector

Natural language copilots are one of the most exciting developments in AI for network operations. They allow engineers and operators to query complex environments in plain language rather than memorizing obscure CLI commands or digging through multiple dashboards. But here’s the truth: a copilot is only as good as the AI behind it. Without a purpose-built network LLM, a copilot can’t deliver the accuracy, context, and speed that real-world IT operations demand.

Read Post

Selector

Read more about Real-World Use Cases for Natural Language Copilots

Top tips: Beating notification fatigue before it beats you

Aug 14, 2025 By Shawn King Jason In ManageEngine

Top tips is a weekly column where we highlight what’s trending in the tech world today and list ways to explore these trends. This week, we’re looking at the rise of notification fatigue and how to manage alerts so they boost productivity instead of draining it. You’re in the middle of a task, fully focused, when ping!—a new email lands. You glance at it, thinking it’ll only take a second, but by the time you get back to your work, you’ve lost your momentum.

Read Post

ManageEngine

Read more about Top tips: Beating notification fatigue before it beats you

7 reasons why intelligent network automation should be on every CXO's agenda

Aug 14, 2025 By Ajay Sharma S In ManageEngine

Discover how intelligent network automation is transforming IT operations by combining AI, machine learning, and policy-based orchestration to deliver agility, reliability, and security at scale.

Read Post

ManageEngine

Read more about 7 reasons why intelligent network automation should be on every CXO's agenda

About Grafana Canvas Panel

Aug 14, 2025 By Grafana In Grafana

Learn the different use cases of the Canvas Panel which was discussed in our Grafana Campfire Community Call - June 2024.

View Video

Grafana

Read more about About Grafana Canvas Panel

Early Warning Signals: Now in Microsoft Teams

Aug 14, 2025 By Valeria Kurolapova In StatusGator

As promised, we’re continuing to expand our Early Warning Signals coverage. In addition to our recent integrations for Slack, SMS, and Webhooks, we’re excited to announce that Early Warning Signals now works in Microsoft Teams. This is another step toward making early outage alerts accessible wherever your conversations happen.

Read Post

StatusGator

Read more about Early Warning Signals: Now in Microsoft Teams

Catchpoint News Catchup Episode 06

Aug 14, 2025 By Catchpoint In Catchpoint

Join, Payal, Denton, and Leon as they explore recent articles about AOL Dialup services, AI agents, ISP throttling, and what the heck is McDonald’s “Grimace” anyway?

View Video

Catchpoint

Monitoring

Read more about Catchpoint News Catchup Episode 06

Introducing the Coralogix Transactions processor

Aug 14, 2025 By Chris Cooney In Coralogix

Coralogix Transactions are a trace segmentation strategy, unique to the Coralogix platform. They allow users to analyze the performance, over time, of a collection of related spans, across billions of traces. Coralogix has introduced a transactions processor into the OpenTelemetry contrib image, enabling users to activate this unique feature using nothing more than OpenTelemetry configuration.

Read Post

Coralogix

Read more about Introducing the Coralogix Transactions processor

Mastering Service Configuration in Icinga Director

Aug 14, 2025 By Ravi Srinivasa In Icinga

The Icinga Director configuration tool makes it easy to define monitoring objects through the web UI and deploy them to the Icinga 2 API. In this blog post, I’ll walk you through how to configure services in Icinga Director. If you haven’t used Icinga Director yet, take a look at our introduction. I assume that most of you are already familiar with Icinga 2 and have used the DSL to define objects.

Read Post

Icinga

Read more about Mastering Service Configuration in Icinga Director

Real-Time Status Monitoring for 50+ EdTech Tools K12 IT Teams Actually Use

Aug 14, 2025 By Nuno Tomas In isDown

K12 IT departments face a unique challenge: keeping dozens of educational technology platforms running smoothly while teachers conduct lessons and students complete assignments. A single service outage can disrupt hundreds of classrooms simultaneously. That's why implementing a k12 service status dashboard has become essential for school technology teams managing complex digital learning environments.

Read Post

isDown

Read more about Real-Time Status Monitoring for 50+ EdTech Tools K12 IT Teams Actually Use

Inside the Coralogix AI Center: Solving AI's Silent Failure Crisis

Aug 14, 2025 By Andre Scott In Coralogix

Observability has always answered one core question: Is it running? But in the era of LLMs, autonomous agents, and AI-powered workflows, that’s no longer enough. We need to ask a harder, scarier question: Is it right? And right now, most teams can’t answer that. Let’s fix it. In our last post, “The AI Monitoring Crisis No One’s Talking About,” we outlined why prompt injection, hallucinations, and context drift create invisible failures.

Read Post

Coralogix

Read more about Inside the Coralogix AI Center: Solving AI's Silent Failure Crisis

What Is an MCP Server?

Aug 14, 2025 By Andre Scott In Coralogix

Ok MCP server, If you’ve been following AI development lately, you’ve probably heard whispers about “MCP Servers” floating around developer circles. It’s been around a little while now, and I myself have finally gotten round to using it. Boy, do we need to talk about it. MCP (Model Context Protocol) is Anthropic’s open standard that lets AI assistants connect directly to your tools and data sources, not just static documentation or code snippets.

Read Post

Coralogix

Read more about What Is an MCP Server?

Getting Started with Grafana Cloud's AI Assistant for Observability

Aug 14, 2025 By Grafana In Grafana

The pace of software delivery in 2025 is unprecedented — cloud-native apps, microservices, and AI-generated code are shipping in days, not months. But one challenge never changes: ensuring reliability and visibility when systems fail. In this video, we explore how the new Grafana AI Assistant brings true, context-aware observability to your stack. Watch as we deploy an open-source Python service with Kafka, Postgres, Kubernetes, and Prometheus then use the AI assistant to instantly generate dashboards, alerts, and reduce un-needed telemetry volume.

View Video

Grafana

Read more about Getting Started with Grafana Cloud's AI Assistant for Observability

REST easy with REST Packs

Aug 14, 2025 By Cribl In Cribl

The countdown to CriblCon 25 is on and we’re giving you an exclusive first look at the expert insights, innovative solutions, and success stories you’ll see on the big stage. REST collector configuration can be painful, requiring navigating to multiple screens and importing multiple configuration files, but it’s about to get a lot easier. Join Cribl experts to preview how easily you can install and build new packs with new enhancements.

View Video

Cribl

Read more about REST easy with REST Packs

LLMs don't stand still: How to monitor and trust the models powering your AI

Aug 14, 2025 By Sheikh Mursaleen In Catchpoint

One Large Language Model (LLM) nails your brand’s tone but drifts after a model update. Another is lightning fast until it spikes in latency during peak hours. A third delivers brilliant answers except in specific regions where it falters.

Read Post

Catchpoint

Read more about LLMs don't stand still: How to monitor and trust the models powering your AI

Building MCP - 0 to Monitored with Vercel, Next.js and Sentry

Aug 14, 2025 By Sentry In Sentry

Build an MCP server starting from 0 using Vercel's Next.js template, and ends in generating hot takes in the tone of David Cramer. Scaffold Sentry, add tool calls via Cursor CLI, and setup MCP monitoring with Sentry's new wrapMcpServerWithSentry method.

View Video

Sentry

Read more about Building MCP - 0 to Monitored with Vercel, Next.js and Sentry

Network Switch Monitoring: How to Monitor Switch Performance with SNMP

Aug 14, 2025 By Andrii Kernitskyi In Obkio

If you’ve spent any time managing networks, you know the switch is the backbone that keeps everything connected, but it’s easy to take them for granted until something breaks. Monitoring network switches isn’t just “nice to have”; it’s critical if you want to avoid those sudden outages that bring everything to a halt.

Read Post

Obkio

Read more about Network Switch Monitoring: How to Monitor Switch Performance with SNMP

AI in observability at Grafana Labs: Making observability easy and accessible for everyone

Aug 14, 2025 By Mat Ryer In Grafana

Did you know that observability has been around for more than six decades? It all goes back to a Hungarian-American inventor named Rudolf Kálmán who thought about how external outputs could measure the internal state of a machine. Kálmán wrote about monitoring single-input single-output systems, but our demands are very different today. We need to observe monoliths, microservices, clusters, pods, regions, and many more.

Read Post

Grafana

Read more about AI in observability at Grafana Labs: Making observability easy and accessible for everyone

AI for Grafana onboarding: Get your teams started quicker with Grafana Assistant

Aug 14, 2025 By Maurice Rochau In Grafana

Grafana puts a powerful set of observability capabilities right at your fingertips, but onboarding entire teams to the sophisticated platform is often a nontrivial exercise—one that can slow adoption and prevent organizations from getting immediate value. We want to make the process as frictionless as possible, which is why we’re excited to tell you that Grafana Assistant is now available in public preview to all Grafana Cloud users.

Read Post

Grafana

Read more about AI for Grafana onboarding: Get your teams started quicker with Grafana Assistant

Sentry MCP server monitoring

Aug 14, 2025 By Sentry In Sentry

We just launched MCP server monitoring in beta. You can instrument most server-side JavaScript SDK based MCP servers with one line of instrumentation code within your MCP SDK implementation using: wrapMcpServerWithSentry(McpServer) See details like protocol usage, client usage, traffic, tool usage, and performance across your MCP implementation so you you can get visibility into all the sharp edges that your MCP server has — who’s using it, how it’s working (or not), and get alerted when things break.

View Video

Sentry

Read more about Sentry MCP server monitoring

MCP Server Observability is here!

Aug 14, 2025 By Sentry In Sentry

Lets be real, who really knows whats happening inside MCP Server? Well, now you can! We've been building our MCP Server for the past few months, and we've taken all the tools that we've used to debug it and turned them into a set of observability tools to help you monitor your MCP servers end to end.

View Video

Sentry

Read more about MCP Server Observability is here!

You built the MCP server. Now track every client, tool, and request with Sentry.

Aug 14, 2025 By Sasha Blumenfeld In Sentry

TL;DR - Starting today, you can instrument most server-side JavaScript SDK based MCP servers with one line of instrumentation code within your MCP SDK implementation. Click to Copy Click to Copy With this in place, you’ll be able to see details like protocol usage, client usage, traffic, tool usage, and performance across your MCP implementation.

Read Post

Sentry

Read more about You built the MCP server. Now track every client, tool, and request with Sentry.

From Chaos to Clarity: How Honeycomb Tags Are Transforming Developer Workflows

Aug 14, 2025 By Jason Harley In Honeycomb

We're thrilled to announce that Honeycomb Tags are now generally available across SLOs, triggers, and boards! Over 100 customers are already actively tagging their observability resources in Honeycomb today.

Read Post

Honeycomb

Read more about From Chaos to Clarity: How Honeycomb Tags Are Transforming Developer Workflows

Optimize Your E-Commerce Platform with PHP Performance Monitoring

Aug 14, 2025 By Mohana Ayeswariya J In Atatus

In e-commerce, seconds can mean millions. A one-second delay during checkout can slash conversion rates by 7% and send frustrated customers straight to your competitors. Most modern e-commerce platforms, such as Magento and WooCommerce, and Laravel-based solutions, run on PHP, making PHP application performance monitoring (APM) not just a nice-to-have, but a revenue-critical necessity.

Read Post

Atatus

Read more about Optimize Your E-Commerce Platform with PHP Performance Monitoring

Simplify XML log collection and processing with Observability Pipelines

Aug 14, 2025 By Micah Kim In Datadog

In Microsoft-based environments, Windows event logs capture critical security events like user logins, privilege escalations, and system changes. These logs are vital for compliance and investigations. However, they’re natively formatted in XML, a verbose and deeply nested structure that is hard to search without preprocessing and inefficient to store.

Read Post

Datadog

Read more about Simplify XML log collection and processing with Observability Pipelines

LogRocket - The Ultimate Toolkit for Front-End Insight and Performance

Aug 14, 2025 By Super Monitoring In Super Monitoring

When you need to get beyond surface-level metrics and see what users are actually going through when using your web application, LogRocket provides a potent set of tools. It was built with designers, developers, marketers, ecommerce managers, and web site owners in mind and in a nutshell it combines session replay, error tracking, product-level analytics, and AI-driven insight all in one place.

Read Post

Super Monitoring

Read more about LogRocket - The Ultimate Toolkit for Front-End Insight and Performance

The IT story behind 911 emergency services

Aug 13, 2025 By Allan In ManageEngine

At 2:37am on a cold Oregon night, a fire alarm blared at a rural station. Seconds later, the call came in: a structure fire on the outskirts of Rogue Valley. But what if that alarm never reached the station? This isn't a hypothetical. For the IT team at Emergency Communications of Southern Oregon (ECSO 911), it’s the kind of emergency scenario they prepare for every day.

Read Post

ManageEngine

Read more about The IT story behind 911 emergency services

Public vs private status pages [cost analysis, security, compliance, and more]

Aug 13, 2025 By Leo Baecker In Hyperping

When your service goes down at 3 AM, how do you communicate with your customers? This question keeps DevOps teams and customer success managers awake at night, and for good reason. The way you handle incident communication can make the difference between retaining customer trust and watching it evaporate. Status pages have become the standard solution for incident communication, but there's a critical decision every organization faces: should your status page be public or private?

Read Post

Hyperping

Read more about Public vs private status pages [cost analysis, security, compliance, and more]

5 PCI DSS File Transfer Requirements You Can Meet With Serv-U

Aug 13, 2025 By Eoin Keenan In SolarWinds

Compliance with the Payment Card Industry Data Security Standard (PCI DSS) is essential for any organization that handles credit card data, and it extends far beyond databases and payment gateways. One area often overlooked is file transfer workflows, which can pose serious risks if not properly secured.

Read Post

SolarWinds

Read more about 5 PCI DSS File Transfer Requirements You Can Meet With Serv-U

LLM-powered insights into your tracing data: introducing MCP support in Grafana Cloud Traces

Aug 13, 2025 By Joe Elliott In Grafana

Distributed tracing data is a unique and powerful observability signal, allowing you to understand how your services interact and the relationships between them. Sometimes it can be difficult, however, to turn raw tracing data into actionable insights. This is exactly why we introduced Grafana Traces Drilldown, an application that lets you quickly investigate and visualize your tracing data through a simplified, queryless experience.

Read Post

Grafana

Read more about LLM-powered insights into your tracing data: introducing MCP support in Grafana Cloud Traces

How to Monitor NVIDIA GPU Metrics with Cribl Edge & Stream (Complete Tutorial)

Aug 13, 2025 By Nikhil Mungel In Cribl

If you’re running AI, ML, or data-intensive workloads on GPUs, monitoring their performance is critical. Overheating, under-utilization, or memory bottlenecks can cost you thousands in cloud bills and potential downtime. This guide walks you through collecting real-time GPU telemetry using nvidia-smi, sending it to Cribl Edge, routing it through Cribl Stream, and using Cribl Search to analyze the data—step by step.

Read Post

Cribl

Read more about How to Monitor NVIDIA GPU Metrics with Cribl Edge & Stream (Complete Tutorial)

How ELSER Transforms One Keyword into Better Search Results

Aug 13, 2025 By Elastic In Elastic

In this session, we’ll show you how Elastic's ELSER takes a single token like _“Terminator”_ and expands it into semantically related terms such as _software, alien, computer technology,_ and _Connor_ (for John Connor). This makes search results more relevant, even when the exact keyword isn’t used.

View Video

Elastic

Read more about How ELSER Transforms One Keyword into Better Search Results

Site Reliability Engineering vs DevOps: Which Approach Fits Your Organization?

Aug 13, 2025 By Nuno Tomas In isDown

Choosing between Site Reliability Engineering (SRE) and DevOps can feel like picking between two similar but distinct philosophies. Both aim to improve software delivery and system reliability, but they take different paths to get there. Understanding these differences helps you make an informed decision about which approach aligns best with your organization's goals, culture, and technical needs.

Read Post

isDown

Read more about Site Reliability Engineering vs DevOps: Which Approach Fits Your Organization?

Elastic wins 2025 Google Cloud DORA Award for Architecting for the Future with AI

Aug 13, 2025 By Brian Bergholm, In Elastic

Applying DORA principles to improve software delivery and operational performance with Google Cloud We’re thrilled to announce that Elastic has been honored with the 2025 Google Cloud DORA Award for Architecting for the Future with AI. Google Cloud DORA awards recognize organizations that have demonstrated significant advancements by applying DORA principles to improve their software delivery and operational performance with Google Cloud.

Read Post

Elastic

Read more about Elastic wins 2025 Google Cloud DORA Award for Architecting for the Future with AI

eG Innovations' AIOps Cloud Monitoring

Aug 13, 2025 By Swaminathan J In eG Innovations

I’ve previously covered how eG Innovations AIOps-powered monitoring benefits those working with Digital Workspaces or leveraging APM; today, I’ll cover how those same AI-powered capabilities benefit those supporting cloud hosted architectures and workloads.

Read Post

eG Innovations

Read more about eG Innovations' AIOps Cloud Monitoring

Rails APM Quickstart: Scout vs Sentry in 2025

Aug 13, 2025 By Sarah Morgan In Scout

If you just want the gist without all the scroll.

Read Post

Scout

Read more about Rails APM Quickstart: Scout vs Sentry in 2025

Data Center VXLAN Overlay Visibility at Scale

Aug 13, 2025 By Phil Gervasi In Kentik

VXLAN overlays bring flexibility to modern data centers, but they also hide what operators most need to see: true host-to-host and service-to-service traffic. Kentik restores that visibility by decoding VXLAN from sFlow, exposing both overlay endpoints and underlay paths in a single view without the cost and complexity of pervasive packet capture — the result: faster troubleshooting, smarter capacity planning, and confident operations at scale.

Read Post

Kentik

Read more about Data Center VXLAN Overlay Visibility at Scale

Monitor the Performance of Your Node.js Fastify App with AppSignal

Aug 13, 2025 By Damilola Olatunji In AppSignal

Fastify stands out among Node.js web frameworks for its obsessive focus on performance and boasts impressive benchmarks, with throughput often 2-3x higher than Express and other popular alternatives. But here's the paradox: without proper visibility, even applications with a good foundation will degrade over time as you add features and complexity.

Read Post

AppSignal

Read more about Monitor the Performance of Your Node.js Fastify App with AppSignal

What Happens When a Security Incident Occurs

Aug 13, 2025 By solarwindsinc In SolarWinds

When ransomware strikes, you don't have much time. Watch as we dive into the intense decision-making and rapid response in a cyberattack. From adrenaline to strategy, it's like a high-speed chess match. Will you be ready?

View Video

SolarWinds

Read more about What Happens When a Security Incident Occurs

Network Visualization: 4 Ways to Visualize Computer Networks

Aug 13, 2025 By Dallon Robinette In Selector

Network visualization is the process of visually representing networks of connected entities, like devices, data flows, or relationships, using nodes and links. This technique helps in understanding complex data, identifying patterns, and improving network management by providing a clear visual overview of the network’s structure and behavior.

Read Post

Selector

Read more about Network Visualization: 4 Ways to Visualize Computer Networks

If you want to monitor reality, you have to monitor your users' perspective.

Aug 13, 2025 By Catchpoint In Catchpoint

Not from your data center. Not from your internal network. Not from your controlled environments. Real users are on hotel Wi-Fi, public LTE, spotty networks, global cloud providers. To understand their experience, your monitoring needs to reflect their reality: location, device, network, context.

View Video

Catchpoint

Read more about If you want to monitor reality, you have to monitor your users' perspective.

IT can save the planet

Aug 12, 2025 By Shawn King Jason In ManageEngine

When we think about saving the planet, we usually imagine solar panels, electric cars, or governments making sweeping climate policies. Rarely do we picture rows of blinking servers in a data center or IT admins patching endpoints. Maybe we should. In today's world, the intersection between technology and sustainability is becoming impossible to ignore, and the IT industry is right at the center of it. The truth is, IT is both part of the problem and part of the solution.

Read Post

ManageEngine

Read more about IT can save the planet

Ensure Microsoft 365 Works

Aug 12, 2025 By NiCE IT Mgmt In NiCE IT Mgmt

Currently, you’ll hear a lot about Copilot, Microsoft 365 security, and modern workplace innovation. But here’s the question: Who’s making sure it all actually works, every day, without interruption?

Read Post

NiCE IT Mgmt

Read more about Ensure Microsoft 365 Works

PHP Performance Monitoring with Atatus PHP APM

Aug 12, 2025 By Pavithra Parthiban In Atatus

PHP is used by millions of websites and applications around the world because it’s easy to work with and very flexible. But like any technology, PHP apps can run into problems like slow performance or errors that affect users and your business. Atatus PHP APM provides developers, DevOps engineers, and SREs with clear insights into what is happening inside PHP applications, helping them find and fix issues faster, improve performance, and keep things running smoothly.

Read Post

Atatus

Read more about PHP Performance Monitoring with Atatus PHP APM

What Is Log Monitoring (and Why IT Teams Are Shifting to Log Intelligence)

Aug 12, 2025 By Patrick Sites | AKA "The Logfather" In LogicMonitor

Your infrastructure isn’t confined to a single location anymore. It’s spread across clouds, containers, and on-prem systems, and every layer is spitting out logs: access attempts, performance spikes, error codes, config changes. That data is invaluable if you can find the signal in the noise. But with millions of logs flying by every day, that’s easier said than done.

Read Post

LogicMonitor

Read more about What Is Log Monitoring (and Why IT Teams Are Shifting to Log Intelligence)

2025 Buyer's Guide - Choosing Unified Infrastructure Monitoring

Aug 12, 2025 By Staff Contributor In SolarWinds

Unified infrastructure monitoring delivers a single, enterprise-grade platform to oversee hybrid environments, providing real-time insights and proactive health monitoring across on-premises, cloud, and edge systems. As 2025 brings new challenges with artificial intelligence (AI), edge computing, and hybrid complexity, SolarWinds stands out as a thought leader in unified infrastructure monitoring for enterprises.

Read Post

SolarWinds

Read more about 2025 Buyer's Guide - Choosing Unified Infrastructure Monitoring

Error Analysis in Honeycomb for Frontend Observability Now in Public Beta

Aug 12, 2025 By Elsie Phillips In Honeycomb

You just shipped your latest frontend release. It passed QA, CI ran, and it looked great in pre-production. But now it’s live and users are hitting an unexpected error: TypeError: undefined is not a function in Chrome. Your error tracking tool flags the exception. You get a stack trace, some breadcrumbs, maybe a session replay.

Read Post

Honeycomb

Read more about Error Analysis in Honeycomb for Frontend Observability Now in Public Beta

Why AIOps Isn't Optional Anymore: The Metrics That Prove It

Aug 12, 2025 By ScienceLogic In ScienceLogic

The CFO slides a single sheet of paper across the conference table, without saying a word. It’s not a budget approval or strategic roadmap—it’s a simple question written in red ink: “What’s our ROI on IT operations?” For too many IT leaders, this moment represents a reckoning. After years of investing in monitoring tools, staffing up operations teams, and implementing “best practices,” the measurable business impact remains frustratingly unclear.

Read Post

ScienceLogic

Read more about Why AIOps Isn't Optional Anymore: The Metrics That Prove It

APM vs observability: why your definitions are broken

Aug 12, 2025 By Leon Adato In Catchpoint

Recently I was asked to offer my opinions on Application Performance Management (APM) and Observability (o11y) - how they overlap, compete, and conflict. I was just one of several folks who's ideas were solicited, so (understandably) some of my thoughts were left out of the original article. HOWEVER, I'm never one to let good words (or at least a lot of words) go to waste, so I thought I'd pull them together here.

Read Post

Catchpoint

Read more about APM vs observability: why your definitions are broken

Observability trends in Brazil: insights from our localized survey

Aug 12, 2025 By Trevor Jones In Grafana

Organizations in Brazil are eager to adopt some of the latest observability trends and technologies as they look to keep their software running as smoothly as possible, according to analysis of a micro survey recently conducted by Grafana Labs. Observability is an evolving space, and this is the first time Grafana Labs has run a Brazilian version of our annual Observability Survey.

Read Post

Grafana

Read more about Observability trends in Brazil: insights from our localized survey

Advanced PHP Monitoring for Enterprise Applications

Aug 12, 2025 By Mohana Ayeswariya J In Atatus

During critical business periods, enterprise PHP applications can experience significant performance challenges, including slow page loading, workflow delays, and essential integrations timing out. As a result, operational efficiency declines, customer satisfaction decreases, and revenue streams are at risk. Enterprise PHP applications power complex business portals, SaaS platforms, internal tools, and mission-critical workflows.

Read Post

Atatus

Read more about Advanced PHP Monitoring for Enterprise Applications

Inside a Cybersecurity War Room - SolarWinds TechPod 101

Aug 12, 2025 By solarwindsinc In SolarWinds

It's CSOC o'clock! In this episode, we dive into the high-stakes world of cyber defense with the manager of cybersecurity operations at a critical infrastructure organization. From ransomware threats and zero-day exploits to the rise of nation-state-backed Advanced Persistent Threats (APTs), our guest reveals how security teams manage 24/7 threats, the mindset it takes to thrive in cybersecurity, and why community collaboration is becoming essential in cyber warfare.

View Video

SolarWinds

Read more about Inside a Cybersecurity War Room - SolarWinds TechPod 101

Grafana Pyroscope: New eBPF profiler in Alloy & Source Code Integration (Community Call August 2025)

Aug 12, 2025 By Grafana In Grafana

Christian is going to talk about the new eBPF profiler in Grafana Alloy as well as new Grafana Pyroscope Source Code Integration UI updates. Have questions? Please bring them! Can't comment in the chat? You may need to create a channel. Grafana Cloud is the easiest way to get started with Grafana dashboards, metrics, logs, and traces. Our forever-free tier includes access to 10k metrics, 50GB logs, 50GB traces and more.

View Video

Grafana

Read more about Grafana Pyroscope: New eBPF profiler in Alloy & Source Code Integration (Community Call August 2025)

Deploying a WhatsUp Gold 360 Connector for Azure

Aug 12, 2025 By Progress WhatsUp Gold In WhatsUp Gold

WhatsUp Gold 360 provides real-time insights into your internet connectivity to your remote sites through the use of connectors. Watch this video to learn how to create and deploy a WhatsUp Gold 360 connector to an Azure environment.

View Video

WhatsUp Gold

Read more about Deploying a WhatsUp Gold 360 Connector for Azure

Grafana Tempo: Performance Moonshots & MCP Server (Community Call August 2025)

Aug 12, 2025 By Grafana In Grafana

We'll have Marty talking about Grafana Tempo Performance Moonshots and Joe will update us with what's new with the MCP Server! Have questions? Please bring them! Can't comment in the chat? You may need to create a channel. Grafana Cloud is the easiest way to get started with Grafana dashboards, metrics, logs, and traces. Our forever-free tier includes access to 10k metrics, 50GB logs, 50GB traces and more.

View Video

Grafana

Read more about Grafana Tempo: Performance Moonshots & MCP Server (Community Call August 2025)

Why Visibility Is the #1 IT Priority in 2025: Tackling Shadow AI and Emerging Risks

Aug 12, 2025 By Ben Botti In Auvik

AI adoption is progressing at a rapid pace. What started as a trickle of generative tools is now a flood of autonomous agents, custom copilots, and AI-powered SaaS, most of it entering the workplace faster than IT can keep track of.

Read Post

Auvik

Read more about Why Visibility Is the #1 IT Priority in 2025: Tackling Shadow AI and Emerging Risks

What is Shadow AI & What Can You do About It?

Aug 12, 2025 By Ben Botti In Auvik

Artificial intelligence (AI) is now embedded in everyday professional workflows — so much so that 46% of employees say they would continue using AI tools even if their organization banned them. The productivity gains are undeniable, but this widespread, unmonitored use of AI also introduces growing risks around data security, compliance, and governance.

Read Post

Auvik

Read more about What is Shadow AI & What Can You do About It?

Beyond the Pipeline: Data Isn't Oil, It's Power.

Aug 12, 2025 By Mezmo In Mezmo

Originally published on Medium, this piece by Winston Hearn dives into a philosophical discussion on why the "data is oil" metaphor is no longer serving the tech industry. Hearn argues that by reframing our thinking to "data is power," we can better understand and manage today's complex data systems. ‍ For more than a decade, we in the tech industry have referenced a common metaphor: data is the new oil. It’s a concept that’s easy to grasp.

Read Post

Mezmo

Read more about Beyond the Pipeline: Data Isn't Oil, It's Power.

SNMP Device Monitoring: Feature Highlight - Obkio

Aug 12, 2025 By Obkio In Obkio

Tired of noisy alerts and overcomplicated SNMP monitoring tools? Learn how Obkio’s SNMP Device Monitoring blends simplicity and intelligence, giving you fewer alerts, better insights and faster troubleshooting so you can resolve network router, switch and firewall issues in minutes. It always starts the same way. You’re managing your network; maybe it’s five devices, maybe it’s five hundred, and everything should be simple. But instead, you’re caught between two extremes.

View Video

Obkio

Read more about SNMP Device Monitoring: Feature Highlight - Obkio

Build secure and scalable Azure serverless applications with the Well-Architected Framework

Aug 12, 2025 By Jordan Obey In Datadog

Serverless platforms like Azure Functions and Azure Container Apps make it easier to scale your applications without managing infrastructure. But successful serverless apps require thoughtful planning. They must be designed to account for cold starts, unpredictable scaling behavior, and ephemeral compute lifecycles, all while ensuring secure data handling and end-to-end observability across highly distributed components.

Read Post

Datadog

Read more about Build secure and scalable Azure serverless applications with the Well-Architected Framework

Doing the math: Proactive monitoring adds up for MSPs

Aug 11, 2025 By Sara Purdon In Martello Technologies

Bells and whistles may inspire great ad headlines, but when it comes to investing in new tools, the only thing most MSPs care about is the business case. “If we buy this, how will it make us more efficient and more profitable?” Period.

Read Post

Martello Technologies

Read more about Doing the math: Proactive monitoring adds up for MSPs

Top 7 Application Performance Monitoring Tools

Aug 11, 2025 By Anjali Udasi In Last9

Your application is under constant pressure to deliver low latency, high reliability, and a smooth user experience isn’t optional. When performance drops, every second matters. Application Performance Monitoring (APM) gives you the visibility to spot issues before your users feel the impact. It also helps you understand what’s happening inside your stack, so you can track resource usage, pinpoint bottlenecks, and keep things running at peak performance.

Read Post

Last9

Read more about Top 7 Application Performance Monitoring Tools

Best Practices for Managing Multiple Vendor Dependencies

Aug 11, 2025 By Nuno Tomas In isDown

Modern businesses rely on dozens of third-party services to operate efficiently. From payment processors and cloud providers to analytics tools and communication platforms, these vendor dependencies form the backbone of your technology stack. When one fails, it can trigger a cascade of issues across your entire operation. Managing multiple vendor dependencies requires a strategic approach that combines proactive monitoring, clear documentation, and well-defined response procedures.

Read Post

isDown

Read more about Best Practices for Managing Multiple Vendor Dependencies

HTTP status codes? Here's a cheat sheet

Aug 11, 2025 By Laurens Goethals In Oh Dear

Whenever you visit a website or click on a link, there’s a whole conversation happening behind the scenes between your browser and the web server. That conversation includes something called HTTP status codes and knowing what they mean can help you make a diagnosis, so to speak. Usually, everything goes smoothly (like a 200 OK), but sometimes things break (looking at you, 404 and 500).

Read Post

Oh Dear

Read more about HTTP status codes? Here's a cheat sheet

RUM measurements: Start with the data, discover the story

Aug 11, 2025 By Ofri Grushka In Coralogix

When something breaks in your application, a slow page, a spike in errors, or a drop in engagement, the typical response is to chase the symptoms. But what if we flipped that process? What if we started not from user complaints, but from actual performance measurements, collected from real sessions in real time? That’s exactly the idea behind Coralogix RUM Measurements.

Read Post

Coralogix

Read more about RUM measurements: Start with the data, discover the story

What Is a Telemetry Pipeline and Why It Matters in Modern IT

Aug 11, 2025 By VirtualMetric In VirtualMetric

A practical guide for IT professionals, DevOps, security teams, platform engineers, and anyone who’s dealing with logs. In contemporary distributed systems, telemetry data—logs, metrics, traces, and events—serves as the primary mechanism for understanding internal system behavior. However, as system complexity increases, so does the volume and heterogeneity of telemetry.

Read Post

VirtualMetric

Read more about What Is a Telemetry Pipeline and Why It Matters in Modern IT

Network custom dashboard video

Aug 11, 2025 By ManageEngine Site24x7 In Site24x7

Experiencing slow networks, dropped calls, or unexplained performance issues? In this video, see how Site24x7’s powerful widgets help you detect, diagnose, and resolve network issues faster.

View Video

Site24x7

Read more about Network custom dashboard video

Introducing Logz.io Open 360 AI: The Next Generation of Observability Is Here

Aug 11, 2025 By David Lotan Bolotnikoff In logz.io

Traditional observability tools can’t keep up with modern complexity. Dashboard and alert-based approaches still rely heavily on manual processes, resulting in longer troubleshooting cycles, slower decisions, and higher MTTR. Engineering teams need something better. Today we’re launching Open 360 AI, the first observability platform designed for both humans and AI agents working together.

Read Post

logz.io

Read more about Introducing Logz.io Open 360 AI: The Next Generation of Observability Is Here

How To Use Alloy and Hosted Graphite's Loki to Store and Visualize Logs

Aug 11, 2025 By Benjamin Pitts In MetricFire

In a modern DevOps environment, having just metrics or just logs is like trying to navigate with half a map because you’re missing important context that makes decisions faster and smarter. Metrics tell you what is happening (CPU spikes, request rates, failed logins) but logs tell you why it’s happening, with the timestamps to prove it.

Read Post

MetricFire

Read more about How To Use Alloy and Hosted Graphite's Loki to Store and Visualize Logs

Your APIs are up, but did the payment go through?

Aug 11, 2025 By Uptrends In Uptrends

If your challenger bank is built on composable core platforms like Mambu or Temenos, this one’s for you. Composable platforms enable API-first integration with modular services, letting you launch, adapt, and grow products quickly. That makes API health a top priority — and it shows in our State of API Reliability Report 2025 (we’ve pulled out the key fintech findings for APAC below).

Read Post

Uptrends

Read more about Your APIs are up, but did the payment go through?

Learn OpenTelemetry tracing through a grand strategy game: introducing Game of Traces

Aug 11, 2025 By Jay Clifford In Grafana

A trace always remembers! Okay, okay. I will try to keep my Game of Thrones references to a minimum throughout this post, but there is a lot of truth to that statement. In observability, a trace is the “when” and “where” of telemetry signals, allowing us to track the state of interactions between services within a microservice architecture. This makes traces the ideal observability signal for discovering bottlenecks and interconnection issues.

Read Post

Grafana

Read more about Learn OpenTelemetry tracing through a grand strategy game: introducing Game of Traces

Introducing Checkly Uptime Monitoring: A Fast and Affordable Way to Detect Infrastructure Downtime

Aug 11, 2025 By Checkly In Checkly

Learn more about Checkly, the application reliability platform designed for modern engineering teams! Discover how Checkly enables you to quickly detect, communicate, and resolve production issues and explore the newly added uptime monitoring features, including URL, TCP, and heartbeat monitors. Configure and manage your entire monitoring setup using monitoring as code!

View Video

Checkly

Read more about Introducing Checkly Uptime Monitoring: A Fast and Affordable Way to Detect Infrastructure Downtime

Why MikroTik VPS Is a Smart Choice for Network Monitoring and Management

Aug 11, 2025 By OpsMatters In OpsMatters

Managing complex, distributed networks is no longer optional; it's essential for business success. They are often used for remote offices and IoT deployments, and managing those without the right toolkit is too much pressure, as uptime, security, and scalability without overspending should be secured. If you buy MikroTik VPS, you can be surprised at how these constant headache-causing tasks are managed successfully and with minimal effort. All thanks to the features this technology has.

Read Post

OpsMatters

Read more about Why MikroTik VPS Is a Smart Choice for Network Monitoring and Management

GPT-OOS: A Secure Step Forward, But Not a Free Pass

Aug 10, 2025 By Teneo In Teneo

The release of OpenAI’s new open-source model, GPT-OOS, has sparked a wave of excitement across the AI community. And rightly so. For organizations that want the benefits of generative AI without sending data out to the web, this is a compelling option. Running locally, GPT-OOS offers a level of privacy, control, and cost-efficiency that’s hard to ignore. It’s fast, lean and at least in its early benchmarks, surprisingly capable in coding, math, and STEM-heavy workloads.

Read Post

Teneo

Read more about GPT-OOS: A Secure Step Forward, But Not a Free Pass

How IT Leaders Can Successfully Adopt and Manage SaaS Solutions

Aug 10, 2025 By Ugo Orsi In Digitate

In recent months, there has been growing discussion among business and IT leaders around the rapid expansion of SaaS solutions. McKinsey’s recent report on the current state of SaaS notes that while the industry has experienced a slowdown, largely driven by economic factors such as rising interest rates and reduced IT spending by enterprises, it has seen a decade of rapid growth, with the market being valued at approximately $3 trillion in 2022.

Read Post

Digitate

Read more about How IT Leaders Can Successfully Adopt and Manage SaaS Solutions

The Ultimate Guide to Incident Management Tools in 2025

Aug 9, 2025 By Hrishikesh Barua In IncidentHub

Incident management tools play a key role in helping organizations to effectively handle service outages. With so many incident management tools around with different feature sets, it's often difficult to find the one that is right for your needs. In this article, we attempt to make a list of incident management software available in 2025 with their features to help you arrive at the right one. We have focused on tools that have incident management capabilities.

Read Post

IncidentHub

Read more about The Ultimate Guide to Incident Management Tools in 2025

VictoriaLogs Practical Ingestion Guide for Message, Time and Streams

Aug 8, 2025 By Phuong Le In VictoriaMetrics

VictoriaLogs Practical Ingestion Guide for Message, Time and Streams This VictoriaLogs article serves as a quick way to grasp the core concepts of VictoriaLogs. It covers only the most important information from the documentation, along with common cases identified after troubleshooting many real-world scenarios. If you’re just getting started with VictoriaLogs, this is a great place to begin. For more in-depth or advanced details, refer to the official documentation.

Read Post

VictoriaMetrics

Read more about VictoriaLogs Practical Ingestion Guide for Message, Time and Streams

What Makes PHP Application Monitoring Tools Essential for Leading Industries?

Aug 8, 2025 By Pavithra Parthiban In Atatus

PHP is one of the most widely used scripting languages for web development. From e-commerce platforms to government portals, PHP powers a large share of the web. However, as web applications grow in complexity, user expectations also rise. Slow page loads, broken features, or unresponsive sites can lead to lost revenue, lower engagement, and frustrated users.

Read Post

Atatus

Read more about What Makes PHP Application Monitoring Tools Essential for Leading Industries?

Observing LlamaIndex Apps with OpenTelemetry + SigNoz

Aug 8, 2025 By Goutham Karthi In SigNoz

LlamaIndex has become a popular choice for building Retrieval-Augmented Generation (RAG) applications, helping developers seamlessly connect large language models with private or domain-specific data. But RAG workflows can be complex with slow retrieval times, irrelevant or inconsistent responses, and silent failures in the data pipeline can all degrade the user experience. That’s why observability is essential.

Read Post

SigNoz

Read more about Observing LlamaIndex Apps with OpenTelemetry + SigNoz

What is PHP memory leaks? How can you detect and resolve with APM?

Aug 8, 2025 By Mohana Ayeswariya J In Atatus

According to the 2025 PHP Trends Report, 31% of developers cited performance bottlenecks as a recurring issue and PHP memory leaks were among the top culprits identified by DevOps teams working with high-traffic applications. Imagine you're shipping an app that’s humming along smoothly during QA. But weeks after going live, you start noticing creeping latency and irregular job failures. You dig into the logs, tweak some queries, but the issue persists.

Read Post

Atatus

Read more about What is PHP memory leaks? How can you detect and resolve with APM?

How to use SQL to learn more about your Grafana usage

Aug 8, 2025 By Wilfried Roset In Grafana

Wilfried Roset is an engineering manager who leads an SRE team, and he is also a Grafana Champion. Wilfried currently works at OVHcloud, where he focuses on prioritizing sustainability, resilience, and industrialization to guarantee customers satisfaction. Grafana needs a database to store all its objects, such as users, dashboards, or even data sources. Each time a user creates a dashboard, it results in a new row created in the database.

Read Post

Grafana

Read more about How to use SQL to learn more about your Grafana usage

What Makes a Good Network LLM?

Aug 8, 2025 By Dallon Robinette In Selector

Large language models (LLMs) have transformed the way we interact with technology, impacting how we generate reports and documents, understand complex topics, and even how we search the internet. But in network operations, where every minute of downtime can mean lost revenue and productivity, a generic LLM isn’t enough.

Read Post

Selector

Read more about What Makes a Good Network LLM?

How to create Jira work items from SquaredUp alerts

Aug 8, 2025 By SquaredUp In Squared Up

A step-by-step walkthrough on how to create a Jira work item from SquaredUp alerts.

View Video

Squared Up

Read more about How to create Jira work items from SquaredUp alerts

Fix Application Performance Issues Fast With Splunk AlwaysOn Profiling

Aug 8, 2025 By Splunk In Splunk

In this video we’ll demonstrate how to use Splunk’s AlwaysOn Profiler to identify and fix a performance bug in a Java web application running on Kubernetes.

View Video

Splunk

Read more about Fix Application Performance Issues Fast With Splunk AlwaysOn Profiling

Exploring the log management dashboard in Site24x7

Aug 8, 2025 By ManageEngine Site24x7 In Site24x7

In this video, learn about the Site24x7's Log Management Dashboard, Why it's essential, and how to set it up?

View Video

Site24x7

Read more about Exploring the log management dashboard in Site24x7

What Is Network Jitter and How It Affects Your Connection: Causes, Tests and Solutions

Aug 8, 2025 By Isaac García In Pandora FMS

Streaming movies and series, VoIP, video conferencing, remote work, competitive gaming… the network shoulders ever more pieces of modern life, and it better not fail—otherwise we get like Michael Douglas in *Falling Down*. One of those issues is network jitter, which we’ll cover in depth here.

Read Post

Pandora FMS

Read more about What Is Network Jitter and How It Affects Your Connection: Causes, Tests and Solutions

Top tips: The secret to a better workday? It's in the little things

Aug 7, 2025 By Alsherin In ManageEngine

Top tips is a weekly column where we highlight what’s trending in the tech world and list ways to explore these trends. This week, we’ll see how fixing small inconveniences at work can make things easier and help us get more done. “It is often the small steps, not the giant leaps, that bring about the most lasting change.” – Queen Elizabeth II It's the little changes in life that bring lasting effects. Small, incremental improvements often add to meaningful comfort over time.

Read Post

ManageEngine

Read more about Top tips: The secret to a better workday? It's in the little things

Migrating to Citrix Cloud Without Breaking the Business

Aug 7, 2025 By GripMatix In GripMatix

Not every migration is about rushing to the cloud. More often, it’s about timing, precision, and ensuring that end-users remain unaware of any underlying change. The goal isn’t just to modernize. It’s to do so without disrupting what’s already working. One of our customers, a global feed company with over 1,000 daily Citrix users, reached out to us for guidance.

Read Post

GripMatix

Read more about Migrating to Citrix Cloud Without Breaking the Business

What is Network Management?

Aug 7, 2025 By Greg Collins In WhatsUp Gold

International businesses and near-citywide college campuses require effective network management solutions to minimize downtime, optimize performance and strengthen cybersecurity. In summary, network management helps maintain the efficiency, reliability and security of a local and/or cloud-based network. However, developing a viable network management strategy requires an understanding beyond its actions.

Read Post

WhatsUp Gold

Read more about What is Network Management?

Using GreptimeDB as Prometheus Data Lake in Coroot

Aug 7, 2025 By Yiran Cui In Coroot

Coroot is excited to feature an editorial from the open source observability database GreptimeDB as an Open Source Spotlight. We hope to improve the work of our global community of SREs and DevOps professionals by sharing exciting projects like GreptimeDB, which make innovation accessible for everyone through the freedom of open source.

Read Post

Coroot

Read more about Using GreptimeDB as Prometheus Data Lake in Coroot

Get the Full Picture: AppSignal Adds OpenTelemetry Support

Aug 7, 2025 By Connor James In AppSignal

We're excited to officially launch our OpenTelemetry instrumentation. AppSignal is now able to expand our observability to a dozen popular languages, frameworks, and tools, giving customers the deep insights they need to monitor their entire stack. In this article, we'll show you how you can use AppSignal and OpenTelemetry to proactively monitor your app.

Read Post

AppSignal

Read more about Get the Full Picture: AppSignal Adds OpenTelemetry Support

Icinga DB Web Automation

Aug 7, 2025 By Sukhwinder Dhillon In Icinga

Icinga DB Web Automation allows you to automate monitoring tasks and integrate them directly into your systems and workflows. It is possible to issue command actions without a browser. To do so, a form needs to be submitted by a tool such as cUrl. Every request you send follows the same permission rules and access restrictions defined in the web interface, so security and user roles still apply. Want to target specific hosts or services? Simply add filter parameters to the URL.

Read Post

Icinga

Read more about Icinga DB Web Automation

Introducing our new notification logs

Aug 7, 2025 By Freek Van der Herten In Oh Dear

One of the core features of Oh Dear is that we can notify you whenever we detect problems with one of your sites. Our notification system is quite powerful. We support many different channels (like email, Slack, Telegram, ... and a whole bunch more), and have fine-grained control over which events should trigger a notification. Today, we've added notification logs.

Read Post

Oh Dear

Read more about Introducing our new notification logs

What Your SD-WAN Isn't Telling You

Aug 7, 2025 By Yann Guernion In Broadcom

Your SD-WAN is constantly making decisions. It assesses path quality based on metrics like packet loss, latency, and jitter, and steers traffic for your most critical applications accordingly. For this, it is an indispensable technology. But have you ever paused to ask a fundamental question: Is the path it chooses truly the best one available, or just the best one it can see from its limited vantage point?

Read Post

Broadcom

Read more about What Your SD-WAN Isn't Telling You

How DX NetOps Topology Streamlines and Optimizes Triage

Aug 7, 2025 By Sandeep Tiwary In Broadcom

Every network operator knows the feeling: a critical alert fires, and suddenly it’s all hands on deck. But instead of jumping straight to resolution, you find yourself sifting through irrelevant alerts, flipping between tools, and trying to assemble a puzzle with missing pieces. In today’s high-stakes, hybrid environments, that kind of delay isn’t just frustrating—it’s costly. When issues arise, fast, intelligent triage is a must.

Read Post

Broadcom

Read more about How DX NetOps Topology Streamlines and Optimizes Triage

Hash, store, join: A modern solution to log deduplication with ES|QL LOOKUP JOIN

Aug 7, 2025 By Adrian Chen In Elastic

Storage reduction example on PowerShell logs with full context.

Read Post

Elastic

Read more about Hash, store, join: A modern solution to log deduplication with ES|QL LOOKUP JOIN

Deletion protection in Grafana Cloud: a simple way to safeguard your observability stack

Aug 7, 2025 By Jose Ignacio Gil Jaldo In Grafana

We’ve all had that “uh-oh” moment. You press Enter and your blood runs cold, as you realize you just deleted something critical. For engineering teams, this type of disaster takes many forms. For example, maybe you used a DELETE statement without a WHERE clause to delete a row in a database, and accidentally deleted all of them instead. To protect you from the accidental deletion of critical resources in Grafana Cloud, we’re introducing a feature called deletion protection.

Read Post

Grafana

Read more about Deletion protection in Grafana Cloud: a simple way to safeguard your observability stack

Powering What's Next: ScienceLogic's Vision for Intelligent, Outcome-Driven IT

Aug 7, 2025 By ScienceLogic In ScienceLogic

The observability market is changing rapidly. The days of simply collecting logs, metrics, and traces are giving way to something bigger: delivering actionable intelligence that actually connects IT operations with business goals. Organizations don’t just want to know what’s happening anymore; they need to understand why it’s happening, what actions to take, and whether their systems can respond independently.

Read Post

ScienceLogic

Read more about Powering What's Next: ScienceLogic's Vision for Intelligent, Outcome-Driven IT

How to Run a Page Speed Test with Uptime.com

Aug 7, 2025 By Uptime Website Monitoring In uptime

Quickly test your website’s load time and performance with Uptime.com’s Page Speed Check, powered by Google Lighthouse. Analyze key metrics like LCP, TBT, and CLS to optimize user experience and SEO. Customize tests by location, device, and throttling.

View Video

uptime

Monitoring

Read more about How to Run a Page Speed Test with Uptime.com

Common Unity errors and how to fix them

Aug 7, 2025 By Abuld D. In Sentry

Unity has a reputation for handing out surprises: the play-mode freeze just after a hot-reload, the sudden sea of pink materials, or the stack trace that politely reminds you your transform was null all along. Rather than letting those moments derail the rest of your sprint, this post rounds up four of the most common runtime offenders, and shows you exactly how to trigger, spot, and fix each one.

Read Post

Sentry

Read more about Common Unity errors and how to fix them

How to build reliable and accurate synthetic tests for your mobile apps

Aug 7, 2025 By Addie Beach In Datadog

Mobile applications offer increased flexibility to both users and developers. Users can access content on a wide range of devices, operating systems, and network types, while developers can leverage touch screens and orientation-based layouts to create more responsive features. However, all of these factors create new testing challenges. To ensure a good user experience (UX), developers have to test their apps across many device models and platforms, which can become costly and time-consuming.

Read Post

Datadog

Read more about How to build reliable and accurate synthetic tests for your mobile apps

Tracing asynchronous systems in your event-driven architecture: When to use parent-child vs. span links

Aug 7, 2025 By Candace Shamieh In Datadog

Asynchronous communication patterns are commonly used in distributed systems, especially in those that rely on events or messages to coordinate activity. Rather than responding to direct API calls like in a traditional request-response architecture, services in an asynchronous system produce, route, or consume events and messages independently.

Read Post

Datadog

Read more about Tracing asynchronous systems in your event-driven architecture: When to use parent-child vs. span links

Keep an eye on remote access to your Kubernetes infrastructure with Datadog Workload Protection

Aug 7, 2025 By Guillaume Fournier In Datadog

To improve efficiency and reduce cloud spending, teams frequently schedule pods on Kubernetes nodes dynamically, based on available resources. However, this practice has also introduced a new security challenge: The workloads maintained by a development team are now spread between Kubernetes nodes, exposing more hosts and increasing the blast radius when user credentials are compromised.

Read Post

Datadog

Read more about Keep an eye on remote access to your Kubernetes infrastructure with Datadog Workload Protection

Visualizing Logs Alongside Metrics: A Practical Use Case

Aug 7, 2025 By Benjamin Pitts In MetricFire

Security threats aren’t always loud and don’t always crash systems or trigger alarms. Sometimes they creep in quietly as a steady stream of unauthorized login attempts, slow brute-force probes, or unknown IPs scanning your server for vulnerabilities. These behaviors often show up in logs before they surface in metrics but if you're only watching logs or only tracking metrics, you're missing part of the story.

Read Post

MetricFire

Read more about Visualizing Logs Alongside Metrics: A Practical Use Case

AI-driven alert triage and root cause analysis (RCA) that proactively responds to production alerts

Aug 7, 2025 By Logz.io In logz.io

Watch AI transform alert management in real-time. This technical demonstration compares manual alert investigation with AI alert investigation. It shows how AI agents automatically investigate production alerts, correlate telemetry across distributed systems, and identify root cause, faster and with more insights than manual processes. Watch and learn how to shift your team from reactive firefighting to proactive system reliability management with agentic AI.

View Video

logz.io

Read more about AI-driven alert triage and root cause analysis (RCA) that proactively responds to production alerts

Getting started with Freshdesk dashboards

Aug 7, 2025 By Blog In Squared Up

Freshdesk is a popular incident management system known for its ease of use, robust ticketing system, and powerful automation capabilities as part of the Freshworks suite of tools. While Freshdesk comes with native reporting and dashboards, they can be limited in terms of customization and data correlation across different sources. Additionally, building complex visualizations in Freshdesk often requires more advanced knowledge of their reporting tools. This is where SquaredUp comes in!

Read Post

Squared Up

Read more about Getting started with Freshdesk dashboards

Weaponized AI vs. AI Driven Security Posture Management: Why the Battle Starts in Misconfigurations

Aug 6, 2025 By Teneo In Teneo

August 5, 2025, Las Vegas Black Hat 2025, Abnormal AI officially launched its Security Posture Management for Microsoft 365. This release marks a critical turning point. In an era where attackers weaponized AI to uncover and exploit misconfigured cloud environments at machine speed, reactive security simply can’t keep pace. Threat actors are now leveraging automated AI to scan systems, identify configuration drift, escalate privileges, and deploy zero‑day exploits in seconds.

Read Post

Teneo

Read more about Weaponized AI vs. AI Driven Security Posture Management: Why the Battle Starts in Misconfigurations

Balancing Speed and Safety with Continuous Delivery

Aug 6, 2025 By Luke Bond In InfluxData

The benefits of continuous delivery are well known these days: rapid feedback, speed of innovation, reduced fault recovery time, and increased confidence in release processes. Along the same lines, those who release less frequently are likely to encounter more stress. Continuous delivery is a spectrum; it doesn’t have to mean blasting every commit to all production environments at once. So, how do we strike a balance between speed and safety?

Read Post

InfluxData

Read more about Balancing Speed and Safety with Continuous Delivery

Boosting Session Replay performance on iOS with View Renderer V2

Aug 6, 2025 By Phil Niedertscheider In Sentry

After making Session Replay GA for Mobile, the adoption rose quickly and more feedback reached us. In less great news, our Apple SDK users reported that the performance overhead of Session Replay on older iOS devices made their apps unusable. So we went on the journey to find the culprit and found a solution that yielded 4-5x better performance in our benchmarks.

Read Post

Sentry

Read more about Boosting Session Replay performance on iOS with View Renderer V2

Size-capped telemetry storage with ClickHouse and Coroot

Aug 6, 2025 By Nikolay Sivko In Coroot

Cloud platforms make it incredibly easy to store data. Object storage feels endless, and block volumes can be resized anytime. That’s great, until you check the cost. In some cases, like financial transactions, storage costs are tiny compared to the value of the data. But observability is a different story. Logs, traces, and profiles can be extremely detailed and often take up more space than the actual business data. Yes, there are situations where logs need to be kept for compliance reasons.

Read Post

Coroot

Read more about Size-capped telemetry storage with ClickHouse and Coroot

Network Visualization Tools: Key Features and Top 6 Tools in 2025

Aug 6, 2025 By Dallon Robinette In Selector

Network visualization tools are software applications that allow users to represent, explore, and analyze network structures graphically. These networks can include computer and telecommunication infrastructure, as well as social, biological, and organizational networks. Visualization is achieved by displaying nodes (entities) and edges (relationships), making complex datasets easier to interpret and manage.

Read Post

Selector

Read more about Network Visualization Tools: Key Features and Top 6 Tools in 2025

Manual vs. AI-Driven Alert Triage and RCA: Who Will Win?

Aug 6, 2025 By Seth King In logz.io

Curious to see how AI actually performs in a real-world production scenario? Watch the webinar “AI-Driven Alert Triage and RCA” with Logz.io Customer Success Engineer, Seth King. Below, we also bring the main highlights of the webinar. AI claims to make engineers more efficient and agile, by shortening processes and surfacing insights that help drive decisions.

Read Post

logz.io

Read more about Manual vs. AI-Driven Alert Triage and RCA: Who Will Win?

Log Format Standards: JSON, XML, and Key-Value Explained

Aug 6, 2025 By Faiz Shaikh In Last9

Your log format defines how your application records events. The structure you choose shapes how logs get parsed, indexed, and queried. It affects how quickly you can debug issues, build alerts, or control storage usage. In this guide, we'll take a look at the log formats developers typically use, the essential fields to include, and what trade-offs to consider before locking down a format for your system.

Read Post

Last9

Read more about Log Format Standards: JSON, XML, and Key-Value Explained

Prevent cloud misconfigurations from reaching production with Datadog IaC Security

Aug 6, 2025 By Roman Olynyk In Datadog

Modern infrastructure is built and deployed faster than ever, but increased speed can elevate risk. Developers who work on cloud-native applications often use infrastructure as code (IaC) to define cloud resources in configuration files, which are then shared across teams and deployed automatically. Although this approach is efficient, undetected misconfigurations in IaC can quickly introduce security risks into production environments.

Read Post

Datadog

Read more about Prevent cloud misconfigurations from reaching production with Datadog IaC Security

Introducing the Coralogix SLO Center

Aug 6, 2025 By Coralogix In Coralogix

Are you struggling to define reliability targets? Teams nowadays are turning to Service Level Objectives (SLOs), reliability targets that can be used to define how much you can play around with your systems before users are affected too much. While they're a great way of defining reliability targets, they are difficult to manage. That's why we built the SLO Center. One place to define, track, zoom into, and stay on top of all your reliability targets and error budgets - so you can be sure when you can experiment, and when it's best to stay safe.

View Video

Coralogix

Read more about Introducing the Coralogix SLO Center

AI Replay Summaries in Sentry Arrive!

Aug 6, 2025 By Sentry In Sentry

Replays in Sentry are awesome. With one property in your Sentry config you can start capturing video-like replays of user interactions with your application, but the problem is... you still have to watch them... but not anymore! AI replay summaries take your replays and run the events through an LLM to summarize the events that happened in them. They are broken up into chapters, with the breadcrumb sequences embedded in, so you can quickly get context of whats happening in every replay.

View Video

Sentry

Read more about AI Replay Summaries in Sentry Arrive!

3 Signs You've Outgrown Scripts and Spreadsheets for Network Configs

Aug 6, 2025 By ScienceLogic In ScienceLogic

In the early days of any IT operation, pragmatism rules. Most network teams start with what’s readily available—custom scripts, Excel spreadsheets, shared network drives, and tribal knowledge. It’s cost-effective and familiar. But as your organization grows, so does the complexity of your network. Devices multiply, configurations diversify, and the operational risk of keeping everything “stitched together” with manual methods increases exponentially.

Read Post

ScienceLogic

Read more about 3 Signs You've Outgrown Scripts and Spreadsheets for Network Configs

A guide to cloud unit economics

Aug 6, 2025 By David Lentz In Datadog

As you analyze your organization's cloud spending, you'll often find that stakeholders have different perceptions of what that spending brings you. This is especially true when overall costs are rising and it's hard to distinguish waste from valuable investments in growth. But when finance, engineering, and product teams can all connect cloud spending to specific business outcomes, you gain the ability to make data-driven decisions about how to maximize the value of that spending.

Read Post

Datadog

Read more about A guide to cloud unit economics

Nothing about today's Internet stays in one place... so why does your monitoring?

Aug 6, 2025 By Catchpoint In Catchpoint

Users are mobile. Apps are elastic. Traffic shifts constantly across clouds, ISPs, and geographies. Monitoring needs to adapt to that reality. You need visibility that moves with your users and your applications, wherever they go, however they connect. The Internet is now your application fabric. And your monitoring strategy should reflect that!

View Video

Catchpoint

Read more about Nothing about today's Internet stays in one place... so why does your monitoring?

Can External Data Predict System Failures?

Aug 6, 2025 By OpsMatters In OpsMatters

Something critical just went down. Again. So you troubleshoot and find out everything's clean - logs, metrics, nothing seems out of the ordinary. You didn't think to look out the window, right? Let's rewind a couple of hours. The temperature spiked 15 degrees outside, the humidity was at 90% and a storm came out of nowhere. Meanwhile, your edge device is sitting in a box on a pole somewhere; it never stood a chance.

Read Post

OpsMatters

Read more about Can External Data Predict System Failures?

Sponsored Post

AI realism (part one)

Aug 5, 2025 By JD Trask In Raygun

Emotions are running high about AI technologies. In this 2-parter, I do my best to make a rational case on the reality of AI, and how we can respond to it. This is part one; part two next week. We seem to be struggling to have pragmatic discussions about advancements in Artificial Intelligence. It's hard to hear calmer voices over the detractors and breathless enthusiasts. Today, I want to make a reasoned, evidence-based case for the potential of this technology, glance at present and future applications, and offer some practical examples for implementing AI within an organization.

Read Post

Raygun

Read more about AI realism (part one)

Pinpointing Logon Duration Issues with Precision: Game-Changing Enhancements in MetrixInsight for Citrix VAD/DaaS

Aug 5, 2025 By GripMatix In GripMatix

At GripMatix, we’re committed to giving IT teams deep, actionable visibility into their Citrix environments, going well beyond what’s available in native tools like Citrix Director or Monitor. With our latest update to the Citrix User Experience (UEX) Analyzer in MetrixInsight for Citrix VAD/DaaS, we’ve taken diagnostics and troubleshooting to the next level by introducing powerful new metrics and insights.

Read Post

GripMatix

Read more about Pinpointing Logon Duration Issues with Precision: Game-Changing Enhancements in MetrixInsight for Citrix VAD/DaaS

New Feature - Vulnerable System Drivers Monitoring

Aug 5, 2025 By Babu Sundaram In eG Innovations

Vulnerable system drivers continue to be a vector exploited by attackers to compromise systems. In eG Enterprise version 7.5 we added a number of periodic security checks to assist administrators proactively identify weaknesses, including vulnerable system drivers monitoring.This new capability is supported for a Windows OS, when using a VM agent for inside view monitoring and / or when monitoring an Azure Virtual Desktop session host.

Read Post

eG Innovations

Read more about New Feature - Vulnerable System Drivers Monitoring

Leaning into AI, ML, and observability to manage your ever-growing infrastructure

Aug 5, 2025 By Ty Bekiares In Elastic

The complexity and scale of modern infrastructure requires an equally intelligent set of observability tools to effectively monitor it. Remember when scaling meant ordering new servers and racking them in a data center? Remember when cloud providers first offered access to seemingly infinite virtual machines at the click of a button? Remember when Kubernetes made it trivial for infrastructure to automatically scale itself based on demand?

Read Post

Elastic

Read more about Leaning into AI, ML, and observability to manage your ever-growing infrastructure

How Business Leaders can Succeed with ValueOps Value Stream Management from Broadcom

Aug 5, 2025 By ValueOps by Broadcom In Broadcom

See how Camille and her team successfully leverage ValueOps throughout their entire organization, to help everyone stay informed, aligned and delivering efficiently.

View Video

Broadcom

Read more about How Business Leaders can Succeed with ValueOps Value Stream Management from Broadcom

Coralogix becomes first observability vendor to earn ISO/IEC 42001:2023 certification for responsible AI

Aug 5, 2025 By Coralogix Team In Coralogix

We’re proud to announce that Coralogix is now officially ISO/IEC 42001:2023 certified, becoming the first observability vendor to achieve this globally recognized standard for responsible AI management. ISO/IEC 42001:2023 is the world’s first international standard for Artificial Intelligence Management Systems (AIMS). It provides a comprehensive framework for how organizations should govern AI, focusing on transparency, ethical use, accountability, and regulatory compliance.

Read Post

Coralogix

Read more about Coralogix becomes first observability vendor to earn ISO/IEC 42001:2023 certification for responsible AI

Coralogix SLO Center & SLO Alerts are now available

Aug 5, 2025 By Chris Cooney In Coralogix

Coralogix has released a new flagship service management product, the SLO Center. The SLO Center allows customers to define service level objectives (SLOs) for their teams. SLOs can be defined across multiple services or metric streams. Powered by the Coralogix Streama engine, this unlocks full coverage SLOs for every team, regardless of volume and with very high cardinality limits.

Read Post

Coralogix

Read more about Coralogix SLO Center & SLO Alerts are now available

PostgreSQL Performance: Faster Queries and Better Throughput

Aug 5, 2025 By Faiz Shaikh In Last9

A PostgreSQL setup that performed well with 10,000 users starts to show strain at 100,000. Queries that once returned in under 50ms now take over 2 seconds. The connection pool regularly hits its limit during peak usage, leading to timeouts and degraded performance. This blog focuses on practical ways to reduce query latency by 50–80% and increase throughput for high-concurrency environments.

Read Post

Last9

Read more about PostgreSQL Performance: Faster Queries and Better Throughput

Getting started with MongoDB dashboards

Aug 5, 2025 By Blog In Squared Up

MongoDB is a popular NoSQL database used by many modern web applications. Once your web application is up and running, you might find you need to monitor the application data for operational purposes. For example, you may need to report on user sign-ups, or monitor for problems like invalid data. SquaredUp is an easy-to-use dashboard that plugs directly into your MongoDB database to visualize and monitor your data.

Read Post

Squared Up

Read more about Getting started with MongoDB dashboards

Patterns for safe and efficient cache purging in CI/CD pipelines

Aug 5, 2025 By Nicholas Thomson In Datadog

"There are only two hard things in Computer Science: cache invalidation and naming things."—Phil Karlton In the age of increasingly frequent deploys, edge caching, and Jamstack adoption, caching plays a key role across the software delivery life cycle. In build and CI pipelines, caching compiled assets or dependencies helps reduce compute costs, speed up job runtimes, and lower the environmental impact (regarding energy usage) of repeated builds.

Read Post

Datadog

Read more about Patterns for safe and efficient cache purging in CI/CD pipelines

New in Grafana Alerting: a faster, more scalable way to manage your alerts in Grafana

Aug 5, 2025 By Alejandro Fraenkel In Grafana

Effective alerting is the backbone of any observability strategy. But as your systems grow, managing hundreds or even thousands of rules can become a significant challenge. And when something goes wrong, the last thing you want is to fight with your tooling. That’s why we’re thrilled to announce the launch of our brand new alert rules list page, which we built to provide a faster, more intuitive, and scalable experience for teams of all sizes!

Read Post

Grafana

Read more about New in Grafana Alerting: a faster, more scalable way to manage your alerts in Grafana

Goodput vs Throughput: The Differences and How They Affect Your Network

Aug 5, 2025 By Andrii Kernitskyi In Obkio

Two key metrics that often come up in discussions about network performance are throughput and goodput. While these terms may seem similar, they highlight different aspects of your network’s efficiency and misunderstanding them can lead to poor decision-making that can impact the way you manage your network and your business’ resources.

Read Post

Obkio

Read more about Goodput vs Throughput: The Differences and How They Affect Your Network

Resilience with Zero Data Loss in High-Volume Telemetry Pipelines with OpenTelemetry and Bindplane

Aug 5, 2025 By Andy Keller In ObservIQ

This was the problem one Bindplane customer had with processing enormous S3-stored log files. Our engineering team tackled the problem head-on, enhancing the S3 event receiver with offset tracking and chaos testing methodologies.

Read Post

ObservIQ

Read more about Resilience with Zero Data Loss in High-Volume Telemetry Pipelines with OpenTelemetry and Bindplane

Secure by Design: IT Modernization for Government

Aug 5, 2025 By ScienceLogic In ScienceLogic

As government agencies modernize IT infrastructure, many are shifting to hybrid and multicloud environments. But this evolution brings heightened exposure to cyber threats. For the public sector, where data protection is tied to national security and public trust, compliance is more than a box to check—it’s the front line of defense. FedRAMP (Federal Risk and Authorization Management Program) provides a standardized framework for securing cloud services used by U.S. agencies.

Read Post

ScienceLogic

Read more about Secure by Design: IT Modernization for Government

Ten Minute Troubleshooting: Meet (and Monitor) Users Where They Are

Aug 5, 2025 By Catchpoint In Catchpoint

What do you do if your monitoring, APM, and synthetic tools tell you an application is up, but the users say it’s not? A good first question is to ask where your monitoring tools are located relative to both the users and the application itself. In this episode Mursi helps Leon identify his “red-light, green light” issue and adjust his monitoring to do a better job showing the REAL user’s experience.

View Video

Catchpoint

Monitoring

Read more about Ten Minute Troubleshooting: Meet (and Monitor) Users Where They Are

What's New at Catchpoint, episode 2

Aug 5, 2025 By Catchpoint In Catchpoint

This month, Leon breaks down the new Internet Resilience Report, enhancements to Catchpoint’s API monitoring suite, and some of the reasons for replacing selenium with Playwright and Puppeteer as Catchpoint’s web test scripting tool. In the “New to You” segment, he reviews SSL certificate monitoring.

View Video

Catchpoint

Read more about What's New at Catchpoint, episode 2

Behind the Dashboard - Catchpoint Traceroute

Aug 5, 2025 By Catchpoint In Catchpoint

Behind the Dashboard is an ongoing series where we look under the hood of a specific Catchpoint feature. Each episode breaks down the technology itself, what’s challenging about using it for monitoring, and how we removed friction and toil to make it a valuable part of the Catchpoint platform. In this episode Leon, Brandon, and Sergey take a look at “traceroute” tests – a feature that may seem humble and unassuming, but has unexpected power and utility when it comes to identifying performance issues with your site, service, or application.

View Video

Catchpoint

Read more about Behind the Dashboard - Catchpoint Traceroute

Ten Minute Troubleshooting: BGP Black Holes

Aug 5, 2025 By Catchpoint In Catchpoint

When a website stops working all of a sudden, it could be down, or it could be something more complex. In this episode Sheldon jumps in to help Leon figure out what’s going wrong with his livestream, where they discover an unexpected black hole.

View Video

Catchpoint

Monitoring

Read more about Ten Minute Troubleshooting: BGP Black Holes

Making LLMs Observable : AI, OpenTelemetry and Rise of MCP

Aug 5, 2025 By SigNoz - Open Source Observability Platform In SigNoz

Making LLMs Observable : AI, OpenTelemetry and Rise of MCP.

View Video

SigNoz

Read more about Making LLMs Observable : AI, OpenTelemetry and Rise of MCP

Top 5 EdTech outages detected by StatusGator in July 2025

Aug 4, 2025 By Colin Bartlett In StatusGator

July 2025 saw several significant service disruptions affecting the education technology (EdTech) ecosystem. From online learning platforms to creative tools used by teachers and students, these outages caused widespread frustration. StatusGator monitored and detected these incidents, providing early alerts to help schools and organizations stay informed.

Read Post

StatusGator

Read more about Top 5 EdTech outages detected by StatusGator in July 2025

Top 5 outages detected by StatusGator in July 2025

Aug 4, 2025 By Colin Bartlett In StatusGator

Throughout July 2025, StatusGator detected several major outages impacting millions of users worldwide. From messaging services to satellite internet, these incidents disrupted critical tools and workflows. Here are the top five outages we monitored this month.

Read Post

StatusGator

Read more about Top 5 outages detected by StatusGator in July 2025

Domain Expiry and Its Impact on SEO: How to Monitor and Prevent Lapses

Aug 4, 2025 By Super Monitoring In Super Monitoring

Your domain name is your digital real estate. It is how customers find you, search engines rank you, and your brand builds trust in the digital world. Whether you run a small blog, an e-commerce store, or a large business, your domain is the foundation of your online activities. But what happens if you forget to renew it? A domain expiry can cause your site to go offline. It can also hurt your SEO rankings and affect your website traffic.

Read Post

Super Monitoring

Read more about Domain Expiry and Its Impact on SEO: How to Monitor and Prevent Lapses

The Outage You Can't Afford: Why CMI/CME Providers Need Autonomous Operations Now

Aug 4, 2025 By Matt Belanger In Digitate

Imagine if degrading network performance—not just bad code—disrupted your live stream during a high-profile event. Customers start flooding support lines. Social media lights up. Your NOC teams scramble to identify the root cause amid fragmented systems. The outage impacts not only your broadcast, but also subscriber logins, ad delivery, and mobile apps. Advertisers want refunds. Executives ask, “Why didn’t we see this coming?”

Read Post

Digitate

Read more about The Outage You Can't Afford: Why CMI/CME Providers Need Autonomous Operations Now

Introducing Cribl Guard

Aug 4, 2025 By Perry Correll In Cribl

Does sensitive data flowing through your network feel like a ticking time bomb? Well, it just might be. Legal mandates, security frameworks, and customer expectations have made the stakes higher than ever. One leaked spreadsheet of personally identifiable information (PII) can wipe out years of customer trust, rack up regulatory fines, and invite ransomware actors to your doorstep.

Read Post

Cribl

Read more about Introducing Cribl Guard

Save Hours on Troubleshooting with Automated Investigations

Aug 4, 2025 By Shyam Sreevalsan In netdata

How many times has your team stared at a dashboard, pointed to a spike, and asked a question that charts alone can’t answer? “What was the real impact of that deployment?” “Why are our Kubernetes pods in the us-east-1 cluster suddenly crashing?” “Are we wasting money on overprovisioned servers?” Answering these questions is the real work of operations and SRE.

Read Post

netdata

Read more about Save Hours on Troubleshooting with Automated Investigations

OnlineOrNot updates from June/July 2025

Aug 4, 2025 By Max Rozen In OnlineOrNot

In this latest update, I'll walk you through a few features I added that will make working with uptime checks less noisy, an alerts integration with Teams, and a few behind-the-scenes changes that will finally let me build mobile apps for OnlineOrNot.

Read Post

OnlineOrNot

Read more about OnlineOrNot updates from June/July 2025

What's the easiest way to check my website's uptime?

Aug 4, 2025 By Laurens Goethals In Oh Dear

Whether you're keeping a personal blog or manage a corporate site or online storefront, website downtime can cost money and can damage your reputation. Let alone when you're maintaining a bunch of different client sites. And while downtime can't always be prevented, it's really easy to at least keep track of things, and diagnose potential issues from there. So, let’s start with the easy part.

Read Post

Oh Dear

Read more about What's the easiest way to check my website's uptime?

What is Observability? (in 60 seconds)

Aug 4, 2025 By Coroot In Coroot

🐧🐝 Try Coroot fully #FOSS and check out the latest open source observability tips on our blog: https://t.ly/qBH9f

#opensource #linux #eBPF #observability #DevOps #Coroot #SREs #kubernetes #softwarelibre #freesoftware

View Video

Coroot

Read more about What is Observability? (in 60 seconds)

140x Cheaper than Datadog: Observability for everyone

Aug 4, 2025 By Coroot In Coroot

🐧🐝 Try Coroot fully #FOSS and check out the latest open source observability tips on our blog: https://t.ly/qBH9f

#opensource #linux #eBPF #observability #DevOps #Coroot #SREs #kubernetes #softwarelibre #freesoftware

View Video

Coroot

Read more about 140x Cheaper than Datadog: Observability for everyone

The MSP's DNS Security Checklist

Aug 4, 2025 By DNS Spy In DNS Spy

DNS is one of the most important and most overlooked layers in your client’s infrastructure. As an MSP, you’re often the one who gets blamed when something breaks—whether you control the DNS or not. And while many DNS problems are silent, their consequences are loud: email failures, website outages, and frustrated clients. This DNS security checklist will help you proactively identify and fix DNS risks across all your client domains.

Read Post

DNS Spy

Read more about The MSP's DNS Security Checklist

The Platform Engineer's Playbook: Mastering OpenTelemetry & Compliance with Mezmo and Dynatrace

Aug 4, 2025 By Mezmo In Mezmo

The rise of platform engineering has put a new team at the center of the developer experience. These teams are tasked with building the "paved road" for developers, which includes providing a robust, self-service observability stack. However, they face a dual mandate: provide a great developer experience and manage the ever-growing costs and complexity of the tools involved.

Read Post

Mezmo

Read more about The Platform Engineer's Playbook: Mastering OpenTelemetry & Compliance with Mezmo and Dynatrace

July product updates

Aug 4, 2025 By Colin Bartlett In StatusGator

July was a busy month here at StatusGator! We rolled out several highly requested integrations and a few handy improvements to make staying on top of service issues even easier. Here’s what’s new.

Read Post

StatusGator

Read more about July product updates

Building on the foundation of OpenTelemetry eBPF Instrumentation: what's new in Grafana Beyla 2.5

Aug 4, 2025 By Marc Tuduri In Grafana

Earlier this year, Grafana Labs donated Grafana Beyla — our open source eBPF-based, zero-code instrumentation tool — to OpenTelemetry under the new project name OpenTelemetry eBPF Instrumentation. In addition to reflecting our deep and long-standing commitment to the OpenTelemetry project, the donation marked a significant milestone in the evolution of zero-code eBPF instrumentation within the open source community at large.

Read Post

Grafana

Read more about Building on the foundation of OpenTelemetry eBPF Instrumentation: what's new in Grafana Beyla 2.5

What are Application Metrics?

Aug 4, 2025 By Anjali Udasi In Last9

Application metrics are structured, quantifiable signals that reflect how your software behaves in production. They capture key aspects of performance, response times, error rates, throughput, and resource usage, giving you a real-time view into the health of your system. Tracking the right metrics helps detect regressions early, surface latent issues before they impact users, and guide optimization decisions based on hard data, not guesswork.

Read Post

Last9

Read more about What are Application Metrics?

Tracking Safety: The Role of Mobile Monitoring in Protecting Vulnerable Family Members

Aug 4, 2025 By OpsMatters In OpsMatters

It's never been easier to stay connected with the people you care about. Thanks to smartphones and GPS technology, families now have powerful tools to protect their loved ones-whether they're across town or across the country. But these same tools raise important questions: how much should we monitor, and when is it necessary? Let's explore how mobile tracking can help safeguard the most vulnerable members of our families-from kids to grandparents-and how to use it responsibly.

Read Post

OpsMatters

Read more about Tracking Safety: The Role of Mobile Monitoring in Protecting Vulnerable Family Members

Netdata Now Troubleshoots Your Alerts for You

Aug 3, 2025 By Shyam Sreevalsan In netdata

The 2 AM pager alert. For anyone in Ops, SRE, or IT administration, those words trigger a familiar sense of dread. An alert has fired. Is it a real fire, or another false alarm waking you from a dead sleep? The pressure is on. Every minute of downtime costs money and reputation, but troubleshooting a complex system when you’re sleep-deprived is a Herculean task.

Read Post

netdata

Read more about Netdata Now Troubleshoots Your Alerts for You

How We Think About "Developer Marketing" at SigNoz

Aug 3, 2025 By Ankit Anand In SigNoz

“Developers hate marketing.” Do they, really? I often hear this thrown around on podcasts about DevTools marketing, and while it’s true that developers don’t respond to the same old marketing tactics, they do respond to genuine communication. The reason developers are hard to “market” to is that they are also the builders of the stuff you want to sell.

Read Post

SigNoz

Read more about How We Think About "Developer Marketing" at SigNoz

Incident Commander Role: Responsibilities and Best Practices

Aug 3, 2025 By Nuno Tomas In isDown

When a critical system goes down at 3 AM, the difference between a quick resolution and hours of costly downtime often comes down to one role: the incident commander. This person serves as the central coordinator during IT incidents, making crucial decisions that can save thousands of dollars per minute.

Read Post

isDown

Read more about Incident Commander Role: Responsibilities and Best Practices

Automated Seer in Under 2 Minutes

Aug 1, 2025 By Sentry In Sentry

What if you had 5 errors, and instead of coming back to 5 issues in your feed, you got 5 pull requests fixing them? Seer is Sentry's new AI Debugging agent. it's able to stitch together all the context from your logs, stack traces, distributed tracing, codebase, and issues and figure out what broke, where, and how to fix it. Seer automation lets you automate that flow - and end up with a nice PR waiting for you to merge if it looks good. Check it out!

View Video

Sentry

Read more about Automated Seer in Under 2 Minutes

What Are Packet Bursts: Causes, Fixes & How to Find Them

Aug 1, 2025 By Andrii Kernitskyi In Obkio

Have you ever been in the middle of an important video call, only for it to glitch or freeze out of nowhere? Or did an application suddenly slow down right when you needed it most? These frustrating moments can often be caused by something hidden in the background: packet bursts. But what exactly are packet bursts, and why do these sudden surges in data traffic catch you off guard when your network seems steady? Are they just random spikes in the data flow, or is there something deeper causing them?

Read Post

Obkio

Read more about What Are Packet Bursts: Causes, Fixes & How to Find Them

Applying AI/ML in Observability - Tech Talk #7

Aug 1, 2025 By VictoriaMetrics In VictoriaMetrics

Ready to master anomaly detection? Join us for Part 2 of our "Applying AI/ML in Observability" series, where we do a deep dive into vmanomaly! In this live stream, Mathis and Marc will be joined by a very special guest: Fred Navruzov, the lead developer and mastermind behind VictoriaMetrics' vmanomaly. If you want to move beyond the basics and unlock the full potential of AI-driven observability, this is a session you can't afford to miss.

View Video

VictoriaMetrics

Read more about Applying AI/ML in Observability - Tech Talk #7

Jaeger Monitoring: Essential Metrics and Alerting for Production Tracing Systems

Aug 1, 2025 By Anjali Udasi In Last9

Your Jaeger setup is running. Traces are coming in, and the UI is helping you spot slow services or debug broken flows. But just like any part of your observability stack, Jaeger needs some basic monitoring to stay reliable. If the collector starts queueing spans or the agent runs out of buffer, it can lead to dropped traces, sometimes without any obvious sign in the UI. This blog focuses on the operational side of Jaeger.

Read Post

Last9

Read more about Jaeger Monitoring: Essential Metrics and Alerting for Production Tracing Systems

SLF4J and Log4j - Understanding the Differences

Aug 1, 2025 By Loggly Team In SolarWinds

Good logging isn’t optional when building Java applications—it’s critical. Logs are often the first place we turn to when something breaks and are essential for performance tuning, security audits, and long-term maintainability. Two names come up in the Java logging conversation: Simple Logging Facade for Java (SLF4J) and Log for Java (Log4j). They sound similar and often work together, but they serve distinct roles.

Read Post

SolarWinds

Read more about SLF4J and Log4j - Understanding the Differences

Explore the NiCE MariaDB Management Pack in Action2025Q3

Aug 1, 2025 By NiCE IT Management Solutions In NiCE IT Mgmt

If you’re running critical MariaDB workloads and need reliable, performance-focused monitoring, this session is for you. You’ll get a live walkthrough of the Management Pack, learn how it integrates seamlessly into SCOM, and explore real-world use cases to improve your database monitoring strategy.

View Video

NiCE IT Mgmt

Read more about Explore the NiCE MariaDB Management Pack in Action2025Q3

Librato on Heroku is Going Away and Hosted Graphite Is the Better Next Step

Aug 1, 2025 By Benjamin Pitts In MetricFire

Librato (a SolarWinds product) is being sunsetted summer of 2025, and that directly affects Heroku teams who’ve relied on the Librato add-on for “good enough” visibility into dynos, routers, and Postgres. If you’re in that group, you’ll need a replacement monitoring add-on that keeps you covered on Heroku and lets you grow beyond it without re-architecting how you ship metrics.

Read Post

MetricFire

Read more about Librato on Heroku is Going Away and Hosted Graphite Is the Better Next Step

Securing the Invisible: Why Ambient AI Needs Next-Gen Security

Aug 1, 2025 By Teneo In Teneo

If, like me, you’re continuously striving to keep pace with the ever-evolving world of artificial intelligence, you’re probably hearing a lot about how Ambient AI is poised to dominate discussions and developments throughout the second half of 2025. Ambient AI refers to artificial intelligence systems that operate unobtrusively in the background of our daily environments, constantly sensing, analyzing, and responding to various inputs without explicit human interaction.

Read Post

Teneo

Read more about Securing the Invisible: Why Ambient AI Needs Next-Gen Security

Selector MCP and the Future of Modular Automation

Aug 1, 2025 By John Capobianco In Selector

In the first two parts of this series, we explored why modern network operations demand intelligent automation and how AI agents can reason, act, and collaborate to solve complex problems. We examined the frameworks – such as ReACT, LangGraph, and Pydantic – that power these agents, and how the Model Context Protocol (MCP) facilitates seamless integration with tools and services. But theory alone doesn’t improve network uptime or reduce manual toil.

Read Post

Selector

Read more about Selector MCP and the Future of Modular Automation

Top Tools for Monitoring Crypto Market Movements in 2025

Aug 1, 2025 By OpsMatters In OpsMatters

Crypto moves fast, and trading happens around the clock. Prices react in real time, instantly responding to everything from online chatter to global economic shifts. As a result, staying informed isn't just helpful, it's necessary in 2025.

Read Post

OpsMatters

Read more about Top Tools for Monitoring Crypto Market Movements in 2025

Operations | Monitoring | ITSM | DevOps | Cloud