Operations | Monitoring | ITSM | DevOps | Cloud

April 2023

Cloud Capacity Planning Is a Hit-or-Miss Exercise That Mostly Misses

The goal of capacity planning is to match resources with demand. There are essentially three outcomes from this analysis. You can underestimate the resources you need (underprovision), which can hurt performance. You can overestimate (overprovision), which adds unnecessary costs. Or you can get it just right (rightsized). And, of course, you want to be rightsized at the lowest possible cost. Because many factors go into cloud capacity planning, it can feel like more of an art than a science.

From Monolithic to Microservices: Code Instrumentation Trends

Software architectures are greatly influenced by the size and scale of the software applications. With growing size, the code base becomes complex. With scale, the deployment becomes challenging. The result: debugging becomes an increasingly time-consuming process for developers.

How Digital Accessibility Solutions Can Benefit Your Website

As the digital world continues to expand and evolve, it's becoming increasingly important for businesses to consider accessibility when designing their online presence. With this comes a variety of digital accessibility solutions that can help ensure everyone on the web can access your content with ease. Understanding what these solutions are, the benefits they offer, and how you can incorporate them into your website is key to ensuring an inclusive user experience that caters to a wide range of visitors.

Reducing Log Volume with Log-based Metrics

As the amount of telemetry being collected continues to grow exponentially, businesses are continuously seeking cost-effective ways to monitor and analyze their systems. Data collection and monitoring can be expensive, especially when dealing with large volumes of logs. One approach to maintaining visibility while reducing the amount of data collected is through creating log-based metrics.

Monitoring: The Rise of Data Observability

There’s an increasingly high cost to poor data quality today. Poor customer data costs companies six percent of their total sales, as per a UK Royal Mail study. And, as per IBM, bad data costs U.S. businesses $3.1 trillion per year. As companies transform into data-driven businesses, we witness a sharp interest in data observability.

Five worthy reads: Multi-cloud strategy in the digital era

Five worthy reads is a regular column on five noteworthy items we have discovered while researching trending and timeless topics. This week we are exploring the multi-cloud strategy and why its the next biggest thing in cloud computing. In an age where digital innovations happen at breakneck speed, the cloud has become crucial to every enterprise.

Sponsored Post

What is Platform Engineering and Why Does It Matter?

In the era of cloud-native development, as businesses rely on a growing number of software tools to enable agile application delivery, platform engineering has emerged as a crucial discipline for building the technology platforms that drive DevOps efficiency. In this blog post, we explain the growing importance of platform engineering in high-performance DevOps organizations and how platform teams enable DevOps efficiency, agility, and productivity.

The Art of Proactive Network Monitoring: Stay Ahead of the Game

Imagine you're playing a game of chess, but instead of reacting to your opponent's moves, you can see five moves ahead and plan your strategy accordingly. That's the power of proactive network monitoring. Instead of waiting for issues to arise and reacting to them, you can stay several steps ahead and keep your network running smoothly. In this post, we'll explore the art of proactive network monitoring and how it can help you stay ahead of the game.

Whisper Data Migration to the Cloud

In the modern business landscape, the recent surge in cloud computing has become a game-changer, fundamentally altering how organizations manage their IT infrastructure. As businesses increasingly embrace digital transformation, migrating services and applications to the cloud has emerged as a crucial factor in guaranteeing scalability, flexibility, and cost efficiency.

Technical Education Can Drive Long Term Success for Organizations

Technology drives so many aspects of our lives in today’s fast paced world. Keeping up with how technology helps us in our work and personal lives is challenging and it's sometimes unclear as to how to go about learning new technology skills to drive business resilience across an organization.

Feature Spotlight: Dynamic Kubernetes Observability Dashboards

If you're a software engineer working with Kubernetes, you know how vital it is to have accurate, real-time information about your applications and resources. With StackState's dynamic Kubernetes observability dashboards, you can now access all the essential data you need for troubleshooting on a single screen. In this blog post, we'll discuss the key features of these dashboards, why they're valuable and how to get started with them.

Top DNS Monitoring Tools

The modern business heavily depends on an active online presence for customer engagement and revenue generation. But have you ever stopped to think about what makes these online services accessible to users around the world? Enter the Domain Name System (DNS), a crucial component of the internet infrastructure that translates domain names into IP addresses and directs traffic to the appropriate server.

Plexporters, Energize: How we monitor Plex with Grafana

As a Grafanista, you tend to find things to visualize — databases, microservices, classic video games, etc. It’s part of our “big tent” philosophy. So when our December hackathon rolled around, some of us in our internal homelab Slack channel decided to take a look at how we could get metrics out of our Plex Media Servers.

Embracing Observability with InfluxDB 3.0: Unlimited Cardinality and Native SQL Support

As the complexity of modern applications continues to increase, so too does the demand for comprehensive observability solutions. Organizations looking to enhance their applications’ performance, reliability, and scalability need powerful tools that allow them to monitor, analyze, and visualize their infrastructure. One such tool is InfluxDB 3.0, a time series database designed to handle large-scale monitoring and analytics workloads.

Root cause analysis with logs: Elastic Observability's AIOps Labs

In the previous blog in our root cause analysis with logs series, we explored how to analyze logs in Elastic Observability with Elastic’s anomaly detection and log categorization capabilities. Elastic’s platform enables you to get started on machine learning (ML) quickly. You don’t need to have a data science team or design a system architecture. Additionally, there’s no need to move data to a third-party framework for model training.

Data-Driven Decision Making: Leveraging Metrics and Logs-to-Metrics Processors

In modern business environments, where everything is fast-paced and data-centric, companies need to be able to track and analyze data quickly and efficiently to stay competitive. Metrics play a crucial role in this, providing valuable insights into product performance, user behavior, and system health. By tracking metrics, companies can make data-driven decisions to improve their product and grow their business.

ManageEngine ServiceDesk Plus awarded SERVIEW CERTIFIEDTOOL Seal of Quality for 13 ITIL 4 practices

We are delighted to announce that ServiceDesk Plus, our IT service management (ITSM) platform, has received certification for 13 ITIL 4 practices from SERVIEW GmbH, a leading IT management consulting, training and certification firm. SERVIEW is an independent management consultancy that makes solutions comparable and specializes in maximizing the performance of service organizations utilizing world-renowned best management practices.

5 Ways to Use Log Analytics and Telemetry Data for Fraud Prevention

As fraud continues to grow in prevalence, SecOps teams are increasingly investing in fraud prevention capabilities to protect themselves and their customers. One approach that’s proved reliable is the use of log analytics and telemetry data for fraud prevention. By collecting and analyzing data from various sources, including server logs, network traffic, and user behavior, enterprise SecOps teams can identify patterns and anomalies in real time that may indicate fraudulent activity.

Log monitoring and unstructured log data, moving beyond tail -f

Log files and system logs have been a treasure trove of information for administrators and developers for decades. But with more moving parts and ever more options on where to run modern cloud applications, keeping an eye on logs and troubleshooting problems have become increasingly difficult. Watch this video to learn how to go beyond tail -f and process custom and unstructured logs with Elastic.

Release features faster and track their impact with Flagsmith's integration and Datadog Marketplace offering

The pressure to release application features faster to meet the demands of customers presents a number of challenges, including unforeseen deployment delays, custom feature sets, and complex rollbacks when errors occur. To overcome these challenges, developers can use Flagsmith, an open source feature flagging and remote configuration service that allows developers to easily roll out and test new features for a specific subset of users.

Track and improve the performance of streaming data pipelines with Datadog Data Streams Monitoring

When managing queues and services in streaming data pipelines that use technologies like Kafka and RabbitMQ, SREs and application developers often struggle to determine if these pipelines are performing as expected. Visibility into the performance of a streaming data pipeline, after all, requires visibility into every component of that pipeline.

Monitor your TeamCity builds with Datadog CI Visibility

As the complexity of modern software development lifecycles increase, it’s important to have a comprehensive monitoring solution for your continuous integration (CI) pipelines so that you can quickly pinpoint and triage issues, especially when you have a large number of pipelines running.

What's New in Sysdig - March & April 2023

This month, Sysdig Secure’s Container Registry scanning functionality became generally available for all users. This functionality provides an added layer of security between the pipeline and runtime scanning stages. On Sysdig Monitor, we introduced a feature to automatically translate Metrics alerts in form-based query to PromQL. This allows you to choose between the convenience of form and the flexibility of PromQL.

Observable Frontends: the State of OpenTelemetry in the Browser

The modern standard for observability in backend systems is: distributed traces with OpenTelemetry, plus dynamic aggregations over these events. This works very well in the world of web servers. But what about the web client? This post describes the state of OpenTelemetry support for React web clients, as of early April 2023.

Observability: A Complete Guide

As technology advances, so does the need for software engineers and DevOps teams to understand the precise inner workings of the systems they create. In 2023, observability is quickly becoming a key factor in gaining success for many businesses. The study states that many businesses are in different stages of adopting observability into their arsenal of tools, and that the need for these practices is on the rise.

Live from KubeCon: Insider Insights with CNCF's Head of Ecosystem

We’ve just recently completed KubeCon + CloudNativeCon Europe 2023 in Amsterdam, one of the signature events of the year in the cloud native, open source and observability spaces. I was thrilled to be joined by Taylor Dolezal, the Head of Ecosystem for the hosting Cloud Native Computing Foundation (CNCF) to discuss the insider happenings of KubeCon EU live from Amsterdam for the April 2023 OpenObservability Talks podcast.

SCOM Authoring: How to use the module System.ExpressionFilter

System Center Operations Manager, SCOM, is a powerful tool for monitoring IT infrastructure, but sometimes the default monitoring criteria are not optimal to fit your specific needs. That is where the System.ExpressionFilter module comes in - it allows developers to create customized monitoring criteria tailored to fit the requirements of their particular organization.

Grafana k6 v0.44.0 release: web crypto API, Web Vitals metrics, and more!

Grafana k6 v0.44.0 has been released, featuring new experimental modules, an upgraded browser module, and tons of improvements. Get Grafana k6 0.44.0 Here’s a quick overview of the latest k6 news from the team and the community.

Stop Viewing Cybersecurity as an Expense

Nine. Million. Dollars. Well, $9.44 million to be exact for your average data breach according to the latest report from IBM, Cost of a Data Breach Report 2022. From 2017 to 2022, that number has only continued increasing from $7.35 million, an almost 30% increase in just five years. For a small company, a security breach can be the difference between staying open or closing the business. And for a Fortune 500 enterprise, that cost will be more severe.

The Hybrid Workplace is Here to Stay: New Study Reveals Hybrid Work Trends and How to Enable Resilience

For years, many organizations have hesitated to build a monitoring strategy for Workforce Experience, focusing on their consumer counterparts instead. A recent IDC Spotlight, sponsored by Catchpoint, suggests those days are soon to be numbered. In this blog, we will explore the key takeaways from the IDC Spotlight on hybrid work trends, entitled, Moving Toward a Hybrid-First Organization with Seamless Connectivity and Collaboration. Key Findings.

RabbitMQ Monitoring: Metrics, Tools, and Best Practices

RabbitMQ is a widely-used open-source messaging broker that facilitates message-oriented middleware for modern applications. It serves as a reliable and scalable mechanism for managing communication channels and messages between distributed applications. However, as modern distributed systems become increasingly complex, monitoring RabbitMQ's performance, availability, and other critical metrics becomes crucial for seamless operations. This is where RabbitMQ monitoring metrics come into play.

The 5Ws (and 1H) of InfluxDB Cloud Dedicated

Just like the classic Scott Bakula tv series, the new InfluxDB 3.0 is a quantum leap forward. Of course, for us it’s the evolution of the InfluxDB product suite. InfluxDB 3.0 is the designation for all products powered by the InfluxDB IOx engine. The latest product release in this new suite is InfluxDB Cloud Dedicated. Let’s jump into the basics for InfluxDB Cloud Dedicated. WHO: There are several different groups of users that should consider using InfluxDB Cloud Dedicated.

Building digital trust and fueling growth through application security

Security awareness is at an all time high. Companies need the right tools to support innovation while building digital trust that users demand. Learn how Cisco Secure Application can help solve this challenge. Security awareness skyrockets with every breach. In response, users are doubling down on vetting the trustworthiness of companies before transacting.

Mobile Monitoring: What is it, and is it Legal?

Mobile monitoring is a complex topic. On one hand, organizations can gather detailed data to help them identify and suggest productivity improvements through alternative or additional apps or processes, without sacrificing security. But on the other hand, monitoring done wrong can reduce employee morale and potentially violate data privacy laws. Exactly how companies should approach mobile app monitoring depends on the business problems they must solve.

10 Best Application Performance Monitoring Tools to Keep Your Apps in Top Shape

In today's digital world, applications are the backbone of many businesses. From web-based software to on-premises applications, they allow organizations to streamline their operations, increase productivity, and improve customer experiences. However, with the growing complexity of applications and the increasing demand for seamless performance, it has become crucial to monitor their performance continuously.

Coralogix Deep Dive - ML Driven Log Clustering with Loggregation

Coralogix Loggregation turns millions of log documents into a handful of templates, which are logs that are structurally similar. Loggregation will analyze both the fields in a log document and the structure of the message field, to find variables and constants. Loggregation enables engineers to prioritize their time and focus on the logs that are having the biggest impact on their system. Cut through the noise and get right to the detail, with Loggregation.

Coralogix Deep Dive - Tracking Every Interaction with Tracing and APM

Coralogix Tracing and APM functionality offers a unique troubleshooting experience, with rich information density, without the need to context switch, with native support for Serverless, Kubernetes, EC2 and many other services, as well as an elegant, living architecture diagram that gives instant visibility to even the most complex components of your system.

NetFlow Analyzer's cloud traffic monitoring: The new addition to our enterprise-grade traffic analysis tool

Given the benefits cloud computing offers such as security, flexibility, better collaboration, and data modernization, organizations of all shapes and sizes have started moving their workload to this technology. With so many organizations using workarounds to gain visibility and control of bandwidth consumption, and with cloud computing letting organizations access and manage the information, the amount of data being generated is also increasing.

A guide to mastering multiclient network management for MSPs

Managed service providers (MSPs) play a major part in increasing business continuity and enhancing the productivity of the businesses they cater to. Businesses these days rely on MSPs to take on the responsibility of managing their networks, so they need not spend most of their time on network management tasks like keeping track of network devices, troubleshooting network bottlenecks, etc.

Understanding Network Mapping with Site24x7

With cloud solutions becoming mainstream and hybrid workforces becoming the norm rather than the exception, organizations need to stay nimble and agile for network access that can occur anytime, from anywhere. A thriving organization has a flourishing network that extends its arms globally, yet delivers high speed and performance without any compromise. Hence, organizations rely heavily on their networks for uninterrupted operations.

How to Monitor Custom Metrics with AppSignal

Setting up custom metrics is an easy way to gain instant insights into the information you need (without scorching through log lines or struggling with complicated reporting tools). Supplement your application's critical monitoring data by tracking meaningful metrics to quickly identify and resolve potential issues. In this blog post, we'll show you how to set up and use custom metrics to remove your monitoring blind spots.

Introducing InfluxDB 3.0: Available Today in InfluxDB Cloud Dedicated

It’s been literally years now that I have been first tangentially, and then intimately involved with the project that has become InfluxDB 3.0. I started using it so early that one of the DataFusion upstream developers literally calls me “User0” … a moniker of which I am not-so-secretly proud.

Start Docking at the Port-The Importance of Monitoring Your Network Ports

Port monitoring is a necessary measure to make sure your network is running at an optimized level. For those unfamiliar with New England, many villages and cities began as fishing towns. These communities still do business on the sea, with multiple companies shipping out their products on massive boats. How do these boats gain access to the towns to continue doing business in them?

What is Windows Event Log?

Event logging for Microsoft Windows provides a standard, centralized way for applications and the operating system to record important software and hardware events. The event-logging service (eventlog) stores events from various sources in a single collection called an event log. The system administrator can use the event log to help determine what conditions caused the error and the context in which it occurred. TechTarget have an excellent overview of Windows event logs available.

The Best Data Science & Data Analytics Certifications for 2023: Beginner To Advanced

As data science and data analytics become integral to our everyday lives, the demand for certifications in this field is at an all-time high. There are many different paths to pursue a certification in these areas — still, some stand out above the rest. In this blog post, we will explore the top 5 data science and analytics certifications that are essential for anyone looking to build a strong career in this field.

A Short Overview: GitLab Tokens

During my recent work on extending our GitLab packaging capabilities, I came across various types of tokens that can be used to authenticate users, services, and pipelines while using GitLab CI/CD. Each token has its unique features and use cases that can help ensure the security and integrity of your GitLab environment. By understanding the features and use cases of each token, you can leverage them to enhance your GitLab CI/CD workflows and ensure the security of your GitLab environment.

Grafana 9.5 release: Grafana Alerting updates, stronger security with service accounts, upgraded dashboards, and more

Grafana 9.5 has arrived! 🔥🎉 Get Grafana 9.5 The latest Grafana release introduces new features and improvements, such as major Grafana Alerting improvements, dashboard and visualization enhancements, a redesigned navigation experience, support bundles for faster issue resolution, and much more to provide you with better insights into your data.

How to Perform AWS Monitoring: Tools and Best Practices

As more and more businesses adopt microservices and leverage cloud infrastructures, keeping track of services and resources becomes increasingly important. Comprehensive monitoring is needed to ensure optimal performance, avoid downtime and ensure all have a great user experience. But leaving monitoring to the cloud provider may not give you the insights you need.

Why Scalable Monitoring is Essential for Modern, Distributed Systems

It's becoming increasingly common to discuss the importance of scalability in monitoring solutions and how it can impact the performance and reliability of distributed systems. In today's rapidly evolving technological landscape, organizations are increasingly relying on distributed systems to power their operations. These systems consist of multiple interconnected components that work together to deliver a cohesive experience.

What Customers are Saying About Cribl Stream on Gartner Peer Insights

Since day one, Cribl has been on a mission to give users more control and more value from their observability and security data. We had a feeling that putting customers first would be the key to unlocking that value, so “Customers First, Always” went at the top of the list when the time came to talk about company values.

How to Troubleshoot Slow Network Performance and Win The Great Network Race

Welcome to the Great Network Race! You might not have realized it, but every time you connect to the internet, you're a participant in a never-ending competition. The goal? To have the fastest, most reliable network connection possible. But as we all know, that's not always the case. Slow network performance can be frustrating, time-consuming, and downright annoying.

Why is Capacity Planning crucial for running your business?

Site24x7 Capacity Planning Businesses are successful when their resources are aligned with their capacity. No business can afford to have their resources run at full capacity, because at that point, any increases past the capacity would lead to increasing failures. Capacity Planning is an action plan to ensure there's enough spare capacity to handle any increases in the workload, and cater to unprecedented workload spikes. Site24x7 and Capacity Planning

Introducing Search by User Click for Session Replay

You run an e-commerce website and notice a drop in sales on a specific product page. You suspect that users may be encountering an issue with the “Add to Cart” button, but you’re not sure what’s causing the problem. With Sentry Session Replay’s new search by user click feature, you can easily find replays where users clicked on that button and watch their sessions to see exactly what happened.

Sponsored Post

Monitoring AIX and Linux on IBM Power Systems using Microsoft SCOM

IBM Power systems are very popular for embedded and high-performance processors, as embedded applications provide an array of uses, including satellites, and the Mars rovers' Curiosity and Perseverance. They are widely used in the financial business, depending on secure high-speed transactions. Supercomputers like the IBM Summit, Sierra, and Lassen run on IBM Power systems.

Why Am I Seeing NGINX 502 Bad Gateway: PHP-FPM?

The NGINX Error 502 Bad Gateway is a common error among website users. There are various possible reasons for this error and different ways to solve it. In this article, we will look at the main possible causes and how they can be solved by users and web developers. Use MetricFire's platform to analyze your system’s performance and troubleshoot errors.

Log Shippers: The Key to Efficient Log Management

Logs are a vital source of information for any system, providing valuable insights into its performance and behaviour. However, with the increasing complexity of modern systems and the massive amount of data generated by them, managing logs can be a daunting task. This is where log shippers come into play. Log shippers are tools designed to simplify the process of collecting and forwarding log data to a centralized location, allowing for easy analysis and troubleshooting.

The Latest Version of OpenSearch Is Now Live On Logit.io

Logit.io is pleased to introduce the latest version of OpenSearch onto the platform, with an OpenTelemetry-compliant data schema that unlocks a host of future analytics and observability capabilities. Also included in this release are improvements in threat detection for security analytics workloads, visualization tools, and machine learning (ML) models.

How German drugstore chain Müller thrives with Icinga

We are proud of our many customers and users around the globe that trust Icinga for critical IT infrastructure monitoring. That´s why we´re now showcasing some of these enterprises with their Success stories. It´s stories from companies or organizations just like yours, of any size and different kinds of industries. Some of them are our long-standing customers, others have just recently profited from migrating from another solution to Icinga.

Optimize Network Performance Management with AppNeta

Regardless of an organization’s industry or size, virtually every critical business activity or service is at least partly reliant upon network connectivity. When network performance slows or issues arise that halt critical end-user services, it can be very costly to an organization. Lost employee productivity, eroded customer loyalty, and reduced sales can all be among the ramifications for even relatively brief service interruptions.

Gaming Industry: How Important are Logs for Systems?

In today’s fast-paced and highly-competitive gaming industry, providing a seamless and enjoyable gaming experience is essential to retain users. Games need to be responsive, offer high-resolution graphics, continuous uptime, and handle a huge amount of transactions. Having strong log analytics solution is essential to improve performance, identify issues, and fine-tune the player experience.

Introducing Internet Performance Monitoring: How does it help?

Catchpoint and ITOps Times break down 6 critical topics you need to understand to ensure Internet Resilience for your business in this bi-weekly microwebinar series, each lasting less than 10 minutes. Explore each of the topics in the series: In this second installment, we’ll dive deeper into the world of Internet Performance Monitoring (IPM). Learn how IPM can help you proactively find and fix issues in your Internet Stack before they impact the business.

Multiple players, one stack: Inside Roblox's centralized observability stack

When you sign into the Roblox platform, you get 30 million immersive experiences, ranging from concerts to fashion shows to, of course, video games. But when the observability team at Roblox logs on, they’re not playing around. The Roblox observability engineers are responsible for keeping more than 214 million monthly users happy and engaged by making the wildly popular gaming platform highly available around the world.

Now Available: The Flight SQL Plugin for Grafana

Today we have exciting news for Grafana customers with Flight SQL data sources: Now there is a new community plugin available for Grafana that allows it to communicate with Flight-SQL-compatible databases. Flight SQL is a client-server protocol developed by the Apache Arrow community for interacting with SQL databases. It utilizes the Flight RPC framework and the Arrow in-memory columnar format.

Why Your Data-Driven Strategies for Network Reliability Aren't Working

What do network operators want most from all their hard work? The answer is a stable, reliable, performant network that delivers great application experiences to people. In daily network operations, that means deep, extensive, and reliable network observability. In other words, the answer is a data-driven approach to gathering and analyzing a large volume and variety of network telemetry so that engineers have the insight they need to keep things running smoothly.

Before Taking the Plunge, Dip Your Toes in OTel

OpenTelemetry was launched in May 2019, as a merger of the OpenCensus and OpenTracing projects. The open-source, vendor-neutral project resides within the Cloud Native Computing Foundation (CNCF), which virtually ensures its longevity and widespread adoption. In fact, OpenTelemetry has gained significant traction in recent years, with support from many major cloud providers and the tech industry.

4 Differences Between DEM & RUM You Should Know

If you want to deliver an outstanding user experience you must know the differences between DEM and RUM. In this modern world, businesses are embracing digitization to provide better services to their customers. However, customer expectations and preferences have changed drastically over time. To address customer demands, businesses have started investing in systems and applications that enhance the user experience.

Should Every Incident Get a Retro?

At a recent training session, Jeli spent a great deal of time covering incident retrospectives and what makes an incident worthy of studying. My colleague Ben Hartshorne asked a fascinating question, which I’ll paraphrase here: That caught me by surprise. We had a great discussion, and it made me consider approaches I hadn’t before.

Supercharging Grafana with the Power of Telemetry Pipelines

Grafana is a popular open-source tool for visualizing and analyzing data from various sources. It provides a platform for creating interactive, customizable dashboards that display real-time data in various formats, including graphs, tables, and alerts. When powered by Mezmo's Telemetry Pipeline, Grafana can access a wide range of data sources and provide a unified view of the performance and behavior of complex systems.

Supercharging Elasticsearch with the Power of Telemetry Pipelines

Elasticsearch has made a name for itself as a powerful, scalable, and easy-to-use search and analytics engine, enabling organizations to derive valuable insights from their data in real-time. However, to truly unlock the potential of Elasticsearch, it is essential that the right data in the right format is provisioned to Elasticsearch. This is where integrating a telemetry pipeline can add value to Elasticsearch.

Find connections and expand your data visualization with new dashboards

One of my favorite movies of all time is WarGames, which depicted a teenage hacker accidentally breaking into NORAD and starting a nuclear war simulation that almost turned into a real catastrophe. The movie featured state-of-the-art dashboards (at least for 1983) showing simulated missile launches by different countries. Now you can create Sumo Logic dashboards like the ones shown in this movie using our new Connection Map panel.

7 Quick Tips for Working with Traces in OpenTelemetry

Avoiding vendor lock-in is a ‘must’ when it comes to working with new services. Those in ITOps, DevOps, or as an SRE also don’t want to be tied to specific vendors when it comes to their telemetry data. And that’s why OpenTelemetry’s popularity has surged lately. OpenTelemetry prevents you from being locked into specific vendors for the agents that collect your data.

Cost-Cutting Strategies and Smart Tooling Choices to Maximize Your Vendor Budget

Tech debt. Vendor redundancy. System fragmentation. Startups and cloud–born companies are looking at vendors for cost-cutting opportunities. But how do you balance vendor costs and value when those resources and tools bring efficiencies as high as the monthly bills? In this session, Charity Majors and Gergely Orosz share advice on managing spend in a vendor-dependent world.

Datadog Universal Service Monitoring Demo

See how you can get instant visibility into the health of your entire fleet of services—without requiring you to change a single line of code. By automatically discovering, mapping, and monitoring every service and dependency, Universal Service Monitoring allows you to detect issues faster, monitor service performance and SLOs across entire environments, and centralize all the knowledge about your services in a single place.

Distributed systems: Because a single computer can't deal with your procrastination

In our eagerness to bring you all the definitions of the world of technology we hired a sherpa, with calves made of steel, who travels the confines of the Himalayas collecting the most exclusive and inaccessible information from temple to temple. Because everyone knows it, the greatest secrets of technology are kept by the monks of monasteries like those of Rizong, Jammu or Kashmir, not the Internet.

How to throw custom exceptions inside Logic Apps: Using default capabilities - Extract failure information (Part II)

Welcome to the second part of this series of blog posts on How to throw custom exceptions inside Logic Apps. In this series of five blogs, I will cover throwing custom exceptions in Logic Apps. I will cover the following topics: In this second approach, we are going to do a small fine-tuning of the previous approach by adding the capability to define custom error messages for each condition and, of course, get that information inside the Catch Scope.
Sponsored Post

Understand the emerging digital employee experience market (DEX)

The Digital Employee Experience (DEX) is a term that's been gaining momentum in recent years. Simply put, DEX refers to the employee's experience of technology at work. This includes everything from the devices and applications they use to the network connectivity and support they receive. The Digital Employee Experience (DEX) is "a holistic approach to designing and managing the technology environment in which employees work." It includes a variety of components, such as digital workspaces, collaboration tools, and workflow systems. DEX is all about creating an environment that is engaging, productive, and conducive to a positive employee experience.

We're delighted to announce that ManageEngine has been named a Technology Leader in the SPARK Matrix: IT Service Management Tools, 2022 report

ManageEngine’s flagship ITSM platform, ServiceDesk Plus, has been named a Technology Leader in the latest SPARK Matrix: IT Service Management Tools report. This recognition reaffirms our efforts to make ServiceDesk Plus a capable ITSM platform that helps enterprises manage and deliver IT and business services.

Best Grafana dashboard for IoT Device monitored via MQTT metrics and Graphite

Internet-of-Things (IoT) devices and technologies have penetrated into our life. Nowadays, we can easily find IoT devices around us. Home appliance manufacturers produce IoT products for smart homes. Car makers invest in smart vehicles that can communicate with other smart cars and IoT objects nearby. Governments allocate huge resources to build smart cities with IoT technologies to improve the quality of life for people and promote economic growth.

Challenges in Choosing an APM tool for Fintech Companies in India due to RBI Guidelines

As the growth lead of an open-source APM tool, I keep interacting with developers from companies of all shapes and sizes. I recently talked with a developer from a fintech startup in India. The startup provides a payment processing platform that enables businesses to accept payments from customers worldwide. For them, monitoring is critical, but the dev shared how limited they were when exploring an APM tool for their application. The reason? Reserve Bank of India.

Alternatives to Datadog

Before we dive into the specifics of each alternative to Datadog, let's address the most critical point: scaling. Datadog is great for users who need to do a little bit of everything, but Datadog's biggest weakness is scaling. Datadog can do logs, APM, time series and more, but scaling time-series metrics, alerts, and servers will cause your monthly bill to escalate. The graph below shows what you pay at Datadog vs. MetricFire, a leading competitor.

Learn how to monitor IoT devices with Grafana

IoT devices open the door to all sorts of computing potential, but they can also produce a flood of telemetry data that users need to properly collect and monitor to ensure those devices are working properly. It’s no wonder so many individuals and businesses use Grafana for IoT use cases, whether they’re starting an aquaponic farm in South Africa, managing an industrial-scale electroplating factory in Ohio, or simply keeping tabs on Pretzel the python at its home in the UK.

Monitoring service performance: An overview of SLA calculation for Elastic Observability

Elastic Stack provides many valuable insights for different users. Developers are interested in low-level metrics and debugging information. SREs are interested in seeing everything at once and identifying where the root cause is. Managers want reports that tell them how good service performance is and if the service level agreement (SLA) is met. In this post, we’ll focus on the service perspective and provide an overview of calculating an SLA.

Profiling Using Java Agents

The core functionality of the Java Instrumentation API lies in its ability to modify the bytecode of classes being executed by the virtual machine. This capability allows for a range of monitoring tasks to be carried out, such as event recording and data gathering, which can provide valuable insights into an application's performance and behavior.

Introducing CrowdStream: A New Native CrowdStrike Falcon Platform Capability Powered by Cribl

We’re excited to announce an expanded partnership with CrowdStrike and introduce CrowdStream, a powerful new native platform capability that enables customers to seamlessly connect any data source to the CrowdStrike Falcon platform.

How to deploy Prometheus on Kubernetes

This is a tutorial for deploying Prometheus on Kubernetes, including the configuration for remote storage on Metricfire. This tutorial uses a minikube cluster with one node, but these instructions should work for any Kubernetes cluster. Here's a video that walks through all the steps, or you can read the blog below. You can get onto our product using our free trial, and easily apply what you learned.

Monitor your Linux web apps on Azure App Service with Datadog

Azure App Service is a fully managed platform-as-a-service (PaaS) solution for deploying web applications, event-driven functions, RESTful APIs, and more. Azure App Service enables developers to quickly build and release services that scale dynamically—without worrying about provisioning and maintaining infrastructure. Last year, we released the Datadog extension for Azure App Service for deep visibility into your Windows.NET applications.

Streamline collaboration throughout your organization with Datadog Teams

As organizations evolve and their stacks become more complex, they need increasingly robust visibility into their cloud-based infrastructure, services, and applications. Meanwhile, teams within these organizations tend to become more specialized and siloed. As they multiply across offices and time zones, these teams must be enabled to collaborate flexibly without muddling their distinct day-to-day priorities.

Lightrun Launches New .NET Production Troubleshooting Solution: Revolutionizing Runtime Debugging

Lightrun, the leading Developer Observability Platform for production environments, announced today that it has extended its support to include C# on its plugins for JetBrains Rider, VSCode, and VSCode.dev. With this new runtime support, .NET developers can troubleshoot their apps against.NET Framework 4.6.1+, .NET Core 2.0+, and.NET 5.0+ technologies.

What Network Teams Need to Know to be Successful in 2023

Join Kentik and Arelion as we discuss events affecting global connectivity, trends in RPKI compliance, the importance of BGP monitoring, and reveal how Arelion keeps an eye on its market share and competitive activity. Doug Madory, Director of Internet Analysis at Kentik and Mattias Fridström, Vice President & Chief Evangelist at Arelion share what network teams can do right now to ensure a successful 2023.

Teneo webinar Using AIOps to Optimize Your Palo Alto NGFW

Strengthen your security posture and prevent network security disruptions with the industry’s first AIOps solution for NGFWs. Palo Alto AIOps for NGFW enhances firewall operations experience with comprehensive visibility to elevate your security posture and proactively maintain deployment health. View our on demand webinar to learn how you can use AIOps for NGFW to.

Kubecon + CloudNativeCon Europe 2023 Recap

KubeCon Amsterdam was an incredible gathering of like-minded professionals, bringing together devops, software engineers, vendors, and cloud technology enthusiasts from around the world. This year’s event was the biggest KubeCon + CloudNativeCon ever, with a sold-out attendee list of 10,000 strong. The sheer scale of the event was a testament to the growing popularity of cloud native technology and the vibrant community that supports it.

Getting Data In: 4 Ways to Ingest Data into Splunk

The first step to unlocking the power of Splunk is to get access to your data. No matter what data type or structure it is, Splunk can read it. Watch this video to learn about the four main ways to get your data into Splunk. Including, securely sending lossless data streams by installing the Universal Forwarder on your Linux or Windows host, easily ingesting cloud data sources (e.g., AWS, Azure, and GCP) via Guided Data Onboarding, creating data inputs for virtually any TCP or UDP data traffic, and using the HTTP Event Collector (HEC) to ingest web and app data.

Cloud Monitoring Console's Health Dashboard: Maximize Your Monitoring Efficiency

Are you a Splunk Cloud admin tired of sifting through various tools and dashboards to monitor the health of your Splunk Cloud deployment? Do you often find yourself wondering what actions you can take to keep your Splunk Cloud deployment running smoothly? Are you looking for ways to be alerted before something impacts your deployment performance? Look no further than the Cloud Monitoring Console's Health Dashboard!

Top 10 SaaS Performance Monitoring Tools & Techniques: Unlock the Full Potential of Your Application

SaaS (Software as a Service) monitoring refers to the process of continuously tracking and analyzing the performance of SaaS applications in order to ensure their reliability, availability, and optimal performance. SaaS monitoring involves collecting and analyzing data on various metrics, such as response times, uptime, error rates, user activity, and system resource usage, among others.

Is Your Cloud Spend Problem a Cloud Cost Tracking and Accountability Problem?

Most companies overspend on cloud services. Many factors contribute to this problem, and while organizations know that their cloud budgets are bloated, they still struggle to reel in those costs. Are they implementing the tools and processes to track their cloud costs effectively? And are they adopting controls to instill discipline and accountability in their cloud spending habits? We explored these questions in our latest State of Multi-Cloud Management research report.

How the All-In Comprehensive Design Fits into the Cribl Stream Reference Architecture

Join Cribl's Ed Bailey and Ahmed Kira as they provide more detail about the Cribl Stream Reference Architecture, which is designed to help observability admins achieve faster and more valuable stream deployment. During this live stream discussion, Ed and Ahmed will explain the guidelines for deploying the comprehensive reference architecture to meet the needs of large customers with diverse, high-volume data flows. They will also share different use cases and discuss the pros and cons of using the comprehensive reference architecture.

10 DevOps Tools for Continuous Monitoring

DevOps has become the dominant software development and deployment methodology over the past decade. In Atlassian’s DevOps Trends Survey, over half of the respondents said that their organizations had a dedicated DevOps team and 99% of respondents indicated that DevOps had a positive impact on their organization. In addition to DevOps teams, many have implemented Platform Engineering as a discipline, or designing technology platforms as a foundation for developers to build and deploy applications.

What are the costs of downtime?

Downtime can be a nightmare for any online business. As the digital landscape continues to expand, the importance of maintaining a reliable and stable online presence has never been greater. While most businesses recognize the direct costs associated with downtime, such as lost revenue and productivity, there are hidden costs that can have a lasting impact on your business.

OpManager Enterprise Edition: Optimized features for scalability

Monitoring an enterprise network is challenging for two reasons: the size of the network and the distributed architecture it follows. In the distributed model, branches spread across the globe, requiring a centralized monitoring solution. As the business grows, more devices will be added to the network, meaning your enterprise monitoring tool must be able to scale to match the performance requirements.

Distributed Database Architecture: What Is It?

Databases power all modern applications. They’re behind your Angry Birds mobile game as much as they’re behind the space shuttle. In the beginning, databases were hosted on a single physical machine. Basically, it was a computer running only one program: the database. Then we moved to running databases on virtual machines, where resources are shared among multiple operating systems and applications.

Alerting on the User Experience

When your alerts cover systems owned by different teams, who should be on call? We get this question a lot when talking about SLOs. We believe that great SLOs measure things that are close to the user experience. However, it becomes difficult to set up alerting on that SLO, because in any sufficiently complex system, the SLO is going to measure the interaction between multiple services owned by different teams.

How To Use Our AWS Integration

AWS is a cloud platform that is popular and widely used in many industries. They have many services such as EC2, S3, Lambda, and many more. Since AWS is so popular, being able to easily use it along with your other tools is a priority. We have an AWS integration set up with our tool so that our users have an optimal experience.

How to Mask Sensitive Data in Logs with BindPlane OP Enterprise

Logs often contain sensitive data, including personally identifiable information (PII) such as names, email addresses, and phone numbers. To maintain security and comply with data protection regulations, it’s crucial to mask this data before storing it in your log analytics tool. BindPlane OP streamlines this process with the Mask Sensitive Data processor, ensuring your logs are safe and compliant.

Announcing LM Exporter

We recently introduced the LogicMonitor Exporter which is now a part of OpenTelelemetry Collector Contrib distro. This allows you to bring your collector for streaming telemetry data from your environment to LM Envision, LogicMonitor’s hybrid and multi-cloud monitoring platform. LogicMonitor associates the exported logs and traces from a single OpenTelemetry collector to simplify your application’s operations and troubleshoot issues.

Adding a Log Record Attribute

Check out how to standardize telemetry by adding metadata to the log record. By tagging appropriately, one can not only enrich, but also have the flexibility to route anywhere avoiding vendor lock-in. #telemetry #opensource #observability About ObservIQ: observIQ is developing the unified telemetry platform: a fast, powerful and intuitive next-generation platform built for the modern observability team. Rooted in OpenTelemetry, our platform is designed to help teams reduce, simplify, and standardize their observability data.

A Guide to Regression Analysis with Time Series Data

This post was written by Mercy Kibet. Mercy is a full-stack developer with a knack for learning and writing about new and intriguing tech stacks. With the vast amount of time series data generated, captured, and consumed daily, how can you make sense of it? This data is projected to grow up to 180 zettabytes by 2025.

3-Step Approach to Eliminate the BitLocker Recovery Key Backup Issue using Nexthink

Data Security & Data Encryption has become a vital security step for organizations world-wide and a key component for security compliance. BitLocker is one of the popular Software Stacks for enforcing encryption on all devices & drives. Learn how the EUC Team of a large multinational IT Consulting organization successfully enforced BitLocker Recovery Key Backup on 700 thousand devices using Nexthink.

8 Best IT Monitoring Tools and Software of 2023 (Updated)

Monitoring tools, also known as observability solutions, are designed to track the status of critical IT applications, networks, infrastructures, websites and more. The best IT monitoring tools quickly detect problems in resources and alert the right respondents to resolve critical issues. Response teams use observability solutions to gain real-time insights into resource availability, stability and performance.

2023 SRE Report

Now in its fifth year, The SRE Report has become the trusted source of trends and insights for reliability-as-a-feature practices. This year in partnership with Blameless, the report contains special contributions from Adrian Cockcroft and Steve McGhee and highlights findings from a global community of reliability practitioners, including SREs, managers, architects, and executives. As ever, we found some familiar trends and some thought-provoking anti-patterns.

IDC Spotlight: Moving Toward a Hybrid-First Organization with Seamless Connectivity and Collaboration

Supporting an anywhere, anytime, hybrid workforce is now a top priority as hybrid work continues to be the new norm. IDC's new Spotlight Paper provides valuable insights and actionable recommendations for implementing a robust and resilient employee experience strategy to ensure your workforce stays connected, engaged, and productive. Download the paper to gain a better understanding of.

Revolutionize Your Cloud-Native Deployments with CloudFabrix using Kubernetes and OpenTelemetry

The Cloud Native Computing Foundation (CNCF) is a non-profit organization dedicated to advancing the adoption of cloud-native technologies and practices. Established in 2015 as a part of the Linux Foundation, the CNCF has become a prominent open-source organization that aims to develop a standardized and vendor-neutral cloud-native stack. The CNCF seeks to enable the use of cloud-native computing for building scalable and resilient applications in dynamic environments.

Optimizing User Experience in Citrix VAD or DaaS Environments with Session CPU and Memory Monitors

In Citrix SBC/VDI cloud or on-premise deployments, CPU and memory usage are critical performance metrics that can impact user experience significantly. If one user's session is consuming a large amount of CPU or memory, it can negatively impact the performance of other users hosted on the same multi-session VDA machine. Therefore, it's essential for administrators to be able to quickly identify what user, application and process is causing high CPU or memory usage.

IBM Consulting and CloudFabrix partner to unify Observability, AIOps and Automation

Thanks so much Meenakshi Srinivasan! We are honored to be chosen over the competition and are excited and looking forward to helping our joint enterprise and cloud-native customers. Thanks to the IBM Consulting team for the joint Proof of Technology and joint GTM team.

From Loading to Interaction: A Guide to Time to Interactive Improvement

Have you ever visited a website that took forever to load, leaving you staring at a blank screen and clicking your mouse in frustration? If so, then you have experienced the slow Time to Interactive (TTI). TTI is the ultimate test of a website's speed and responsiveness, measuring the time it take s for a page to fully load and become interactive. A slow TTI can leave your users feeling bored, frustrated, and downright furious.

A Comprehensive Comparison of Prometheus and Nagios

Prometheus and Nagios are both open-source infrastructure monitoring solutions created by SoundCloud Engineers and Ethan Galstad respectively. They both find popular usage in monitoring the availability and performance of computer systems, networks, and applications. While Prometheus uses a pull-based model to collect metrics and its dynamic service discovery support, Nagios uses a push-based model modeled on plugins.

Grafana vs Graphite: A Comparison for Data Visualization and Analysis

Data generation today is at an unprecedented level, and we are generating an inexhaustible amount of data. As a matter of fact, more data has been created over the past two years than ever before in the history of mankind. This throws a big complexity in front of us. How do we even manage such a huge amount of data? Where do we store them? Can they be segregated to fit into our needs, who would do that for us, and so on! The questions are endless, and so is the rate of generation of new data.

How an Observability Pipeline Can Help With Cloud Migration

Do you want to confidently move workloads to the cloud without dropping or losing data? Of course, everyone does. But easier said than done. Cloud migration is tricky. There’s so much to think through and so much to worry about — how can you reconfigure architectures and data flows to ensure parity and visibility? How do you know the data in transit is safe and secure? How can you get your job done without getting in trouble with procurement?

Unlocking the Power of Embedded CDNs: A Comprehensive Guide to Deployment Scenarios and Optimal Use Cases

This guide explores the benefits of embedded caching for ISPs and discusses deployment optimization strategies and future trends in CDN technology. Embedded CDNs help reduce network congestion, save costs, and improve user experiences. ISPs must carefully plan their deployment strategies by considering how each of the CDNs distributes content and directs end-users to the caches. They need to know both the CDNs and their network architecture in detail to build a successful solution.

Feedback Week Results

For the third time, we initiated the Icinga Feedback Week. Why? Because your opinion matters to us, a lot. Even though, we do get feedback throughout the year, our yearly Feedback Week is a chance for us, to ask you specific questions to certain topics. By understanding your thoughts and feelings towards Icinga, we aim to develop the most effective monitoring tool for you. But that wasn’t all: as a good tradition, we asked you to choose your Community Heroes and we have found five!

Honeycomb's Deployment Protection Rule for GitHub Actions

Today, GitHub announced the public beta of Deployment Protection Rules for GitHub Actions for GitHub Enterprise users. In support of that launch, we’ve partnered with GitHub to create the Honeycomb Deployment Protection Rule (available as a GitHub App). This rule lets you run Honeycomb queries so that you can get real-time performance feedback from your services before deciding whether to prevent deployment of your code to a specific environment.

Exciting New Additions to the eG Enterprise Mobile Application

Our latest release of eG Enterprise, version 7.2 is accompanied by significant enhancements and new features for our popular eG Enterprise mobile app for iOS and Android. These apps allow administrators to access the eG Enterprise administrator console on the go and receive meaningful alerts and push notifications with click throughs to deep diagnosis rather than dumb text messages.

Observability overload: Insights into the rise of tools, data sources, and environments in use today

With countless observability tools, data sources, and environments to juggle, the organizations that deploy and manage today’s distributed applications often face an uphill battle to gain visibility into their application performance. That was a key takeaway from the Grafana Labs Observability Survey 2023, which incorporated input from more than 250 industry practitioners who are all too familiar with these complexities.

Revitalize Your Testing With Continuous Everything Practices to Meet DevOps Goals

Software testing has been an established discipline as old as software development itself. We have seen significant evolution of testing practices recently specially driven by Continuous Delivery and DevOps, where testing is increasingly integrated with agile development and other software lifecycle practices.

SNMP Traps: Definition, Types, Examples, Best Practices

The Simple Network Management Protocol (SNMP) is a widely used protocol for monitoring and managing network devices. SNMP traps are a key feature of SNMP, and they’re used to notify management systems about specific events or conditions on network devices. This article will explore SNMP traps, discuss the different types and examples of traps and outline best practices for using SNMP traps in a network environment.

Use Datadog monitors as quality gates for GitHub Actions deployments

With the growing adoption of automated deployment tools, many organizations are releasing code more frequently. As releases increase, it’s important to ensure that you don’t accidentally introduce faulty deployments, which can have wide-ranging impacts on your infrastructure, application, and end-user experience, and can potentially lead to costly rollbacks.

Search your logs efficiently with Datadog Log Management

In any type of organization and at any scale, logs are essential to a comprehensive monitoring stack. They provide granular, point-in-time insights into the health, security, and performance of your whole environment, making them critical for key workflows such as incident response, security investigations, auditing, and performance analysis. Many organizations generate millions (or even billions) of log events across their tech stack every day.

Device Groups in Dashboards

Utilizing device groups inside of dashboards is extremely powerful and will allow you to present all the information collected by Service Watch for whatever you are trying to accomplish. Whether you are looking to benchmark the devices of your remote workforce or figure out which office is performing better or worse after changes, you’ll find it extremely easy to visualize this information.

IBM-i Monitoring with the M81 Plugin

There are more than 200,000 companies around the world that use the current IBM Power server technology within the IBM i operating system (formerly OS/400, i5/OS). In Pandora FMS we are aware of this need, and we can integrate this IBM-i technology with the rest of open systems thanks to the m-18 plugin. This includes IBM i systems, SAP, Oracle, Windows systems, Unix systems, Linux systems, and applications from any manufacturer, as well as hardware and network equipment.

Retrace Logging Benefits

More than just APM, Logs inside of trace requests. Centralised Logging allows logs from many sources such as servers, files, applications all into Retrace. Search for log, go straight into trace. Tagging allows to group logs (based on client, developer etc.). Search and filter with any text, tag, regular expression. Save searches. Retrace allows unlimited users, all users can use saved searches. Live Tailing - what's going on the servers at any one time

Introducing the Sentry GitHub Deployment Gate Integration

If you have a large codebase with multiple developers shipping quickly – errors need to be caught quickly as well. To help ensure your code is performant and reliable while you’re deploying code, we partnered with GitHub to build a bridge between your CI/CD workflow and your favorite error monitoring tool (Sentry, of course).

Real user monitoring with Applications Manager

Real user monitoring (RUM) is used to collect and analyze data about user interactions with a website or application in real-time. This enables organizations to gain valuable insights into the performance and user experience of their digital products. Despite its importance, the significance of RUM is often overlooked, and many organizations fail to leverage its benefits. By employing a RUM tool such as Applications Manager (APM), you can stay vigilant by capturing real-time user interactions.

What is Synthetic Monitoring: The Secret Sauce to Network Monitoring

Picture this: You're the IT manager at a large company, and you're responsible for ensuring that your network is running smoothly. But how do you know if everything is working as it should be? You could wait for someone to report a problem, but that's reactive and not ideal. You could monitor your network constantly, but that's impractical and time-consuming. So what's the solution? Enter synthetic monitoring, the secret sauce to network monitoring.

IT Operations in 2023: Business Services Become a Viable Organizing Principle

Staring down the runway of 2023 and toward 2024, I’m wondering what will become of the digital transformation initiatives that began pre-Covid and accelerated over the past two-plus years. We’re certainly not entering a “stop transforming” period. So, what will transformation look like going forward?

Data-Driven Defense: Exploring Global Cybersecurity and the Human Factor

A data-driven approach to cybersecurity provides the situational awareness to see what’s happening with our infrastructure, but this approach also requires people to interact with the data. That’s how we bring meaning to the data and make those decisions that, as yet, computers can’t make for us. In this post, Phil Gervasi unpacks what it means to have a data-driven approach to cybersecurity.

How to Set Downtime Alerts for your Website

Learn how to monitor your website uptime proactively, setting downtime alerts to get notified immediately of any accessibility or performance issues. In this guide, we will show you how to setup downtime alerts using Dotcom-Monitor's website monitoring tool. Get real-time insights into your website's performance, and monitor multiple websites and web applications from different locations around the world.

Pandora FMS transformation: Discover its new interface. Updated, homogeneous and developing

Pandora FMS has changed a lot since its inception, and you, dear reader, may have noticed it. Through effort and hard work it has grown older and become someone strong and capable. As you know, a tool, hard as well as flexible, that recognizes, connects and interprets different types of technologies to present them in a single environment. A system monitoring software that has gained lots of popularity in the market and has just launched its new interface.

How to troubleshoot memory leaks in Go with Grafana Pyroscope

Memory leaks can be a significant issue in any programming language, and Go is no exception. Despite being a garbage-collected language, Go is still susceptible to memory leaks, which can lead to performance degradation and cause your operating system to run out of memory. To defend itself, the Linux operating system implements an Out-of-Memory (OOM) killer that identifies and terminates processes that consume too much memory and cause the system to become unresponsive.

The TIG Stack in IIoT/OT

Many industrial operators find themselves amid yet another industrial revolution. Deeper insight through artificial intelligence (AI) and machine learning (ML) integrations characterize this fourth wave (or Industry 4.0). Data is no longer just a record occupying server space. It’s alive and providing value. Real-time insights work in tandem with historical records, painting a complete picture of the lifespan of a piece of machinery and/or its components.

Q1 Roadmap Review & Q2 2023 Look Ahead

In our recent virtual meetup the VictoriaMetrics Founders team discussed some of our Q1 2023 highlights, including features highlights, the 2023 roadmap for VictoriaMetrics as well as first introduction to the upcoming VictoriaLogs. In this blog post, we’d like to share a summary of these highlights and a heads up on where to find our team in the coming weeks and months - starting with our participation at KubeCon Europe 2023.

Releasing Graphite Query Language in Open Source VictoriaMetrics

As many of our users and the wider monitoring community will know, Graphite Query Language is a query language for Graphite monitoring tools, which helps analyze data stored in it. Graphite is a well-known and respected pioneer in the monitoring space, which has seen a number of next generation monitoring solutions enter the scene … such as ourselves. It’s been used by a wide range of companies, which started using monitoring tools more than a decade ago.

What Are the Main Drivers Behind Real-Time Transaction Monitoring?

Real-time transactions have become the new normal. In fact, MasterCard research found that consumers consider real-time payments more important than the Internet, next-day delivery, and utility services. As a result, banks and other financial institutions are increasingly turning toward real-time transaction monitoring. Real-time transaction monitoring solves some pain points of providing real-time payment capabilities.

Top 16 CDNs to Speed Up Your Website in 2023

A Content Delivery Network (CDN) is a network of servers or nodes spread around the world that stores and delivers the components of a website such as images, videos, or other static files. It can help both big and small websites deliver their content to users at the fastest speeds possible. If you’re new to the world of CDNs, it might seem intimidating when you’re trying to choose one that best fits your website.

Web checks in Kubernetes: a simple alternative to Prometheus Blackbox Exporter

The continued adoption of Kubernetes, a leading container orchestration platform, increases the demand to monitor these complex environments accurately and efficiently. Maintaining optimal performance and ensuring quick issue resolution are vital aspects of efficient Kubernetes management.

We're live on Product Hunt

It's official, Spectate is live and has been launched on Product Hunt! I am super excited to introduce you to Spectate, our all-in-one platform for monitoring, alerting, incident management, and status pages. Spectate started as an internal tool to satisfy our monitoring and alerting needs, but it has grown into a complete platform with the help of AI automation. This means your team can focus on what's important - getting issues resolved.

Optimize Your Prisma Queries with AppSignal and OpenTelemetry

AppSignal integrates seamlessly with Prisma via OpenTelemetry to give you invaluable insights into how your application is performing. In this blog post, we'll outline how you can use AppSignal to optimize your application's Prisma integration, mitigate inefficient database queries, spot anomalies, and improve your application's scalability.

The Biggest Website Outages of All Time

As much as we all love the internet and everything it offers, we’ve also all experienced that sinking feeling when we try to access our favorite website, only to find it’s down. If you run your own site, you know that uptime is crucial for your online success — so that sinking feeling in your chest when your own website is down is … well, even worse. But let’s face it: even the internet giants aren’t immune to outages.

Grafana Cloud is now available in AWS Marketplace

Grafana Labs is excited to announce that Grafana Cloud is now available in AWS Marketplace. With this new offering, existing AWS customers can procure, deploy, and scale the fully managed Grafana LGTM observability stack (Loki for logs, Grafana for visualization, Tempo for traces, Mimir for Prometheus metrics) with just a few clicks.

DevOps Pulse 2023: Increased MTTR and Cloud Complexity

Evolving DevOps maturity, mounting Mean-Time-to-Recovery (MTTR), and perplexing cloud environments – all these factors are shaping modern observability practices according to approximately 500 observability practitioners. While every organization faces its unique challenges, there are broadly impactful trends that arise.

Increasing Implications: Adding Security Analysis to Kubernetes 360 Platform

A quick look at headlines emanating from this year’s sold out KubeCon + CloudNativeCon Europe underlines the fact that Kubernetes security has risen to the fore among practitioners and vendors alike. As is typically the case with our favorite technologies, we’ve reached that point where people are determined to ensure security measures aren’t “tacked on after the fact” as related to the wildly-popular container orchestration system.

Elastic Common Schema and OpenTelemetry - A path to better observability and security with no vendor lock-in

At KubeCon Europe, it was announced that Elastic Common Schema (ECS) has been accepted by OpenTelemetry (OTel) as a contribution to the project. The goal is to achieve convergence of ECS and OpenTelemetry’s Semantic Conventions (SemConv) into a single open schema that is maintained by OpenTelemetry. This FAQ details Elastic’s contribution of Elastic Common Schema to OpenTelemetry, how it will help drive the industry to a common schema, and its impact on observability and security.

Lightstep from ServiceNow deepens commitment to OpenTelemetry project

At Lightstep, we’ve seen many organizations grapple with “cloud-native sticker shock” as they come to understand that these complex systems require sifting through massive amounts of data across architectures and proprietary solutions. In today’s macroeconomic environment, organizations are looking to reduce costs while driving innovation, especially when it comes to cloud-native applications.

Unite Testing and Monitoring with a Monitoring as Code Workflow, Enabled by the Checkly CLI

Siloing testing and monitoring is increasing your costs, wasting Dev and QA hours, and impacting your customer's experiences. Stefan Judis and Jonathan Canales explain how to align your QA, Dev, and Ops by enabling teams to code, test, and deploy API and Playwright-based checks with our monitoring as code (MaC) workflow, enabled by the Checkly CLI.

The Three Pillars of Observability: Metrics, Logs and Traces

Metrics, Logs and Traces are often referred to as The Three Pillars of “Observability“. The term observability has been used in control theory to refer to how the state of a system can be inferred from the system’s external outputs. Applied to IT, observability is how the current state of an application can be assessed based on the data it generates. Applications and the IT components they use provide outputs in the form of metrics, events, logs and traces (MELT).

Optimize your CI/CD Pipeline with Coralogix Tagging

Continuous Integration/Continuous Delivery (CI/CD) has now become the de-facto standard for all engineering teams seeking to keep pace with the demands of the modern economy. At Coralogix, we operate some of the most advanced build and deploy pipelines in the world. We’ve baked that knowledge into our platform with a CI/CD Observability feature called Coralogix Tagging.

Rest Assured, Cribl's Improved Webhook Can Now Write to Microsoft Sentinel

As version 4.0.4, we are excited to announce the capability of Cribl’s webhook to write to any destinations and APIs that requires OAuth including Microsoft Sentinel. Cribl has long supported OAuth in many destinations through native integrations but with the enhanced Webhook we can now write to any destination that require OAuth authentication.

A Guide to OpenTelemetry for .NET Engineers

Hey.NET engineers! Today, we’ll explore the world of OpenTelemetry, focusing on how it can benefit your.NET applications. We’ll talk about the strengths and weaknesses of OpenTelemetry, walk you through the setup process, discuss the basics, and share some best practices. Plus, we’ll touch on topics like auto-instrumentation, metrics, and more. So, let’s dive in!

Achieving Great Dynamic Sampling with Refinery

Refinery, Honeycomb’s tail-based dynamic sampling proxy, often makes sampling feel like magic. This applies especially to dynamic sampling, because it ensures that interesting and unique traffic is kept, while tossing out nearly-identical “boring” traffic. But like any sufficiently advanced technology, it can feel a bit counterintuitive to wield correctly, at first. On Honeycomb’s Customer Architect team, we’re often asked to assist customers with their Refinery clusters.

Flexible OpenTelemetry data generation for effective testing (Part 1)

Our OpenTelemetry data generator provides a seamless product-validation experience across all teams working in AppDynamics Cloud. OpenTelemetry™ is a complete telemetry system for monitoring both modern, distributed architectures in the cloud and more traditional on-prem applications.

Flexible OpenTelemetry data generation for effective testing (Part 2)

In this second (and final) segment, we continue to show how our OpenTelemetry data generator provides a seamless product-validation experience across all teams working in AppDynamics Cloud. In the first part of this two-blog series, we provided a high-level overview of OpenTelemetry™ (or OTel). a complete telemetry system for monitoring both modern, distributed architectures in the cloud and more traditional on-prem applications.

A new approach to performance: How Tilled keeps their endpoints fast and their developers sane

While timeouts and slow page loads are frustrating for users and can go unnoticed by developers, Tilled can’t afford either. As a PayFac-as-a-Service provider, fast API endpoints mean the difference between a successful business and one that flops due to dropped payments. Join us to see how the team at Tilled takes the complexity out of performance monitoring, so they can fix slowdowns faster without researching span trees or deciphering dashboards - saving developers time (and their sanity).

Beyond 6: What's Next for Wi-Fi?

Somewhere lost in the noise of ‘faster’ and ‘better’ marketing messaging around Wi-Fi in the last few years, there were also some genuine transformations in what Wi-Fi is capable of. And what’s on the horizon maybe some of the most significant improvements since the inception of Wi-Fi itself. Let’s look at where Wi-Fi is at, what big changes have happened, and where we’re going next in the land of Wi-Fi technology.

Confidently Manage Multiple Projects with Sentry's New Spend Allocation and Spike Protection

Today we’re announcing our new Spend Allocation feature and updates to Spike Protection, giving you more control over how your projects consume events. While we’ve made it super easy for teams to add Sentry to their projects, we kept hearing from the community that they wanted more guardrails to ensure their noisy projects don’t eat through their event quota.

Proactively track, triage, and assign issues with Datadog Case Management

Complex systems require many different monitors to assess the health of their infrastructure and applications, creating a wealth of alerts that can be hard to track. Due to a lack of effective triage processes, many organizations page engineers for every alert that comes in, making it difficult to separate false positives from issues that actually require immediate attention.

What Is Observability? Everything a Beginner Needs to Know

Observability originated in the field of engineering and has recently gained popularity in the world of software development. Put simply, observability refers to the ability to understand the internal state of a system based on its external outputs. IBM defines it as follows: As systems have become more complex, often including remote elements in cloud-based systems, management of the systems and troubleshooting faults and downtime have also become more complex.

SLO vs SLA vs SLI: A Complete Guide for DevOps Professionals

In today’s fast-paced world of software development, DevOps professionals strive to provide high-quality and dependable services for their users. An essential aspect of achieving this objective is understanding and effectively managing service level indicators (SLIs), service level objectives (SLOs) and service level agreements (SLAs). These metrics help guarantee that a service meets its performance and reliability targets.

How to get Azure Data Factory Pipeline Failure Notification?

Azure Data Factory is a cloud-based data integration tool focusing on data extraction, transformation, and loading. A pipeline in Azure Data Factory is a collection of processes that move data to a shared repository, such as a data warehouse. Why it is important to monitor Azure Data Factory pipeline failures?

Docker monitoring 101: Tools, key features, metrics, and more

Docker is a well-known open-source platform that is predominantly employed to bundle applications and their dependent components into containers for easy development and deployment. Docker is lightweight and efficient in resource consumption by operating as an executable packaged software with all the necessary framework, libraries, code, runtime, and files required to run an application.

How to monitor MySQL performance metrics in minutes

MySQL, a leading open source database for the past few decades, underpins potentially millions of applications, from tiny prototypes to internet-scale e-commerce solutions. The beauty of MySQL is not only its powerful relational database capabilities but also that it can be scaled up as the application grows. Why should you care about MySQL performance? Because MySQL is the backbone of many applications, your application performance will be inherently tied to your MySQL database performance.

A Detailed Guide to Formatting Dates in SQL

In modern software applications, time-stamped data is a common requirement. As a software developer or database administrator, you know that formatting dates is crucial for ensuring data accuracy and consistency. But with so many date formats and SQL engines out there, it can be challenging to find the right way to format dates for your specific needs.

NetOps for Application Developers: Understanding the Importance of Network Operations in Modern Development

True observability requires visibility into both the application and network layers. For companies reliant on multi-zonal cloud networks, the days of NetOps existing as a team siloed away from application developers are over.

Goliath Performance Monitor for ChromeOS is Verified as Citrix Ready

Philadelphia, PA – April 17, 2023 – Goliath Technologies today announced that its Goliath Performance Monitor for ChromeOS has been verified as Citrix Ready®. The Citrix Ready program helps customers identify third-party solutions that are recommended to enhance DaaS and VDI, and App Delivery and Security solutions from Citrix, a business unit of Cloud Software Group.

Secrets Management: Use Cases, Best Practices, and Tools

To provide proper visibility into the health and status of your systems, observability tools require access to the internal and external services you’re using, and Sensu is no different. In the past, this could mean exposing sensitive authentication credentials like usernames and passwords with local environment variables or even by including the secret information in your monitoring configuration.

Tutorial: Querying InfluxDB with SQL and Grafana

Thank to a new and improved core database engine, InfluxDB now supports native SQL queries. Leveraging Flight SQL, InfluxData Developer Advocate, Jay Clifford, walks through how to set up querying InfluxDB using Grafana. He shows you how to configure Flight SQL and how to build a dashboard in Grafana for visualizing industrial IoT data.

Flowmon: How to Choose a Network Observability Platform

Individual EU Member States are expected to transpose the NIS2 and RCE directives into national What are the key characteristics organisations should have to shift from network monitoring to network observability? The need is to have more of a platform approach. Let´s see how to choose a Network Observability Platform to succesfully manage the networks in highly distributed environments.

WhatsUp Gold: How to Choose a Network Observability Platform

Individual EU Member States are expected to transpose the NIS2 and RCE directives into national What are the key characteristics organisations should have to shift from network monitoring to network observability? The need is to have more of a platform approach. Let´s see how to choose a Network Observability Platform to succesfully manage the networks in highly distributed environments.

How to Improve & Monitor Network Connectivity: The Step-by-Step Network Watchdog Guide

Are you tired of constantly experiencing buffering, lagging streams, or dropped calls? Have you ever wished you had a network watchdog to monitor your connection and alert you of any issues before they become a major headache? Well, you're in luck because we have just the tools and tips for you! In this step-by-step guide, we'll show you how to become a network watchdog yourself and use Network Monitoring tools to monitor your network connectivity like a pro.

Plan better and preempt bottlenecks with predict for metrics

Nothing is certain in this world except for death, taxes, and that you will eventually run out of disk space. You may have used our unique predict operator to query logs and forecast future values (we’ve even heard of customers predicting their ingest volume for Sumo Logic log data to better forecast their usage and budget!) — and wanted to do the same with metrics. With the recent general availability of the predict for metrics operator, you can.

OpenTelemetry-powered infrastructure monitoring: isolate and fix issues in minutes

The process of building and maintaining modern, cloud-based applications requires a new approach to infrastructure monitoring. Traditionally, engineers would try to isolate a specific infrastructure component causing an issue — and fix it alone, without diving into code. Today, DevOps engineers must understand how application performance is related to their infrastructure. Infrastructure, for DevOps engineers, is an enabler to deploy code.

Debugging Containerized React Apps

In your lifetime as a frontend developer that works with React, you must have come across several issues with debugging a containerized React application. I bet you can relate, you’re certainly not alone. Containerization has become an integral part of best practices for software development teams that want to create, test and deploy applications quickly and efficiently. However, despite its advantages, it also comes with new challenges for debugging and troubleshooting applications.

The Magic Behind the Lumigo Kubernetes Operator

Kubernetes is the container orchestration platform of choice for many teams. In our ongoing efforts to bring the magic experience of Lumigo’s serverless capabilities to the world of containerized applications, we are delighted to share with you the Lumigo Kubernetes operator, a best-in-class operator to automatically trace your applications running on Kubernetes.

Why do you need network monitoring?

Are you tired of dealing with network issues that slow down your business operations and create headaches for your network administrators? Be sure to fix problems you could have prevented before it's too late. Say hello to Network Monitoring! Imagine having a proactive approach to network management, where you can anticipate and prevent network issues from causing costly downtime.

Lumigo Product Training: Actionable Alerts

Get hands-on training from Lumigo's Director of Product during this live webinar on alerts and how to use them to reduce response time. Recorded on April 13, 2023. Make sure to subscribe so you don't miss out on any new livestreams and observability content! With one-click distributed tracing, Lumigo lets developers effortlessly find and fix issues in serverless and containerized environments.

Monitor your Nodejs application with OpenTelemetry and SigNoz

OpenTelemetry can auto-instrument many common modules for a Javascript application. The telemetry data captured can then be sent to SigNoz for analysis and visualization. OpenTelemetry is a set of tools, APIs, and SDKs used to instrument applications to create and manage telemetry data(Logs, metrics, and traces). For any distributed system based on microservice architecture, it's an operational challenge to solve performance issues quickly.

Best Cloud Monitoring Tools For Every Use

Nowadays, almost half of traditional small businesses (and nearly 2/3rds of small tech companies in particular) employ cloud infrastructure or hosting services, and for good reasons. Cloud computing can potentially save money, help with scalability and flexibility and make it easier to test new products and design fresh applications faster.

Using Practical Alerting to Stay on Top of Teams Call Quality - Part 1

In the modern workplace, more and more organizations are relying on tools like Microsoft Teams to stay connected and productive. However, with the rising usage of these tools comes the potential for technical difficulties that can impact call quality and overall performance. This can be frustrating for teams trying to get work done, but there are ways to stay on top of these challenges.

Reducing Your Splunk Bill With Telemetry Pipelines

With 85 of their customers listed among the Fortune 100 companies, Splunk is undoubtedly one the leading machine data platforms on the market. In addition to its core capability of consuming unstructured data, Splunk is one of the top SIEMs on the market. Splunk, however, costs a fortune to operate – and those costs will only increase as data volumes grow over the years. Due to these growing pains, technologies have emerged to control the increasing costs of using Splunk.

Optimizing Your Splunk Experience with Telemetry Pipelines

When it comes to handling and deriving insights from massive volumes of data, Splunk is a force to be reckoned with. Its ability to index, search, and analyze machine-generated data has made it an essential tool for organizations seeking actionable intelligence. However, as the volume and complexity of data continue to grow, optimizing the Splunk experience becomes increasingly important. This is where the power of telemetry pipelines, like Mezmo, comes into play.

Check, measure, and monitor your bandwidth for an effective network infrastructure

Having slow internet speed can cause huge frustration among the users in an organization. We know how effective use of bandwidth can influence productivity in your business. Bandwidth is the volume of information that can be sent over an internet connection in a measured amount of time calculated in megabytes per second. For instance, let’s assume your organization’s network bandwidth is a canal and your data is the ships sailing through it.

How to Build a Culture of Data-Driven Product Management

Product-led growth (PLG) is on the rise, a discipline that relies on the product itself to drive user acquisition, expansion, conversion and retention. Today, 60% of Cloud 100 companies embrace a PLG strategy, because it’s an efficient method of growth with low customer acquisition costs. Plus, cloud-native companies have the unique opportunity to collect more data that shows exactly how customers are using their products, and where potential friction occurs.

Java performance optimization tips: How to avoid common pitfalls

In this post, we’ll be delving into Java performance optimization, providing you with essential tips to write faster and more efficient code. If you’re reading this, you’re probably already aware of the importance of performance engineering and the need to optimize your code to ensure speed for all users, but even the most seasoned and performance-aware programmers can get tripped up.

The Ultimate Raspberry Pi Network Monitor Setup for Tech Enthusiasts

Are you tired of slow Internet speeds and mysterious network hiccups? Well, fear not because we have the ultimate solution for you: a Raspberry Pi network monitor setup with Obkio! Cloud and hosted solutions are becoming more and more popular, which makes monitoring network performance more important than ever.

Observability for business decision making: bridging the gap between IT and the business

Hear from Corneile Britz, Observability and DevOps Specialist and Co-Founder of Boxfish how Observability can provide real-time data to support business decision-making and improve customer experience, enabling collaboration and trust between IT and business stakeholders.

How to Monitor Your Windows Infrastructure

Nowadays many organizations still rely on classic Windows servers and virtual machines (VMs) for their business applications. Although Kubernetes is a trending topic, not everything running in the cloud is a container-based application. When it comes to monitoring Windows applications and infrastructure, many businesses leverage OSS Prometheus to get Windows metrics via its Prometheus Windows Exporter.

Predictive Maintenance Tools - 7 Types to Check Out

In today’s business landscape, predictive maintenance has emerged as a critical strategy to optimize equipment performance, reduce downtime, and minimize maintenance costs. In this article you will learn about some tools that can be used to simplify the complexity involved with implementing a successful predictive maintenance program.

BGP Beyond Preventing Outages

Since its initial conception on the back of a napkin, BGP has been an essential part of the Internet. However, its ubiquity and simplicity also make it a potential weak spot in any organization's Internet Stack. As an open, near-universal protocol it's a vector for potentially malicious attacks. It can also cause the same amount of problems simply through misconfiguration (in fact telling the difference between the two can be a challenge in and of itself).

Securely Connecting an Amazon S3 Destination to Cribl.Cloud and Hybrid Workers

There are several reasons you may want to route to Amazon S3 destinations, including routing to object storage for archival, routing to S3 buckets to utilize Cribl.Cloud’s Search feature, and archiving data that can be replayed later. When setting up Amazon S3 destinations in Cribl, there are three authentication methods: Auto, Manual, and Secret. Using the Auto authentication method paired with Assume Role is the most secure way to connect Amazon S3 to Cribl.

Grafana Alerting: Searching for Grafana alerts just got faster, easier, and more accurate

Grafana Alerting enables users to create and customize alert rules as separate entities and link them to Grafana panels. It also supports various data sources with built-in alerting engines, such as Prometheus, Grafana Mimir, and Grafana Loki, allowing users to manage their alert rules directly from Grafana’s UI.

Log Less, Achieve More: A Guide to Streamlining Your Logs

Businesses are generating vast amounts of data from various sources, including applications, servers, and networks. As the volume and complexity of this data continue to grow, it becomes increasingly challenging to manage and analyze it effectively. Centralized logging is a powerful solution to this problem, providing a single, unified location for collecting, storing, and analyzing log data from across an organization’s IT infrastructure.

Does OpenTelemetry in .NET Cause Performance Degradation?

Contrary to Betteridge’s Law of Tabloid Headlines, the answer to the question, "does OpenTelemetry in.NET cause performance degradation?" is yes, but context is important. I get this question so often that I thought it was time to get some stats on it. I’ve heard comments like: I can only assume that these are based on previous versions, or things like OpenTracing / OpenCensus (the heritage frameworks that were the feeders for OpenTelemetry).

Migrating a Web App to AWS Lambda with Lambda Web Adapter

As developers, we all seek to build web applications that can scale seamlessly, adapt to changing needs and do so without incurring excessive costs. One way to achieve this is by migrating web applications to AWS Lambda, which can provide scalability, flexibility, and cost savings. To make this process even easier, AWS provides the Lambda web adapter, a simple and efficient tool that enables you to migrate your web apps quickly.

OpenTracing via Jaeger

Within enterprises, it used to be that applications ran on a single server. Owners could directly monitor that discrete machine, conveniently access all the logs they needed, see all the metrics that mattered, and hit the reboot button, without needing to confer with “everyone.” Those days are gone. Modern application architectures stretch the definitions of the words “federated” and “distributed.” We now have distributed applications.

Our broken links check now highlights application errors

One of the unique features of Oh Dear is that we crawl your entire site and report any broken links. Our broken links report had two main categories: In both categories, the problem is caused by something related to the site's content. In most cases, a page you're linking to was removed or archived. The solution is often letting the content manager of the site fix this. Today, we're introducing a third category in our report: internal broken links that resulted in a 5xx status code.

Dashboard Fridays: Sample VMware dashboard

These three VMware dashboards built in SquaredUp provide a full overview of the data in a VMware deployment so users can spot performance issues and fix them fast. Having all the VM performance and health metrics in one place – for your VMs, VM hosts, and guest VMs – allows engineers to pre-empt issues and fix them before they become problems for end users.

AMA: Achieving code reliability across the release cycle

Code coverage, arguably a very important measure that we as development teams don’t pay enough attention to. That is until Sentry notifies you of a frustrating/critical/oh s*#t moment. Then we all think: “how could this have been avoided?”, “why didn’t our tests catch it?”, “oh… we didn’t have any coverage on this flow.” With our new Codecov integration, you can avoid regressions (and awkward conversations) by being able to see which lines that caused an issue are covered and which ones are not - right in the stack trace we kick out on every error report.

DataOps Uncovered: A Bold New Approach to Telemetry and Network Visibility

Network telemetry and DataOps play a critical role in enhancing network visibility. By combining both, organizations can improve network visibility and gain insight to help them optimize their network performance, improve security, and enhance the overall user experience.

OpManager now integrates with Microsoft Teams

Picture this: You’re knee-deep in a project and suddenly, your network goes down. Panic sets in as you frantically try to troubleshoot the issue, sifting through piles of emails or waiting for responses on clunky chat platforms. Gone are those days! With Microsoft Teams, you can collaborate with your team, create dedicated channels for different projects, and share ideas and files seamlessly.

How to Monitor Microsoft Teams Issues & Fix Microsoft Teams "We're sorry - we've run into an issue"

Welcome to the world of Microsoft Teams! When it comes to video conferencing and messaging, Microsoft Teams is one of the most popular players in the game. When we get error messages like Microsoft Teams “We're sorry—we've run into an issue,” or “something went wrong,” it’s important to have a tool to help monitor and troubleshoot Microsoft Teams performance issues and connection issues.

How Much Bandwidth Do I Have? How Do I Check Bandwidth Speed

Understanding how to check your bandwidth speed can make all the difference when your Internet connections appear slower than usual. Sometimes, not everything works as expected. A video conference takes too long to load everyone's cameras; downloading a vital document starts at a snail's pace; webpages present the visitor with a low-resolution HTML version of their website. Why do these kinds of issues occur in the office?

The New Networking Realities in Healthcare

Systems for managing electronic health records (EHRs), clinical decision support, claims and payments, operations, and more are advancing rapidly in the healthcare industry. A more complex and dynamic mix of internal and external networks now play an integral role in patient outcomes. Digital transformation in healthcare is swiftly moving from islands of data to interconnected hybrid environments through critical apps like Epic Systems.

Snowflake Monitoring with MetricFire

In today's world, where data plays a huge role in business success, companies must manage and analyze large amounts of information. Data warehousing has become a critical part of business operations to handle this challenge. Snowflake, a top cloud data warehousing platform, provides a scalable, secure, and flexible solution to help businesses adapt to their ever-changing data needs.

8 Top Status Page Tools: Keep Your Users Informed and Your Services Up

Free status page tools and open source status page tools became a valuable tool to keep users on track with the status of tasks and problems. However, countless self hosted, accessible, and open source status page platforms are available. Open source status page tools are crucial for notifying internal employees of threats and connecting with end users.

Add more context to your logs with Reference Tables

Logs provide valuable information for troubleshooting application performance issues. But as your application scales and generates more logs, sifting through them becomes more difficult. Your logs may not provide enough context or human-readable data for understanding and resolving an issue, or you may need more information to help you interpret the IDs or error codes that application services log by default.

Tracing Services Using OTel and Jaeger

At observIQ, we use the OTel collector to collect host/container-level metrics and logs from our systems. But to get more detailed monitoring of our applications (APM), we use the OTel SDK and instrumentation libraries. This post aims to provide a quick start to setting up tracing exporting to a local Jaeger instance.

Log Analytics: Everything To Know About Analyzing Log Data

Log data is big data! But that’s not why it’s such a big deal. Log data can be really useful if you know what to do with it — which is where log analysis and analytics comes in. Let’s take a look at this valuable activity, starting with what log data can tell us and moving into how we can use analytics to inform business practices. (This article was written in collaboration with Muhammad Raza.)

Get a Sneak Peek with Operator Preview in Cribl Search

At Cribl, we understand precisely what challenges our customers face when running complex searches, and the importance of getting exactly what they need with their queries. Cribl Search’s latest feature, Operator Preview, allows data analysts to test search operators without committing to a full search. It saves time, reduces costs, and streamlines your everyday data analysis.

Why the visibility gap is holding your IT operations back

Depending on your business, MTTR stands for mean time to repair or mean time to recovery – but it can also mean resolution, resolve, or restore. No matter how you define it, the basic measurement is the same: it’s the time it takes from when something goes down to when it is back and fully functional. This includes everything from finding the problem to fixing it. For ITOps teams, keeping MTTR to an absolute minimum is crucial.

Enhancing cloud native application observability on AWS with business transaction insights

With business transaction insights in AppDynamics Cloud, you can turn cloud native chaos into business context. Here’s how. In any organization, technology plays a vital role in nearly every aspect of the business — from marketing to operations to human resources. But increasingly, its role in revenue generation is taking center stage. Profitability and growth are now in the hands of CTOs and CIOs.

Log Management in the Age of Observability

The explosive growth of interconnected data across distributed systems has disrupted traditional development, DevOps, and ITOps practices and forced many organizations to rethink their cloud strategies. Higher-velocity feature development and more responsive support requests involve developers throughout the delivery cycle and require them to monitor and observe application behavior before releasing it to production.

The SolarWinds Platform

This video discusses the SolarWinds Platform, its different components, and how those components work together to monitor our customers’ diverse environments. This video is suitable for anyone who wishes to understand what the SolarWinds Platform and its components are, and what is the difference between the Orion Platform, the SolarWinds Self-Hosted Platform, Hybrid Cloud Observability, and SolarWinds Observability.

Real User Monitoring (RUM) vs. Synthetic Monitoring

Real User Monitoring (RAM) and Synthetic Monitoring are two different approaches to website and application monitoring. They both serve the same purpose of ensuring optimal performance of a website or application, but they differ in how they collect data and the types of insights they provide. Understanding the difference between the two can help you determine which approach is best suited for your specific needs.

How to collect and query Kubernetes logs with Grafana Loki, Grafana, and Grafana Agent

Logging in Kubernetes can help you track the health of your cluster and its applications. Logs can be used to identify and debug any issues that occur. Logging can also be used to gain insights into application and system performance. Moreover, collecting and analyzing application and cluster logs can help identify bottlenecks and optimize your deployment for better performance.

Syntax × Sentry MMXXIII

Today is a special day at Sentry, as today we welcome Syntax to the family. We’ve long been fans of Scott and Wes, of what they’ve built with Syntax, and of their general curiosity, drive, and hustle. As one of Sentry’s earliest partners, it’s been amazing to watch and experience their growth alongside our own. Today we’re going to talk about the next chapter of Syntax, one with increasing ambition, and one we hope you’ll be just as excited about as we are.

Distributed Tracing for AWS CDK Applications

The AWS CDK lets users build as Infrastructure as Code (IaC) reliable, scalable, and cost-effective applications in their cloud environments. With the AWS CDK, developers can use various supported programming languages to create constructs (reusable cloud components) and compose them together into stacks and applications.

Top 5 important metrics you need to be monitoring in your MySQL server

MySQL is an open-source relational database management system that operates based on the client-server model by using SQL as its mode of communication. It is the second most popular database in the world owing to its flexible and scalable nature, high security, ease of use, and ability to handle large data sets seamlessly. Due to its wide range of functionalities, MySQL is employed as part of the database management system for several high-profile companies such as Facebook, PayPal, and Twitter.

The Digital Resilience Guide: 7 Steps To Building Digital Resilience

The question is: are you full prepared to adapt to what may come…cyber incident, recession, severe weather? With unexpected events like a global pandemic, businesses see the need to improve their resilience against digital disruptions. That’s because disruption is a certainty — and resilience has a strong ROI. Digital resilience helps businesses respond successfully to these kinds of unexpected events.

Pandora FMS recognized for its excellence in 79 Top 10 reports and 10 G2 grid reports

Here we are again, there you go, efforts and perseverance bear sweet fruit sooner or later. Pandora FMS, leader in monitoring software, does it ring a bell? It has been recognized for its excellence in 79 top 10 reports and 10 G2 reports, one of the main software review platforms in the world.

Announcing: Time-Based, Revocable, Leased - Dynamic Access Credentials for InfluxDB

Today we’re excited to announce the InfluxDB add-on for Ockam Orchestrator. Through the use of the add-on, customers that are using InfluxDB Cloud can use Ockam to improve their security posture by automatically granting uniquely identifiable, least privilege, time-limited credentials for any client that needs to connect to InfluxDB Cloud.

Managing Monitoring and Alerting during IT Maintenance

Alerting is a critical feature in monitoring and observability products. The monitoring platform continually tests systems and applications for metrics crossing thresholds, key events in logs and the other warning signs of issues. Alerting ensures that help desks, administrators and IT Ops teams know about issues, or impeding issues and handle communications and remediation rapidly.

Ukraine's Wartime Internet from the Inside

It has now been over a year since Russian forces invaded its neighbor to the west leading to the largest conflict in Europe since World War II. Kentik’s Doug Madory reviews what has happened with internal connectivity within Ukraine over the course of the war in this analysis done for a collaboration with the Wall Street Journal. This February marked a grim milestone in the ongoing war in Ukraine.

4 CDN Monitoring Tools to Look At

Beyond their primary function of bringing internet content closer to client servers, CDNs also play a vital role in network security. For instance, CDN helps you absorb traffic overloads from DDoS attacks by distributing traffic across many servers. However, the volume of servers under your CDNs control and their geographically distributed nature presents its own set of risks, operational and security. Choosing the best CDN monitoring tool is critical to the end-user experience.

How to monitor Microsoft SQL Server performance with Grafana Cloud

A database is one of the most critical components for almost every application. Making sure it is running with the expected read and write latencies is paramount. This can be the difference between a smooth, pleasing user experience and a slow, error-filled one that makes your customers turn their back on a product — and never come back.

Introduction to Device Groups

Device Groups help prioritize what needs to be done to improve user experience. Customers use them to group by department, line of business, geography, VIP users, and any way they want. With device groupings, it’s easy to understand who has the worst digital experience and why - in real time. Combined with built-in groupings - by network connection, type of device, even by ISP - you'll have fewer issues to deal with, and when issues pop up, you'll fix them faster.

How to Comply with Current EU Regulations NIS2 & RCE

Individual EU Member States are expected to transpose the NIS2 and RCE directives into national legislation. It will concern not only critical, essential and important entities, but also National Security Authorities and various accredited CSIRTs. In this webinar, we discuss important milestones, individual measures and obligations, and the capabilities of the Flowmon solution, which can help organisations comply with directives and paragraphs of Cyber Security Acts, especially in the area of risk-management measures, network security monitoring and incident reporting.

Migrating Graphite to the Cloud

Are you ready to take your monitoring and analysis game to the next level? As businesses increasingly shift to the cloud, migrating Graphite metrics to the cloud has become critical to unlocking the full potential of real-time performance monitoring. In this article, we'll guide you through migrating Graphite to the cloud and show you how to make the transition as smooth as possible. So grab a cup of coffee, and let's dive in!

Even faster 3 am troubleshooting with new logs search and query

As an SRE putting out fires all day, it’s nice to get a good night’s sleep. But there are times when that PagerDuty alert goes off in the middle of the night, forcing you to leap into action to fix a high-priority issue. This is where having the best log analytics tool is critical to easily search and query the log data, perform deep-dive troubleshooting and analysis and quickly come to a resolution.

How to deploy the Datadog Agent on Windows with Ansible

When your organization relies on hundreds or thousands of hosts, it can be difficult to ensure that each is equipped with the proper tools and configurations. Configuration management tools like Ansible are designed to help you automatically deploy, manage, and configure hosts across your on-prem and cloud infrastructure. In this post, we’ll show you how to use Ansible to automate the installation of the Datadog Agent on a dynamic inventory of Windows hosts.

Keeping Your Cloud Connections Clear: A Deep Dive into Azure Network Monitoring

Welcome to the exciting world of cloud computing, where everything is digital and the skies are always clear...or are they? With so many connections in the cloud, it's easy to get lost in a digital haze. That's where Azure Network Monitoring comes in! By keeping a watchful eye on your cloud connections, you can ensure that your data travels smoothly and your cloud experience remains crystal clear.

5 Ways Nexthink Amplify Strengthens Your Service Desk

What does a normal day look like for your service desk? A flood of incoming tickets. Insufficient data to quickly find the root cause of issues. And a lack of remediation power to close tickets fast. Level 1 (L1) analysts are forced to spend time jumping across tools, reaching out to end users, or relying on guesswork to solve problems.

How to get started with monitoring Apache Cassandra with Grafana Cloud

Apache Cassandra is a highly scalable, open source NoSQL database system designed to handle large amounts of data across multiple commodity servers with no single point of failure. Apache Cassandra can be run as a single node but starts making sense when its run in a cluster setup. The system is optimized for high write throughput and is known for its ability to handle big data workloads with ease at super-low latencies.

Expanding Our Vision: Unifying Client-Side Observability Data

In 2021, we started Request Metrics as a simple and developer-friendly service to measure and improve web performance. We built an incredible platform that distilled complex data down into simple reports and recommendations. Lots of teams around the world found valuable insights in Request Metrics that they couldn’t get anywhere else. But web performance data can be very unpredictable—the web slows down in all sorts of ways.

Top Reasons to Embrace a Hybrid Multi-Cloud Strategy

It is not always possible or necessary for an organization to rely solely on cloud resources. For example, requirements might call for on-premises infrastructure for privacy reasons. Alternatively, some organizations might use both on-premises infrastructure and public cloud services provided by companies like AWS, Azure, or Google.

How to Reduce Duplicate Log Data with BindPlane OP

Are you dealing with duplicate log data? This can get expensive💰 and be difficult to parse, so how can one solve? Check out the clip with our CEO, Michael Kelly, as he shows you how to reduce telemetry data at the edge, reducing cost but not impacting visibility👀 with BindPlane OP. #telemetry #opensource #observability

How to Measure Jitter & Keep Your Network Jitterbug Free

Has your network been bitten by the jitterbug? Are you tired of your network dancing the jitterbug? Do you find yourself constantly tapping your foot waiting for pages to load or downloads to finish? Network jitter is your network's biggest enemy when using unified communications and real-time apps like IP telephony, video conferencing, and virtual desktop infrastructure. Troubleshooting and measuring jitter helps you avoid sounding like a robot on video calls.

The Complex Reality of Multi-Cloud Environments

Most companies today have multiple cloud instances with multiple cloud service providers (CSPs) as well as an on-premises environment. It’s complex, but that doesn’t make it inherently wrong—there are usually good business reasons behind the decisions. It does, however, create management challenges.

Set up application monitoring for your Node JS app in 20 mins with open source - SigNoz

In this article, learn how to setup application monitoring for Node.js apps with our open-source solution, SigNoz. Node.js tops the list of most widely used frameworks by developers. Powered by Google's V8 javascript engine, its performance is incredible. Ryan Dahl, the creator of Node.js, wanted to create real-time websites with push capability. On Nov 8, 2009, Node.js was first demonstrated by Dahl at the inaugural European JSconf.

Cribl Reference Architecture Series: Scaling Effectively for a High Volume of Agents

Join Cribl’s Ed Bailey and Ahmed Kira in an insightful discussion about scaling your Cribl Stream architecture to accommodate a large number of agents. Managing high-volume agent data flows presents a unique set of challenges that must be addressed to ensure the reliable transmission of data from your endpoints to your analytics systems, meeting business resiliency requirements. Errors arising from agent scale and data volume can lead to difficult-to-diagnose and even more challenging-to-fix issues that tend to surface at the most inopportune times.

10+ Best Tools & Systems for Monitoring Ubuntu Server Performance [2023 Comparison]

Ubuntu is a Linux distribution based on Debian Linux that’s mostly composed of open-source and free software. Released in three options – servers, desktop computers and Internet of Things devices. Ubuntu is highly popular, reliable and updated every 6 months, with a long-term support version released every two years. Multiple Ubuntu versions allow users to choose whether to stick with the long-term support version or the recently updated one.

Announcing support for monitoring AWS Lambda Function URLs with Datadog

AWS Lambda Function URLs make it even easier to create AWS Lambda functions that can be accessed and triggered by using HTTP/S requests, which is key for building serverless applications that are connected to and invoked from the web. Now you can generate a URL in one click that points to a specified Lambda function. Then, any HTTP/S request that a Function URL receives will trigger the Lambda function it’s assigned to.

Scrape Azure metrics and monitor AKS using Grafana Agent

As more organizations adopt cloud-based services like Microsoft Azure Kubernetes Service (AKS), it becomes increasingly important to monitor and manage the performance and reliability of these services. If you’re using AKS today, then Grafana Cloud provides the flexibility, performance, and visualizations you need to monitor your distributed applications.

A Comprehensive Comparison of New Relic and Scout APM

When it comes to getting to the root of a performance problem, nothing beats the lightning-fast speed and precision of Application Performance Management (APM). But choosing between New Relic and Scout can be like navigating a labyrinth of options and considerations. With New Relic, you'll get a top-of-the-line tool that's perfect for some situations, while Scout's streamlined approach fits like a glove in others.

Data Streaming in 2023: The Ultimate Guide

Data streaming is the backbone of so many technologies we rely on daily. Endless data sources that generate continuous data streams. Dashboards, logs and even streaming music to power our days. Data streaming has become critical for organizations to get important business insights — when you can get more data from more data sources, you might have better information to run your business. This article explains data streaming, including: Let’s get started!

Return large objects with AWS Lambda's new Streaming Response

Lambda has a size limit of 6MB on request and response payloads for synchronous invocations. This affects API functions and how much data you are able to send and receive from a Lambda-backed API endpoint. I have previously written about several workarounds on the request payload limit. But sometimes you also need to return a payload bigger than 6MB. For example, PDF or image files.

Closing the Gaps in Desktop Virtualization

In virtual desktop environments, visibility into common employee experience problems has traditionally been limited. In most virtual desktop scenarios, what’s really being delivered is an instance of Windows with a collection of Windows applications, including a browser to access various SaaS applications. EUC teams have been managing these environments for years, so this should be easy, right?

Introduction to Prefix Featuring OpenTelemetry

Prefix puts the power of OpenTelemetry in the hands of developers, supercharging performance optimization for your entire DevOps team. With unmatched observability across user environments, new technologies, frameworks and architectures, Prefix simplifies every step in code development, app creation and ongoing performance optimization for your apps and your team!

10 Best Tools to Monitor SSL Certificate Expiry, Validity & Change [2023 Comparison]

Webmasters always have their hands full with everything from user experience, search engine optimization and last but not least, SSL certificates. While some may not prioritize SSL certificates, they are still critical to the correct operation of your websites. Because Secure Layer Certificates are so important, monitoring them is a must! To help you get started, we’ve compiled a list of the top 10 best tools for monitoring SSL certificates for validity, expiry, and change.

Sponsored Post

OpenTelemetry 101: A Non-Technical Guide to Starting Your Open Observability Journey

If you’re involved in IT Operations, you’ve probably heard of OpenTelemetry. It’s a hot topic in the observability industry, and for good reason. OpenTelemetry is a set of open-source tools and APIs that make it easy to collect telemetry data from your applications and infrastructure. This data can then be used to monitor your systems, troubleshoot problems, and improve performance.

8 reasons customers choose CloudSpend over native cloud billing tools

CloudSpend-Cloud cost management FinOps professionals often find it difficult to gain full visibility of cost overruns resulting from cloud wastage. MangageEngine CloudSpend is a cloud cost management tool that helps you reduce cloud wastage and save costs. If you are a sysadmin or a FinOps professional and are wondering what CloudSpend does that native cloud billing tools don’t, this article is for you.

Factors Affecting Website Page Loading Speed & Optimization Strategies

The speed at which your website pages load can affect the overall user experience. Slow page loading times can lead to higher bounce and lower conversion rates, as users may become frustrated and leave the site before engaging. Optimizing your website's page loading speed helps improve user experiences, increases search engine rankings, and drives more traffic to your website.

Sysdig Validated as AWS CloudOps Competency Launch Partner

This week AWS unveiled its new Cloud Operations Competency–aka the CloudOps Competency–designed to recognize qualified partners who help cloud customers build and manage hybrid cloud environments securely and efficiently. Sysdig is a launch partner and is now validated for the AWS CloudOps Competency in Compliance and Auditing, as well as Monitoring and Observability categories.

Forecasting and Visualizing Time Series with Tableau and InfluxDB Cloud

Data analysis is a crucial aspect of any business or organization because it helps with making informed decisions and improving overall performance. However, with the vast amounts of data generated every day, it can be overwhelming to manually analyze and derive insights from it.

Webinar Recap: Unlocking Business Performance with Telemetry Data

Telemetry data can provide businesses with valuable insights into how their applications and systems are performing. However, leveraging this data optimally can be a challenge due to data quality issues and limited resources. Our recent webinar, "Unlocking Business Performance with Telemetry Data", addresses this challenge.

Gain agility through observability

As companies navigate geopolitical challenges, macroeconomic headwinds, and the post-pandemic comedown, business leaders face intense pressure to drive software transformation, reduce costs, and compete faster in the cloud-transition era of “lift and shift.” Amid layoffs and a slowed pace of hiring, the demand for better tools, real-time insights, seamless experiences, and contextual analysis has skyrocketed.

Speeding Up the Web: A Comprehensive Guide to Content Delivery Networks and Embedded Caching

Content delivery networks are an important part of the internet, as they ensure a short path between content and the consumers. The idea of placing CDN caches inside ISPs networks was created early in the days of CDNs. The number of CDNs with this offering is growing and ISPs all over the world take advantage of the idea. This post explains how this works and what to look out for to do it right.

Implementing a log management program: What is best to start with?

Everything you need to know about creating a log management program Businesses create, collect and have access to more data than ever before. Some of this log data, the record of events that occur in your digital spaces, can help DevOps and security teams assess the performance and reliability of their systems, evaluate weaknesses and troubleshoot any issues that may be occurring.

Troubleshoot faster and modernize your apps with AWS Monitoring and Observability

As a company born in the Amazon Web Services (AWS) cloud, we understand that operating at cloud scale requires balancing security, compliance, and operational safety with your commitment to innovation, speed, and agility. From cost optimization at scale to operational resiliency to application modernization, we know you’re facing various challenges and need reliable solutions.

OpenTelemetry: Why community and conversation are foundational to this open standard

While many of the popular tools for observability in software are open source, one thing they lack is open design. Most of these solutions, from Nagios to Prometheus, started as a product with an opinionated design, which happened to work well for many people. These became the de facto standards. That position of de facto standard is what every open-source project and every commercial product tries to be.

What are the best practices for log management?

Logs record digital actions within your IT system to let you know where errors or unauthorized access attempts originated. However, having only a partial log management plan — or lacking one entirely — can leave you with a mess of unstructured data that doesn’t provide the insights you need. Fortunately, following log management best practices can make tracking your digital actions or modifying your current log management plan a straightforward process.

Goliath Technologies Unveils AI-Powered Monitoring and Troubleshooting Assistant - KIP

Philadelphia, PA – April 6, 2023 – Goliath Technologies, a leader in end-user experience monitoring and troubleshooting software, today announced the introduction of KIP, an AI-powered monitoring and troubleshooting assistant. Powered by OpenAI/ChatGPT 4, KIP is designed to help IT professionals improve efficiency when monitoring and troubleshooting user experience issues.

The Future of Observability is Bright as Honeycomb Announces $50M in Series D Funding

TL;DR—This is a fundraising post! Yes, even in this economy. Here at Honeycomb, we've always focused more on the problems we help our customers solve rather than playing the meta game of posturing in startup-land—so these fundraising blog posts are usually the least fun to write (and read, probably). But this one is a little different.

Grafana Loki 2.8 release: TSDB GA, LogQL enhancements, and a third target for scalable mode

Grafana Loki 2.8 is here — and it’s at least 0.1 better than Loki 2.7! Jokes aside, this release includes a number of improvements users will appreciate. In addition to graduating our TSDB index from Experimental to General Availability, we’ve added a number of nifty LogQL features, and we’ve made the Loki deployment and management experience much easier. This also marks the release of Grafana Enterprise Logs (GEL) 1.7.

6 Steps When Your Website Get's Flagged as "Deceptive"

Seeing your website flagged as deceptive by Google or other search engines is enough to spoil anyone's day. You've spent long hours creating a site, only for users to be informed that it is a cybersecurity risk. But what can you do? Should you scrap the whole thing and start again? Today we'll explore why your website has been flagged as deceptive. We'll also look at what you can do to overcome the issue.

How Treating Testing and Monitoring as Separate Operations is Costing You Money

I’ll get right to the point: Not uniting testing and monitoring is costing you expensive engineering time, sales, and customer confidence. Below you’ll find an all too familiar scenario that outlines the problems of traditional testing and monitoring approaches and what the benefits are of a united approach to testing and monitoring through monitoring as code (MaC).

What to Expect When You Are Expecting: Cribl Data Routed to a Cribl Destination

For so many, the unknown sucks. Knowing or knowing what to expect is best. Why? Because it puts us at ease, and peace and gives us a calm sense of knowing without having experienced it yet. That’s part of my mission here at Cribl. I talk to a lot of people and the one consistent part of these conversations is the unknown.

How To: Kubernetes Inventory Monitoring with DX UIM

Learn how to install and configure new monitoring solution to discover, dashboard and monitor your enterprise Kubernetes deployment using DX Unified Infrastructure Management. DX UIM continues to expand its coverage of new-age technologies across hybrid, multi-cloud environments to provide full-stack observability from a single pane of glass.

Logic App Best Practices, Tips, and Tricks: #27 How to embed HTML images into emails

Today I will speak about another useful Best practice, Tips, and Tricks that you must consider while designing your business processes (Logic Apps): How to embed HTML images into your email using Logic App Designer.

How to write and install a custom Python plugin for Linux servers

This video will guide you through the process of writing and installing a custom Python plugin for Linux servers. With Site24x7's Plugin Integrations, you can monitor applications, hosts, devices, services, protocols, and more. Write your own custom plugin script to monitor any application or service in your tech stack in a few simple steps.
Sponsored Post

Streamline and Simplify SSL/TLS Certificate Monitoring

Hackers busily work night and day to find the tiniest hole in your security perimeter, so they can compromise your systems. Browsers are the most commonly used application on your enterprise network - and one becoming increasingly difficult to secure. Managing their security certificates became more challenging recently, but Exoprise's easy to deploy SSL certificate monitoring solutions close up any holes. There is no doubt that your network is constantly under attack.

Avantra: Deriving Value from Investing in Hyperautomation and AIOps

Hyperautomation and AIOps are two of the most important technologies that are driving the digital transformation of businesses across the globe. The business value of hyperautomation and AIOps is significant. By automating repetitive tasks, hyperautomation helps businesses to save time and reduce costs, while also improving accuracy and efficiency. AIOps, on the other hand, helps businesses to monitor their IT infrastructure in real-time, detect and resolve issues faster, and optimize IT operations, which ultimately leads to better business outcomes.

Monitor the Health of Your Node.js Application

Node.js is a popular choice for creating a scalable and highly performant web app. Its event-driven, non-blocking I/O model makes it well-suited for building real-time, data-intensive applications. Maintaining the health of your Node.js app includes monitoring and tracking several metrics over time to better understand how your app is performing. Monitoring your application's health is important to ensure its smooth operation and a good user experience.

12,000+ GitHub stars, better search capabilities, and a more intuitive Logs tab - SigNal 23

Welcome to our monthly product newsletter - SigNal 23! Last month, our team worked on improving the search capabilities across multiple tabs. We also made the logs tab more intuitive by adding color coding in the logs view. We presented SigNoz at an observability-focused meetup and crossed 12,000+ GitHub stars. Let’s dive in to see what humans at SigNoz were up to in the month of March 2023.

Grafana Alerting: A beginner's guide to templating alert notifications

We often see questions about how to template alerts. In Grafana, you can template information about your alerts with custom labels and annotations, and you can also template how notifications look and what information they contain with notification templates. Many users confuse the two, despite being separate features with different use cases.

LogicMonitor Maintains Leader Rankings in G2 Spring 2023 Reports

G2’s Spring 2023 Reports were announced March 30, 2023, with LogicMonitor grabbing several number-one spots and Leader rankings. This recognition is based on the responses of real users featured in the G2 review form. “Rankings on G2 reports are based on data provided to us by real software buyers,” said Sara Rossio, Chief Product Officer at G2.

Practical Steps for Enhancing Reliability in Cloud Networks - Part I

Delivering on network reliability causes an enterprise’s data to become more distributed, introducing advanced challenges like complexity and data gravity for network engineers and operators. Learn concrete steps on how to implement cloud reliability and the trade-offs that come with it.

Unleash the Lightning: Best Practices for Turbocharging and Monitoring Your Salesforce Performance and Speed

If you're a Salesforce user, you know how important it is for your system to run smoothly and quickly. Slow page loads and poor performance can lead to frustration for users and ultimately, lost productivity and revenue for your business. In this article, we'll explore the best practices for optimizing and monitoring your Salesforce performance and speed, so you can unleash the lightning in your instance.

How to Monitor Cloudflare with OpenTelemetry

With observIQ’s latest contributions to OpenTelemetry, you can now use free open source tools to easily monitor Cloudflare. The easiest way to use the latest OpenTelemetry tools is with observIQ’s distribution of the OpenTelemetry collector. You can find it here. In this blog, the Cloudflare receiver is configured to monitor logs locally with OTLP– you can use the receiver to ship logs to many popular analysis tools, including Google Cloud, New Relic, OTLP, Grafana, and more.

Website Monitoring for Shopify: Avoid Downtime & Lost Revenue

In today’s fast-paced e-commerce world, it’s essential for your Shopify store to remain accessible and functional at all times. Even a brief period of downtime can result in substantial lost revenue, tarnish your brand reputation, and drive potential customers to your competitors. That’s why website monitoring for Shopify should be an indispensable component of your store’s digital infrastructure.

EUC Professionals Choose Nexthink as Clear Leader in DEX

We’re honored and humbled to announce that popular Peer Reviews site, G2, has named us a Leader in DEX Management software. In a field of 18 other competitors, we’ve received the highest rating from hundreds of real EUC Professionals. We strongly believe in the adage: ‘Your Reputation is Everything’, so to receive such positive feedback from the IT community means the world to us here at Nexthink.

Citrix Cloud Troubleshooting: How to Resolve Experience Issues for End-Users

Citrix Cloud is a popular cloud-based solution that provides businesses with the ability to deliver secure and reliable access to applications and desktops to end-users from anywhere and any device. However, like any other cloud-based solution, there may be instances where end-users may experience issues while using Citrix Cloud. In this blog post, we will discuss how to troubleshoot and resolve experience issues for end-users on Citrix Cloud using a step-by-step troubleshooting workflow.

Using Elastic Anomaly detection and log categorization for root cause analysis

Elastic's machine learning helps support several easy-to-use features to help determine root cause analysis for logs. This includes anomaly detection and log categorization, which are easy-to-use features aiding in analysis without the need to understand or know about machine learning.

Revolutionize Your Observability Data with Cribl.Cloud - Streamline Your Infrastructure Hassle-Free!

Cribl.Cloud provides control over observability data without the hassle of running infrastructure. Cribl.Cloud quickly spins up all Cribl products — Stream, Edge, and Search — in just a few minutes.Teams can get working quickly and make their observability data valuable while Cribl handles scaling and security.

Pricing & Producthunt launch announcement

Since September, Spectate was free to use during the technical preview and later open beta phase. During this time, we have received immensely valuable feedback from all of you. Listening to feedback is something we'll never stop doing - as it makes Spectate even better bit by bit. First, let's talk about pricing. It's very hard to get this right, but we have decided on the following plans: The exact differences in features and limitations are listed on our pricing page.

Network Path Monitoring Pinpoints and Mitigates Connection Bottlenecks

An employee calls complaining about slow response time. Another one has similar trouble. No red lights are flashing on the Network Operations console, so the network is up and running. What is happening? Frankly, it could be just about anything: an overworked router, a runaway process on a laptop, a slow loading web page, or a bandwidth hog at home.

CloudDNS' geo-load balancing and redundancy ensure high availability and reliability

Cloud computing makes it easier for businesses to build and deploy applications that are accessible from anyplace in the world. But cloud computing also introduces new challenges, such as ensuring applications remain available even if there are network outages or unexpected spikes in traffic. ManageEngine CloudDNS provides a powerful geo-load balancing and redundancy feature that makes it easy to manage and distribute traffic across multiple servers and locations.

The AIOps journey: Navigating the path to proactive IT operations

In the modern IT era, most organizations are heavily on their IT infrastructure to stay relevant and competitive. However, managing complex IT systems can be a daunting task, as the volume of data grows and IT environments become more heterogeneous. To address these challenges, many organizations are turning towards artificial intelligence for IT operations (AIOps)—an approach that leverages AI and ML to streamline IT operations, improve efficiency, and reduce downtime.

Monitor external dependencies outages in Datadog

We're excited to announce a new feature release: the integration of IsDown with Datadog, a powerful addition to your cloud monitoring and SaaS monitoring toolkit. Datadog is a leading monitoring and analytics platform that provides full visibility into your infrastructure and applications. It allows you to track metrics, traces, and logs from various sources, giving you a comprehensive understanding of your environment's performance.

Sponsored Post

Building vs. buying a digital experience monitoring tool

It's usually possible, and often tempting, to build your own tools. For engineers in particular, there's a strong appeal to having total control over a custom-built product that will perfectly meet your specific requirements. Raygun itself originally came from an internal tool we built to monitor errors in a different product. But as our business matured and we started to think more strategically, we recognized the hidden, lingering costs to our DIY approach, and began to believe that internal tools were a misuse of the time and skills of our development team.
Sponsored Post

Airlines aiming to transform need modern Observability

The last decade has been nothing but a roller coaster ride for the airline industry. The pandemic has transformed it forever and now it needs to reevaluate its digital transformation priorities on how to manage traveler expectations. Taking it a step further, travelers buying behavior is changing farther as now they will want to book tickets while chatting with an AI interface. The transformation was already underway. In 2020, Google Cloud and Sabre announced a partnership to modernize Sabre. Recently, American Airlines announced their modern rebooking app launched in partnership with IBM. Lufthansa announced industry's first continuous pricing tailored to suit individual customer attributes.

Webinar: Let your SCOM alerts talk with ChatGPT

Are you tired of spending endless hours sifting through alerts and troubleshooting incidents manually? We have some exciting news that could revolutionize the way you manage alerts! Our friends at TopQore are hosting a webinar on April 12th at 4 PM CEST (Europe - Amsterdam time) that will introduce the integration of SCOM with ChatGPT, a powerful AI language model developed by OpenAI.

Mastering the Art of Network Device Monitoring: A Beginner's Guide

Are you ready to embark on a journey of network discovery and device monitoring? Or are you feeling overwhelmed and lost in a sea of network cables and devices? Fear not, my fellow explorer, for we are about to embark on a quest to master the art of network device monitoring! Whether you're a seasoned IT pro or a curious newbie, this beginner's guide will take you by the hand and lead you through the maze of network devices, protocols, and monitoring tools.

Monitor OpenAI API and GPT models with OpenTelemetry and Elastic

ChatGPT is so hot right now, it broke the internet. As an avid user of ChatGPT and a developer of ChatGPT applications, I am incredibly excited by the possibilities of this technology. What I see happening is that there will be exponential growth of ChatGPT-based solutions, and people are going to need to monitor those solutions.

Troubleshooting Intermittent Failure in Amazon ECS apps

A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another. The components interact in a decentralized manner and work together to achieve a common goal. Working with distributed systems is challenging, because failure often spreads between components and debugging across multiple components is difficult and time-consuming.

Citrix Logon Simulator Updates - eG Enterprise v7.2

Our latest release, eG Enterprise v7.2 has added a number of enhancements to our leading logon simulators. A large number of our Citrix customers rely on this popular tool to actively test and benchmark the speed and success of logons repeatedly even when no real users are accessing systems.

Kubernetes 1.27 - What's new?

This release brings 60 enhancements, way up from the 37 enhancements in Kubernetes 1.26 and the 40 in Kubernetes 1.25. Of those 60 enhancements, 12 are graduating to Stable, 29 are existing features that keep improving, 18 are completely new, and one is a deprecated feature. Watch out for all the deprecations and removals in this version! The main highlight of this release is actually outside Kubernetes.

Top Distributed Tracing Tools - Every Developer Should Know

Web applications have expanded over the past ten years to support millions of users and generate terabytes of data. Customers of these programmes anticipate quick responses and round-the-clock accessibility. When businesses adopt service-oriented architectures and give up monolithic workloads, they are stepping into the uncharted ground.

eBPF Explained: Why it's Important for Observability

eBPF is a powerful technical framework to see every interaction between an application and the Linux kernel it relies on. eBPF allows us to get granular visibility into network activity, resource utilization, file access, and much more. It has become a primary method for observability of our applications on premises and in the cloud. In this post, we’ll explore in-depth how eBPF works, its use cases, and how we can use it today specifically for container monitoring.

What is Generative AI? ChatGPT & Other AIs Transforming Creativity and Innovation

Upon its release in November 2022, ChatGPT stunned Silicon Valley and the world. OpenAI, a small company based in San Francisco, introduced a chatbot that mimics complex emotions, writes code and answers complex questions. Technology considered a decade away was now at everyone’s fingertips and quickly became the fastest-growing app in history. Just four months later, OpenAI launched a significant update, ChatGPT-4, and the results of this new technology are fascinating.

Serverless Architecture Explained: Easier, Cheaper, FaaS vs BaaS & Evolving Compute Needs

Want to build websites and apps in a way that’s both easier and cheaper? Well, it’s possible even for major organizations and international companies. In this article, let’s take a look at how serverless architecture and computing is changing the game for software developers. We’ll start at the very beginning and walk through how serverless works, how we got this far, and the pros & cons of this approach.

Monitor NGINX Performance Automatically with AppSignal

Understanding how NGINX performs can be overwhelming. There are many data points to follow, and it can be tricky to know which ones are relevant to you and which ones you can ignore. In this article, we'll explain how you can use AppSignal to monitor NGINX, expanding your visibility over your application's performance.

Monitoring and troubleshooting - Apache error log file analysis

Your Apache HTTP server access and error logs contain a wealth of actionable insights about potential server configuration and web application issues. The problem is that this information is hidden within millions of log messages, so you need analytics to efficiently extract these insights so you can respond to problems before they impact your users. Apache log analysis revolves around two activities: monitoring and troubleshooting.

What is log management in DevOps?

DevOps teams are used to working with data that is spread out across lots of different systems and environments. In organizations that have achieved tight collaboration with security teams to transition to DevSecOps, this is even more true! Log management is part of how all these teams keep track of information and make vital business decisions. It’s important to take a moment to understand what is meant by log management.

What is log management in security?

Cyber crimes are expected to cost the world roughly $10.5 trillion per year by 2025, according to Cybersecurity Ventures. And these attacks don’t just cost money. Businesses impacted by these kinds of crimes can expect to experience not only financial losses but also loss of productivity, damage to their reputation, potential legal liabilities and more.

What is log management used for?

Faced with an important business decision? Do you have the data you need to make it? Odds are, you probably don’t. Or, if the data is captured somewhere, can you count on it being in one place and easily accessible? This is a common issue, easily solved by proper log management. This practice is vital for data-driven businesses, helping you maintain security, troubleshoot operations more quickly and enhance user experience.

Ship OpenTelemetry Data to Coralogix via Reverse Proxy (Caddy 2)

It is commonplace for organizations to restrict their IT systems from having direct or unsolicited access to external networks or the Internet, with network proxies serving as gatekeepers between an organization’s internal infrastructure and any external network. Network proxies can provide security and infrastructure admins the ability to specify specific points of data egress from their internal networks, often referred to as an egress controller.

Flowmon Solution Overview

Discover all capabilities of Flowmon Network Detection and Response (NDR) and Network Performance Monitoring and Diagnostics (NPMD). Flowmon analysis. In a world where technology exists for the benefit of people, secure and healthy digital environments are essential. That’s why Flowmon Networks develops an actionable network intelligence solution that enables businesses to ensure their services are running well and securely, and their workforce is productive.

MetricFire - Dashboards Tutorial

In this video, we explore MetricFire’s powerful dashboarding platform. Discover how to customize your dashboard to view the metrics that matter most to your business. With a variety of visualization options and an intuitive interface, creating and editing charts, tables, and graphs has never been easier. Share your dashboard with your team and keep everyone on the same page. Join MetricFire and start tracking your business metrics today.

ChaosSearch Pricing Models Explained

ChaosSearch was built for live analytics at scale on cloud storage. Our architecture was designed for high volume ingestion of streams & analytics at scale via ElasticSearch & Trino API via a stateless fabric that can scale to meet the customers’ scale & latency requirements. Because we don’t store any data, under the hood, ChaosSearch is basically a set of containers that are deployed in cloud compute instances in a dedicated VPC to each customer managed by ChaosSearch.

El Salvador government healthcare institution saves up to $400,000 by using OpManager

Instituto Salvadoreño del Seguro Social (ISSS) is an El Salvadorian government institution that provides health care services to the people of El Salvador. The institution offers insurance, medical treatment, prescription home delivery, and other health-related services. There are about 114 branches throughout El Salvador.

What is Context Propagation in Distributed Tracing?

In modern microservices-based applications, it is difficult to get an overview of how requests are performing across multiple services, infrastructure, and protocols. As companies began moving to distributed systems, they realized they needed a way to track requests in their entirety for debugging applications. Distributed tracing is a technology that was born out of this need.

5 Tools for Managing a Network from a Remote Location

If you’re responsible for overseeing a network infrastructure, but you’re not always on-site to complete tasks and tackle issues in person, you need the right tools to empower you in your admin efforts. There’s a diverse array of resources out there which will enhance your network management capabilities, even when you’re working remotely. Here are just a few examples of must-have apps for you and your team in this context.

How to Choose the Right Database in 2023

Databases are often the biggest performance bottleneck in an application. They are also hard to migrate from once being used in production, so making the right choice for your application’s database is crucial. A big part of making the right decision is knowing what your options are. The database landscape has been changing rapidly in the past few years, so this article will try to simplify things for you by going over the following topics.

How to monitor Kafka and Confluent Cloud with Elastic Observability

The blog will take you through best practices to observe Kafka-based solutions implemented on Confluent Cloud with Elastic Observability. (To monitor Kafka brokers that are not in Confluent Cloud, I recommend checking out this blog.) We will instrument Kafka applications with Elastic APM, use the Confluent Cloud metrics endpoint to get data about brokers, and pull it all together with a unified Kafka and Confluent Cloud monitoring dashboard in Elastic Observability.

Continuous Monitoring: 5 Tools to Give You Peace of Mind

If you’re part of the DevOps, SecDevOps, or IT team, you would agree that continuous monitoring of the entire IT systems and networks is vital. And this explains why 70% of businesses treat preventing downtime as a priority to avoid causing significant issues in the future. So monitoring your IT infrastructure constantly with a continuous monitoring plan is one way of improving efficiency and productivity.

Platform Engineering 101: Origins, Goals, DevOps vs SRE & Best Practices

Platform engineering is the practice of automating infrastructure operations and enabling self-service infrastructure capabilities within collaborative Dev, Ops and QA teams. It involves designing and building platforms, technologies and workflows that enable self-service capabilities to automatically manage, provision and operate complex modern software architecture environments.

New APM Capabilities Help Optimize Application Performance Across Monoliths or Microservices

With the goal of helping you get to the “why” faster, Splunk Observability recently announced several new enhancements to reduce noise and provide more visibility when isolating problems in your environments. Specific to applications and services, whether you operate monolithic or microservices architectures our releases help you easily investigate problems in complex environments. Here’s a roundup of the recent Splunk APM capability releases, and helpful links to help get started now.

Frontend vs. backend: How to plan your performance testing strategy

There are many aspects of application performance, but they broadly fall into two categories: frontend performance and backend performance. As a tester, it’s important to know the differences between the two and how that impacts the way you approach your tests. In this blog, I’ll provide a high-level overview of frontend performance testing and backend performance testing, including pros and cons of each one.

How To Improve Performance of SaaS Applications using Nexthink

In today’s dynamic and rapidly changing environment, organizations deploy most, if not all of their business solutions using SaaS applications. This fast-paced digital transformation makes it critical for organizations to ensure the performance of SaaS applications, so that these applications can meet end user demands and provide the expected experiences. IT teams must ensure their SaaS applications are functioning well, facilitating productivity rather than inhibiting it.

Reduce time to detect with AppDynamics Cloud Log Analytics

How machine learning in AppDynamics Cloud accelerates log analysis and reduces mean time to detect. Site recovery engineers (SREs) need to investigate unknown problems reported in production. The common approach is to search and filter log files to find the root cause, and we all know how painful it is to sift through log contents. It’s like finding a needle in a haystack. A machine learning approach is essential to assist SREs to quickly identify the root cause.

Five Free Tools to Monitor Microsoft Teams Performance

Microsoft Teams has become a popular platform for remote communication and collaboration, especially since the outbreak of the COVID-19 pandemic. As more and more businesses rely on Microsoft Teams to communicate and work together, it’s crucial to ensure that the platform is performing optimally and to measure and improve Microsoft Teams performance. Fortunately, there are free tools available to help you monitor the performance of Microsoft Teams.

The Five Sources of Microsoft Teams Call Quality Issues

Microsoft Teams call quality issues can stem from a lot of places. People often jump to blame Microsoft when they’re having a call or video breakup. But, the reality is that while this does happen, more often than not it’s an external factor to the Microsoft service itself. To give you some insight into where your Teams problems come from we put together this easy primer on the five most common sources of Teams call quality issues.

Enhancing Datadog Observability with Telemetry Pipelines

Datadog is a powerful observability platform. However, unlocking it’s full potential while managing costs necessitates more than just utilizing its platform, no matter how powerful it may be. It requires a strategic approach to data management. Enter telemetry pipelines, a key to elevating your Datadog experience. Telemetry pipelines offer a toolkit to achieve the essential steps for maximizing the value of your observability investment. The Mezmo Telemetry Pipeline is a great example of such.

ITSM and monitoring: A match made in IT heaven

It has been a veeeeery long time since we discussed a technical concept from an ingenious allegory. Many people send us emails asking us why, and we have to admit that… it’s true, everything is quite more fun with fantastic allegories. So be it then! At the request of our fans. Let’s talk today about ITSM and Monitoring Support through an invented event from which we can then draw a technical lesson.

Azure Resource Monitoring: The key method for Holistic Monitoring

When your organization has started to adopt Azure, the continuously increasing number of Azure resources throughout all your Development, Test, and Production subscriptions make it hard to keep on top of the health of all those resources. So, it is important to have proactive Azure resource monitoring to know when something unexpected happens.

Spying on Your Network with AWS Network Monitoring

Are you tired of feeling like you're in the dark when it comes to your network? Do you want to know what your data is up to when you're not looking? Well, put on your spy gear, because we're about to show you how to spy on your network with AWS Network Monitoring! Forget about traditional monitoring methods that leave you feeling overwhelmed and under-informed.

How to Test Your Network Latency : Ensuring a Smooth Connection

In today's fast-paced business environment, reliable and efficient network connectivity is crucial to the success of any organization. Slow or inconsistent connections can disrupt operations, cause delays, and negatively impact productivity. One of the key factors that can affect the performance of your network is latency, or the delay between the sending and receiving of data.

How To Improve MS Team's Poor Call Quality across the Enterprise with Nexthink

Microsoft Teams is one of the most widely used collaboration tools today. Entire enterprises rely on MS Teams for cross-functional communication, project management, and productivity. If Teams has issues, entire projects can get derailed, and business objectives could be at risk. This places pressure on IT teams to proactively measure and manage the performance of MS Teams, to ensure issues don’t prohibit employee productivity.

When Slow is Not an Option: How to Improve & Troubleshoot Your SAP Speed Performance

SAP (Systems, Applications, and Products) is a popular enterprise resource planning (ERP) software used by businesses to manage various operations such as finance, human resources, and supply chain management. As businesses rely on SAP for their critical operations, it is essential that the software performs at an optimal level. Slow SAP performance can result in frustration, delays, and ultimately, reduced productivity.

Anomaly Detection Using OSquery and Grafana

Detecting unauthorized usage and malicious applications in an instance involves analyzing OS and application logs. Doing this manually is a herculean effort because of the number of logs and the patterns one has to look for. Having a tool that can provide an aggregated view of your instance and the ability to analyze them easily can greatly reduce manual effort.

Quantum Entangled Observability

As the world of technology continues to evolve, the demand for cutting-edge solutions to monitor and optimize system performance has never been higher. Today, we’re excited to introduce a revolutionary new concept in observability: Quantum Entangled Observability (QEO). This ground-breaking method leverages the peculiar properties of quantum mechanics to provide unparalleled insights into your systems’ inner workings.

Shed Some Light with Meraki Network Monitoring! Don't Be in the Dark

In today's fast-paced business environment, maintaining a reliable and high-performing network is essential. One technology that can help achieve this goal is Meraki SD-WAN (Software-Defined Wide Area Network), a cloud-based solution that enables organizations to easily manage and optimize their network performance. However, simply deploying SD-WAN is not enough.