Operations | Monitoring | ITSM | DevOps | Cloud

June 2022

What to look for in a Windows network monitoring tool

Monitoring the Windows devices in a network is difficult yet essential since the devices are tasked with the critical functioning of the network. The challenges and complexity increase multi-fold for an enterprise network because each device is associated with many events, services, and processes that must be monitored to ensure the hassle-free operation of both the devices and the network. The devices should be monitored constantly by a network monitoring tool.

Your Telemetry Data is Faster, Is Your Analysis?

Continuous intelligence (CI) platforms can be used to collect telemetry data from various sources, perform analysis on that data, make inferences about the data, and provide real-time insights that help businesses understand what’s going on. For years, network, application performance, and security monitoring were fairly passive operations. Systems collected key telemetry data, and operators received alerts when a particular metric crossed a preset threshold. Operations were limited in two ways.

Detect user pain points with Datadog Frustration Signals

Whether you run an ecommerce site, a digital publication, or any other customer-facing service, delivering optimum user experiences is key to the success of your business. Customers can grow frustrated and abandon your site when they run into hurdles such as JavaScript errors or confusing page designs, and that frustration negatively impacts your company’s bottom line.

Analyze wait events and in-flight queries with the Datadog Database List

When you’re operating databases at scale, being able to get real-time insights across all your databases is essential for addressing issues and identifying areas for optimization. Datadog Database Monitoring’s Database List allows you to monitor your entire database fleet in one place, so you can quickly identify and troubleshoot overloaded hosts and gauge the impact of problematic queries throughout your infrastructure.

Application Snapshots: A Valuable Observability Signal for Developers

Monitoring is often not the first thing on the mind of the modern developer. Yet, it’s necessary at many points of the software development lifecycle, including: before deprecating an API, before launching a new feature, after launching the feature, and more. In fact, monitoring needs can vary much more than the classic Ops monitoring.

gRPC - Monitor gRPC calls with OpenTelemetry | Explained with a Go example

OpenTelemetry can only help in generating the telemetry data. In order to store, and analyze that data, you need to choose a backend analysis tool. In this article, we will monitor collected data from gRPC calls with SigNoz. SigNoz is a full-stack open-source APM tool that provides metrics monitoring and distributed tracing. It is built to natively support OpenTelemetry data formats. Hence, it’s a great choice for a backend analysis tool to combine with OpenTelemetry. On a side note, OpenTelemetry provides you the freedom to select a backend analysis tool of your choice.

3 Enterprise IT Factors That Will Make MSPs More Successful in 2022

A version of this blog first appeared in APMdigest. A new study by OpsRamp on the state of the Managed Service Providers (MSP) market concludes that MSPs face a market of bountiful opportunities but must prepare for growth by embracing complex technologies like hybrid cloud management, root cause analysis and automation.

Cloud Monitoring metrics, now in Managed Service for Prometheus

According to a recent CNCF survey, 86% of the cloud native community reports that they use Prometheus for observability. As Prometheus becomes more of a standard, an increasing number of developers are becoming fluent in PromQL, Prometheus’ built-in query language. While it is a powerful, flexible, and expressive query language, PromQL is typically only able to query Prometheus time series data.

How to monitor nginx in Kubernetes with Prometheus

nginx is an open source web server often used as a reverse proxy, load balancer, and web cache. Designed for high loads of concurrent connections, it’s fast, versatile, reliable, and most importantly, very light on resources. In this article, you’ll learn how to monitor nginx in Kubernetes with Prometheus, and also how to troubleshoot different issues related to latency, saturation, etc.

Monthly Product Update - Edge Data Replication and Sample Apps for IoT & Node.js

We love to write and ship code to help developers bring their ideas and projects to life. That’s why we’re constantly working on improving our product in sync with developer needs to ensure their happiness and accelerate time to awesome. This is the second in a blog series covering our product’s latest features — features that we think will save you time and effort when building with time series and InfluxDB.

InfluxDB Named a Leader in G2's Summer Grid Report for Time Series Databases

Industry-leading time series platform recognized for ease of setup and user satisfaction SAN FRANCISCO, June 30, 2022 – InfluxData, creator of the leading time series platform InfluxDB, today announced it has been named a leader in the G2 Grid for Time Series Databases, as well a leader within the inaugural Momentum Grid® Report for Time Series Databases in the Summer 2022 ratings from G2, the world’s leading business solution review platform.

Centralized Log Management for Network Monitoring

It’s been a long few years for your IT department. In the span of one month, you had to make sure that all employees and contractors could work remotely. This meant giving everyone access to all cloud resources and ensuring uptime. Then, you needed to start securing access. Now, you need to shore up all your security as the phrase “zero trust architecture” has recently entered conversations with leadership.

Get Started with Python Error Reporting

Python is one of the most industry-oriented languages. It has versatile usability and features that are great for all types of tech teams. You can make both frontend websites and backend APIs with Python. According to a report by Statista, around 48.24% of the developers worldwide use Python as one of their programming languages. Python is a backend technology in most cases, so monitoring the application is very important. The backend is the backbone of the application.

Digital Experience Monitoring from Exoprise (Intro video)

Digital Experience Monitoring (DEM) from Exoprise offers a unique solution combining proactive monitoring with synthetics and endpoint monitoring with RUM. We call it Better Together! Using the Exoprise DEM solution, enterprises can empower IT to support work-from-anywhere initiatives. Whether your team is interested in 24*7 monitoring of Microsoft 365 or complete coverage of third-party services like Zoom, Exoprise has it all.

Cloud, Visibility, and Security

Three great things that do not always work great together. In the beginning there were large computer systems that few organizations could afford. Over time these systems became smaller and cheaper and many (if not most) organizations took advantage of them. Some just at the end-user level (i.e. the IBM PC on the desk), some only at the high-end level (i.e. a mainframe in the data center with terminals on desks), and some in a combination of both (anyone remember Reflection?).

Move Faster with Rollbar Improve

Rollbar was founded with the belief that done is better than perfect. Building software is complex and it's better to move quickly and manage risk intelligently rather than try to build perfect code. For the past decade, Rollbar has provided peace of mind to hundreds of thousands of developers by monitoring production environments for errors. The tool has been leveraged to find and fix bugs in a fraction of the time and is trusted by the individual developers to at-scale enterprises.

How to Save on Monitoring Costs by Using Honeycomb

Are you overspending on monitoring and APM tools? Forrester’s Total Economic Impact analysis of Honeycomb identified significant ROI in customers using us to reduce spend on less efficient APM workflows. But this isn’t about budget reallocation to a newly branded set of similar but shinier tools.

Leverage error data for automated decisions in build pipeline

Discover how you can leverage Rollbar’s Versions API to make an automated decision in a build pipeline. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

Implementing Synthetic Monitoring with Telegraf and Logz.io

In my previous blog post, we explored key questions about Synthetic Monitoring, such as what it is, why it’s important, how it works, and how it compares to Real-User monitoring. Synthetic Monitoring is becoming an increasingly-popular method to continuously monitor the uptime of applications and the critical flows within them so that DevOps, IT, and engineering teams are quickly alerted when issues arise. Unfortunately, a good Synthetic Monitoring tool can be expensive.

On moving over a million uptime checks per week onto fly.io

The other day, a friend told me about fly.io's nice developer experience (DX). For my day job, I work on improving wrangler2's DX, so naturally it had me curious. I went from "I'll just play around with it, maybe give it a toy workload" to "holy shit, what if I quickly rewrite my business's AWS Lambda + SQS stack to fit entirely within their free tier" in about 90 minutes. It wasn't that simple in the end, but I did manage to migrate most of my active workload from AWS Lambda to fly.io.

Authors' Cut-How Observability Differs from Traditional Monitoring

Remember the old days where if you had an uptime of 99.9 you could be fairly confident everyone was having a good experience with your application? That’s not really how it works anymore. Modern, distributed systems are so complex they typically fail unpredictably, making it much harder to diagnose issues. Traditional monitoring grew out of those early days, allowing you to check the health of simpler systems.

TL;DR Replication from Edge to Cloud with InfluxDB

Depending on your available resources, data analysis can take place at the edge or in the cloud, but businesses don’t need to choose one location over the other. There are benefits to giving the edge autonomy to collect, process, and act on data locally. Data replication helps maintain edge autonomy and makes it easier for users to get the data they need, where they need it.

Collect More Data with Windows Server Support in Cribl Edge 3.5

Cribl Edge is the easiest and most manageable agent for exploring, processing, and collecting Observability data at the edge for Linux servers. Today, we’re excited to announce that it’s not just Linux admins whose lives have been made easier with Edge. With the Cribl Software Suite 3.5.0, Cribl Edge now supports Windows Server 2016, 2019, and 2022, bringing that same intuitive experience for deploying, setting up, and collecting observability events to your Windows infrastructure.

Bring More Reliability and Insights to Your Observability Pipelines with Cribl Stream 3.5

We’ve been busy building more features for Cribl Stream, and are excited to share the new value we offer our users. Cribl Stream 3.5 is now available! This release brings some much-requested features that will help users build more robust observability pipelines, with new sources and destinations. Let’s dive into what’s new!

Cribl.Cloud Summer 2022 Release Helps You Be Even More Proud of Your Cloud

Cribl.Cloud’s Summer 2022 release is now available in an AWS cloud near you! As part of this release, we are excited to share the features we have been building, including the latest Cribl product releases (Stream 3.5 and Edge 3.5). This release brings some much-requested features that will help customers increase their compliance, reduce overall costs, and deploy a more resilient observability data pipeline.

Cribl's New Education and Certification Program Defines a Critical Role in Observability

What is an observability engineer? They build monitoring tools, right? Develop data pipelines? For time series data? Maybe distributed tracing? Ah, got it…an observability engineer is just an extension of an SRE with a wider ‘end-user’s’ perspective? But don’t they also build solutions that move telemetry for security tools? Maybe monitor and review an organization’s overall security posture?

Top-10 Cisco Live 2022 Announcements/Highlights

It was great to be back in person for the Cisco Live 2022 annual conference that happened in Las Vegas from June 12 to June 16, 2022. CloudFabrix is a Cisco solution partner and we had our booth #3645 on the show floor where we showcased our Robotic Data Automation Fabric (RDAF) and how it can help accelerate AIOps and Observability projects. We got a lot of interest from many enterprises, partners, and community members.

Integrating Moogsoft with Datadog: Key Benefit Discussion | Moogsoft Product Videos & How-Tos

Are you a Datadog customer? And do you wish to optimize your workflow further? Then this video is for you! In this video, you will learn a sample use case to learn the key benefits of integrating Moogsoft to your Datadog environment. Don't forget to subscribe for content on DevOps, Observability, AIOps and more!

The Keys to Understanding Network Visibility from the End-User Perspective

The core network paths today’s applications take primarily live outside of IT’s control and sight, even though the burden of fixing these paths still resides with network operations. Understanding the dynamic delivery paths is critical to being able to drive successful digital transformation across the enterprise. Without this insight, there is significant risk to end-user experience, IT efficiency and the ability to support an increasingly remote workforce.

Network Basics: What Is SNMP and How Does It Work?

Simple Network Management Protocol (SNMP) is a way for different devices on a network to share information with one another. It allows devices to communicate even if the devices are different hardware and run different software. And despite any rumors you may hear, it’s not going anywhere anytime soon.

Grafana reporting: How we improved the UX in Grafana

Behind every feature in Grafana, no matter how big or small, lies a lot of hard work, commitment, and attention to detail. At Grafana Labs, we use a cross-functional team approach to come up with ideas and solutions that ultimately make our products more usable, resilient, and adaptable to user needs. To achieve this, we work collaboratively across the UX, product, and engineering disciplines.

Introducing Metric Alert notification charts and Duplicate Alerts

Unless you stare at Sentry all day waiting for error or performance problems to pop up, chances are you rely on alerts to let you know when something breaks or slows down. There are two types of alerts you can set up: 1) Issue Alerts, which are tied to issues that meet predefined criteria, and 2) Metric Alerts, which trigger when error volume or performance metrics like apdex, latency, or throughput surpass a pre-defined threshold.

Madrid improves the services of the EMT monitoring the data of its buses

The public service of buses of Madrid, the EMT, was one of the protagonists of Global Mobility Call that took place the 14th and 16th of June in IFEMA. The company from Madrid exposed its future plans tracing the route to continue transforming the capital into a Smart City where sustainable mobility and independent vehicles acquire great relevance. Nevertheless, executing this plan would be impossible without a technology characterized by generating data on a steady basis.

We're thrilled to be positioned in the 2022 Gartner Magic Quadrant for Application Performance Monitoring and Observability!

We are elated to announce that ManageEngine has been recognized in the Gartner® Magic Quadrant™ for Application Performance Monitoring and Observability for the tenth time! The applications performance monitoring industry has grown tremendously over the past decade and the wide range of vendors highlights this growth.

Monitor your Graviton3-powered EC2 instances with Datadog

AWS’s new Graviton3 EC2 instances are built on its third generation of custom Arm-powered processors. These instances promise up to 25 percent better performance over Graviton2 for compute-intensive workloads. This means that, for applications like distributed data analytics, machine learning, video encoding, gaming, and more, migrating to Graviton3 instances can provide better performance, cost savings, and more energy efficiency.

Monitoring Moodle Applications to Deliver High Quality Educational Services

In this article, we will cover a case study of one of our higher education customers using eG Enterprise to proactively monitor and troubleshoot their Moodle applications to ensure service availability and performance for their staff and students.

6 lessons from Cloudflare's June 2022 outage

On June 21, 2022, the US-based global content delivery network (CDN) provider and security company Cloudflare suffered an outage at 6:27 UTC that lasted until 7:42. The outage was caused by a network configuration error that affected 19 of Cloudflare's data center locations — Amsterdam, Atlanta, Ashburn, Chicago, Frankfurt, London, Los Angeles, Madrid, Manchester, Miami, Milan, Mumbai, Newark, Osaka, São Paulo, San Jose, Singapore, Sydney, Tokyo.

StackState's v5.0 Release Delivers New 4T Monitors and More: Apply the Power of Topology to Transform Traditional IT Monitoring

This week we released v5.0 of StackState’s observability and AIOps platform, which introduces a rich set of new capabilities. Our latest release contains a little something for everyone responsible for reliably running business critical workloads in dynamic environments – SREs, DevOps, central platform teams, even business teams – and for new and existing users alike.

Improve Website Performance by Checking Logs

In the early days of log analysis, application developers would use their logging libraries to write logs to files stored on a disk. After years of relying on those libraries, they found that they were unable to monitor the performance of their applications anymore because they didn’t understand the way their logging libraries worked. This led to a shift from using log files stored on a disk to using Syslog.

Have fun again creating, discover visual console and dashboard editing

The visual console editor allows the user to visually design the final layout by dragging elements with the mouse, choosing the background and the icons that represent the status of each relevant aspect you want to show. With dashboards you may define screens with different created visual elements and share them with other users or display them in full screen as slides for the whole team to see.

Exploring AWS Costs Beyond the Service Level

Honeycomb uses AWS Lambda as a core part of our query execution architecture; Lambda’s ability to quickly allocate lots of resources and charge us only for use is invaluable to keeping Honeycomb fast and affordable. Our total Lambda bill is easily accessible in the AWS Console, but how do we know which customers or application areas dominate this bill? How do we judge the cost of changes we make to our own software?

Tips for Optimizing React Native Application Performance - Part 2: Using Sentry SDK for Performance Monitoring

Monitoring performance in front-end applications is vital. It focuses on the areas of the application users experience. These areas are slow rendering or frame rate, network request errors, and unresponsive user experience. Your application’s users expect the experience of using the application to be fast, responsive, and smooth. In the first article of this series, we discussed some tips for optimizing your React Native application performance.

An Introduction to Synthetic Monitoring: Monitor the Uptime of your App and Critical Flows

In a world where the customer’s digital experience is critical to business outcomes, it is crucial to understand how our applications are behaving. As businesses increasingly rely on the performance and availability of revenue-generating applications, the tolerance for downtime and slow response times has plummeted – so the response to production issues must be quick and effective.

MQTT vs Kafka: An IoT Advocate's Perspective (Part 3 - A Match Made in Heaven)

So here we are…the final chapter. In Part 2 of this series, we started to drill down into some of the concepts that make Kafka great. We concluded that although terminology between MQTT and Kafka was similar (for example topics), they behaved quite differently under the hood. We also took a brief overview of Kafka Connect and how we can use some of the enterprise connectors to stream our data to other platforms. Yet we did learn that Kafka does have some shortfalls.

Elastic Observability 8.3: Broader observability for cloud, SaaS, and big data

Note 8.3.0 has an issue that could cause creating and accessing snapshots against Azure snapshot repositories to fail authenticating when using SAS tokens. This impacts self-managed customers who have deployed 8.3.0. Elastic Cloud Azure deployments are not currently being upgraded to 8.3.0 and are not impacted as a result. Visibility is crucial for ensuring application performance but it can be difficult to efficiently scale monitoring across all your critical infrastructures, platforms, and services.

How to monitor Tomcat with OpenTelemetry

We are constantly working on contributing monitoring support for various sources, the latest in that line is support for Tomcat monitoring using the JMX Receiver in the OpenTelemetry collector. If you are as excited as we are, take a look at the details of this support in OpenTelemetry’s repo. You can utilize this receiver in conjunction with any OTel collector: including the OpenTelemetry Collector and observIQ’s distribution of the collector.

Deconstructing AIOps: Is it even real?

This essay explores AIOps and investigates if machine intelligence applies to IT operations (ITOps). I will dive into objection handling around artificial intelligence (AI) in pop culture and address the limitations around data sets and implicit bias coded into machines. Then, I will delve into what this means for ITOps and the ways AI-based parsing utilities can help operators and developers alike. How does Sumo Logic enable anomaly detection and identify threats?

Dashboard Studio: Level-Up Your App with Dashboard Studio

Dashboards are a powerful tool for communicating a lot of information at once. Many Splunk apps are packaged with dashboards to help you make the most of your data. For example, the Microsoft 365 App for Splunk comes with a number of dashboards to provide insights around usage, incidents, and more.

Financial Impact of an Outage

In October 2021, the world’s largest social media platform suffered a massive worldwide outage affecting billions of customers. Facebook has a monthly active user base of 2.8 billion users, which increases to 3.5 billion when you include its subsidiaries such as Instagram, WhatsApp, and Oculus. The platform succumbed to a “Gigalapse,” which happens when a server can’t adequately respond to excessive demand.

The Auvik Network Device Buyer's Guide

Buying the right network devices is an essential part of network design, and can have an impact throughout the network lifecycle. Get it right, and your network is high-performing, easy to troubleshoot, and reliable. Get it wrong, and downtime, complexity, and costs add up fast. A network device buyer’s guide would probably be really helpful. So we made you one.

NextJS - Monitoring your NextJS application using OpenTelemetry and SigNoz

In this video, we demonstrate how to implement OpenTelemetry NextJS libraries for a sample NextJS application and then visualize the collected data in SigNoz. More about SigNoz: SigNoz - Monitor your applications and troubleshoot problems in your deployed applications, an open-source alternative to DataDog, New Relic, etc. Backed by Y Combinator. SigNoz helps developers monitor applications and troubleshoot problems in their deployed applications. SigNoz uses distributed tracing to gain visibility into your software stack.

Elixir - Monitor your Elixir Application with OpenTelemetry and SigNoz

SigNoz provides query and visualization capabilities for the end-user and comes with out-of-box charts for application metrics and traces. Now let’s get down to how to implement OpenTelemetry in your Elixir application. More about SigNoz: SigNoz - Monitor your applications and troubleshoot problems in your deployed applications, an open-source alternative to DataDog, New Relic, etc. Backed by Y Combinator.

Java Debugging: Using Tracing To Debug Applications

Write enough programs, and you’ll agree that it’s impossible to write an exception-free program, at least in the first go. Java debugging is a major part of the coding process, and knowing how to debug your code efficiently can make or break your day. And in Java applications, understanding and leveraging stack traces can be the game-changer you need to ship your application quickly. This article will cover how to debug in Java and how Java stack traces simplify it.

K-12 and Network Monitoring: Solving IT Mysteries, Meeting Challenges

To say that K-12 school systems have challenges is an understatement. COVID forced schools to make a dramatic turn towards remote learning, which meant the network was anything but insular, forcing IT to efficiently support thousands of new remote endpoints. That is on top of other K-12 network challenges. Issue number one: tight budgets. Most school systems are tight for cash, especially after the financial stresses of COVID and all the millions spent on PPE.

South East Asia-based company reduces its MTTR using Applications Manager

Founded in 1983, KFin Technologies is a leading transaction processing platform based in South East Asia. The organization serves the mission-critical needs of asset managers with clients in mutual funds, AIFs, pension, wealth management, and corporations in India and abroad.

Logit.io Announces The General Availability of OpenSearch

You may remember from our previous update (published August 2nd 2021) that we announced our initial support for the beta version of OpenSearch, well today we are pleased to announce that we’ve recently launched OpenSearch & OpenSearch dashboards 2.0.0 available for all platform users to use from today.

How Travel Disruptions Have Affected Website Performance

Recent travel disruptions have impacted customers in more ways than one. Cancelled flights, extended airline wait times and staff shortages have all contributed to long-awaited holidays being disrupted or cancelled, often at the last minute. As a result, there has been an erosion of consumer trust and confidence in the travel industry.

Spinning Time Series into Efficient Wind Power

Operating sustainably and promoting green practices can be more complex than you imagine. The benefits of sustainable practices can be significant for businesses, people and the planet. Companies need to ensure that they’re achieving those benefits in a way that complies with established rules and regulations to maximize the impact these initiatives have.

Democratizing Data Using a DataFabric & How it Benefits IT Enterprises

Enterprises today want real-time business insights to make decisions that improve operational efficiency and customer engagement and present newer revenue opportunities. However, the promise of the data-driven business falls short due to gaps in data management. These gaps exist because data in the modern enterprise doesn’t exist only behind firewalls and within organizational premises.

Big Relationships Grow From Big Data: UMBRiO, StackState and Splunk

Until I came to StackState seven years ago, I was selling for Splunk. Everyone that has big data knows Splunk – big data, big data lakes, observability data, data analysis. Anything to do with data, Splunk is there! While at Splunk, I had the good fortune to sign up a reseller by the name of UMBRiO. Together, with Erik Witte and his team, we sold a multi-million dollar deal to a large, multi-national bank in the Netherlands that put our partnership on the map.

How to Detect Ransomware: 12 Monitoring & Alerting Opportunities to Automate

Ransomware threatens the loss of crucial data as well as financial loss. However, with the right knowledge and tools, you can take action to protect your business from the damaging effects of ransomware. In a 2022 Coveware Q1 ransomware report, the average ransom payment was $211,529 with an average of 26 days for downtime suffered. Ransomware obviously continues to be a huge and costly threat to industries across the board.

Docker Stats | Understand how to monitor Docker Metrics with docker stats

Docker containers are transient (lasting for a very short time), spawning quickly and in high numbers, which causes metrics bursts. This makes monitoring a challenge due to Docker's scaling and redeployment features. Docker stats is a built-in feature of Docker containers. The docker stats command returns a live data stream of your running containers. Docker is a containerization platform that lets you separate your applications from your infrastructure to deliver software quickly.

Gzip Compression for Faster Web Pages (Apache, Nginx, WordPress)

Even reducing the loading time of our website by a second can have a drastic impact on the traffic we get. It is vital for our websites to load quickly in this fast-moving world, where all the information we need can be found within a blink of an eye. One way to achieve it is using Gzip compression. The Gzip algorithm compresses and decompresses data in order to make websites load faster in a client's machine.

New in Grafana 9: Role-based access control (RBAC) is now GA

Role-based access control (RBAC), previously referred to as fine-grained access control (FGAC), is Grafana’s new authorization system. It was introduced as a beta feature in Grafana 8.0 release a year ago, and we’re now excited to promote it to general availability status. With the release of Grafana 9.0 during GrafanaCONline 2022, RBAC is enabled by default for all instances. (The easiest way to get started with Grafana 9? Sign up for a free Grafana Cloud account today.)

See Current Core Web Vitals with Chrome

Google is using Core Web Vitals as a factor in search results rankings. They’ve also found that improving Core Web Vitals can lead to increased traffic, sales and ultimately conversions. But how can you see your Core Web Vitals easily? Request Metrics is the best solution for monitoring Core Web Vitals across your entire site, but if you just want a quick way to see the Core Web Vitals directly from your browser - check this out.

Sensu Integration Catalog: One Static API to Rule Them All

ICYMI: in Part 1 of this blog post, I introduced Sensu Catalog Integrations, one of the three components that make the Sensu marketplace work. In this post, I want to cover the second piece, the Catalog API generator. This is the tool that consumes the github.com/sensu/catalog repository content and renders static http API content the Sensu web app can consume.

Network Configuration Management: The Benefits & Importance

Configuration management is one of those network management topics people often neglect. It’s not very exciting—but it’s incredibly important. Administrators rely on network configuration management in a variety of circumstances. Let’s discuss what network configuration is, delve further into the importance of network configuration, and explore the benefits of configuration management.

8 things you can do right now to rank better in Google

We all know it’s getting harder than ever to rank in Google, whether it’s for blogs or webpages, but all is not lost. There are some key elements that you should be focusing on on your webpages, for example, to ensure that you’re giving your website the best chances possible of ranking well. After all, the majority of organic traffic clicks on the first 3 results on the first page of Google so if you’re not there, your competitors must be!

How to get your customers more engaged with your SaaS product

One of the biggest challenges for SaaS companies isn’t only how to get customers to buy your SaaS product but how to keep them engaged with it. It may be surprising but just because someone has bought your product or service, it doesn’t mean they interact with it in the way that they should or in such a way that they’re getting the most out of it. The downside to this? They are the customers that are more likely to churn.

To Shift Right, You Need Observability

I recently attended Sapphire Ventures' Hypergrowth Engineering Summit (thank you David Carter and Sapphire for the invitation! I wrote a separate blog post on the whole thing), and one of the sessions was a panel discussion with Rob Zuber, CTO at CircleCI and Jonathan Nolen, SVP of Engineering and Product at LaunchDarkly. The session was illuminating: they talked about how in this world of everyone shifting left, teams should actually consider shifting right.

How much was the success of Mark Gallagher's team down to risk mitigation?

Mark Gallagher is a renowned keynote speaker on a range of business topics relating to his experiences gained while working in senior leadership roles within Formula One motor racing over the last 30 years. In this session he talks about why Formula 1 teams make sure that performance is predictable and how risk management enables the team to achieve.

Infrastructure Monitoring: Definition & Best Practices

Organizations are continually increasing the number of devices and technologies used within their IT environments. To ensure these IT environments are functioning well and providing users a good experience, a company’s IT infrastructure must be tracked and maintained. This is accomplished through the use of infrastructure monitoring.

The Ultimate Crash Course on Microservices: The 5 Key Questions and Answers to Know in 2022

Over the past decade, organizations have reinvented themselves through digital transformation. Nowadays, this journey is well in its second chapter and gaining momentum – also driven by the explosion of app and service deployment, data and intelligence, digital reach, and post-pandemic customer expectations. And the newest cutting-edge technological trends – such as hybrid infrastructure and edge computing – are making it particularly difficult for traditional tools to keep up.

Deleting Production in a Few Easy Steps (and How to Fix It)

It’s the type of nightmare that leaves developers in a cold sweat. Imagine waking up to a message from your team that simply says, “We lost a cluster,” but it’s not a dream at all. InfluxDB Cloud runs on Kubernetes, a cloud application orchestration platform. We use an automated Continuous Delivery (CD) system to deploy code and configuration changes to production. On a typical workday, the engineering team delivers between 5-15 different changes to production.

Webinar Recap: How to Avoid Being On Call With Under-Instrumented Tools

“It’s too expensive!” “Do we really need another tool?” “Our APM works just fine.” With strapped tech budgets and an abundance of tooling, it can be hard to justify a new expense—or something new for engineers to learn. Especially when they feel their current tool does the job adequately. But, does it?

Shoulder surf with the Sensu DA: Developing a new AWS RDS integration (Part 1)

Join Jef Spaleta from the Sensu DA Team as he develops an entirely new AWS RDS integration from scratch. In Part 1 of the video, he develops the AWS RDS reference Check resource that works for his environment. In Part 2, he adapts that check into a reusable Marketplace Catalog integration.

Shoulder surf with the Sensu DA: Developing a new AWS RDS integration (Part 2)

Join Jef Spaleta from the Sensu DA Team as he develops an entirely new AWS RDS integration from scratch. In Part 1 of the video, he develops the AWS RDS reference Check resource that works for his environment. In Part 2, he adapts that check into a reusable Marketplace Catalog integration.

How to Monitor Your AWS RDS Instances

Even though NoSQL databases like Amazon’s own DynamoDB are very popular today, for many business use cases, there’s almost no way around using a traditional relational database. Amazon Relational Database Service (RDS), released back in October 2009, is one of Amazon’s first cloud services and can therefore be seen as a very mature service.

Kentik moves up the stack with Synthetic Transaction Monitoring

In our quest to provide the leading network observability solution, Kentik has been focused on developing a service for NetOps teams that empowers them to have intimate knowledge of their network traffic and the devices that route traffic. Our service helps them plan capacity, project costs, optimize routes, detect unwanted traffic, troubleshoot issues and analyze events.

Scaling Engineering Teams: Perspective from our VP of Engineering

At Catchpoint, my role can be summarized at a high level as two halves: designing and taking care of Engineering teams – and working with those teams to design and take care of the various distributed systems that run our platform. I recently attended Sapphire Ventures' Hypergrowth Engineering Summit (thank you David Carter and Sapphire for the invitation!) where the sessions focused on creating and scaling high functioning engineering.

How to send logs to Grafana Loki with the OpenTelemetry Collector using Fluent Forward and Filelog receivers

In this guide, we’ll set up an OpenTelemetry Collector that collects logs and sends them to Grafana Loki running in Grafana Cloud. We will consider two examples for sending logs to Loki via OpenTelemetry Collector. The first one shows how to collect container logs with a Fluent Forward receiver. The second one shows how to collect system logs with a Filelog receiver.

7 Essential Tips For Choosing The Best Domain Name and Why it Matters

Choosing a domain name is a task that requires your full attention and should be thought about long before you undertake your website launch checklist. An inappropriate or poorly thought-through domain name is something you'll be stuck with for the foreseeable future. Domain names are tricky to change and can lead to serious SEO complications for your brand. By researching, you'll be better informed to choose a domain that works for you by maximizing website traffic and driving business revenue.

What's With All the New Observability Tools?

Organizations struggle with getting the right visibility into their environments. Better visibility can improve performance, increase uptime (or decrease downtime, depending on your perspective) and ultimately improve customer satisfaction. Finding the right tool, however, can be a real challenge. Making matters even worse, vendors seem to be announcing new observability platforms every day.

Anomaly rate in every chart

A month ago, we introduced unsupervised ML & Anomaly Detection in Netdata, the Anomaly Advisor. Today, we’re happy to announce that we’re bringing anomaly rates to every chart in Netdata Cloud. Anomaly information is no longer limited to the Anomalies tab and will be accessible to you from the Overview and Single Node View tabs as well. This will make your troubleshooting journey easier, as you will have the anomaly rates for any metric available with a single click.

How to monitor and troubleshoot Fluentd with Prometheus

Fluentd is an open source data collector widely used for log aggregation in Kubernetes. Monitoring and troubleshooting Fluentd with Prometheus is really important to identify potential issues affecting your logging and monitoring systems. In this article, you’ll learn how to start monitoring Fluentd with Prometheus, following Fluentd docs monitoring recommendations. You’ll also discover the most common Fluentd issues and how to troubleshoot them.

Agents of Transformation: Helping IT teams achieve greatness at The University of Texas at San Antonio

With a broad, cross-functional mindset and a boost from AppDynamics, The University of Texas at San Antonio’s (UTSA) IT team has morphed from an error-resolving group into an innovation machine. Here's how they did it.

Retrace Power User Tips and Tricks - Extending APM

Retrace is the full lifecycle APM solution that includes tools and capabilities far beyond your typical APM tool. With sophisticated log management, detailed code tracing, deployment tracking and more, Retrace delivers what your DevOps team needs most to resolve issues before impacting users. By extending usability beyond traditional APM functionality, Retrace provides greater value than competitive products. But where would Retrace be without robust APM functionality?

Using the Density Function for Adaptive Thresholding with Splunk

It’s 3PM on a Friday, and your day is winding down. Suddenly, you get an urgent email from your boss asking you to set up an alert for monitoring volume. You consider this an easy task. You set a hard threshold for what you think is a low volume based on the last four hours of incoming data.

Time Series Forecasting Use Cases and Anomaly Detection

Wouldn’t it be great to peek into the future and find answers to the problems that you’re facing today? This may sound like science fiction, but many companies currently possess this capability, and they are creating strategies around it to strengthen their monitoring and analytical capabilities. One way is time series forecasting, a statistical method. You can take advantage of the insights of time series forecasting by using techniques like anomaly detection to gain.

Microservice Monitoring Tools + Best Practices

Microservices are one of the hottest app architectures in the current market. They easily solve some of the most common problems with monolithic and service-oriented architecture. The ability to split your application into multiple smaller components and develop as well as monitor them individually opens up a whole new world of possibilities. However, this also brings with it a new set of problems. Monitoring distributed applications requires thinking outside of the box.

Top 8 VScode Python Extensions

Visual Studio Code (VScode) is an open-source and cross-platform source-code editor. It was ranked the most popular development tool in the Stack Overflow 2021 Developer Survey, with 70% of the respondents using it as their primary editor. VScode allows you to use a few programming languages like JavaScript and TypeScript. Still, you need an extension if you want to use any other programming language and include extra functionalities to improve your code.

Securing the Software Development Build

Tim Brown, SolarWind CISO and VP, Security, explains how SolarWinds is ensuring the integrity of the build process and how we share learnings with our partners, community, and customers; as well as how we're leveraging and contributing to open-source initiatives and lead by example in securing the supply chain.

How The Washington Post uses Datadog to detect and respond to traffic spikes early

Stephen Erickson, Director of Engineering at The Washington Post, discusses how Datadog's alerting and unified visibility into their entire architecture are instrumental in managing dynamic traffic cycles caused by breaking news. An early Datadog alert on the day of the Capitol riots notified the team of a spike in traffic so they were able to respond and scale their systems immediately.

Kubernetes Security Best Practices

As the container orchestration platform of choice for many enterprises, Kubernetes (or K8s, as it’s often written) is an obvious target for cybercriminals. In its early days, the sheer complexity of managing your own Kubernetes deployment meant it was easy to miss security flaws and introduce loopholes. Now that the platform has evolved and been managed, Kubernetes services are available from all major cloud vendors, and Kubernetes security best practices have been developed and defined.

Setting Up DevOps Teams for Success

The gap between top and low-performing engineering teams is dramatic, whatever angle you look at it from. Whether you analyze their tech stacks and architecture choices, performance metrics, or cultural and team elements, the delta tends to be quite impressive. Despite the wide range of approaches, a few indicators paint a clear picture of the typical characteristics of a well-oiled setup.

Tracealyzer On the Race Track - Calgary Solar Car

With only a week to go before the start of the 2022 American Solar Challenge (ASC), here’s a good luck shout out to the University of Calgary Solar Car team! Their race car, the Schulich Elysia, not only looks stunning, it also packs a powerful solar battery that can store up to 18 kWh of energy, enough to drive the car over 300 km when there’s no sunlight. Percepio is immensely proud that our Tracealyzer tool has been involved in perfecting this battery technology.

Monitor your IoT devices at scale with Datadog Log Management

The Internet of Things (IoT) can be found in a diverse range of devices, including fleets of autonomous vehicles, automobiles, planes, electric charging stations, and voice controllers. These devices are embedded with gateways, electronics, actuators, platform hubs, and cloud-service connectivity, enabling them to exchange data across the physical, network, and application layers that constitute IoT architecture.

The CEO Rebooting AI

Few people can claim the moniker of “pioneer” in building the internet as we know it. Rami Rahim, CEO of Juniper Networks, is one of them who had a front-row seat in the late 90s to the infant days of building the internet. At the time, I along with other Silicon Valley CEOs, was frantically building out web 1.0 companies, which required a grueling DIY business creation approach.

The ultimate logging series: Logging using PHP functions

In part one of our PHP logging blog series, we discussed what logging is and covered the basics of creating logs in PHP applications using the PHP system logger. While the PHP system logger automatically records critical events like errors in code-execution, a more customized logging setup can be achieved using PHP functions. For part two, let's look at the basics of creating custom error logs by calling PHP functions.

APM Vision for Open Source and Security

Earlier this month, we shared exciting news with our first placement in the 2022 Gartner® Magic Quadrant™ for Application Performance Monitoring and Observability: we are in the Visionary Quadrant. This research is near to my heart, as I led this research for four years; so, I wanted to reflect on why this is an accurate placement for Logz.io. The Visionary Quadrant is designated for those organizations who are pushing the boundaries of a specific market and technology.

New in Grafana 9: Introducing the command palette

Grafana is an open source tool for people with many different perspectives and various skill levels. Many initiatives to improve the Grafana user experience start by thinking about someone who’s just getting started on their observability journey. However, late last year, a Grafana Labs hackathon team looked to improve the user experience for our power users by introducing a command palette to Grafana.

Ingesting HTTP Access Logs from AppService

Debugging application performance in Azure AppService is something that’s quite difficult using Azure’s built-in services (like Application Insights). Among some of the issues are visualizations, and the time it takes to be able to query data. In this post, we’ll walk through the steps to ingest HTTP Access Logs from Azure AppService into Honeycomb to provide for near real-time analysis Access Logs.

Matplotlib Tutorial - Learn How to Visualize Time Series Data With Matplotlib and InfluxDB

A time series is a sequence of data points (observations) arranged chronologically and spaced equally in time. Some notable examples of time series data are stock prices, a record of annual rainfall, or the number of customers using a bike sharing app daily. Time series data exhibits certain patterns, such as the highs and lows of hotel prices depending on season.

Delivering Outcome-Based Results at Gartner's Security & Risk Summit

It’s common for most CISOs to lead off a security conversation by comparing what other companies in the industry are spending on cybersecurity and simply matching that. After all, regardless of the results, the CISO can always tell the board of directors they’re following industry guidelines around security budgets. The problem is security outcomes are bad regardless of budgets. It’s not what you spend. It’s the results you get that matter.

Lights, Camera, Action: Lumigo Joins AWSonAir for a Big Announcement

It’s no secret that AWS has an extensive catalog of services which enable organizations to rapidly scale infrastructure. In this fast paced and self scaling cloud native world, observability across all these services has never been more critical. As a long time AWS Technology Partner, it’s always great to speak to our friends at AWS, and most recently, Lumigo CEO Erez Berkner joined AWS on Air to talk about end-to-end observability of the modern cloud application.

Trouble with Google Chrome and macOS: When Google Helper isn't Helping

How a US-based consumer goods company reduced ticket count related to performance issues on macOS devices using Chrome as part of their digital collaboration platform. Thanks to digital collaboration tools, the world continues to evolve and accept new hybrid working styles. Today, reliance on these instant messaging, video calls, and conferencing tools has turned collaboration solutions into the lifeblood of modern business productivity and collaboration.

Sematext Infrastructure Monitoring Tool | Full stack observability | Product and Feature Overview

Sematext infrastructure Monitoring platform is an easy and effective way to monitor a full-stack. Monitoring a monolithic system was an easy task. But as software becomes more distributed, DevOps, SysAdmins, and run-of-the-mill developers need to have monitoring tools that are capable of monitoring dynamic distributed systems.

Log Management: A Useful Introduction

We find ourselves submerged in a sea of software applications practically all the time. Their primary job is to make life easier and help us accomplish certain tasks. However, these applications require a lot of data. What’s more, their development requires a systematic approach with proper management of that data — and its related activities. But that’s not a straightforward and simple process. What happens if these applications stop running?

Adding RUM to Your ITSI Cocktail: Content Pack for Splunk Observability V2

Want to improve your outlook with a splash of RUM? In our pursuit of connecting users to the right data at the right time, we’ve come to see Real User Monitoring as an invaluable tool for understanding the total picture when it comes to your web properties, apps, and cloud footprint. Do you find yourself asking any of these questions?

Dashboard Studio: More Maps & More Interactivity

In Splunk Cloud Platform 8.2.2203, we're continuing to expand on interactivity capabilities and visualizations for Dashboard Studio. We've added the ability to use search results and job metadata as tokens, and pass tokens through drilldowns to other dashboards. There is a new map visualization for cluster maps and UI to match strings for dynamic coloring. And finally, we've included the ability to set a Studio dashboard as your home dashboard.

What is Remote Work?

Remote work has seen a resurgence due to the pandemic, and hybrid work is here to stay. It’s excellent news for knowledge workers, but what does it mean for support teams? Employees working from their home, vacation property, or a coffee shop create a different environment for technology teams, who need to support and ensure the applications and infrastructure are working well end-to-end.

Monitor SQL Server and Azure managed databases with Datadog DBM

Datadog Database Monitoring (DBM) provides comprehensive visibility into SQL queries running on your databases. Using DBM, you can troubleshoot database performance issues by drilling into frequently used queries and analyzing historical trends in your queries’ metrics and execution plans. Whether you operate self-hosted SQL Server instances or leverage Azure’s fully managed services, DBM can provide deep visibility into the databases your application depends on.

Setup RabbitMQ in HA Mode using Kubernetes Operator

Organizations are moving from monolithic architecture (where all the code building the application exists as a single, monolithic entity) to microservices architecture as it simplifies app management, making it easier to build, deploy, update, test and scale each service independently without affecting other parts of the architecture.

Monitorama 2022 - A Q&A with Mehdi Daoudi

This year's Monitorama is quickly approaching. This year, the event is hosted in Portland, Oregon, from June 27 to 29th. It features talks from industry experts and community leaders on all things monitoring, observability, SLI/SLO, and most importantly, what practitioners are doing (not vendors and providers).

Multicloud Cost Management

More enterprises are adopting cloud computing to ensure that they can accelerate innovation, stay competitive, and enjoy cost savings. This trend has only increased in the last two years with the rise of remote work necessitated by the COVID-19 pandemic. With the rise of cloud adoption, multi-cloud and hybrid cloud deployments are increasing in popularity as well. According to a Gartner survey, 81% of survey respondents are using two or more cloud providers.

Ruby - Tracing a Ruby application with OpenTelemetry for performance monitoring

Tracing your application can give the much needed context required to troubleshoot performance issues. OpenTelemetry is an open-source project that can help you to set up an observability framework for your cloud-native applications. In this tutorial, we will use SigNoz as our backend analysis tool. SigNoz is a full-stack open-source APM tool that can be used for storing and visualizing the telemetry data collected with OpenTelemetry. It is built natively on OpenTelemetry and works on the OTLP data formats.

NestJS - Monitoring your NestJS Application using OpenTelemetry and SigNoz

Monitoring your NestJS application is critical for performance management. But setting up monitoring for NestJS applications can get cumbersome requiring multiple libraries and patterns. That's where OpenTelemetry comes in. In this tutorial, we will use SigNoz as a backend. SigNoz is an open-source APM tool that can be used for both metrics and distributed tracing. Let's get started and see how to use OpenTelemetry for a NestJS application.

Making the World's AWS Bills Less Daunting

Armed with a Ph.D. from UC San Diego, our guest started off with internships at Google and Microsoft before gaining valuable experience as a VP and a highly sought-after consultant for startups and SMBs. Now he’s one of the world’s foremost experts on wrangling vast data sets and maximizing efficiency.

Getting Started with OpenTelemetry for Observability

This article was published in The New Stack. For most developers, software development means there is an API for almost everything, hardware is provisioned via the cloud and the core focus is on building only the features most crucial to your business. Of course, all these integrations and modern distributed architectures create their own set of problems. Having full insight into your application has become even more important and is now commonly known as observability.

Cribl.Cloud: Are You Ready to Fly Solo?

Many years ago, I attained my private pilot’s license. This entailed completing a very structured program, similar to how most companies introduce a product to a new user. Let’s be honest, there is a really good reason for this – to avoid the crash and burn. With flight training, it’s literal, while with products it’s a bit more figurative (except when you YOLO something into production–that can cause a crash and burn–and leave for a bad first impression).

Introducing the new and improved Grafana BigQuery plugin

We are happy to announce that an official Google BigQuery data source plugin for Grafana has arrived! Based on the popular DoiT International BigQuery DataSource community plugin, the new Grafana BigQuery plugin brings a new and improved query editor experience plus support for all BigQuery data types, Grafana Alerting, and query caching.

Tech Talk: DevOps Edition - Monitor and Alert on Your Kubernetes Clusters in Seconds

Watch Monitor and Alert on Your Kubernetes Clusters in Seconds to learn how Splunk Observability can help demystify challenges with monitoring distributed microservices. You’ll also view a demonstration on how to correlate application and infrastructure behavior to streamline troubleshooting and alerting on-premises and in the cloud.

Honeycomb Supports Service Ownership

The software industry is moving toward teams that own the services they build. This concept encloses principles and possibilities from movements toward microservices, DevOps, Agile, and Project to Product. In these paradigms, a team of people delivers software that provides valued capabilities. These capabilities help customers get their work done, support business operations, or enable other software to do these.

SPOTcon 2022 Recap

SPOTcon 2022 is an annual conference hosted by Scout APM that empowers developers with solutions that drive leading-edge transformation in application development and observability. This year’s event took place virtually and was an educational couple of days filled with insights into the current and future states of application monitoring and observability. Didn’t make it to SPOTcon 2022 this year? Not a problem! Here’s a recap of everything that went down.

3 Tips to Deliver Microsoft Teams Service Excellence

As we continue to see Microsoft Teams usage skyrocket, now more than ever, users are depending on Microsoft Teams service excellence to maintain productivity. But it can be challenging to deliver a reliable user experience in today’s modern workplace. There are many factors in the IT environment impacting Microsoft Teams performance, and IT teams typically don’t have full visibility into them, or the service quality delivered to end users.

Problem solving and pinball with Jay Adelson | Network AF Episode 18

Chairman and Co-Founder of Scorbit, Jay Adelson, sits down with Network AF host Avi Freedman to talk about his history as a serial entrepreneur. Jay founded Equinix, Revision3, Opsmatic, and was the CEO of Digg. Throughout the conversation Jay and Avi touch on problems founders encounter, and discuss their mutual joy for gaming.Highlights of the conversation include.

Prometheus vs Nagios vs Pandora FMS: Never before has such combat been seen!

You already know that in this house we love comparisons. Somehow you have to elucidate which is the best monitoring tool on the market, right? Well, this time we bring you the final battle between three great ones. Prometheus vs Nagios vs Pandora FMS. Nothing like that had ever been seen before in the ring! Let the bell ring!

Debugging Gson, Moshi and Jackson JSON Frameworks in Production

Parsing bugs are the gift that keeps giving in the age of APIs. We use a service; it works perfectly in debugging, QA, etc. Then some user input that made its way to the web request, returns a result we just can’t parse. Unfortunately, there isn’t much we can do at this stage. We need to understand why the failure occurred and how we can workaround it and fix it.

Cloudflare outage? The Domino Effect!

This day started a bit abruptly, with several services experiencing outages due to a Cloudflare outage. It started approximately at 06:34 AM UTC. Check the official announcement. What came next was a domino effect through many popular services over the internet. Major services like Gitlab, Notion, Hubspot, Digital Ocean, Monday, Recurly, and a lot more. We registered incidents from 230 services between the outage was published until it was marked as resolved.

What's New In IBM MQ 9.3

The latest long-term support (LTS) and continuous delivery (CD) release of IBM MQ will be released for the distributed platforms on June 23, 2022. MQ 9.3, again, has a focus on securely powering cloud-native applications across hybrid-multicloud as well as making it easier to get started. MQ 9.3 includes: Nastel has been participating with the development team on this technology and analyzing the impact and benefits to our customers as part of our own cloud and container initiatives.

Optimizing Static HTML And Images With Webpack

Webpack is great for building Javascript applications, but did you know it can optimize static HTML assets too? In this article, we use Webpack to process HTML files while automatically optimizing their images too. Hey! Don’t want to read all the in’s and outs of bending Webpack into shape? Jump to the final webpack configuration.

The Silent Digital Transformation Killer

Digital transformation, no matter what form it takes within your organization, is a high-stakes initiative to deliver strategic impact to the business. The cloud is a pivotal enabler to that effort. But there’s a flip side—challenges related to migrating and managing workloads in the cloud can have a negative impact on the success of your transformation efforts.

All you need to know about SSL certificate expiration

With copious amounts of data getting added across online platforms, safeguarding data and ensuring a secure environment are concerns among business entities. To offer a secure and reliable service, you need to identify loopholes, implement preventive measures to thwart attacks, and ensure customer data privacy. You need a valid Secure Sockets Layer (SSL) certificate to secure your online presence.

Prometheus vs Elasticsearch stack - Key concepts, features, and differences

Prometheus and the Elasticsearch stack are both used for monitoring applications. But while Prometheus is primarily meant to monitor metrics, the Elasticsearch stack or the ELK stack is mainly used to collect, store, analyze, and visualize application logs. In this article, we will see what Prometheus and ELK stack is and compare their differences. Prometheus is a time-series metrics monitoring tool. Prometheus enables you to capture time-series data as metrics.

9 Ways to Improve Node.js Performance

Node.js is the most popular tool for developing server applications in JavaScript, the world's most widely used programming language. Node.js is now regarded as a critical tool for all types of microservices-based development and delivery, as it has the capabilities of both a web server and an application server. In any web application, performance is critical. Faster performance in any web application improves the user experience and leads to increased revenue, which makes business owners pleased.

The Best Production Monitoring Tools & Software For 2022

As a software engineer running applications in production, it is essential to monitor this environment to maintain the health of your applications. Production monitoring software and systems are used to improve observability so that you can better understand your operating environment and visualise performance issues easily.

GrafanaCONline 2022 Day 4 recap: Grafana Labs technical docs, citizen science with Grafana Cloud, load testing with Grafana k6, and more

GrafanaCONline 2022 wrapped up on Friday, and the big finish featured sessions that covered important changes in Grafana Labs technical documentation, Grafana Cloud’s role in activist engineering projects, and the benefits of load testing with k6. There was also a great success story out of East Africa, where a major bank switched to a business monitoring system with Grafana and improved its customer satisfaction and revenue.

Integrate with AppDynamics | Moogsoft Product Videos & How-Tos

After watching this video, you will be able to set up a template in AppDynamics to send data to Moogsoft, configure a JSON payload to map AppDynamics data to Moogsoft event fields, and define an AppDynamics policy to forward health rule violations and other issues to Moogsoft.

Observation Is More than Monitoring | Discovering Observability: Session 2

You’re ready to take the leap and move your existing Orion Platform infrastructure to Hybrid Cloud Observability, but we know you’ll have questions. So, we’ve put together some people who can answer these questions in a way that’s easily digestible for current Orion Platform users.

Status page aggregation: How to aggregate status pages?

If you’re managing IT or infrastructure these days, you almost certainly depend on dozens of hosted services or cloud applications. From major cloud services like Amazon, Azure, or Google, to customer service tools and marketing platforms, your business depends on the uptime of others. Each of these services publishes a status page where they warn about maintenance, outages, and performance issues but can you keep track of all of those at the same time?

Filtering Metrics with the observIQ OpenTelemetry Collector

In this post, we will address the common monitoring use case of filtering metrics within the observIQ OpenTelemetry (OTEL) collector. Whether the metrics are deemed unnecessary, or they are filtered for security concerns, the process is fairly straightforward. For our sample environment, we will use MySQL on Red Hat Enterprise Linux 8. The destination exporter will be to Google Cloud Operations, but the process is exporter agnostic.

Short and Exciting Journey of M1 Build Agent Configuration

Back in November 2020 Apple’s M1 chip was introduced and as the end users moved forward to M1 based Macs it became mandatory to build applications that are compatible with the new technology. The M1 chip has incredible improvements and features but I won’t cover them in this post.There are many resources on the internet covering this and I encourage you to explore them. In this post I will cover several challenges I tackled while setting up an M1 build.

OpenTracing vs. OpenTelemetry

Monitoring and observability have increased with software applications moving from monolithic to distributed microservice architectures. While observability and application monitoring share similar definitions, they also have some differences. The purpose of both monitoring and observability is to find issues in an application. However, monitoring aims to capture already known issues and display them on a dashboard to understand their root cause and the time they occurred.

Customer success story: How AASCC built network resilience during the pandemic using OpManager

Al Ain Sports & Culture Club (AASCC) is at the forefront of sporting excellence in the UAE, striving to bring unmatched success to the nation in all sports’ fields. AASCC has achieved an impressive track record across sports during the past four decades, with soccer as its flagship. AASCC has brought its local community together, cultivated a rich history, and promoted economic development in the UAE.

June 2022 update: We are back with improved up/down emails, reports, affiliate system and more

Hi folks! It’s been a while since the last update post. But don’t worry, it’s not because we were lazy. We’ve just been really busy preparing new, interesting things. Let’s take a closer look at them.

What Is Synthetic Monitoring? | The Benefits of Running Synthetic Tests - Sematext

Find out what synthetic monitoring is and how it works. Discover the benefits of using synthetic testing tools for website performance and how to choose the right one for your use case. Synthetic monitoring (also known as synthetic testing and active or proactive monitoring) is one of the many tools developers use to oversee their websites. Synthetic testing removes the user and their device as variables and lets your test your deployed site. This helps ensure the website and all its third-party APIs are accessible and functioning as they should.

Inside Grafana Labs hackathons: how they work and what projects ended up on the product roadmap

Three times a year, Grafanistas around the world step away from their daily responsibilities for one week and put their creative energy into what has quickly become a cultural touchstone at Grafana Labs: Our company-wide hackathon.

GrafanaCONline 2022 Day 3 recap: Alerting in Grafana 9, Loki developments, dashing dashboards, and more!

GrafanaCONline 2022 is still going strong, with sessions that covered alerting in Grafana 9, developments in Grafana Loki, and some winning Loki use cases. Plus, there was a talk about building an a F1 telemetry analysis solution that uses Grafana Cloud, along with plenty of dashboard discussions.

Authors' Cut-Structured Events Are the Basis of Observability

At its core, observability is understanding the internal state of your systems based on the telemetry they output so you can effectively troubleshoot, debug, and tune performance. However, there’s a tendency to reduce observability to a collection of logs, metrics, and traces, which strips away much of the visibility you need to understand what’s going on.

Network Configuration, Monitoring and Management Explained

Network configuration entails setting up your network to support local or remote network communication. This configuration allows for wired or wireless connection and entails the installation of network hardware, software, and devices. In this article, let’s take a look at network configuration, including the benefits, various types and tools, and how to monitor and manage the network.

Get immediate visibility into Azure Kubernetes Service with Datadog's powerful AKS dashboard

We’re pleased to announce a new out-of-the-box dashboard for Azure Kubernetes Service (AKS) that allows you to immediately visualize the health and performance of your AKS clusters. This dashboard organizes and highlights the most critical information from the standard AKS metrics, while also incorporating log data to provide observability into the control plane.

What's Up with Synthetic Monitoring?

If you are running a website or have a live application, you will have to ensure that the digital experience for your end-user is seamless. That is where Synthetic Monitoring can help. Any bad experience, delay, or even a glitch could hurt your budget. Industry experts unanimously agree that it is best to ensure that the web pages load as quickly as possible.

New Browser APIs for Detecting Javascript Performance Issues in The Production

Users nowadays demand the greatest possible experience, which implies top-notch performance. Smooth scrolling, prompt interaction responses, a fast page load time, and flawless animations are all things they anticipate. Local profiling to identify performance issues is convenient, but it only provides a limited amount of information. While things may run smoothly on our high-end developer machines, the user may be dealing with poor hardware and a bad experience.

Rust - Implementing OpenTelemetry in a Rust application for performance monitoring

In this tutorial, we will use OpenTelemetry to instrument a PHP application for telemetry data. OpenTelemetry can be used to trace Rust applications for performance issues and bugs. OpenTelemetry is an open-source project under the Cloud Native Computing Foundation (CNCF) that aims to standardize the generation and collection of telemetry data. Telemetry data includes logs, metrics, and traces. More about SigNoz.

Recommended AppSignal Setup

We're launching our new Getting Started page. This feature helps first-time users to set up their monitoring with AppSignal, as soon as they've signed up. Before we dive in, we'd love to share our beliefs about onboarding. All developers share these same “first-time” moments: Many of our customers start monitoring their applications for the first time with AppSignal, or experience new types of issues when scaling an application.

An Introduction to Windows Event Logs

The value of log files goes far beyond their traditional remit of diagnosing and troubleshooting issues reported in production. They provide a wealth of information about your systems’ health and behavior, helping you spot issues as they emerge. By aggregating and analyzing your log file data in real time, you can proactively monitor your network, servers, user workstations, and applications for signs of trouble.

GrafanaCONline 2022 Day 2 recap: Grafana 9, Grafana Mimir, Grafana Tempo demos, new hackathon projects, and more

The excitement around GrafanaCONline 2022 continues to soar after another day filled with demos of new features and functionalities in Grafana 9, Grafana Mimir, and Grafana Tempo. Plus we learned how a mini arcade turned into a Grafana display; how Grafana transformed into a health tracker, and how, yes, Grafana can run Doom.

How low-level API calls can stabilize your end-to-end tests

We’re heavy end-to-end monitoring users here at Checkly and always experiment with how to architect our tests the best way. Over the past months, we’ve settled on a few workflows that make it much easier to spin up new tests, avoid code duplication, and make the entire test setup easier to manage. One of those strategies is to strictly separate concerns in our tests.

Announcing Logz.io Alert Manager for Metrics

Logz.io alerts are a critical capability for our customers monitoring their production environment. By keeping a watchful eye for data that indicates an issue – like spiking memory metrics or 3xx-4xx response codes – alerting quickly notifies engineers that something is going wrong. Setting an actionable alert to immediately notify engineers of oncoming problems can be the difference between a minor issue and a major event with widespread customer impact.

Why and how to monitor Amazon API Gateway HTTP APIs

API gateways are part of every modern microservice architecture. As their name already suggests, they are the gateway into your system; everyone who wants to access your service has to go through a gateway. In 2019, AWS announced HTTP APIs for its API Gateway (APIG) service. This was a big step to add more flexibility and lower latency to APIG. Before this release, you could only build REST APIs with APIG, which only helped when you wanted to create an API based on the REST architecture.

10 Web Monitoring Tips for Redundant Systems

As your team grows, so do the rules and regulations you use to keep things organized. The same is true for systems, which grow in complexity as they grow in size. That complexity is difficult to manage on its own without the natural turnover that occurs in tech. Those who built and managed legacy systems, eventually go on to bigger and brighter things, either within the company or toward other opportunities.

Going On Call for the First Time

I've never been on call before, and I'm not sure what to expect, or how I can best prepare for it. Will I need to upend my life just in case the pager goes off? And how should I best cope with getting paged? I've read Charity's piece on the opposite problem of wanting to stop being on call, but it didn't quite answer my question.

Learn Effective Policy Compliance Management with Restorepoint

In the two previous blogs from this series, we showed you how Restorepoint enables you to minimize MTTR to mitigate the impact of change management and remediate after a network breach. The third and final blog of the series walks you through policy compliance management—demonstrating the value of creating a single pane of glass where you can see all relevant information from a single location.

Proactively Deliver a Better Microsoft Teams Experience

Legendary wine and spirit merchants Berry Brothers & Rudd arm their IT team with deep Microsoft Teams visibility to speed issue resolution and deliver an exceptional user experience. Berry Bros. & Rudd (BBR) is a family-run British wine and spirits merchant founded in London in 1698. Over the years, the company has grown to include six offices worldwide, including Japan, Singapore and Hong Kong.

Monitoring your Windows performance metrics with hosted Graphite

Windows is the most popular operating system in the world. It powers not only personal computers and laptops at home but also enterprise servers and systems. Due to the vast use of the software, many solutions to monitor Windows have been developed. In this article, we will discuss an efficient way to monitor Windows and the reason why we want to track the metrics of the operating system. But before we start, check MetricFire.

Take control of your telemetry data with Datadog Observability Pipelines

Many organizations manage applications that are supported by a large number of services in multiple environments, ranging from the cloud to their own data centers across the globe. As these organizations scale and accelerate service adoption, the volume of telemetry data in their environments multiplies every year.

Beware the 'Secret Agent' Cloud Middleware

New open source database details the software that cloud service providers typically silently install on enterprises’ virtual machines — often unbeknownst to customers. If cloud services weren’t complicated enough for the typical business today to properly configure and secure, there’s also a lesser-known layer of middleware that cloud providers run that can harbor hidden security flaws.

How can Guidance Report for AWS help you make data-driven business decisions?

The world is moving into a post-covid era and a cloud boom is on the horizon. Businesses require a higher return on ROI for every penny invested in a cloud service like AWS. To accomplish this, Site24x7 provides a personalized cloud assistant to attain the most out of your cloud investment. Site24x7's Guidance Report for AWS helps you adopt industry best practices in AWS and make informed decisions.

What's the secret to choosing the right SAP HANA migration solution for your organization?

As a revolutionary column-based in-memory database, SAP HANA has been praised as one of the most notable technology innovations in many years. Organizations entirely new to SAP can implement and benefit from using its comprehensive ERP solution SAP S/4 HANA almost right away – immediately gaining a boost thanks to rapid scalability, consistent efficiency and airtight security.

Apache Error Log Explained in Detail

Apache has been around since 1995 and is the most important web technology. The majority of businesses nowadays run-on Apache servers. Different servers operate in different ways and have different features and functions. For simple debugging, several servers keep server logs. Understanding how the server works is essential. All errors encountered by the server while receiving or processing requests are recorded in the Apache error logs.

You can now monitor your domain name using Oh Dear

When registering a domain name, you could assume it is yours forever. Unfortunately, this is false, and most domains must be renewed periodically. If you fail to do this, you risk losing your domain, and ownership could be transferred away from you. Oh Dear's new Domain check can send you a notification days before your domain expires. This way, you still have time to renew it.

GrafanaCONline 2022 Day 1 recap: Grafana 9 release, Grafana OnCall open source, Grafana and Grafana Loki in space, and more!

GrafanaCONline 2022 is off to a great start with exciting news from around the Grafana-verse and a jam-packed day filled with dashboards showcasing how Grafana is used in space, in industrial IoT, at live events, and even in an effort to prevent food waste.

Metric Correlations on the Agent

As of v1.35.0 the Netdata Agent can now run Metric Correlations (MC) itself. This means that, for nodes with MC enabled, the Metric Correlations feature just got a whole lot faster! The Netdata Metric Correlations feature uses a Two Sample Kolmogorov-Smirnov test to look for which metrics have a significant distributional change around a highlighted window of interest.

Monitoring and Troubleshooting Containerized Applications with Lumigo

Modern applications are designed to leverage cloud native technologies like serverless and containers to run at an unprecedented scale, moving the focus away from machines to the actual service. Lumigo’s observability platform was purpose-built for these evolving cloud environments, and we’ve been delivering the most advanced automated distributed tracing for serverless applications since 2019.

InfluxData Announces InfluxDB Edge Data Replication

SAN FRANCISCO, June 15, 2022 – InfluxData, creator of the leading time series platform InfluxDB, today announced Edge Data Replication, a new capability for centralized business insights in widely distributed environments. Edge Data Replication enables developers to collect, store and analyze high-precision time series data in InfluxDB at the edge, while replicating all or subsets of this data into InfluxDB Cloud.

Announcing InfluxDB Edge Data Replication: Combining the Power of the Cloud with the Precision of the Edge

There are technical and business reasons to have a time series data presence both at the edge and in the cloud – InfluxDB has always played a key role in both contexts. Today, we’re announcing Edge Data Replication, a new feature that combines these two deployment strategies. With this announcement, InfluxData begins a greater initiative to accommodate both edge and cloud data workloads in one unified solution.

PHP - Monitoring a PHP application with OpenTelemetry and SigNoz

In this tutorial, we will use OpenTelemetry to instrument a PHP application for telemetry data. It’s essential to monitor your PHP application for performance issues and bugs. Application owners need good telemetry data from their application in order to monitor it effectively. That’s where OpenTelemetry comes into the picture. OpenTelmetry provides client libraries for many programming languages, including PHP, which can be used to instrument applications.

How to monitor Elasticsearch with OpenTelemetry

Some popular monitoring tools in the market can complicate and create blind spots in your Elasticsearch monitoring. That’s why we made monitoring Elasticsearch simple, straightforward and actionable. Read along as we dive into the steps to monitor Elasticsearch using observIQ’s distribution of the OpenTelemetry collector. To monitor Elasticsearch we will configure two OpenTelemetry receivers, the elasticsearch receiver and the JVM receiver.

Monitoring: The ROI of Build vs. Buy

I’ve written before about building your own monitoring systems versus buying a monitoring tool like Redgate SQL Monitor. There I talked about the time that someone tasked with managing and maintaining data gets back in their day when they purchase a monitoring solution. However, that’s not where the business focuses. The business frequently wants to know one thing and one thing only: what’s the return on this investment (ROI).

Anodot Supports the FinOps Foundation Mission

As a member of the FinOps organization, Anodot is excited to sponsor the upcoming FinOps X event in Austin, TX. Anodot’s mission has always been to help organizations solve one of the most recognized challenges associated with public cloud adoption — cost control and optimization. Every feature of our Anodot cloud cost management platform has been built by taking a core FinOps market concern and working backward to deliver a capability that fills that need.

Accurately Forecasting Cloud Costs

Most companies today have a “cloud first” computing strategy. According to Foundry’s April 2022 report outlining their 2022 Cloud Computing research, 92% of businesses globally have moved to the cloud. What’s more, the percentage of companies with most or all of their IT infrastructure in the cloud is expected to leap from 41% today to 63% in the next 18 months. As companies move more workloads onto various cloud platforms, cloud budgets continue to increase.

How To: Roll Your Own Cribl Pack

Cribl Packs are, in my opinion, our most exciting feature. Packs encapsulate the deep log processing capabilities and enable sharing of the best practices with customers, Worker Groups/Fleets, and the Community. Ease of sharing enables consistent configurations across distributed deployments of Cribl Stream or Cribl Edge. All users can leverage Packs–and should! If you collect Microsoft Windows Logs, use Palo Alto Networks or share logs via Syslog, Packs are for you.

Understand Source Code - Deep into the Codebase, Locally and in Production

Say you have a new code base to study or picked up an open source project. You might be a seasoned developer for whom this is another project in a packed resume. Alternatively, you might be a junior engineer for whom this is the first “real” project. It doesn’t matter! With completely new source code repositories, we still know nothing… The seasoned senior might have a leg up in finding some things and recognizing patterns.

How to improve your Crash Free Users score in minutes

If you’re reading this blog, you likely already know the importance of quality software. But with the overwhelming number of metrics that can be monitored and improved, development teams are struggling with what metrics they should prioritize to have the most significant impact. The Crash Free Users score in Raygun is a perfect place for development teams who care about software quality to focus their efforts.

Windows 11 Upgrade? Try Digital Experience Monitoring

It feels like yesterday but believe it or not; it’s been over six months since Windows 11 was officially launched in the market. To be precise, the operating system came out on October 5, 2021. Compared to Windows 10, Windows 11 is packed with enhanced security features and provides faster access to services you already use, such as Microsoft Teams, Skype, the (new) Edge browser, and more. Most importantly, the OS centers on hybrid work and digital experience to empower remote learning.

Site Reliability Engineering (SRE) Survey Now Open for 2022 - Calling All Reliability Practitioners and Leaders

In its fifth year, Catchpoint sponsors The SRE Survey, in partnership with Blameless, to uncover new trends and challenges for teams focused on advancing the reliability of digital products.

Annual Study: Hybrid IT Acceleration Has Increased Network Complexity and Lowered Tech Pros' Network Management Confidence

SolarWinds IT Trends Report 2022-Getting IT Right: Managing Hybrid IT Complexity examines the current state and areas of opportunity for technology professionals managing increased complexity as hybrid IT accelerates. The continued shift to hybrid IT drives increased levels of IT management complexity, but tech pros feel a lack of confidence in how to best manage it. Nearly half (44%) of tech pros said their organization manages hybrid IT complexity through training staff and adopting IT monitoring/management tools (37%)

Monitor and diagnose network performance issues with SNMP Traps

Monitoring your on-premise or hybrid infrastructure means keeping track of potentially thousands of devices, any one of which could be a point of failure. Additionally, silos between application and network teams can create visibility gaps that complicate troubleshooting. For network engineers investigating bottlenecks, being able to view real-time infrastructure health and performance data alongside application metrics is essential for ensuring their organizations meet key SLOs.

Using High Availability Capabilities to Make Migration of the Monitoring System Simple

A monitoring tool and its backend database Monitoring platforms such as eG Enterprise collect large numbers of metrics and data points about the applications and infrastructure being monitored. As the complexity of the applications, the number of tiers and the scale of the infrastructure grows, so do the number of metrics that need to be analyzed. Even in a mid-sized IT infrastructure, there may be over 100s of thousands of metrics collected and analyzed over time.

Outage in Egypt impacted AWS, GCP and Azure interregional connectivity

On Tuesday, June 7, internet users in numerous countries from East Africa to the Middle East to South Asia experienced an hours-long degradation in service due to an outage at one of the internet’s most critical chokepoints: Egypt. Beginning at approximately 12:25 UTC, multiple submarine cables connecting Europe and Asia experienced outages lasting over four hours. As I show below, the impacts were visible in various types of internet measurement data to the affected countries.

OpenObservability Talks Second Year at a Glance

I can’t believe that OpenObservability Talks podcast is already celebrating its second anniversary. It feels like just yesterday I wrote the summary of the summary of the first year, sharing the hectic times of starting a podcast in the midst of the COVID-19 global pandemic. The pandemic has been with us most of this year too, but it didn’t stop us from bringing the latest on the best of breed open source observability.

Recapping SLOconf 2022: SLOs are for everyone!

Did you get to attend the excellent SLOconf last month? With four different tracks and over 60 talks - covering everything from defining an SLO to the financial framing of error budgets, you, like us, may have missed a couple of things. In this handy recap, we take you through some of the juiciest sessions and point you to a few you may have overlooked. Luckily, SLOconf 2022 was designed for while-you’re-working participation and all the talks are still available.

Netdata Agent release v1.35

The latest Netdata Agent release v1.35 introduces massive improvements for the machine learning-powered Anomaly Advisor, Metric Correlations, Kubernetes monitoring, and much more. Anomaly Advisor & on-device Machine Learning This release features a launch of the flagship machine learning (ML) assisted troubleshooting Anomaly Advisor. Unsupervised ML models are trained for every metric, at the edge, on your devices, enabling real-time anomaly detection across all your systems and applications.

GrafanaCONline 2022: A guide to all the big announcements from Grafana Labs

We have lift off! GrafanaCONline 2022 officially launched today with the opening keynote featuring Grafana Labs CEO and Co-founder Raj Dutt, Chief Grafana Officer and Co-founder Torkel Ödegaard, and Senior Engineering Manager Myrle Krantz. Along with previewing the much-anticipated release of Grafana 9.0, we revealed some exciting news for our open source community. Below is a summary of all the major headlines that mark one small step for Grafana, one giant leap for the Grafana community.

Grafana 9.0: Prometheus and Grafana Loki visual query builders, new navigation, improved workflows, heatmap panels, and more!

GrafanaCONline, our annual community event designed for Grafana open source users and dashboarding enthusiasts, also marks the general availability of Grafana’s latest and greatest release. Grafana 9.0 is now available to both open source and Grafana Enterprise users, and is being rolled out to Grafana Cloud users incrementally. (The majority of instances have already been upgraded!) New Grafana Cloud users will immediately get the Grafana 9.0 experience.

Grafana Alerting: Explore our latest updates in Grafana 9

Grafana 8 marked a major redesign in the way we do alerting. We created a unified alerting experience that implemented a workflow that operates across all of our products and combined Grafana panel alerts and Prometheus-style alerts into a single pane of glass. We built this as an open source feature first to make sure you could opt in and try it out from day one, regardless of which flavor of Grafana (OSS, Cloud, or Enterprise) works best for you.

Introducing Grafana OnCall OSS, on-call management for the open source community

Last November, we announced the launch of Grafana OnCall, an easy-to-use on-call management tool that helps reduce toil through simpler workflows and interfaces tailored for developers. Born out of Grafana Labs' acquisition of Amixr Inc., Grafana OnCall began as a cloud-only solution that became generally available to all Grafana Cloud users, on both paid and free plans, in February.

An Observability Guide From Someone with a Precarious Grasp on the Topic

I’m Phillip, a product manager here at Honeycomb. After eleven-ish months of working on our product, I totally understand observability, right? ...Kinda? Sorta? Maybe? I'm not sure—but, I have been sitting in this space long enough to be a little better than clueless. Here's my guide on the topic. I hope it helps, especially if you’re passionate about exploring alternative ways you or your team can manage today’s cloud-native applications.

Highlights From the 2022 Gartner Market Guide for AIOps Platforms

Trent Fitz, CMO at Zenoss, covers some of the key highlights in the recently released Gartner Market Guide for AIOps Platforms. Having lived in the AIOps world since its inception, Trent offers insights on how to interpret key observations in the research, including the contrast between different types of AIOps tools, common pitfalls, and the direction of the market itself.

Learn the Latest from the Research Roundup for Modernizing Infrastructure and Operations in a Hybrid World by Gartner

Gartner breaks down modernizing I&O into five key focus areas you won’t want to miss. Unsure of how to keep-up with infrastructure reliability in an ever-changing, hybrid world? With so many cloud infrastructure and platform services (CIPS) available, do you find yourself asking, ‘How can I make sure my data center infrastructure and its operations are maximized and successful?’ If you are feeling a little challenged in any of these areas, this report will be informative.

Stackify Earns SD Times 100: Best in Show for Performance Monitoring

Stackify earned top honors for the 4th year in a row from SD Times for Performance Monitoring. This year’s SD Times 100 looks at the best of the software development companies from how they performed in 2021 — a year like no other due to … Improve Your Code with Retrace APM Stackify’s APM tools are used by thousands of.NET, Java, PHP, Node.js, Python, & Ruby developers all over the world. Explore Retrace’s product features to learn more.

Introduction to Cloud-native Monitoring and Observability - Civo

Our applications and infrastructure provide us with vast amounts of data about their performance. To gain control over this amount of data, we must process it to be useful information. This information can offer you insights into where your application could be performing better, with more efficiency, or with fewer errors. Your insights can only be as good as the data you gather and how you organize it. More is not (always) better - this is where observability comes in.

Defining the Edge for IoT with InfluxDB

The 'edge' is the place where the physical world meets the digital world. More and more businesses rely on workloads at the edge, especially in the IoT and IIoT spaces. Define the edge to fit your needs. InfluxDB has the tools and resources to use data at the edge and in the cloud, and to create reliable, durable data pipelines between them.

What Is JMX Monitoring?

The Java Management Extensions (JMX) framework is a well-known tool for any experienced Java developer. The purpose of the JMX framework is to simplify the management of local and remote Java applications while providing a user-friendly interface. The primary advantages of the JMX framework are that it’s highly reliable, scalable, and easy to configure. However, it’s also known for introducing the concept of MBeans, which unlocks the capacity for real-time Java application management.

The best brand reactions to website downtime

We have to admit that customers and brands alike are able to put a good spin on website downtime and social media managers are undoubtedly having all their Christmases arrive at once! We’ve found some of the most reactions to website downtime online, albeit the majority are from the notorious Facebook, Instagram and WhatsApp downtime experienced globally. Feel free to actually LOL.

Heartbeat monitoring

Heartbeat monitoring (reverse monitoring) enables you to monitor your internal services that are not publicly accessible (for example due to security reasons). You may ask “How is this possible”? This can be accomplished by sending us simple HTTP GET request to our unique heartbeat URL. You periodically send us those “pings” as long as your application is healthy.

SAP Maintenance Planner: A quick guide to help SAP maintenance planning

SAP maintenance planning is a helpful way to keep track of scheduled maintenance activities in your SAP environment. If you’re in the process of updating or planning for the upcoming year, it can be helpful to determine when your SAP systems will be down for maintenance. In this post, we’ll explore how you can use your maintenance planners to stay on top of your SAP maintenance requirements.

Bboxx Taps Time Series Data to Light Up the Developing World

Using technology to help businesses thrive is always a thrill, but it doesn’t compare to the sense of accomplishment on both a personal and organizational level when you see your tech used to positively impact humanity. It’s one thing to function in a support role for these initiatives, but it’s also important to acknowledge the businesses at the vanguard, building their primary mission around positive human and global impact.

Optimizing Images for Web Performance with NGINX

Images are a constant source of pain when developing websites. There are many formats and resolutions a developer must consider in order to maximize web performance. You’ll often end up with a cartesian explosion of the same image in different sizes and formats to support different scenarios. For example, you don’t want to send a high res image meant for high DPI screens to a low DPI screen - you’d be wasting bandwidth and burning time. Using the right file format is equally important.

Key Takeaways - Logz.io Named a Visionary in 2022 Gartner Magic Quadrant for Application Performance Monitoring and Observability

I’m thrilled to announce today that Logz.io has been named a Visionary in the 2022 Gartner® Magic Quadrant™ for Application Performance Monitoring and Observability. Gaining this recognition from these leading industry experts, in my opinion, is an outstanding accomplishment for our entire organization – the product of years of hard work and putting the needs of our 1,300-plus customers first.

Optimize Continuous Delivery of Micro-Services Applications with Continuous Performance Testing

I often hear from customers who complain about how “classic” performance testing (i.e., end-to-end testing with high volume of virtual users) of their applications before release slows down the cycle time by several weeks. In addition, the testing significantly consumes both people and infrastructure (hardware and software license) resources.

Installing and Configuring the OpenTelemetry Collector

The scope of the OpenTelemetry project encompasses how telemetry data is collected, processed, and transmitted. The OpenTelemetry project is not involved with how the data is stored, displayed, or used beyond the collection and transmission phases. The OpenTelemetry Collector is an application written in Go.

Implementing OpenTelemetry in React applications | Tutorial

More about SigNoz: SigNoz - Monitor your applications and troubleshoot problems in your deployed applications, an open-source alternative to DataDog, New Relic, etc. Backed by Y Combinator. SigNoz helps developers monitor applications and troubleshoot problems in their deployed applications. SigNoz uses distributed tracing to gain visibility into your software stack. If you need any clarification or find something missing, feel free to raise a GitHub issue with the label documentation or reach out to us at the community slack channel.

The Key to Achieving Trustworthy AIOps

Sean McDermott of the Find Flow podcast, brought to you by Windward, interviews guest speakers from Zenoss: Trent Fitz, chief marketing officer, and Ani Gujrathi, chief technology officer. In this insightful episode, they dive into how AIOps has evolved and continues to evolve. We’ve come a long way from Gen 1 AIOps, which was based mainly on root-cause analysis, or pinpointing a problem for the IT team. Gen 2 AIOps is here, and it is the next step up in AIOps. It provides faster insights with topology and connectivity built into the AIOps system.

Elastic recognized as a Visionary in the 2022 Gartner Magic Quadrant for APM and Observability for the second consecutive year

We are excited to announce that Elastic has been recognized as a Visionary in the 2022 Gartner Magic Quadrant for APM and Observability for the second year in a row. In addition, the Elastic solution scored among the Top 3 vendors in five out of six use cases in the 2022 Gartner Critical Capabilities for APM and Observability.

Enable Operational Analytics with Cribl Stream and Snowflake

Every enterprise collects and stores massive amounts of security and observability data but struggles to get value outside of operations and security teams. These datasets can offer enormous value to business operations and enterprise reporting teams if they have access to the data in their toolsets. BizOps needs to optimize batch planning and the enterprise reporting teams need to reconcile how many assets the enterprise owns versus the number it has under support contracts.

A Service-Oriented Architecture for the Automotive Industry

The pace of change in the automotive industry is at an unprecedented level. Not only are cars quickly switching from fossil fuel to electric, but the vehicle itself also is playing an ever-increasing role in advanced driver-assistance systems (ADAS) to improve driving comfort, efficiency, and safety.

Datadog named Leader in 2022 Gartner Magic Quadrant for APM and Observability

Gartner® has published the 2022 Magic Quadrant™ for APM and Observability, an annual report that evaluates vendors in this category. We’re honored that Datadog has been recognized as a “Leader” within this Magic Quadrant report for the second consecutive year, with the highest position for Ability to Execute.

Find out how to check my ping and the best way to check ping speed

Many users are familiar with the concept of network ping. Especially those who like online games. If you are having connection problems, the most reliable way to check it is to use the "ping" command. But not everyone can answer what a ping really is. In simple words, it is a measure of the time it takes for a network request to get from user to server and back.

SAP on premise to AWS cloud migration: The complete guide

SAP on premise to AWS cloud migration is a key focus area for many businesses. This is no surprise given that SAP is without a doubt among the leading ERPs. SAP HANA, available both on premise and in the cloud, offers a powerful relational database ideal for data storage, retrieval and analytics in real time. The on premise iteration of SAP HANA features an SAP environment hosted on your servers.

Setting up Route 53 Health Checks

We live in an age where the internet and digital data drive modern day markets, which results in huge amounts of data being generated and consumed. Hence, it has become very important for online platforms to manage this traffic and serve their customers more efficiently. In this blog we will explore the Amazon Route 53 service and see how it addresses domain name system routing and health check problems.

OpenTelemetry PHP | Monitoring a PHP application with OpenTelemetry

PHP is a widely popular server-side language and enjoys the top spot in terms of market share. Many world-famous organizations like Facebook have their applications written in PHP. WordPress, which powers 43% of all websites, is also built on PHP. In this tutorial, we will use OpenTelemetry to instrument a PHP application for telemetry data. It’s essential to monitor your PHP application for performance issues and bugs.

Going faster with intelligent automation

Avantra is an AI powered solution that turns your complex SAP landscape into accelerated hyperautomation. By providing an automation platform it helps SAP ops teams better manage complex mission-critical legacy environments so they can release capacity from the drag of managing complex legacy systems and redirect resources to power hyperautomation success. Shifting the focus from just managing and ‘keeping the lights on’ to creating value.

Honeycomb Cements Its Position as a Leader in 2022 Gartner Magic Quadrant

Honeycomb ruffled the first of many feathers nearly seven years ago when we coined the term “observability” in talking about production code. Today, we get to celebrate a major victory in our push for the term to break free from its unsuitable parent category, Application Performance Monitoring (APM).

DEVCOM Uses InfluxDB to Connect the Field and the Lab

The U.S. Army Combat Capabilities Development Command (DEVCOM) Army Research Laboratory (ARL) faces unique challenges in working with data to develop new technologies. It needs the ability to seamlessly analyze data both in the field and in the lab. Connectivity in the field can also be very unpredictable. Without a database that can handle intermittent connectivity, the systems become inefficient and waste time and money.

Sematext Synthetic Monitoring | Release and Features | The Best Website Monitoring Tools

Sematext’s Synthetic monitoring tool is a website monitoring solution that lets you track the availability and performance of your websites. Monitor your entire website or an individual HTTP request, including 3rd party APIs. Get the best website monitoring tools with Sematext’s synthetics and Real User monitoring tools.

Getting Your Clouds Under Control: Part 2-Cloud Governance

Given the strategic importance of the cloud and size of cloud expenditures, it’s critical for enterprises to have solid controls in place to manage it all. According to our latest research, however, while most organizations agree with that sentiment, very few have put it into practice. There are distinct but related disciplines that come into play: FinOps and cloud governance. In this two-part series, we explore current state of each.

1000+ Community Members, Async APIs for retention settings & improved UI - SigNal 13

Every month, our team works on two major fronts: shipping new features asked by users and iterating on shipped features to make them better for existing users. Last month, our team worked closely with users to ship a lot of product improvements both in UI and backend performance. Alongside that, we have been working on metrics builder and log management, two major feature upgrades to SigNoz.

Managing Variable Log Retention

As systems become more complex and distributed, the total amount of machine data put off by those systems continues to skyrocket. While teams may need access to an ever-increasing scope of machine data to gain insights into their increasingly complicated systems, that same need to access an increasingly large amount of data also creates cost concerns. Those concerns can grow into cost emergencies quickly.

You don't know anything about Google Cloud monitoring

The fact that data centers have evolved a lot is undeniable. This has enabled storage evolution and the execution of online applications. Now we often talk about hybrid clouds. *Yes, we don’t even take the time to explain what digital clouds are anymore and we even assume that everyone has their own, small but they have them.But when it comes to doing things big, it’s unavoidable to mention the giant Google Cloud®!

Django Performance Improvements - Part 1: Database Optimizations

The main goal of optimization in Django projects is to make it fast to perform database queries by ensuring that your projects run by making the best use of system resources. A properly optimized database will reduce the response time hence a better user experience. In this 4 part series, you will learn how to optimize the different areas of your Django application. This part will focus on optimizing the database for speed in Django applications.

Analyzing Log Data: Why It's Important

From production monitoring to security concerns, businesses need to analyze and review their logs on a daily basis to make sure their system is up to par. Here are the reasons why analyzing your log data is so important. If you landed here, chances are you probably know what logs are, but we’ll start off with a short explanation of what it is.

How Healthcare IT Improves Clinical Efficiency Through Digital Experience

Healthcare providers are under more pressure than ever to provide better care and improve patient outcomes despite clinical resource scarcity, high turnover, and burn-out. Staffing shortages of physicians and nurses, growing populations in need of care, and rising costs create real barriers to exceptional care.

Real User Monitoring - Definition, Benefits & 5 Best Practices

What is Real User Monitoring (RUM), and how does it help developers improve their applications? Real User Monitoring is a technology that analyzes the user experience on your web apps. Tracking how visitors interact with your pages, RUM tools capture how fast your pages load, your app’s responsiveness and its functionality. Compiled statistics help you gauge your users’ level of satisfaction with your web app.

Rein in Operational Disruptions with AIOps

The Wall Street Journal recently described the current market for tech talent as “insane.” Well-heeled enterprises with gobs of cash competing for workers with more traditional employers who lack the resources, brand cache, and trendy perks of the better-known organizations have driven demand for technology employees.

What the Hell is Activity Anyway?

I use .NET and I keep seeing something called `Activity` but in OpenTelemetry there is only talk about “Span” and “Trace,” why? And what should I be using? This is understandable, and has caused confusion since that decision was made by Microsoft back in 2018/19 (I believe). I’ll do my best to provide some guidance on what the distinction is, and also when each is useful.

Log Observability and Analytics Guide

Monitoring and analyzing log files to identify and resolve issues make up log observability. Log analytics is the process of extracting insights from log data. Logs are a valuable source of information for IT operations teams, as they provide insight into what is happening on a system or network. Logs can monitor system performance, troubleshoot problems, and identify security incidents. Logs are a vital part of application performance management.

IP Blacklisting: Your Beginner's Guide

IP Blacklists contain ranges of or individual IP addresses ‌you want to block. A blacklist can be used in combination with firewalls, intrusion prevention systems (IPS), and other traffic filtering tools. With the recent developments in cyber security, organizations are increasingly relying on IP blacklisting to protect their networks.

Node.js Performance Testing and Tuning: Step by Step Approach

Node.js is well-known for its lightning-fast performance. However, as with any programming language, you might develop Node.js code that performs poorly for your users. Appropriate performance testing is required to combat this. Node.js can be used for a variety of tasks, including scripting to do tasks, running a web server, and serving static files, such as a website. Today, we'll go over the procedures to test a Node.js HTTP web API.

Beginner's Guide to RabbitMQ Logging: How to View, Locate, and Analyze Logs

RabbitMQ is one of the most popular open-source message brokers available. Its ability to be deployed in various configurations and on various platforms makes it a widely used tool; it also supports all major messaging protocols, making it very versatile. Still, debugging issues with a tool like RabbitMQ can be challenging, especially when it’s deployed on a large cluster. RabbitMQ logs are one way to go, as they help you backtrack to an earlier point while debugging.

Cloud storage provider Koofr loves Icinga´s flexibility and simple handling

We´re proud of our many customers and users around the globe that trust Icinga for critical IT infrastructure monitoring. That´s why we´re now showcasing some of these enterprises with their Success stories. It´s stories from companies or organizations just like yours, of any size and different kinds of industries. Some of them are our long-standing customers, others have just recently profited from migrating from another solution to Icinga.

How to Monitor IT Infrastructure when adopting IaC for VDI and Digital Workspaces

IaC (Infrastructure-as-Code) is becoming ubiquitous in the EUC (End User Computing) community and within the datacenter. Automation and declarative infrastructure for on-premises VDI and cloud digital workspaces, such as Microsoft AVD (Azure Virtual Desktop) or AWS WorkSpaces, is now mainstream. Vendors such as Citrix now advocate the use of technologies such as Terraform and Ansible for deployments.

Nameserver Explained: What it is And How it Works

The Internet is a vast network of servers and devices connecting people worldwide. It allows for near-instant data communication between machines, which is presented in a readable format for the end-user. From a human perspective, it's a simple case of typing the URL of your desired website and pressing enter. Of course, computers talk to one another in their own numerical language, using lengthy IP addresses to refer to domains instead of URLs.

What the Heck is an AIOp?

AIOps is one of the current buzzwords (buzz-initialisms?) that is hot in the monitoring space. Everyone seems to be talking about it. How you have to have it, how much better it will make everything if only you just had it, etc. But how much of that is real and how much of that is wishful thinking? Let’s take a look and see if we can separate the buzz from the words.

Design Thinking and Beautiful UX: Do They Matter for AIOps?

In this post, Pad Warrier, product manager at Zenoss, interviews Natalia Bakusevych and Carl Camera about the Zenoss design system. A beautiful user experience is the result of a purposeful, principled, and process-driven approach that begins with customer discovery. Natalia and Carl describe Zenoss' continual evolution with before and after examples that highlight increased productivity and efficiency for both Zenoss and our customers.

What Does Observability Mean for Developers?

Monitoring is often not the first thing on the mind of the modern developer. Yet, it’s necessary at many points of the software development lifecycle, including: before deprecating an API, before launching a new feature, after launching the feature, and more. In fact, monitoring needs can vary much more than the classic Ops monitoring. My podcast guest Liran Haimovitch is the co-founder and CTO of Rookout, a live data collection and debugging platform.

Customers First, Always!

Software exists to make your job easier, not to suck the joy out of your work. It should be there when and if you need it, but be completely out of the way when you don’t — you’re at work to get a job done, not to use any particular product. If you’re forced into using the same underperforming, over-customized, difficult to implement, or just generally terrible software each and every day, it can really put a damper on the quality of your work and quality of life.

Anatomy of a Supply Chain Attack Detection and Response

In today's world of global supply chains, a breach never stops at a supplier level but cascades all the way up the chain. So being able to detect and stop a supply chain attack at an early stage before an attacker exfiltrates confidential company data or damages company operations and reputation is critical to your organization's survival. Luckily, hackers always leave a trace, so proper detection can help you stop breaches at an early stage before hackers achieve their goals.

Sensu Integration Catalog: Engineering an Open Marketplace

ICYMI, the recent release of Sensu 6.7.0 introduced the Sensu Integration Catalog – an open marketplace for Sensu Go. Along with the release, we also hosted a webinar highlighting how the Sensu Integration Catalog unlocks self-service infrastructure monitoring. In the webinar, we talked about the “what and why?” – what problems does the Catalog solve, and why we decided to solve them with a marketplace.

Introducing the Mezmo Exporter for OpenTelemetry

At Mezmo, we see a massive opportunity to reduce Mean Time to Detection (MTTD) and Mean Time to Resolution (MTTR) by making log data more valuable and actionable. Today, we’re thrilled to announce the release of the Mezmo Exporter for OpenTelemetry- the first step in our continued work with the project to further simplify the ingestion of log data and make that data more actionable with enrichment of key OpenTelemetry attributes.

New StackPod Episode: Defining and Executing a Clear Product Strategy With Andreas Prins

We are happy to announce our latest StackPod episode featuring guest Andreas Prins! With over five years of experience in different product management and product strategy roles, Andreas is currently the VP of product here at StackState.

Sumo Logic - Challenging the status quo

As the applications we support evolve, so too must the services that keep them reliable and secure. And, evolve they have! Sumo Logic started life over a decade ago by solving the difficult problem of log management. Our cloud-native architecture eliminated the hassle of managing on-premise log management solutions while scaling on-demand to handle a significant volume of high-cardinality data. Powerful search made exploratory investigation fast and efficient for customers. This was a game changer!

How to build a dashboard for any AWS CloudWatch data

Looking to keep an eye on logs and metrics from AWS or CloudWatch? There are several reasons you might want to build a CloudWatch dashboard somewhere outside of the CloudWatch console: Whatever the reason, we’ve put together a write-up to help you plug into CloudWatch to surface any logs or metrics in one place, for easy alerting and sharing, using the SquaredUp observability portal. You can get your own SquaredUp account, just head over to squaredup.io/get-started to sign-up for a free account.

Apache Pulsar vs. Apache Kafka 2022 Benchmark

You will have seen the Nastel Messaging Middleware Performance Benchmark Report comparing the performance of the commonly used messaging middleware platforms – IBM MQ, RabbitMQ, Apache Kafka, ActiveMQ Classic, ActiveMQ Artemis, TIBCO EMS, and Apache Pulsar. Nastel produced this using its CyBench performance benchmarking technology. StreamNative has also produced its own benchmark report focused on comparing Apache Pulsar vs. Apache Kafka using the Linux Foundation Open Messaging benchmark.

The ultimate logging series: Using the PHP system logger

Logging is essential to application development. Logs provide exhaustive, robust information that is useful for tracking all the changes made to an application's code. PHP logs help you track the performance of the method calls within your application, the occurrence of a particular event, and the errors in your application. With proper PHP logging techniques, you can track and optimize an application's performance.

Regression testing your Java Agent Plugin

For developers who’ve created their own instrumentation with the Java Agent plugin, the next phase of the process is regression testing. By performing regression testing, you can ensure that your plugin functions the way it’s supposed to after you’ve made code changes or updates. You’ll have your own plugin, but to illustrate regression testing in this article, I’ll use the plugin in our example repo.

New Feature: Sitemap Monitoring

Our feature development has always been heavily influenced by our users. Some of our most popular features were directly requested by people using our products on a daily basis, which we believe is one of the best ways of developing a product such as ours. If one person wants to monitor something, chances are others will do too. Which is how our new feature: Sitemap Monitoring was incepted!

Minimize MTTR to Mitigate Impact of Change Management

In the first blog this demo series, we showed you how to use Restorepoint to remediate after a network breach. In our second blog of this three-part series, we walk you through a change management instance—showing how to speed problem resolution and how to mitigate the impact of poor change management to minimize MTTR.

Use Kubernetes to Manage Eleasticsearch Clusters for Logging | Sematext at Kubecon 2022

Radu Gheorghe (Sematext Group) & Ciprian Hacman (polypoly) explain how Kubernetes can be used to manage and autoscale Elasticsearch clusters. They will explain the use cases for such a setup, what are the pros and cons of using Kubernetes to manage Elasticsearch for logging, and what operators are currently available to accomplish this task.

A quick guide to load testing Grafana Loki with Grafana k6

As a software engineer here at Grafana Labs, I’ve learned there are two questions that commonly come up when someone begins setting up a new Loki installation: “How many logs can I ingest into my cluster?” followed by, “How fast can I query these logs?” There are two ways to find out the answers.

Monitoring Ruby on Rails with InfluxDB

Time series databases like InfluxDB are databases that specialize in handling time series data, which is data that is indexed by time. Unlike traditional databases, time series databases are optimized for reading and writing data with less performance consideration for updating or deleting data. Due to the time-dependent nature of time series data, time series databases are handy for application monitoring.

What Are Preload Resource Hints?

Preloads are a powerful optimization technique that can make significant improvements to crucial performance metrics such as Core Web Vitals. I have written on prefetching a DNS lookup or even preconnecting to a domain. Preloading is a much more powerful extension of these concepts because it enables you to download entire resources in advance. In this article, let’s look at.

Tracing errors and surfacing collateral damage across your code base

Frontend technologies typically talk to several services in your backend, and those services talk to other services. At the root of every issue is a single event that causes a domino effect. A domino effect that impacts every operation from the first experience on the frontend to the backend API call. Sentry can show you how these exceptions and latency issues impact every one of your services. For example, take the ever common and seemingly simple to resolve 500 - Internal Server Error.

How Calixa protects developers' time with a custom integration

Whether you want to improve how your team uses Sentry or publish an integration, we’re making it easier to build on Sentry. Join Thomas Schiavone (CEO and Co-founder of Calixa) and Gautham Chundi (PM at Sentry) as they walk through how Calixa built a custom integration on Sentry’s platform. The Calixa team’s goal was to remove distractions for their developers. They ended up building something both their developers and customers could use. No developer will have to hear, “Is it fixed yet?” again.

Receiving PagerDuty alerts from MetricFire

One of the most critical aspects of monitoring your digital assets is getting a timely alert when something goes wrong. Even when you finish building a monitoring stack and expose metrics on a beautifully designed dashboard if you cannot notice abnormal behaviors and fail to take pre-emptive or follow-up actions swiftly, this means your monitoring system does not serve the purpose.

How Status Pages can help you build better relationships with your customers

Uptime monitoring. You keep hearing us talking about it and you know why it’s important, hey, you might even have a StatusCake account. But do you know what to do if you do experience website downtime? Let’s do a little quiz. Your website has suffered two hours of downtime. Do you: If you answered a, you might be a lost cause (I’m only joking, you should just definitely read to the end of this post), and if you answered d, crack out the sales bell and start dinging!

Spring Boot Performance Workshop with Vlad Mihalcea

A couple of weeks ago, we had a great time hosting the workshop you can see below with Vlad Mihalcea. It was loads of fun and I hope to do this again soon! In this workshop we focused on Spring Boot performance but most importantly on Hibernate performance, which is a common issue in production environments. It’s especially hard to track since issues related to data are often hard to perceive when debugging locally.

Sponsored Post

How much could software errors be costing your company?

Errors are an inevitable part of building software. But while you can't eradicate them, you can definitely mitigate them. If you don't measure, track or resolve errors, you're ignoring a loss in revenue. It's time to pay attention to how much software errors are costing your company and take action, catching them early with methods like smarter testing and crash reporting. Using a few industry averages, you can put a number to the real cost of software errors in your company and start to plug cash leaks like wasted developer time and lost customers.

MEVN stack tutorial | Build a CRUD app using Vue 3, Node, Express & MongoDB

MEVN stack is a popular Javascript software stack that has become very popular in recent times to build powerful web applications. MEVN stack involves four technologies to build an end-to-end web application, namely, MongoDB, Expressjs, Vuejs, and Nodejs.

Build vs Buy: What's The Best Route for Observability Pipelines?

If there was a question on if an enterprise needed an observability pipeline in 2019 or 2020, we now know the answer is: yes. The observability data management methods of the 2010s aren’t going to work in the 2020s. Data is growing too fast for us to ignore, and the need to gain intelligence from said data continues to grow in importance. Data (and access to it) is becoming a competitive edge for many enterprises today.

Why Quick Troubleshooting and Self-Healing Automation are Essential in a Citrix Environment

Citrix environments can be highly complex with multiple back-end components like Storefront, Netscalers, Delivery Controllers, FAS, License servers, Cloud Connectors, and databases. And the front end can be equally complex with VDA’s running in different parts of the world and the cloud and on-premises hybrid. With all this complexity, it’s essential to have the correct troubleshooting, monitoring, and self-healing automation tools.

How I installed Grafana Mimir on my homelab cluster

When I started at Grafana in January, I was accustomed to working with private clouds and on-prem infrastructure, so nearly everything in my role here as a senior software engineer for the Grafana Mimir customer squad was new to me. I was new to Golang, Docker, Kubernetes, gRPC, public cloud services, etc. Kubernetes has been especially challenging. In my work on Grafana Mimir and Grafana Enterprise Metrics, I experience k8s in one of two extremes.

How Companies Are Using InfluxDB and Kafka in Production

Hulu, the entertainment streaming platform, needed a solution to scale up its internal application and infrastructure monitoring platform as it grew beyond 1 million metrics per second. The solution it created combines two open source tools— InfluxDB, a time series database, and Kafka, an event-streaming platform. It’s not just global enterprises like Hulu that have access to world-class tools and infrastructure to achieve their business goals.

Business Application Monitoring: Why Your Company Needs APM

Imagine this scenario. Your company has been laboring for months on an application and is finally releasing it. Your team has worked out all the potential issues and has created an all-encompassing project, but something is not right. There is an issue, and your team is getting some much-needed R&R. You come the next day and see a ton of irate emails. You have to go back to the drawing boards to figure out what went wrong before you release a fix. What I explained was a nightmare.

Simplify DigitalOcean Application and Infrastructure Monitoring

Since early 2021, DigitalOcean and SolarWinds® Papertrail™ have worked together to improve the ease and efficiency of log management for infrastructure running on the DigitalOcean platform. The most exciting new development of this partnership is the availability of the Papertrail software as a service (SaaS) Add On in the DigitalOcean Marketplace.

Something IT professionals need to know and miss

At Pandora FMS we have IT professionals on an altar. Literally, one next to the water dispenser in the office. I’m serious! It even has its blessed liturgical cloth, its flattering parsley, its candles and the incense! But there are still things that these people miss. Here’s a hint: it’s related to ITOM.

CDN Log Analysis

Since the beginning of the Internet, the speed of delivering content has been an issue. While processor enhancements, network acceleration, and web frameworks have brought drastic improvements to performance, the goalposts have continued to shift further away; devices operate on wireless connections with limited bandwidth, and the Internet is accessed from every corner of the globe.

Is it Time to Retire Your VPN?

VPNs: not just sponsors of YouTube tech videos! If you think of Virtual Private Networks (VPNs) as data privacy tools or a way to access geographically restricted content, then the idea they’re dying out probably seems far-fetched. But if we’re talking about internet-based VPNs, the kind that enables access to corporate networks, there’s been plenty of prognostication that the last days of VPN may be upon us.

Auditing Capabilities in IT Monitoring Tools for Security and Compliance

It is critical that access to any configuration changes or management actions made to monitoring platforms are logged and traceably audited. In this article, I will help you learn how to discover the auditing capabilities in IT monitoring tools. You will learn how to audit and manage the monitoring platform itself and make sure that it is being used appropriately.

Debug Logs and Analyze Trends with Log Data Restoration

Everyone in your organization needs logs to perform critical functions of their job. Developers need them to debug their applications, security engineers need them to respond to incidents, and support engineers need them to help customers troubleshoot issues. These various use cases create general requirements for enriched log data and often include the need to access insights from outside typical retention windows.

Monitoring Cloud Native Microservices

Today’s modern applications contain a broad set of microservices, with containers and serverless becoming the architectures of choice for many cloud applications. Both architectures facilitate highly scalable systems, and while which approach to take is routinely debated, containers and serverless technologies are being used in tandem more and more.

Monitoring Ubuntu 20.04 and Activating ML with Netdata

Sometimes a hat is just a hat, the truth is just the truth, and the clearly most popular example of a category is plain to see. In this case, Ubuntu is the most popular Linux distribution currently available. With the operating system’s superior popularity also comes an amazing amount of community support.

Deploying a MkDocs documentation site with GitHub Actions

In the previous post in this series about building documentation sites with MkDocs, I showed you how to host a site on GitHub Pages. We briefly touched upon GitHub Actions, the integrated build and deployment server available on GitHub. In this post, I'll continue the example and get a real deployment pipeline set up.

5 websites that have experienced website downtime in the first half of 2022

There’s a common myth that you may have heard – “only small companies’ websites go down”. This is a classic, especially since it couldn’t be more wrong. Thousands of websites go down, regardless of their size (or the size of their IT team for that), and even Google can, and has, suddenly experienced the dreaded monster that is “the outage”. So far, we are 6 months into 2022 and we’ve already seen a ton of websites go down.

Test Driving Machine Learning (ML) Anomaly Advisor

Netdata’s new Anomaly Advisor feature lets you quickly identify potentially anomalous metrics during a particular timeline of interest. This results in considerably speeding up your troubleshooting workflow and saving valuable time when faced with an outage or issue you are trying to root cause.

Expanding Vision: OpenSearch Dashboards Advance Open Source Observability

From the moment Elastic announced plans to abandon a pure open source license for its Elasticsearch engine and Kibana dashboards in early 2021, there’s been a massive effort underway to create clear alternatives for the global community of active users. Logz.io has been an outspoken advocate and contributor to this work – fully embracing it as part of our product roadmap to best serve the needs of our customers, and preserve our long-term commitment to open source observability.

Build or Buy? Developer Productivity vs. Flexibility

A common debate in software development focuses on whether to use already-available tools or services, which offer better developer productivity, or stick with lower-level tools or custom-built solutions, which offer more control and potentially better performance and flexibility. This can be boiled down to the decision of whether to build or buy. These two approaches are at the root of many current tech industry ideological conflicts.

Building Efficient Pipelines in Cribl Stream

An old colleague of mine once said to me, “It doesn’t matter how inefficiently something DOESN’T work.” This was a joke used to make a point, so it stuck with me. It also made me consider that it does matter how efficiently something DOES work. Sometimes, when we have tools like Cribl Stream making things like routing, reducing, and transforming data so easy, we can forget that there might be a more efficient way to do it.

Ask Miss O11y: As a developer, how can I try out observability?

What's the first small thing to do in o11y that would teach me something, bring something valuable, and open the way for something else? Observability doesn’t have to be a big, company-wide project. It can be useful locally and individually. A little playing around can get you some crucial insight into how your software works. Try it as a team, or in a pair, or by yourself. It takes 3 steps: Step 1 is easy. The other two might take ten minutes, or maybe more like a day.

Grafana dashboards: A complete guide to all the different types you can build

There is one universal truth about using Grafana: Dashboards are easy to create, but not-so-easy to organize. As organizations scale, there’s a high risk of unchecked dashboard sprawl, when dashboards become an unmanageable mess. As the number of users increase, so does their dashboard output. Our guide to dashboard management gives an overview of features that help with organizing dashboards, but there are still two pain points.

Multi-Step Monitoring: Why it's Essential and How it Works

The term “essential” is thrown around pretty loosely these days. That new show about the hospital (no, not that one… not that one either… yeah that one) is advertised as essential viewing. A newly-released track by a hip hop artist that describes how little they need to release new tracks in order to live much, much better than the rest of us? That’s essential listening.

Solving slow web applications - Kentik Synthetics Page Load Test

Kentik's Phil Gervasi talks with Product Manager Sunil Kodiyan to discuss what synthetic testing is and how it can be used to troubleshoot issues with web applications. Sunil demonstrates the use of the Kentik Synthetics Page Load test, which provides a granular breakdown of how every component on the page is loading. Using that information, web app developers can easily track down exactly what’s impacting a site’s performance—whether it’s a site hosted on-premises, in the cloud, or by a SaaS provider.

Getting Your Clouds Under Control: Part 1-FinOps

Given the strategic importance of the cloud and size of cloud expenditures, it’s critical for enterprises to have solid controls in place to manage it all. According to our latest research, however, while most organizations agree with that sentiment, very few have put it into practice. There are distinct but related disciplines that come into play: FinOps and cloud governance. In this two-part series, we explore current state of each.

Integrating API Monitoring Into Your Performance Management Strategy

APIs have existed nearly as long as websites themselves. But because APIs are primarily consumed by programs instead of people, they tend to be less visible than applications or sites directly accessed by users. The result: APIs often receive far less attention from a site reliability engineering (SRE) and monitoring perspective than other parts of application environments.

Monitor Datazoom telemetry with Datadog

Modern video streaming workflows are composed of many different services, including encoders, origins, ad servers, content delivery networks (CDNs), and more. This wide range of options enables organizations to choose the tools that best fit their needs, but it also introduces considerable observability challenges. For instance, you may have limited access to the log data from each layer of your video workflow, and the data you can access likely isn’t standardized.

Express middleware: A complete guide

In this guide, we’ll explore the basics of using Express.js middleware. We’ll create a simple Express API from scratch, then add the middleware to it and demonstrate how to use each tool. The Express middleware tools we’re going to discuss are must-haves for your initial Express.js app setup. We’ll show you how to get started with them, and you can further configure them according to your application’s unique needs.

Solving slow web applications with the Kentik Page Load test

Kentik Synthetics is all about proactive monitoring. With synthetic monitoring, you can investigate users’ digital experience by peeling back, layer by layer, exactly what’s going on in every aspect of the digital experience from the network layer all the way to application. Because synthetic tests can be so granular, the results provide different information than you can get from flows, streaming telemetry or other observability data.

OpenTelemetry in a C# .NET application | Implementation guide

C# (pronounced C-Sharp) is a simple, modern, object-oriented, and type-safe programming language. ASP.NET is one of the top frameworks for building modern applications using C#, F#, or Visual Basic. OpenTelemetry is one of the popular CNCF projects. Some other notable projects under CNCF include Kubernetes, Helm, and Fluentd. The OpenTelemetry project aims to create an open source web standard for instrumenting cloud-native applications.

What's next in Kubernetes monitoring, Prometheus histograms, observability, and more: KubeCon EU 2022 in review

In May, a team from Grafana Labs descended on Valencia, Spain, to share their latest insights on the cloud native landscape at KubeCon + CloudNativeCon EU 2022. Along with diving into the future of Kubernetes monitoring with kubectl alpha events and multi-cloud deployments, Grafanistas presented an overview of the Prometheus ecosystem with an eye towards how sparse high-resolution histograms are going to change the game.

IT & The Flow State: 5 Ways IT Can Facilitate The Flow State at Work

Let’s start with a concept you’re probably familiar with: how it feels to get into a flow state at work. Maybe you were creating a new graphics package for a client deliverable. Maybe you were building a new website, or working on a coding sprint for the next product release. Maybe you were digging into some script automations for common technical issues. It doesn’t really matter what you were doing; what’s really important is how you felt while you were doing it.

How to Monitor Docker Container Logs | 5 Minute Docker Log Monitoring Setup with Sematext

Monitoring Docker logs is critical to ensure the performance of your containers. However, setting up a centralized logging solution may be a daunting task. But it doesn’t need to be. Follow along with this short Docker tutorial to learn how to start monitoring your container logs now!

Cribl and GitOps: Go From Development to Production

Git integration has always been at the foundation of Stream. In the fall 2021 release of Cribl Stream (both on-prem software and Cloud), our Enterprise users have a received set of APIs to separate the development and deployment of Stream. Stream GitOps connects with your favorite git based versioning platforms and leverages their PR, approve/reject, and CI/CD workflows to push production-ready changes from a development branch into a main branch or release.

The State of Availability Today: Availability Monitoring & Management

At first glance, availability monitoring may seem like one of the more mundane responsibilities of site reliability engineering (SRE) and IT operations teams. Determining whether an application is available may appear to be relatively straightforward, especially for teams that focus simply on monitoring certain transactions or services. This may have been true in the past.

In 2022, should we still be wrestling with slow computers?

The invention of computers gave us the promise of high speed, a new user experience, and extreme reliability. Early research by Robert B. Miller, who helped design some of the earliest IBM computers, discussed the transactions between humans and computers. Miller believed users require a response from a machine within two seconds to maintain their concentration and productivity.

See How Restorepoint Helps You Remediate After a Network Breach

This is our first blog in a three-part series, where we demonstrate the many features and benefits of using Restorepoint and the ScienceLogic SL1 Platform together. Today’s demo takes you through a network breach scenario—showing how to identify and remediate following an unplanned network device change.

What the Heck is Network Observability Anyway?

When it comes to monitoring and specifically IT Operations Monitoring (ITOM), everyone is saying monitoring is dead – you need observability. Vendors are jumping on the observability bandwagon. There’s a lot of noise about observability, network observability, full-stack observability and every other kind of observability you can imagine. This is a topic we have touched on in the past.

Adding Redis & MySQL to AppSignal for Node.js with OpenTelemetry

We've simultaneously launched 4 new integrations for Node.js: Redis, ioredis, MySQL, and MySQL2. This means that you can now see all the details of a query in the Event Timeline and Slow Query screens in AppSignal. Because we are a small and bootstrapped team, we've chosen to embrace OpenTelemetry as a means of expanding AppSignal's offering in the Node.js ecosystem.

Observability strategies that work - and some that don't

Creating an observability strategy is a lot like playing with Legos: It takes small building blocks to create a bigger picture, but the slightest mistake could throw off an entire build — and often you realize it very late in the process and have to rip and repair the Hogwarts castle infrastructure you spent many days creating.

The Power of ChaosSearch Alerts

How can you derive value from data? One answer is to generate alerts based on the data. Alerts help your team stay on top of a variety of potential challenges – like application performance issues, security risks, disruptions to the CI/CD delivery chain and beyond. ChaosSearch’s flexible alerting system makes it easy to generate alerts relevant to your organization’s needs.

Building a search experience with Elastic

We’re excited to share an end-to-end demo that showcases how Elastic empowers developers to build rich search solutions. The demo provides mechanisms to run in your own environment, to ingest data into Elastic Enterprise Search using the Enterprise Search Python Libraries, and to create a modern UI in React, using the free and open source tool Search UI.

Service Level Objectives as Code: Terraforming Honeycomb SLOs

In March, we announced official support for a Honeycomb Terraform Provider. Today, we’re announcing additional support for managing Honeycomb Service Level Objectives (SLOs) with Terraform. This furthers Honeycomb’s support for configuration as code and it gives you programmatic control for an immensely popular Honeycomb feature.

Announcing LM Envision

LogicMonitor’s unified observability platform brings clarity to Enterprise IT 2022 Is another exciting year for LogicMonitor. Today, LogicMonitor brought together customers, partners, industry analysts, and visionary thought leaders in New York City for our annual user conference, LM Elevate to discuss how to“elevate” their monitoring, their digital transformation, and their industries.

Coralogix - Announcing Our Series D Funding Round

While 2020 and 2021 were significant years for us, we’ve entered this year ready to give more to our users! We’re delighted to share we have raised $142 million in a Series D funding round! So what does that mean for you exactly? Over the past few months, our team has been working hard to build custom mapping for metrics, an advanced tracing UI, and more into our platform. The world is our oyster, and we can’t wait for you to see what else we have planned!

The Internet Is Buzzing About Cribl Search and Our Series D Funding!

It’s been an exciting few days at Cribl. A week ago, we announced our $150 Million Series D funding round led by Tiger Global, with participation from existing investors IVP, CRV, Redpoint Ventures, Sequoia Capital, and Greylock! We also announced an exciting new product: Cribl Search! We’ve been blown away by the excitement from our customers thus far.

Lumigo Receives Frost & Sullivan's Technology Innovation Leadership Award

Since our founding, Lumigo has worked hard to build innovative technology that meets the real-world needs of our customers in a cloud-first world. Today, we’re excited to be recognized for our work in serverless operations and the AI market by the experts at Frost & Sullivan, who have awarded Lumigo with the prestigious Best Practices Technology Award in Europe and Israel.

Synthetic Monitoring Phases & Strategies

Synthetic monitoring tools have long formed a core part of application performance management and monitoring toolsets. Yet no matter how familiar you are with synthetic monitoring, there is likely room to get more out of it than you currently are. Indeed, the default approach to synthetic monitoring tends to involve using it reactively: problems occur in production, and your team uses synthetic monitoring to help understand and remediate them.

Sponsored Post

What is Business Transaction Tracking?

When conducting business online it is vital to be able to track all parts of any financial transactions conducted by your organization. Where different companies are working together on different steps in the process, this isn't always simple to achieve. It is important to ensure that there is an appropriate level of observability at the heart of all of your systems and that it is easy to track the result of business decisions to real-world actions.

Sponsored Post

3 Reasons You Need a Digital Experience Monitoring Solution

Exoprise customers are already enjoying the full benefits of 24*7 active monitoring for their enterprise applications. Don't believe us? Take a look at one of our case studies. While synthetic monitoring (aka Active Monitoring) is great for proactively detecting SaaS, network, and Internet outages, the IT world has now switched to Digital Experience Monitoring (DEM) solutions. Thanks to Covid, that took the world by storm.

Sponsored Post

IT Automation is a Key to Reckoning the Challenges of The Great Resignation

The Great Resignation has been a time of reckoning and disruption for many businesses. According to the U.S. Bureau of Labor Statistics, more than 40 million people voluntarily left a job in 2021, the highest such rate in more than four decades.

How to Install Apache ActiveMQ on Debian 11

Apache ActiveMQ is a free and open-source message broker developed by Apache Software Foundation. It is one of the well-known message brokers that supports multiple protocols such as AMQP, MQTT, Stomp, and OpenWire. It is written in Java and fully compliant with JMS 1.1 standards. Apache ActiveMQ is one of the most popular message brokers that support different types of programming languages that can be deployed on multiple platforms.

Coming soon! A sneak peek at Raygun Alerting's Slack integration

The Slack integration is our #1 feature request for our Alerting feature, and the Raygun team has been busy at work making this feature available to all our customers. We expect this feature to be available around early - mid July 2022. Many of you, our curious customers, have asked for a preview of what’s to come. So we thought we’d share some specs and screenshots with you as we progress through the work.

Is monitoring dead yet?

No, it is not! How can it possibly be dead? In my opinion, monitoring is some thing very important for building a successful IT infrastructure. If monitoring is dead for an IT infrastructure then either that infrastructure already does not exist; or it is so powerful that problems like failure, errors, utilisation issues etc never occurs in the IT infrastructure which is literally not possible.

This Win is for Our Customers - We've Just Raised $142 Million in a Series D Round

While 2020 and 2021 were significant years for us, we’ve entered this year ready to give more to our users! We’re delighted to share we have raised $142 million in a Series D funding round! So what does that mean for you exactly? Over the past few months, our team has been working hard to build custom mapping for metrics, an advanced tracing UI, and more into our platform. The world is our oyster, and we can’t wait for you to see what else we have planned!

PCI DSS 4.0: Protecting Payment Card Processing

PCI? PCI SSC? PCI DSS 4.0? Need these acronyms explained? Well, this blog is for you. Read on to find out how the new PCI DSS 4.0 (a set of security standards created to ensure companies maintain a secure financial environment) will affect how you transact online, monitor your website payment gateways and more.

Introducing Opportunities & Experiments on WebPageTest: Take the Guesswork out of Performance

As many of you will know, Catchpoint and WebPageTest joined forces 18 months ago with the goal of building stronger alignment across the IT org to ensure performance issues can be caught in QA, staging and development, not just once new features are released to production.

KubeCon EU 2022 - Trends & Highlights

Kubecon EU returned to Spain. This time to Valencia, city of paella and horchata and, of course, a great place for big events. We had a great time meeting you all in person, and attending the talks. Here are our hot takes from the event. The main event started on Wednesday, but before that different co-located events took place: Ebpf Day, Cloud Native SecurityCon, and PrometheusDay among others. These events gathered a large number of attendees.

Engineering Levels at Honeycomb: Avoiding the Scope Trap

It has been seven years since Rent the Runway posted their engineering ladder, kicking off a veritable trend of engineering teams open sourcing their ladders. Interestingly, nearly all of them seem to have coalesced around “area of scope” as a useful proxy for level. At first glance, “area of scope” does seem to make sense. Senior engineers should be able to work across larger areas of the organization. In addition, your area of influence should expand as you gain experience.

Anomaly Advisor Case Study - K6 Load Test

In this video, our Analytics & ML Lead, Andrew Maguire, walks through an example case study using the K6 load testing platform to run a load test against some of our demo servers running Netdata. Watch in real-time as the Anomaly Advisor reacts to the load test and painlessly surfaces the most anomalous metrics, making it easy to just "see" the load test and how it plays out on the servers.

DX UIM Team Practices DevSecOps for Secure Development, Delivery, and Deployment

DevOps is a composition of enhanced engineering practices that reduce lead time and increase the frequency of delivery. The primary goal of DevOps is to ensure operations team members are engaged and collaborating with development from the very beginning of a project or product development. Within many enterprises, teams are being compelled to reassess the security of their DevOps implementations. Recent news on vulnerabilities like Sun Burst and Log4j underscore why this is so critical.

Security Teams Are Struggling, and Cribl Is Here to Help

Many cybersecurity teams are drinking from multiple firehoses without solutions in place to deal with the onslaught of data. And with 70 percent of companies experiencing over one hundred attacks each day, it’s not slowing down. Teams are overwhelmed with data from multiple sources and formats with continuous requests to pull in more and more.

Understanding Kubernetes Cost Drivers

Optimizing Kubernetes costs isn’t an easy task. Kubernetes is as deep a topic as cloud (and even more complex), containing subtopics like: That’s a lot for a busy DevOps team to understand and manage, and doesn’t even consider that line-of-business stakeholders and finance team members should have some understanding of each cost driver’s function and importance to contribute to a successful FinOps Strategy.

Synthetic Monitoring vs Real User Monitoring: What's The Difference?

It’s important for both technical and business teams to understand the different web performance monitoring options that are available as well as their various use cases and the benefits of each. In this article, we’ll compare synthetic monitoring and real user monitoring (RUM).

Synthetic Monitoring Tools: 5 Must-Have Features

There are many different classes of web performance tools, from synthetic monitoring to application performance monitoring (APM), to real user monitoring (RUM), and more. These different classes exist because each has its own strengths and weaknesses. When evaluating open-source or enterprise-grade synthetic monitoring tools, you want to look for capabilities that maximize its strengths.

Telegraf Best Practices: SNMP Plugin

Telegraf has now reached 300+ plugins and is deployed in a wide variety of use cases. In January we released a blog post covering the golden rules for creating configs and optimizing your Telegraf agent. It’s now about time we got our hands dirty covering some of the plugins the community uses the most. In this post, we are going to cover the SNMP Input Plugin.

Identify Vulnerable Devices From Vendor Recalls and Security Notices

When network hardware vendors issue device recalls, field notices, or security alerts, the implication can be massive for MSPs. Take the 2017 clock signal issue, for example. That huge recall of Intel microchips was a large-scale vulnerability for tons of devices—and left MSPs scrambling to figure out which devices were affected on which client sites.

Tips for Optimizing React Native Application Performance: Part 1

React Native is an amazing framework for building cross-platform mobile applications. It helps you provide a high-quality, native-looking application that runs on multiple platforms, using a single codebase. The current React Native architecture uses the UI (or main) thread and the JavaScript thread. The JavaScript thread is used to run business logic, make API calls, process touch events, etc. The UI thread is where animations and transitions are done.