Operations | Monitoring | ITSM | DevOps | Cloud

February 2022

Five worthy reads: Private 5G-Your fastest way to successful digital transformation

Five worthy reads is a regular column on five noteworthy items we’ve discovered while researching trending and timeless topics. This week, we explore how Private 5G is impacting organizations worldwide. Digital transformation (DX) integrates digital technology into different areas of an organization to help change the way it operates and deliver value to its customers. Organizations that offer a higher value to their customers gain a competitive advantage over others in the marketplace.

8 trends that will define IT digital dominance in 2022

The pandemic struck the IT industry hard, upending many businesses. The businesses that adjusted best were those with a digital transformation process already in effect. Cut to 2022 and most organizations have started the digital transformation. And the ones that already had established digital transformation initiatives sped up the process. So, what’s next? Digital dominance is the trend; the objective might be transformation, but the goal is beyond just process and strategy improvement.

Advanced Oracle Monitoring on Microsoft SCOM within Minutes

Monitor your Oracle databases, instances, tablespaces, pluggable databases, processes, system global area (SGA), clusters, automatic storage management (ASM), listeners, disk groups, and more within minutes. Discover Oracle objects automatically and track the overall health of your Oracle environment easily. Spot and act on Oracle latency issues as they appear. Generate detailed Oracle performance reports with a mouse-click. Use build-in Oracle tasks to speed up operations.

Lightrun Releases KoolKits - Debugging Toolkits for Kubernetes

KoolKits (Kubernetes toolkits) are highly-opinionated, language-specific, batteries-included debug container images for Kubernetes. In practice, they’re what you would’ve installed on your production pods if you were stuck during a tough debug session in an unfamiliar shell. To briefly give some background, note that these container images are intended for use with the new kubectl debug feature, which spins up Ephemeral containers for interactive troubleshooting.

Sponsored Post

How MSPs can benefit from AIOps adoption/strategy and add value-added services

According to Gartner, enterprise usage of AIOps is set to surge from a mere 5% in 2018 to a whopping 30% in 2023. To survive in an increasingly competitive market, MSPs must not only respond well to customer expectations but anticipate them. Another Gartner report states that by 2025, over 80% of public cloud managed and professional services deals will require both hybrid and multi-cloud capabilities from the provider, up from below 50% in 2020.

KoolKits - Highly-opinionated, batteries-included Kubernetes debugging toolkits

KoolKits (Kubernetes toolkits) are language-specific container images that contain a (highly-opinionated) set of tools for debugging applications running in Kubernetes pods. You can read more about the motivation behind this project here. Those images are intended for use with the new kubectl debug feature, which spins up Ephemeral containers for interactive troubleshooting. A KoolKit will be pulled by kubectl debug, spun up as a container in your pod, and have the ability to access the same process namespace as your original container.

Quickly troubleshoot application errors with Error Reporting

Are you familiar with the four golden signals of Site Reliability Engineering (SRE): latency, traffic, errors, and saturation? Whether you’re a developer or an operator, you’ve likely been responsible for collecting, storing, or analyzing the data associated with these concepts. Much of this data is captured in application and infrastructure logs, which provide a rich history of what is happening behind the scenes in your workloads.

How to manage cardinality with out-of-the-box dashboards in Grafana Cloud

When there’s a cardinality explosion, it can cause problems: It’s a surprise, it’s noise, and it can increase your costs or cause performance degradation of your systems. Over the past year, we’ve improved our time series storage systems so that under normal use, high cardinality is no longer an issue. But as the operator of an observability platform, you should have tools you need to help protect that infrastructure.

Monitoring system performance metrics with Graphite

In this article, we will explain what system performance metrics are and why you need to monitor them. Then we will look at Graphite and Grafana monitoring systems, which make it easy to collect, save and visualize metrics. Finally, we will consider why you should choose MetricFire to monitor your system’s metrics. If you would like to learn more about the benefits of MetricFire, book a demo with our experts or sign up for a free trial today.

Distributed Tracing and Suspect Spans

At the root of every performance issue is, there is most often a single event that creates a domino effect of excruciatingly slow load times. With distributed tracing, we give you all the context to see what actually matters and help you solve what’s urgent faster. However in some cases, you might want or like really need a short cut. And this is where Suspect Spans come into play.

10 Microsoft Teams Performance Use Cases for IT Admins

Dependence on Microsoft 365 and Teams has never been greater, and the pressure is on for IT teams to deliver exceptional user experiences – anytime, anywhere. The modern workplace sees users connecting from the office, home, and pretty much any place in between. This hybrid work model has a significant impact on IT, the network and the overall quality of service perceived by the users.

Latest Release of Our Network Monitoring Software Delivers AI-Driven Log Analytics

If you manage a network, every network device generates a large volume of logs. These logs are extremely important and narrate a story about both events and the sequencing of those events within your network. This capability is critical for any network monitoring software, helping you easily understand network activities, user actions, security breaches, and much more.

VirtualMetric Presents: Database Monitoring

VirtualMetric provides a powerful tool to observe your database performance and health. With VirtualMetric's tool, you can monitor #Database Transactions, Database Statistics, #PerformanceMetrics and #Inventory from a single dashboard. Drill down into detailed statistics to troubleshoot your database performance and optimize it. Ensure compliance and guarantee the security of stored data with advanced visibility across datasets.

How to Test Salesforce Multi-Factor Authentication

Assuming you have correctly configured the user ids for MFA authentication in Salesforce, end-users should see the following screen when trying to login into the CRM application. The TOTP-based verification code is generated in third-party authenticator apps (Google or Microsoft) on your mobile device when you first scan the QR code or enter the key manually in the app. In this article, we’ll guide you through all the steps you need to set up our Salesforce MFA Web Sensor in your environment.

Full-Cycle Observability With Instana and Lightrun

Understanding everything that happens inside a production environment is a notoriously difficult task. Instana’s solution helps developers and DevOps become aware of problems quickly – problems that are rooted in both infrastructure-level information and application-level information. Lightrun, on the other hand, enables practitioners to drill deeper into line-by-line, debugger-grade information from your production systems – enriching the existing information Instana delivers.

Lightrun Announces GA Support for Visual Studio Code

Lightrun is the world’s first IDE-native observability platform. A developer-first product, Lightrun enables engineering teams to connect to their live applications and continuously identify critical issues without hotfixes, redeployments, or restarts. We are proud to announce the general availability of the Lightrun extension for Visual Studio Code, the popular IDE from Microsoft.

Jaeger Tracing: A Friendly Guide for Beginners

Written by @thetomzach @ Aspecto. In this guide, you’ll learn what Jaeger tracing is, what distributed tracing is, and how to set it up in your system. We’ll go over Jaeger’s UI and touch on advanced concepts such as sampling and deploying in production. You’ll leave this guide knowing how to create spans with OpenTelemetry and send them to Jaeger tracing for visualization. All that, from scratch.

Best alternative to CloudWatch: Cost-Effective, Simple, Complete AWS Monitoring

Trace root cause of performance issues with AWS Application monitoring. Get flexible deployment models for hybrid infrastructure eG Enterprise supports all the major technologies you use for infrastructure, digital workspaces and applications with AWS Monitoring It's a single-pane-of-glass that monitors your entire IT deployment, understands all your interdependencies and gets you to the root cause of performance problems faster.

The Role of Service Desk in Digital Transformation

Almost every organization is moving towards or planning to move towards digital transformation and IT is a significant part of it. The pandemic has made digital transformation a necessity to achieve maximum customer and employee engagement and satisfaction, bridge cultural and talent gaps, and promote IT as a valuable business process.

What is API Monitoring? Ways to Monitor API, Best Practices, Tools, and More

To provide a fast, seamless, and highly available experience for the end-user, modern applications increasingly rely on third-party services. Due to this rising complexity, it's become important for IT employees to ensure that these services are up and running and communicating as they should. API monitoring has thus become a must-have for DevOps teams.

Dashboard Fridays: Sample Pingdom dashboard

Join Adam Kinniburgh and Ashley Thompson in this latest Dashboard Fridays episode on Pingdom. This dashboard gives an overview of Pingdom checks using PowerShell scripts against the Pingdom API. In this short video, we'll demonstrate how this dashboard was built using SquaredUp dashboards, the challenges it solves, and how you can easily replicate it in your own environment.

Network AF, Episode 10: Navigating venture capital and networking with Alan Cohen

In this episode of Network AF, your podcast host Avi Freedman chats with networking investor, advisor and VC partner, Alan Cohen. Alan brings a hilarious, witty and nonconformist attitude to the talk, exploring Silicon Valley in the 90s, the joy of moving from large enterprises to small disruptors, and generously sharing secrets of the trade with Avi and podcast listeners.

How we optimized Python API server code 100x

Python code optimization may seem easy or hard depending on the performance target. If the target is “best effort”, carefully choosing the algorithm and applying well-known common practices is usually enough. If the target is dictated by the UX, you have to go down a few abstraction layers and hack the system sometimes. Or rewrite the underlying libraries. Or change the language, really. This post is about our experience in Python code optimizations when whatever you do is not fast enough.

Webinar Recap: Streamline Connections with LogStream QuickConnect

Feature Highlights is a new addition to our ongoing series of webinars. As the name suggests, it’ll focus on specific product features with anonymized customer use cases taking center stage. In other words, how Cribl customers actually use the features to get the job done, sometimes in unintended ways. QuickConnect was the first act with a session “Streamline Connections w/ LogStream QuickConnect”.

How to Optimize Your WordPress Site With Pingdom Real User Monitoring

To keep their applications and websites available and accessible, today’s businesses put a lot of emphasis on infrastructure monitoring to ensure their servers are healthy and running. Amid the hustle, many companies overlook the need to monitor a different aspect of their application: user experience.

Revisiting The Things Network: Connecting The Things Network V3 to InfluxDB

Back in 2019, David Simmons created an awesome blog introducing LoRaWAN devices and The Things Network. He also showed you how easy it was to connect The Things Network V2 to InfluxDB. Since then, a few things have changed and I thought it was time to revisit the Things Network with a new project.

Ask Miss O11y: OpenTelemetry in the Front End: Tracing Across Page Load

Ah, good question! TL;DR: store the start time of the span, and then create the span on the new page. Usually, you want to start a span, do some work, and then end the span. The whole span gets sent to your OpenTelemetry collector (and thence to Honeycomb) when you end it. But when a page load happens, that span object is lost. Honeycomb never hears about it becausespan.end()wasn’t called. How can we deal with this? Create the span only on the new page, where you can end it. But!

What is Digital Experience Monitoring, and How Can it Help Companies?

Customer-centered business practices have become a major focal point in IT innovation. The advent of massive communication has brought forth a big challenge, however: how does one provide a top-notch customer experience when there are so many factors to account for? The market’s answer to this conundrum is monitoring software.

9 popular JavaScript frameworks (and how to choose one for your project)

Choosing a JavaScript framework for a new project can be a daunting task. There’s always a new one getting hype from the community, while the established players still have a lot to offer. So you need to do your homework and make sure the framework you choose is the right one for your specific requirements. Popularity alone is never the best indicator, but a review of the most widely-used options should help you decide which way to go.

Create and navigate a documentation library with Notebooks

Datadog Notebooks enable your teams to create and manage key reports and documentation as they build out, monitor, and maintain their infrastructure. Notebooks can include both text and graphs of any telemetry data you have collected in Datadog, and they support collaborative editing so that multiple team members can edit and leave comments simultaneously.

How to monitor Starlink with Prometheus

In this article, you’ll learn how Starlink works in a domestic environment, and how to monitor Starlink connection with Prometheus. SpaceX’s Starlink uses satellites in low-earth orbit to provide high-speed Internet services to most of the planet. During the beta, Starlink expects users to see data speeds vary from 50Mb/s to 150Mb/s and latency from 20ms to 40ms. It’s also expected that there will be brief periods of no connectivity at all.

SAP HotNews and CVE kernel patch: Securing your SAP systems

New ICMAD bugs require immediate attention and patching for SAP systems The dust has not yet settled on the CVSSv3 10.0 score Log4j security vulnerability that hit in December 2021. Last week, a new group of three security vulnerabilities were published by SAP, which all relate to SAP’s Internet Communication Manager (ICM or ICMAD). Once again, one of these vulnerabilities has a CVSS v3.0 base score of 10/10. In contrast to Log4j, the latest threats only impact SAP customers, but they need immediate attention.

How to publish messages through Kafka to Grafana Loki

Back in November 2021, Grafana Labs released version 2.4 of Grafana Loki. One of the new features it included was a Promtail Kafka Consumer that can easily ingest messages out of Kafka and into Loki for storing, querying, and visualization. Kafka has always been an important technology for distributed streaming data architectures, so I wanted to share a working example of how to use it to help you get started.

Introducing BGP monitoring from Kentik

Designed at the dawn of the commercial internet, the Border Gateway Protocol (BGP) is a policy-based routing protocol that has long been an established part of the internet infrastructure. Historically, BGP was primarily of interest to ISPs and hosting service providers whose revenue depends on delivering traffic.

Monitoring Application Response Times

The world is going digital with lightning speed. According to this report by Statista, at the beginning of January 2021, there were 4.66 billion active internet users in the world, which accounts for 59.5 percent of the world's population, and out of this 4.66 billion, around 4.32 billion (92.6%) users have accessed the internet by mobile. All companies regardless of the sector in which they operate, want to use the internet for expanding their business.

Fantastic Cribl Packs and How to Export Them

In LogStream 3.0, we introduced a framework that provides a way for LogStream customers to build, reuse, and share configuration modules – including pipelines, lookups, data samples, and knowledge objects – called Packs. While each Pack has its own “context” containing custom pipelines, routes, lookups, variables, etc., it still retains access to built-in LogStream configuration that is shipped with the product.

The AppScope Origin Story

Since we introduced AppScope in 2021, we’ve been relentlessly working towards the production-ready milestone. Last week we released AppScope 1.0. It’s been a long haul getting to this point. Not really sure if it took this long because we solved difficult problems, or if we’re just that slow. Someone told me that what we are doing would go a lot faster if we use a modern high-level language. Maybe … Can you imagine doing this in TypeScript? Yeah, me either.

Using Lambda Extensions to Streamline Observability

Lambda is a top-rated compute service available on the AWS cloud service network. Its popularity largely derives from its ease of use, allowing users to run Lambda functions reliably without provisioning or managing servers. Lambda can be triggered manually or by any linked events in the AWS network, including DynamoDB streams, SQS, Kinesis, and more.

Have a Worry-Free Upgrade

The waiting can be intensely stressful. You are mid-way through a critical production upgrade during the weekend. The schedule is tight. Suddenly there is an unexpected problem you aren’t able to resolve. You need help. So, you call in a support ticket. And that’s when the waiting starts. While you’re waiting for the support team to review and get back to you, questions race through your mind: How quickly will they respond to the ticket?

Deep Dive into the App Start Experience

Our customers rely on Splunk’s mobile apps when they are on-call and troubleshooting in high-stress situations. Splunk’s customer base includes 96 of the Fortune 100 , many of whom rely directly on Splunk’s mobile app to help them solve outages or large scale performance problems. Therefore, they need a reliable quality of experience they have with our products and services. My team and I work on two mobile apps at Splunk: 1.

How to monitor ActiveMQ logs and metrics

ActiveMQ is a message-oriented middleware, which means that it is a piece of software that handles messages across applications. It acts as a broker that can help facilitate asynchronous communication patterns like publish-subscribe and message queues. The main goal of those servers is to create a scalable and reliable message bus that different components can use to communicate with each other.

Logging Blindspots: Top 7 Mistakes that are Hindering Your Log Management Strategy

Today, virtually everyone who manages infrastructure or applications relies on logging to understand what is happening within their environments. But some teams do logging better than others. Although there is no one right – or wrong – approach to log management, there are a variety of logging mistakes that engineers commonly make when deciding what to log, how to log it, and how to work with their log data.
Sponsored Post

Strategies to avoid downtime and maintain business continuity

In March 2019, Facebook experienced a 14-hour outage that cost the company $90 million. In July 2018, Amazon lost up to $99 million on Prime Day after experiencing downtime. While these critical financial crises greatly impacted these industry leaders, both companies were able to recover from them eventually; however, many smaller organizations may not have the means to overcome a similar incident. As per Gartner, downtime costs on average $ 5,600 per minute; since IT operations vary from business to business, downtime could cost $140,000 per hour on the lower end or $540,000 per hour on the higher end.

Monitor MongoDB Atlas for Government with Datadog

MongoDB Atlas is a fully managed cloud database service for modern applications. Earlier this year, the MongoDB team released MongoDB Atlas for Government, a dedicated environment for US federal agencies and state, local, and education (SLED) entities that need to meet stringent security and compliance requirements.

Use Log Analytics to gain application performance, security, and business insights

Whether you’re investigating an issue or simply exploring your data, the ability to perform advanced log analytics is key to uncovering patterns and insights. Datadog Log Management makes it easy to centralize your log data, which you can then manipulate and analyze to answer complex questions.

Introducing exemplar support in Grafana Cloud, tightly coupling traces to your metrics

We’ve talked in previous posts about why we think the concept of exemplars are so valuable: They make it easy to jump from metrics into exactly the right traces, eliminating the needle in the haystack problem. We were enthusiastic enough about the idea that we helped contribute the necessary code changes to bring this functionality to the Prometheus ecosystem.

reCAPTCHA: Easy for Humans and Hard for bots

Captchas are used on many websites to protect user accounts from bots and other automated programs, preventing them from accessing the website. According to Imperva's research, harmful bots generate 25.6% of all web traffic in 2020. They are used by spammers to send automated messages to users, and by hackers to attack websites with automated scripts that often wreak havoc on the site’s performance.

8 Best Network Monitoring Tools to Try-on

Network monitoring tools are an important part of any network management strategy since it gives you valuable information about network-related issues that could damage your business. Overloaded networks, router difficulties, downtime, cybercrime, and data loss are all dangers that can be mitigated by frequently monitoring networks.

SAP Cloud Security: A strong defense is the first step

Securing an SAP environment is critical to any organization and as the use of public clouds grows for these environments, so does the concern of SAP security. These hyperscalers all have teams of people working around the clock to ensure their solutions are impenetrable. Third party cybersecurity solutions are often used as well to make sure these environments are safe and secure.

Use Real User Monitoring to Optimize Real User Experience on Websites and Applications

What if I told you one of the most common mistakes businesses make is reporting on website performance without understanding user experience? What if I said there was much more to website performance monitoring than simply alerting you when your site is experiencing a downtime outage? No two websites – or baselines – are exactly the same.

Why is Causation Important in AIOps?

Modern IT environments have become much more complex to manage thanks to hybrid infrastructures and comprehensive instrumentation that generate metrics, alerts and events data constantly. ITOps (IT Operations) and SRE (Site Reliability Engineering) teams are tasked with providing superior performance and user experience for the numerous applications while not letting the budget out of hand.

Implementing distributed tracing in a nodejs application

In this article, we will implement distributed tracing for a nodejs application based on microservices architecture. To implement distributed tracing, we will be using open-source solutions - SigNoz and OpenTelemetry, so you can easily follow the tutorial. In modern microservices-based applications, it is difficult to understand how requests are performing across multiple services, infrastructure, and protocols.

Slack Outage of 2/22/22 - Good Morning! Here's 16 Minutes of Stress!

For some time now, people have understood the importance of early warning systems, whether for detecting earthquakes and tsunamis, military defense, or business and financial crises. Why should service providers, especially those delivering software as a service (SaaS,) be any different? In a world where time is money and minutes mean millions, it is vital for organizations to keep a very close eye on the supply and delivery chain of their service to their end users, both business and consumer.

Adapting Icinga Web modules To Icinga DB

Icinga DB web has a better layout and is more user friendly. This makes monitoring more simple. Hence it would be nice if we could adapt all the Icinga modules to Icinga DB. In this blogpost, I will discuss how to adapt Icinga Web modules to Icinga DB. To do this, first and foremost, install and enable the Icinga DB module. Currently, monitoring backend is the default backend for all the modules.

FAQ - Netreo Azure and AWS Monitoring Capabilities

Netreo SaaS delivers a single solution for simplifying how IT organizations optimize today’s hybrid blend of on-premises, public and private clouds that are common in complex, global enterprise infrastructures. With the upcoming release of cloud monitoring enhancements coming in March, Netreo SaaS will provide even greater, multi-cloud monitoring capabilities and extended functionality for Microsoft Azure and AWS cloud customers.

SNMP Traps: The 90's Want Their Monitoring Technology Back

How do you monitor your network? There are a myriad of technologies and tools out there, each providing different benefits and challenges. Today we are going to focus on one specific area, Simple Network Management Protocol (SNMP) Traps. That’s right, we are going narrow here, not just focusing on SNMP but on one specific portion of the protocol: namely the ability of devices that support SNMP to send alert information to collectors.

How Infrastructure Monitoring Can Support Your Digital Transformation Journey

Cloud deployments have overtaken that of on-premises in the enterprise application software market since 2020, and Gartner expects they will be double the size of on-premises by 2025. These changes reflect the fast evolution in IT infrastructure due to new technology, business models, and market demands. The movement toward cloud, mobility, and IoT continues to surge forward.

Top B2B & B2C UX Design Examples

33% of consumers will leave a brand they love after just one bad experience. And almost 80% of American consumers say that convenience and speed are among the most important factors to a good customer experience (according to PwC). So yes, good UX design can make or break your business’ product and customer relationships. Hence, products are more people-focused than ever before.

Does A Multi-Cloud Strategy Mean Compromising on Application Performance?

Based on the Teneo customers I’ve spoken with in the last 12 months, adopting a ‘‘Cloud-First’, Multi-Cloud strategy is often a top priority for Infrastructure and Operations (I&O) teams. However, many organizations have multiple clouds and cloud services mixed with physical data centers and Co-Lo (co-locations). And as a result of running multiple applications and services stitched together, user experience suffers.

New feature alert: OpManager MSP's NCM add-on for a seamless configuration and compliance management

According to ABC News, there has been a 600% rise in security intrusions during the COVID-19 pandemic, which is expected to double before 2025. In many circumstances, admins and technicians either intentionally or unintentionally play a part in the process of derailing the organization’s strategy for success. In order to prevent such mishaps, MSPs need a recovery plan to recuperate from any unfortunate accidents or cyberattacks.

Hybrid Monitoring for Azure and Microsoft 365

Business analytics and business continuity go hand in hand. Microsoft 365 as well as Azure services are used by companies around the globe to achieve better business outcomes. Learn why a hybrid approach to Azure and Microsoft 365 monitoring can be beneficial and what steps to take to get clear insights into SLA fulfillment.

Disk performance collection on virtual appliances in VMware

A virtual appliance is a pre-integrated, self-contained system that combines a software program (e.g., server software) with just enough operating system to run correctly on industry-standard hardware or a virtual machine, such as VMware. Virtual appliances offer many advantages for enterprise IT, including the ability to package a solution as a single product to quickly create ready-to-use systems in the cloud or on-premises, with little to no setup.

How secure is your Grafana instance? What you need to know

One of Grafana’s most powerful features is the ability to funnel data from hundreds of different data sources (i.e., services or databases) into a single dashboard without migrating the data from where it lives. You can connect and correlate data from Grafana’s curated observability stack for metrics, logs, and traces, or third-party services, such as Splunk, Elasticsearch, Github, Jira, and many more.

Tracking Stability in a Bluetooth Low Energy-Based React-Native App

For most of my career I’ve worked with health and wellness startups. Most of these companies have a wearable that tracks movement, heart rate, body weight or stimulates a body part. The common denominator between these apps is their use of sensor data to determine physiological progress an athlete is making. Problem is, your Bluetooth Low Energy (BLE) device does not have an internet connection and cannot send diagnostics anywhere if there are errors.

APM vs. Logging: Do I Need Both?

The final stage of the popular Software Development Lifecycle after planning, analysis, design, and implementation is maintenance. This is where a full-fledged application running in production is constantly looked after and taken care of. Bugs, bottlenecks, slow database queries, security loopholes, and other issues are discovered and fixed before deploying the updated code. Log records and Application Performance Monitoring (APM) tools play a crucial role in software development maintenance.

The Complete UX Audit Checklist 2022: for CIOs and Self Auditors

It’s essential to perform a UX Audit on your website from time to time. It helps improve the quality of your site by reviewing its strengths and weaknesses. Chief Information Officers often carry out this job. However you can perform the UX Audit yourself if the company is small. Either way, you’re going to need a complete UX Audit Checklist that makes sure you get the most valuable insights from your audit.

IoT Security: How Important are Logs for System?

IoT has rapidly moved from a fringe technology to a mainstream collection of techniques, protocols, and applications that better enable you to support and monitor a highly distributed, complex system. One of the most critical challenges to overcome is processing an ever-growing stream of analytics data, from IoT security data to business insights, coming from each device. Many protocols have been implemented for this, but could logs provide a powerful option for IoT data and IoT monitoring?

Debugging Node.js Memory Leaks: How to Detect, Solve or Avoid Them in Applications

In this article, you’ll learn how to understand and debug the memory usage of a Node.js application and use monitoring tools to get a complete insight into what is happening with the heap memory and garbage collection. Here’s what you’ll get by the end of this tutorial. Memory leaks often go unnoticed. This is why I suggest using a tool to keep track of historical data of garbage collection cycles and to notify you if the heap memory usage starts spiking uncontrollably.

End-to-End Application Performance Monitoring

In this blog, I’ll cover a real-world example of application performance troubleshooting a Java web app, hosted on JBoss Wildfly using Microsoft SQL as the backend database, including details of the analysis and diagnosis we had to perform in order to identify the root-cause of, and resolve, the performance issue.

How (not) to test signup and keep your CEO happy

One of Checkly's strengths is the capability to monitor key transactions on your site. It'd be missed opportunity if we didn't reuse it to monitor our own product! But for some important flows that comes with a couple of pitfalls. In this post, we'll take a closer look at how we monitor one of our top key flows: signup.

[Infographic] AWS SNS from a serverless perspective

The Simple Notification Service, or SNS for short, is one of the central services to build serverless architectures in the AWS cloud. SNS itself is a serverless messaging service that can distribute massive numbers of messages to different recipients. These include mobile end-user devices, like smartphones and tablets, but also other services inside the AWS ecosystem. SNS’ ability to target AWS services makes it the perfect companion for AWS Lambda.

Moving in Concert: What is SD-WAN, and Where is it Going?

Software-defined WAN (SD-WAN) is a software overlay that decouples WAN configuration and management from the underlying network transport medium (5G/LTE, MPLS, xDSL, etc) and dynamically routes WAN traffic. To the uninitiated, that’s a mouthful. But it’s one of the most interesting trends of the last few years. Let’s jump on the bandwagon! Time for a primer on SD-WAN, why it matters, and where it’s heading now and into the future.

Ship software faster by removing bottlenecks and keep work flowing

We know customers and users today demand new features to be frequently released to their favorite apps. Plus they expect any bugs or issues hindering a great user experience to be fixed—and fast. Here we're going to cover new capabilities built to help you keep up with the business by measuring how well your team works in small batches and identifying previously invisible cross-team dependencies in your development and delivery processes.

The Truth About "MEH-TRICS"

A long time ago, in a galaxy far, far away, I said a lot of inflammatory things about metrics. “Metrics are shit salad.” “Metrics are simply nerfed dimensions.” “Metrics suck,” “metrics are legacy,” “metrics and time series aggregates will fucking kneecap you.” I cannot tell a lie; Twitter will testify that I’ve spent the past six years ragging on metrics.

How We Used Our Own Platform Capabilities to Prevent Log4j Attacks and Protect Customers

In December, information security researchers discovered a serious vulnerability in the popular open-source logging library, Log4j. If exploited, this vulnerability, known as Log4Shell, could allow malicious attackers to execute code remotely on any targeted computer. Millions of computers use Log4j. According to one study, 93% of all cloud environments are affected by the vulnerability.

Uptime.com Real User Monitoring Report

Take an in-depth tour of the Uptime.com RUM report. Skip to each aspect of reporting with the timestamps below: Comprehensively understand your users – and your baselines. Organize RUM data by URL(s) or group URL(s) to track subdomains; segment data by devices, operating systems, browsers, countries, other geographies – to compare metrics within specific time windows to your website or application’s performance monitoring baselines.

Your guide to the key steps of capacity planning and management

If you’re going to move assets to Azure – or any public cloud, you’re going to need some help. As a cloud consulting firm with a top-notch infrastructure performance monitoring application, we help enterprises navigate obstacles on the path to the cloud all the time. We’ve also felt the pain of sizing and pricing in our cloud journey, too. That’s why we created Galileo Cloud Compass (or GCC as we sometimes call it).

6 Simple Steps to an Easy Cloud Migration

If you’re going to move assets to Azure – or any public cloud, you’re going to need some help. As a cloud consulting firm with a top-notch infrastructure performance monitoring application, we help enterprises navigate obstacles on the path to the cloud all the time. We’ve also felt the pain of sizing and pricing in our cloud journey, too. That’s why we created Galileo Cloud Compass (or GCC as we sometimes call it).

Service Level Objectives: Where do we start?

Most of us have heard about SLOs and what they mean but always found it hard to start adopting them across our teams. This video is a way to demystify the journey of adoption of SLOs, with examples of how several large companies like Disney adopted them. Whether you are new to the DevOps/SRE world or an experienced developer, you will learn a fresh approach to making software more reliable!

What I learned running a SaaS for a year

This time last year, I showed the internet a little prototype uptime checker I built using Next.js as the frontend, with services running on AWS Lambda. I gave myself one week to put it together. I wrote a few articles about how the business was going throughout the year: The gist of my approach is as follows: I started with a single Lambda function that checks if static websites were still online, added an email alert if it's offline, wrapped authentication around it, integrated Stripe, and shipped it.

Logging Practices: Know What to Log

Logging is an essential component of many applications. Every application has a different logging technique. You may prefer certain logging implementations, but you must also consider what to log, when to log, how much to log, and how to control logging. System administrators and developers, particularly the support team, benefit greatly from a well-designed logging system. For both the support team and the developers, logs save a lot of time.

Protecting your SAP systems from new vulnerabilities

New ICMAD bugs require immediate attention and patching for SAP systems. The dust has not yet settled on the CVSSv3 10.0 score Log4j security vulnerability, which has been keeping IT employees across all businesses very busy since December 2021. Read this article to learn more about the CVSSv3 10.0 score Log4j security vulnerability.

How to Detect Network Congestion

We’ve all been stuck in traffic congestion on the road at some point in our lives. Traffic congestion may happen when there’s too many cars on the road, when there’s an accident or a closed street. Network congestion isn’t too different from that - but instead of cars causing congestion, it’s a different type of traffic. That’s why, in this article, we’re running you through how to detect network congestion with Network Monitoring tools.

Track down high PVS device rates with release MetrixInsight for CVAD - v1.4.22004.x

Track down high Citrix Provisioning Services device retry rates with our new MetrixInsight for Citrix Virtual Apps and Desktops (CVAD) SCOM Management Pack release. MetrixInsight for CVAD is a CITRIX® Ready SCOM Management Pack for monitoring Citrix Virtual Apps and Desktops, Citrix License Server, Citrix Provisioning Services, Citrix StoreFront and Application Delivery Controller, formerly known as NetScaler.

How to Handle Java Lang OutOfMemoryError Exceptions

All the applications that you’re trying to execute require memory. It doesn’t matter if the application was developed using assembly language. Or if you used a low-level programming language like C or a language compiled to a bytecode like Java. Running the application requires memory for the code itself, the variables, and the data that the code processes. Depending on your usage, the memory requirements will vary.

Icinga Director v1.9 released: Improved permissions, new config options and more

This release of Icinga Director includes a bunch of new options to make your daily monitoring business easier and more comfortable. It includes many fixes as well as some new features for Sync Rules, Configuration Baskets and Permissions. Check out the Changelog for a detailed list about all changes. You can get started with Icinga Director by just adding the module to your Icinga installation: Follow the installation guide.

Insights from the 2022 Gartner Report on AI for CSP Networks and how Autonomous Network Monitoring Fits In

Last month Gartner published its first ever “Market Guide for AI Offerings in CSP Network Operations,” and we’re excited to share that Anodot has been identified as a Representative Vendor in the report. According to the Gartner report, “CSPs are focusing on automation of their network operations to improve efficiency and customer experience, and mitigate security concerns.” The market guide presents many new and actionable insights.

Why Website UX "Edge Cases" Lead to Visitor Frustration - and What to Do About It

The year was 1993. Beanie Babies invaded the planet. Dinosaurs dominated cinemas worldwide when they escaped from Jurassic Park. Seinfeld won the Emmy for Outstanding Comedy Series (you might say that Jerry & co. were masters of their domain). And righteous rockers Aerosmith extolled the virtues of “living on the edge.” A lot — and we are talking A LOT — has changed since 1993; especially that advice about living on the edge.

Continuous Test Data Management for Microservices, Part 2: Key Steps

In my prior blog, Continuous Test Data Management for Microservices, Part 1, we offered an introduction to the key approaches for applying continuous test data management (TDM) to microservices. The continuous TDM process for microservices applications is similar to that for general continuous TDM (see figure below), but tailored to the nuances of the architecture. In this post, I’ll outline the key steps for applying TDM across the lifecycle.

Move away, Pandora FMS WP is coming!

Three funny facts that you may not have known: 1) Elvis Presley and Johnny Cash were colleagues. 2) Jean-Claude Van Damme was Chuck Norris’s security staff. 3) Pandora FMS has a plugin for WordPress. That’s right! Pandora FMS has a monitoring plugin for WordPress that has been totally renewed and prepared for you! Get to know Pandora FMS WP!

Leading Prometheus Monitoring Tools For 2023

Prometheus is one of the leading open-source monitoring frameworks around today. It is well known for its operational simplicity, and for being highly available by default. It is the second project hosted by the Cloud Native Computing Foundation (CNFC) and was accepted into the programme on May 9, 2016. Prometheus is completely based on time series data collection and uses both a dimensional data model and flexible query language (PromQL).

The most shocking websites that experienced website downtime in 2021

2021 saw some of the biggest websites on the internet experience outages that rippled across the globe. If you thought that “large” companies couldn’t experience website downtime, unfortunately, you were wrong. Website downtime can happen to any website, small, medium, or large, and it can happen when you least expect it. Thinking that the proof is in the pudding? Check out the most shocking websites that went down in 2021.

Crossing K8s Monitoring and Observability Gaps With Change Intelligence

Recently we had the privilege of being named a Gartner Cool Vendor in the Monitoring and Observability category. The funny thing is, while this is definitely the closest Gartner category for our solution, we aren’t really used to thinking about Komodor as a monitoring and observability tool.

Top 13 Site Reliability Engineer (SRE) Tools

The role and responsibilities of a site reliability engineer (SRE) may vary depending on the size of the organization, and as such, so do site reliability engineer tools. For the most part, a site reliability engineer is focused on multiple tasks and projects at one time, so for most SREs, the various tools they use reflect their eve-evolving responsibilities.

Understanding Service Management and Its Benefits

Organizations are increasingly reliant on their internal IT teams to supply business-critical services and operations in today’s business climate. As IT activities become more integrated into business operations, more and more IT departments are opting to implement service management best practices, to meet the evolving requirements of the businesses. But what is service management? Let’s look!

SquaredUp 5.4: New ODBC data source

We just released the new SquaredUp 5.4 with some brilliant new features. Taking center stage was the new ODBC data source. (If you missed the release announcement you can catch up by reading the quick overview blog post where you can also watch the full release webinar.) With SquaredUp 5.4, you can now instantly visualize any data from almost any database with the addition of ODBC.

Apica Quick Guides - Common Check Configurations

Have you ever wondered what that one checkbox does, where that button takes you or what a specific function does? These quick guides are designed to explain every function as quick and precise as possible so you can continue your monitoring without any disturbance. This guide will explain all of the common check configurations that are used for the majority of our check types, and their individual functions.

Dashboard Fridays: Sample SQL Page Timeframe dashboard

When writing a SQL query, getting the date and time to format correctly can be a real pain. What is the current date and time, and how is it formatted? What happens if I want the date and time from 30 days ago? This SQL Page Timeframe dashboard demonstrates how the current date is picked up from the server hosting SquaredUp and how the Page Timeframe button impacts the date and time in a SQL query. Several different examples are demonstrated.

Node Congress Lightning Talk: Monitoring errors and slowdowns with a JS frontend and Node backend

We've got a JavaScript frontend hitting a Node (Express.js) backend. Join Chris Stavitsky in this quick 7-min demo as he goes through how to know which party is responsible for which error, what the impact is, and all the context needed to solve it. This lightning talk took place at Node Congress on Feb. 17, 2022.

Node Congress Workshop: Tracking errors and slowdowns in Node + JavaScript using Sentry

Join Neil Manvar, Sales Engineer Manager, as he sets up Sentry step-by-step to get visibility into our frontend and backend. Once integrated, he will show you how to track and triage errors + transactions surfaced by Sentry from our services to understand why/where/how errors and slowdowns occurred within the application code. This workshop took place live at Node Congress on February 15, 2022.

LogicMonitor APM is now generally available to Enterprise customers

Over the past two years, we have been on a journey to provide the tools you need in order to achieve unified, end-to-end observability in real-time across your entire business. We believe that true observability gives you the confidence to embark on your cloud and digital transformation initiatives. LM APM empowers ITOps and DevOps teams with the context they need to continue delivering quality user experiences while seamlessly correlating all of this data in one easy-to-use platform.

Ask me Anything with WUG Ninja

You may have heard about an ‘Ask Me Anything, and its popularity across social media platforms like Instagram and Reddit. In our first Ask me Anything Webinar for What’s up Gold, our in-resident WUG Ninja, whom you’ve seen in many other webinars will be here to answer your questions about WhatsUp Gold! In this on-demand session, we’re turning it to you, to ask us questions. But there’s one catch, you have to submit the questions ahead of time (in the form on the right). You can ask us whatever you want, but here are some sample questions to get your creative juices flowing.

SCOM 2022 coming soon: the most exciting updates

Great news for the SCOM community – SCOM 2022 is going to be released in the spring! SCOM isn’t going anywhere and it’s only getting better. We saw this proven in the Big SCOM Survey Results 2021 where more than half of respondents said they were going to increase their SCOM deployment to monitor more of their existing and new infrastructure.

Limit the risk of ransomware with OpsLogix VMware MP

Ransomware is not a new concept within IT security. However, much focus is now being brought to it as the scale, number, and cost of these attacks are increasing worldwide. Though these attacks are aimed at organizations, the outcome can significantly impact consumers and individuals as well. The impact these attacks have on healthcare systems, schools, and power providers is significant and can have devastating consequences.

Monitor your GitHub Repos with Graphite and Grafana

In this article, we will explore the main metrics of a GitHub repository and why it is important to monitor them. We will learn how to get GitHub data in a convenient format, process, then visualize it for further analysis. Finally, we will analyze the main advantages of using data monitoring tools such as Hosted Graphite and Grafana by MetricFire.

What Is Government Digital Transformation?

The U.S. federal government knows it has not kept pace with technology innovation. Recent legislation and a $1 billion modernization fund aim to bring the federal government up-to-date. What does government digital transformation mean, and what are federal IT leaders doing to modernize their agency’s IT?

Grafana 8.4 release: new panels, better query caching, increased security, accessibility features, and more!

Grafana 8.4 is here! Get 8.4 This release includes a variety of updates focused on making Grafana easier to use, improving performance, and keeping your data secure. For a full list of new features and capabilities, check out our What’s New in Grafana 8.4 documentation. You can get started with Grafana in minutes with Grafana Cloud. We have free and paid Grafana Cloud plans to suit every use case — sign up for free now.

Cut Out the Noise: Issue Grouping and Alerting Best Practices

We’re drowning in emails and Slack notifications. As our eyes glaze over, we start bulk-archiving everything into folders we most likely never go into again - missing critical bugs, crashes, or slowdowns sometimes weeks too late. Learn from Dustin Bailey, Solutions Engineer at Sentry, and Phillip Jones, Ecosystem Product Manager, as they share issue grouping and alerting best practices to help cut out the noise so you can start taking action on issues faster.

How Cribl LogStream Doctors QRadar

We know the old adage: All data is security-relevant. But at what cost? Many organizations are still trying to get their arms around existing data flows and tooling to say nothing of new apps and data sources coming into play as we continue to migrate to the cloud. Working to get a complete picture of their security environments, many CISOs are forced to make painful decisions between staying within budget and getting complete security event visibility.

Ask Miss O11y: Making Sense of OpenTelemetry-Context

“What is up with the Context in OpenTelemetry? Why do I need to mess with it at all? Why, when I set a span as active, don’t subsequent spans just use it as a parent?” Oh, yikes, yeah. The Context abstraction in OpenTelemetry is hard to understand. Here are several ways it’s tricky.

Defending Your Network Infrastructure Against Attack

News over the last few years has been thick with reports of major data breaches on corporate network infrastructure. In the cases of the Panama Papers, the OPM leak, and the Hacking Team leak, the results were catastrophic leaks of extremely confidential information. In truth, a determined and well-resourced attacker can always find a way in.

6 Engagement Strategies to Avoid a Sluggish Windows 11 Adoption

Every digital interaction behind a screen depends on the Operating System (OS). It is the most fundamental element of the Digital Employee Experience (DEX). But IT pros are understandably apprehensive about their inevitable Windows 11 migration as so many things can go wrong; and with every failure comes additional delays, costs and frustrations – both for IT and for employees.

Troubleshoot From Anywhere with PanSift

This article was written by Donal O Duibhir, Founder & CTO, PanSift. Scroll down for the author bio and photo. In 2015 I gave a brief talk at the Wireless LAN Professionals conference in Berlin about remotely troubleshooting client performance and Wi-Fi at scale. The solution I described then was rough and didn’t yet use a time series database (TSDB), but the requirements and goals are still valid and even more vital today.

SCOM 2022: the most exciting updates

Technical Evangelist, SquaredUp Great news for the SCOM community – SCOM 2022 is here! SCOM isn’t going anywhere and it’s only getting better. We saw this proven in the Big SCOM Survey Results 2021 where more than half of respondents said they were going to increase their SCOM deployment to monitor more of their existing and new infrastructure.

How To Detect and Prevent Zero-Day Vulnerabilities With Smart Infrastructure Monitoring Tool

“End of life, end of support, pandemic-induced shipping delays and remote work, scanning failures: It’s a recipe for a patching nightmare.”, federal cybersecurity CTO Matt Keller says. Ensuring a high level of security for your IT infrastructure and being sure you have not missed something is hard to arrange during these days. A zero-day exploit happens when hackers identify a software weakness or a security gap and take advantage of it to perform a cyberattack.

Introducing Grafana k6 Cloud for Education, a free program to help teach performance testing

Grafana k6 is our open source tool to help you ship reliable applications by doing performance testing in a modern and developer-friendly way. Performance testing is still unknown to many, but it is not a new topic. In fact, performance testing courses are everywhere — even at colleges and universities. One of our passions is to educate others on the best practices of performance testing, working together with the Grafana k6 community.

Enhanced Network Monitoring with Progress Flowmon

Ensuring that networks and the applications they enable are performing as well as they should is a full-time and challenging task for system administrators. We've all encountered scenarios in which end-users complain that an application is slow. Then the network team says it's not their problem, and the development team (or third-party application vendor) also says it's not their problem either.

Updates to Dashboards and Stats

Between planning, triaging tickets, negotiating requirements with external stakeholders, and actually building software, it’s hard to take the time to make dashboards or even think about the most important metrics your team needs to track. To make it easier for you to get insights into your team effectiveness and project health, we made a few updates to Dashboards and Stats that you just might like.

Coralogix - On-Demand Webinar: Achieving Scale and Compliance During a Global Expansion

Armis is the first agentless, enterprise-class security platform to address the new threat landscape of unmanaged and IoT devices. With a hybrid environment of both single and multi-tenant infrastructures generating massive amounts of data, the team needed a powerful solution to centralize and manage their log data. In this session, Armis’s Head of DevInfra Roi Amitay discusses how his team leverages Coralogix’s unique capabilities together with custom-built dev tools to streamline the development and debugging of microservices on multiple EKS clusters.

Coralogix - On-Demand Webinar: Decoupling Streaming Data Pipelines at Scale

In this session, Harel Ben-Attia, Chief Architect at Coralogix shares the model we have implemented in order to create a resilient and scalable streaming data pipeline and how we had to rethink our entire approach to message processing from the ground up in order to achieve our goals.

9 Best Practices For Salesforce Performance Monitoring

Salesforce influences the productivity of entire organizations. That is why monitoring your Salesforce performance in a timely and professional manner is imperative. With GermainAPM’s help, your Salesforce response times and behaviors can be proactively monitored in many ways. Many enterprises depend on the mission-critical capabilities of Salesforce. One minute of downtime in your Salesforce app can cost you hefty.

7 best session replay tools for analyzing user behavior

How well your website works, and your visitor’s experience can make or break your business if you run your business digitally. Subsequently, your lead generation and eventually your sales can be negatively affected by bad user experiences. Almost every website owner has asked himself, are visitors enjoying using their website, store and other web applications? Do they find the information they anticipated? Or are they finding the interface friendly enough?

How Many Tools Do ITOps Teams Need to Observe?

In the recent past, every enterprise has had to deal with an outage, leading to war rooms where ITOps teams are put on the spot. While they take on the burden of ensuring 100% uptime, it is often the tools they employ which don’t live up to their promises. Especially in the wake of the pandemic, with working norms being redefined, ITOps teams have been under even greater pressure to deliver. While they strive to be efficient and rely on cutting-edge technology, uptime is often elusive.

10 Orion Platform Connected Use Cases You Should Know

IT environments are getting increasingly complex and can often straddle on-premises gear, private cloud, public and/or multi-cloud, and SaaS environments... oh, and it’s a moving target. Your legacy applications may have to reincarnate in one of these forms with minimal disruption. And when things go wrong, you need all the help and pointers you can get to identify the root source of the problem and what it’s impacting.

Announcing official Icinga packages for RHEL, Amazon Linux 2 and SLES

We are pleased to announce the general availability of Icinga installation packages for Red Hat Enterprise Linux, Amazon Linux 2 and SUSE Linux Enterprise Server. We extend the list of supported operating systems to give you even more options where you can run Icinga. At the same time we respond to changes and requirements by operating system vendors.

How to Use the For-Each Feature with DX Unified Infrastructure Management's Monitoring Configuration Service

For-Each is a new feature added to the DX Unified Infrastructure Management’s (DX UIM) Monitoring Configuration Service (MCS) that uses the device attributes with one or multiple values. MCS will loop through each value and create a profile for each one. If that attribute does not exist for a device, no profile will be created. Similarly, if a new value is added or removed from a device, MCS will revaluate and add or remove profiles.

Optimizing Mobile App Startup with Splunk Real User Monitoring

One of the most challenging and rewarding things I do as a Principal Software Engineer in our Splunk Mobile division is ensuring our customers’ experience meets the quality and standards we promise to keep. My team and I are part of an on-call rotation that is committed to measuring and optimizing key Service Level Indicators (SLIs) using Splunk Real User Monitoring (RUM) and Splunk On-Call (iOS & Android) mobile apps.

IT Service Intelligence (ITSI) Comes to Splunk Mobile and TV

Why should only Dashboard Studio users get all the fun new features on Splunk Mobile and Splunk TV? To spread the cheer this new year, we brought the latest and greatest Mobile and TV features to IT Service Intelligence (ITSI) Glass Tables, so that you can view your ITSI data anywhere at any time!

User Experience Web Monitoring Software Guide

After the long process of designing and creating your website, your journey in its development has only started. Now you need to move forward, watching, analyzing and fixing any problems that may occur (and probably will) that could ruin your customer’s website experience. But how can you learn if there’s anything that’s wrong with your website? How can you measure user experience and learn where improvements are needed? Well, we have got an answer for you!

How to Create Low-Code Workflow Automations with Pipedream and InfluxDB

A big part of modern software development involves working with APIs. While using 3rd party services can speed up development, moving data around and gluing things together can be pretty dull. Luckily, there are a growing number of tools that help deal with the boring stuff so you can focus on more interesting things. One of these tools is Pipedream.

SCOM is a great addition for monitoring Kubernetes - and this is why!

Kubernetes is one of the most prominent container orchestration platforms available today. As cloud-native- and container solutions gain attention, so is Kubernetes. The platform that Google open-sourced in 2014 has even become the standard for container management for private- and public cloud. With the new approach towards cloud-native application development, where microservices and containers are essential, there is a big focus on software development and how to migrate to the cloud.

The NetOps Expert - Episode 5: Broadcom Software and AppNeta - Part 1

Jeremy Rossbach, Head of DX NetOps Product Marketing and Alec Pinkham, Head of AppNeta Product Marketing discuss the recent acquisition of AppNeta by Broadcom and the reasons why the combination of both network monitoring solutions sets it apart from the industry and why our customers should be excited for the future of their network visibility.

Why is Python so Popular?

Despite several widely acknowledged flaws, Python remains one of the most popular development languages worldwide. The sole fact that for years Python had two different and incompatible versions existing in parallel should have spelled the end for Python given the numerous alternatives available in the market. But Python overcame this conflict. Developers also criticized Python’s design and functionalities. Python is known to be slow and inadequate at dealing with memory-intensive operations.

5 Best RMM Software Tools to Use in 2022

As the world has shifted towards remote work in the wake of Covid-19, the need for remote management and monitoring tools has increased. Managed service providers (MSPs) use RMMs to provide off-site, automated and streamlined support to their clients for their IT needs. Having an efficient RMM can reduce workload and give MSPs a competitive edge over adversaries. So let’s take a more detailed look at what an RMM software can do and then a look at the best RMM software tools for MSPs in 2022.

What are cardinality spikes and why do they matter?

At Grafana Labs, we spend a lot of time talking to our customers, and something we’ve heard from people in a wide range of organizations is that they want to be able to better manage sudden spikes in cardinality. Here we will give you a basic overview of what cardinality is and why it’s an important factor in your observability setup, especially when there is a dramatic uptick.

Optimized Traffic Mirroring Examples - Part 2

In a previous post, we looked at an example of a fictional bookstore company and recommended mirroring strategies for that specific scenario. In this post, we’ll be looking at a fictional bank and recommended mirroring strategies for their network traffic. For a list of the most commonly used strategies, check out our traffic mirroring tutorial.

Log4j vulnerability highlights the value of a combined security and observability approach

When we launched AppDynamics with Cisco Secure Application in early 2021, it was the industry’s first integrated application performance management (APM) and runtime application security offering. We made a bold bet that consolidated monitoring would become increasingly important and provide significant benefits such as improved security capabilities and reduced costs. It was the right bet.

Minimize the Risk of Logging Over the Internet: How LogStream Cloud Can Be Paired With Cloudflare

With the proliferation of security SaaS platforms, such as Cloudflare, Proofpoint, and PingOne, enterprises must figure out how to integrate third-party data shipped over the internet into their analytics and SIEM platforms. This requirement to integrate third-party data raises a host of security, infrastructure, and data quality questions. Enterprises can lower risk, and complete projects faster, by using Cribl LogStream Cloud to solve their challenges in managing third-party SaaS platform data.

What Can We Learn from AWS's December Outagepalooza?

2021’s slew of Internet outages or disruptions show how connected and relatively fragile the Internet ecosystem is. Case in point: December’s trifecta of Amazon Web Services (AWS) outages, which really brought home the fact that no service is too big to fail: The reality is, the next outage is not if, but when, where, and for how long. Pretending they don’t exist or won’t happen is not only pointless but harmful to your business.

Distributed network visibility, the ultimate weapon against chaos

2022, the world is the technological paradise you always dreamed of. Space mining, smart cities, 3D printers to make your own Darth Vader mask… Just a little problem, society is based on digitization and communications and you have no idea about the visibility of distributed networks. Something of vital importance considering the rise of cybercrime. Well, don’t worry, we’ll help you.

How to Simplify Monitoring for Complex Network Software

Digital transformation is causing the IT ecosphere to evolve, and the evolution is being accelerated by competitive necessity. Enterprises are using digital technology to increase revenue and lower costs. Failure to compete effectively will have devastating consequences. Digital transformation requires that IT evolve from a cost center to a value creator. FinOps and DevOps are processes that include the entire enterprise in value creation.

AWS EC2 Cost Optimization Best Practices

Amazon Elastic Compute Cloud (EC2) is one of the core services of AWS, designed to help users reduce the cost of acquiring and reserving hardware. EC2 represents the compute infrastructure of Amazon’s cloud service offerings, providing organizations a customizable selection of processors, storage, networking, operating systems, and purchasing models.

Cover Your DRaaS: Everything you need to know about Disaster Recovery

Unplanned downtime carries a hefty price tag for enterprises. In 2020, critical server outages cost enterprises on average at least $10,000 per hour, with 95% of respondents stating that the cost was $200,000 per hour or more. 40% said that the average cost was closer to $1 million per hour, and 17% lost $5 million or more for every hour offline. Those are some sobering statistics that demonstrate the importance of being prepared for the worst. But you’re thinking, “We back up everything!

InfluxData Announces New Customers and Accelerated Momentum in Industrial Data and Internet of Things

InfluxData today announced accelerated momentum in Industrial Data and Internet of Things (IoT) driven by new customers, product enhancements and expanded industrial partnerships fueled by the growth of time series data. Customers including Tesla, Rolls Royce, Airbus, Teck, Graco and Graphite Energy are using InfluxDB to collect industrial data from devices and sensors.

Graphite Energy Uses Time Series Data to Drive Industrial Decarbonization Efforts

One major challenge with decarbonization of industrial heat is converting the variability of renewable energy into the reliability required by process plants. Solar panels only generate energy when the sun is out, and wind turbines generate energy when the wind blows. Industry, however, has a consistent and persistent need for energy. Graphite Energy, based in Australia, recognized this disconnect and set out to create a solution to it.

A Platform Gaining Momentum: Announcing New InfluxDB Features for Industrial IoT

Data – specifically time series data – continues to be the key ingredient for successful digital transformation. No matter the industry, time series data helps companies understand the activities and output of people, processes and technologies impacting their business. The effective management and use of time series data has emerged as the best path towards this goal.

4 Tools to Drive More Traffic to Your Law Website

Today, having a website is an essential component for all types of business, including law firms. Through a website, you get an ideal platform where you can engage with your existing and potential clients, thereby, boosting your company’s image in the long run. With such significance, you must always take your time to create a visually appealing law website to convert possible leads into paying clients. With that said, what should you do if your website is barely generating enough traffic?

DevOps State of Mind Episode 8: What do DevSecOps and Formula 1 have in common?

Josh Minthorne is the co-founder and global technology director of Axcelinno, an IT technology consultancy and professional services company that helps organizations define and implement their DevSecOps adoption and cloud migration. Today, we're talking about why the security landscape has made companies hesitant to move to the cloud and what they can do to migrate with confidence.

Tips to implement AIOps the right way in 2022

A lot of things have changed in recent years. From the way of working to executing IT operations, the business strategies have changed overnight with arising advances like Machine Learning, Automation, and Artificial intelligence. The technologies have changed present-day applications and IT operations, and with AI and ML on board, IT industries operate more perplexing undertakings and resolve issues across complex infrastructures.

Learn how to get started with Grafana Cloud, Grafana OnCall, Grafana Tempo, and the Grafana Stack

Are your metrics, logs, and traces playing hard to get in your current observability setup? Feel like your on-call messages are left on read? Is the heatmap between your data sources fizzling out on your dashboard?

Broadcom Software Launches Cloud-Based Log Analytics Service for Data-Driven Network Visibility

Human operators utilizing traditional network monitoring software with methods like SNMP, ping, or flow tracking are still limited to diagnosis and triage issues within the four walls of the on-premise data center. But with increased adoption of cloud, SD-WAN and “work from anywhere,” application workloads are getting more distributed and creating network monitoring visibility gaps.

Minimize downtime, and improve performance for Verizon 5G Edge applications with Sumo Logic

It is safe to say that customers and enterprises have come to expect their digital experiences to be near instantaneous. Fifty three percent of consumers will wait no more than three seconds for a web page to render before abandoning the site. But new technologies, like connected vehicles, AR/VR, and industrial automation, are pushing the limits of what traditional architecture can handle when it comes to delivering ultra-low latency.

The Future Workplace Reading List: 6 Impactful Books for IT Leaders

Business leaders are always on the hunt for that life-changing book – the one that deepens their understanding of the workplace and provides an inspiring roadmap for the way forward. But with thousands of highly-praised books to choose from, it’s difficult to know which ones are retreading the same old ground and which contain truly groundbreaking insights. If you can relate, you’ve come to the right place.

Managed VictoriaMetrics announcement

VictoriaMetrics is a fast and easy-to-use monitoring solution and time series database. It integrates well with existing monitoring systems such as Grafana, Prometheus, Graphite, InfluxDB, OpenTSDB and DataDog - see these docs for details. We are glad to announce the availability of Managed VictoriaMetrics at AWS Marketplace - try it right now!

Network and Infrastructure Engineers: Here's How to Up Your Cloud Networking Game

It's easy to get caught flat-footed on things like collecting logs and metrics at cloud-scale, making sense of cloud network performance and health using old-school data like port numbers and IP addresses, and automating processes like troubleshooting and remediation when you don't own the underlying infrastructure.

The Network Pro's Guide to the Public Cloud

Top 3 gotchas in AWS, and how network observability helps you avoid them As you transition apps and services to the cloud, whether fully public or hybrid, the networking component quickly gets complicated. Your team not only has to manage on-premises networking, but also cloud networks now - which requires a different approach to handle things like VPCs and connectivity back to on-prem environments and other clouds. Poor performance and high costs can quickly hinder cloud projects from being successful.

What is AIOps. 4 Types of AIOps Platforms. How to Effectively Navigate the AIOps Landscape.

AIOps or Artificial Intelligence for IT Operations refers to a set of technologies that augment human decisions with autonomous decisions driven by AI and machine learning that learn patterns, relationships from data. AIOps is the term originally coined by Gartner, and pictorially illustrated in the following way.

Datadog Serverless Monitoring for Amazon API Gateway, SQS, Kinesis, and more

Many organizations leverage AWS to build fully managed, event-driven applications, which break down complex workloads into APIs, event streams, and other decentralized services in order to improve performance and scalability. This type of architecture relies primarily on AWS Lambda functions to process synchronous and asynchronous requests as they move between a workload’s resources, such as Amazon API Gateway and Amazon Kinesis.

How to take action from Datadog Apps

Engineers who support production environments are tasked with resolving new issues as quickly and efficiently as possible. But as they look to carry out these responsibilities, their remediation workflows tend to take on the following pattern: For example, someone on your team might discover in a log analysis tool that a user is flooding a key service by making an abnormal number of requests.

Dashboard Fridays: Sample SQL AdventureWorks Dashboard

Join Adam Kinniburgh and Shawn Williams in this latest Dashboard Fridays episode on SQL AdventureWorks. Microsoft provides a common dataset called AdventureWorks when learning how to use SQL Server. Using SquaredUp, the AdventureWorks dataset can easily be visualized in any organization. In this short video, we'll demonstrate how this dashboard was built using SquaredUp dashboards, the challenges it solves, and how you can easily replicate it in your own environment.

Transforming application logs into metrics with Istio and Grafana Cloud

Do you actually know what your customers are looking for? A way to uncover new business opportunities is to analyze your system, collect what you really need, and visualize it through a comprehensive graph! Log traces are a great place to start because they usually contain useful information on your customers' interests. You just need to transform them.

Can your AIOps platform do Log Noise Reduction in addition to Alert Noise Reduction? If not, it is time to re-evaluate your AIOps

One of the core value propositions of AIOps platforms is to increase IT efficiency & productivity by applying AI & ML techniques to perform Alert Noise Reduction. This in turn translates to direct cost reduction due to savings in IT man-hours. In this approach, the AIOps platform kind of becomes like a gatekeeper for all the IT alerts/events, and it can help effectively, reduce and correlate such events, so as to send meaningful incidents to NOC or Service Desk.

How to Troubleshoot Networks with Employees Working From Home | Obkio

With many employees now working remotely, IT teams have had to change the way they manage their networks and services. Intermittent network issues are hard to troubleshoot and more so with remote users working from home. Many of our customers have used Obkio to identify and troubleshoot network issues for their remote employees working from home. From working with these customers, we’ve encountered a lot of similar issues that remote employees often encounter.

Network AF, Episode 9: Learning from great mentors and by breaking things with Hank Kilmer

In a new episode of the Network AF podcast, your host Avi Freedman interviews Hank Kilmer, VP of IP engineering at Cogent. Hank has been running major internet backbones since the early 90s. He joined Cogent in 2011, and prior to that, held leadership positions with UUNET (now Verizon), Sprint, Digex, Abovenet and Terrapin Communications.

Server Uptime Monitoring: What, Why, and How?

In an earlier blog post, we had discussed how server performance monitoring is not just about monitoring CPU, memory, and disk resources anymore. There is more to server performance monitoring than just three resources or metrics. That blog post covered several key performance indicators (KPIs) that IT teams must track to ensure that their servers are performing well. In this blog post, we focus on another KPI – server uptime.

ICYMI: Achieving Visibility in Your CI/CD Pipeline With Honeycomb + CircleCI

Before continuous integration came to be, setting up builds was no fun because the complexity and overhead involved in a release cycle was compounded by inflexible, manual processes. The release cycle was slow and often resulted in breaking changes. Continuous integration and continuous delivery (CI/CD) has changed much of that through pipelines that automate how we build and test software—today, we can deploy, have builds fail, and resolve any errors faster than ever.

New Research: The State of Cloud-Driven Transformation

Over the last couple years, cloud transformation has become increasingly critical, evolving from a preferable priority to an urgent imperative. In our rapidly changing world, organizations have had to innovate at unprecedented rates — and those most successful are harnessing the power of cloud to move faster and smarter. But it’s more than a simple migration.

Datadog acquires CoScreen

At Datadog, we’re dedicated to building a platform that helps teams detect, troubleshoot, and resolve issues in their applications and infrastructure. We know that our customers need to be able to debug issues, explore ideas, and manage incidents efficiently, and that means having access to tools that can help them seamlessly share information and leverage the expertise of their distributed teams.

Traceroute software-the troubleshooting tool your network needs

The need for in-depth network monitoring is growing exponentially as organizations expand in size and more companies are established. Increased monitoring needs demand a feature-rich tool to simplify networks and get a clear view of their underlying infrastructure. Diagnosing network faults and ensuring a well-balanced operation of all network devices is the primary task of a network admin any day.

Best CDNs: United States

The online marketplace grows more competitive by the day—especially for content-rich websites and applications. As America has the third-largest amount of internet users by population (next to China and India), a sizeable portion of global online activity is generated from the United States. America also happens to be the world’s current largest national economy.

SCOM Notifications: Kill your legacy email subscriptions in favor of Microsoft Teams

Microsoft Teams is a product that already before the pandemic started to gain popularity and, during, helped employees communicate and collaborate in their new remote offices. There are precursors to Teams - in Microsoft's portfolio, Skype for Business was retired last year, and after that, the increase of Teams users have been skyrocketing. Why Teams differ from some of the other communication alternatives is the many functions of teams that allow for much more than a chat, call, or virtual meeting.

Usual Performance Suspects: Introducing Suspect Spans

A trace is the end-to-end journey of one or more connected spans and a span is an operation or “work” taking place on a service. So when it comes to debugging a performance issue, being able to pick out slow spans out of a line up is the fastest way to seeing the root cause and knowing how to solve it. Suspect Spans surfaces a list of spans that correspond to where the most time in a transaction is spent.

Bootstrapping a cloud native multi-data center observability stack

Bram Vogelaar is a DevOps Cloud Engineer at The Factory, and he recently delivered an intro to observability talk during our Grafana Labs' EMEA meetup. When I talk to customers, they might tell me about how their applications are running in two data centers, but when we probe a little further, it turns out that their observability stack is only available in one of them. This revelation hit close to home last March.

Logstash: Path to ECS for 8.0

The Elastic Common Schema is a community-driven effort to provide consistent semantic meaning to datasets so that data from disparate sources can be meaningfully used together. In Logstash 8.0, ECS compatibility is on-by-default — this is a pretty major change to how many plugins operate. In this talk, we outline the rationale behind the transition and also highlight how to opt-OUT of the transition with a simple pipeline setting.

Launching a labor of love, Kentik Market Intelligence

When it comes to the internet, understanding the global ecosystem can be tough. There’s a lot of manual work that service providers and digital businesses have traditionally put into finding the best way to reach customers over IP networks. And more work is needed for benchmarking against competitors and finding the best relationships for peering.

Webinar Recap: Force Multiply Your Security Operations Teams with Cribl LogStream

We hosted a webinar a few weeks back on using Cribl LogStream to make your security operations more scalable, efficient, and cost-effective. The turnout was fantastic and, while we answered most of the audience’s questions live, we couldn’t get to all of them. So I’ll go through the questions we couldn’t get to and offer some answers. Along the way, I’ll also share the results of two polling questions we asked during the webinar.

Azure AD Monitoring Tips and Strategies

The Azure Active Directory (Azure AD) is Microsoft’s cloud-based identity and access management (IAM) service and an identity provider (IdP). Azure AD is the backbone for authentication in Microsoft 365 and for thousands of cloud-based SaaS applications. Azure AD provides several features for your organization and one of the features is the Microsoft Identity Platform.

Icinga Web - Not just Black and White

Most of you know that Icinga Web can be adjusted by themes. Some of you also have made some! Icinga Web itself comes with several themes since the early days. Now with the next upcoming major update v2.10 we’ll take themes to their next evolution. But since we’ve postponed this feature, much additional work has gone into it, which I want to outline today. There will be some general hints for module/theme development as well.

Ask Miss O11y: Making Sense of OpenTelemetry-Tracer and TracerProvider

OpenTelemetry is a strong standard for instrumentation because it is built of careful, well-thought-out abstractions created by experts in the space. OpenTelemetry feels painful to start using because it’s full of abstractions that make sense to experts in the space. For a developer who wants to think about their own software and not spend a month becoming an expert in telemetry, this is hard. For high-level conceptual description, there’s the OpenTelemetry specification.

Best Splunk Alternatives [2023]

Every business from large enterprises through to small startups needs some level of log management in their day to day operations. For large-scale enterprises, Splunk has quickly become one of the most popular log management solutions globally. Splunk was developed for enterprise-level log analysis and Security Incident and Event Management (SIEM). The tool can also be used by medium-size enterprises as long as your organisation generates large volumes of machine data and log files.

Getting Started with Arduino and InfluxDB

Time series data differs from “normal” data in an interesting way. The essential characteristic is that the data’s primary point of reference is a timestamp showing at which point in time a sample of data was measured. Time series databases like InfluxDB are helpful for situations that involve this kind of data.

Making a More Accessible navigation

I’m Tim, a Product Design Manager at LogDNA. My team is responsible for creating a beautiful and easy-to-navigate user interface so that you can easily access, and gain value from, your logs. We’ve been working on making our product’s navigation more accessible and are rolling out a mixture of subtle and more noticeable changes.

JavaScript Mutators & The Programmable Observability Pipeline

JavaScript mutators shine among the improvements in Sensu Go 6.5 – they are both more effective and more efficient at transforming Sensu event data than pipe mutators. This post explores the advantages of JavaScript mutators and includes an example, but first, a brief review. In the Sensu observability pipeline, checks generate events, which Sensu then filters, transforms, and processes. A mutator is a component that transforms the event data.

Monitor SAP NetWeaver with Agentil's offering in the Datadog Marketplace

SAP delivers a suite of solutions for managing business operations, such as enterprise resource planning (ERP), customer relationship management (CRM), and supply chain management. Many of these solutions, including SAP S/4HANA, run on top of the SAP NetWeaver application development and integration platform. Enterprises and SAP managed providers often operate hundreds or even thousands of SAP NetWeaver systems, and Agentil provides a centralized, low-overhead way to monitor their entire fleet.

Synthetic testing: A definition and how it compares to Real User Monitoring

Performance monitoring is critical for a healthy software application. If you don’t have synthetic testing or real user monitoring in place, opportunities for performance optimizations are slipping through the cracks. With the guidance of a monitoring tool, on the other hand, you could be fixing problems such as slow-loading pages within the hour. The two main types of application monitoring are Real User Monitoring (RUM) and synthetic testing (or synthetic monitoring).

Tagging in a monitoring tool: what is it and how can it benefit your team?

As you start to have responsibility for more than a handful of SQL Server instances, you’ll need to get more organised. Everyone around you benefits if you’ve recorded basic things like what the server does and who is responsible for it, and we think that a great place to do this is in your monitoring tool (and, better still, if that’s SQL Monitor!).

Real-time drone tracking and management with Grafana

The number of internet-connected assets around us that are powering services and utilities in a wide array of sectors is rising at an exponential rate. As a result, it’s becoming critical for businesses that provide such services and utilities to have an observability stack tailored to the type of physical hardware devices that are generally deployed in swarms.

Financial Services Customer Maintains 99.99% Uptime With LogicMonitor

In this case study video, LogicMonitor is joined by Abrigo, a software company for financial institutions, to discuss the evolution of the financial technology space through the digital transformation era. From supporting PPP loans throughout the pandemic to consolidating a plethora of monitoring tools into one platform for greater visibility and ease of use – LogicMonitor provides Abrigo with the enterprise-grade SaaS monitoring solution it needs to support its customers 24/7, around the globe.

GCISD Accelerates Digital Transformation With LogicMonitor

In this case study video, LogicMonitor is joined by Grapevine-Colleyville Independent School District to discuss the evolution of the education space through the digital transformation era. From keeping tabs on thousands of devices, consolidating a plethora of monitoring tools into one platform for greater visibility and ease of use, and leveraging AI powered alerting and forecasting, LogicMonitor provides GCISD with the enterprise-grade SaaS monitoring solution it needs to support its students 24/7, wherever they are learning.

DeveloperWeek 2022: Front-end Code Observability: Errors, Performance, Web Vitals

Good user experience requires a well performing frontend application. Code observability on a frontend application—to understand errors and their relevancy, performance of transactions, and Web Vitals to quantify website quality—is complex. By watching this video on-demand, you'll learn more about the tools that are available to aggregate and organize relevant frontend data to provide necessary visibility on errors and performance to keep users engaged.

Parallelizing Queries with Rails 7's `load_async`

As you're likely well aware, Rails 7 was released last month bringing a number of new features with it. One of the features we're most excited about is load_async. This features allows for multiple Active Record queries to be executed in parallel which can be a great tool for speeding up slow requests. Since Rails introduces an entirely new infrastructure for load_async, Skylight's existing integration wasn't capturing all of these queries.

Datadog on Profiling in Production

Depending on your chosen programming language and stack, you may have never used a profiler in production. The very idea of using a profiler in production for a web service may seem unrealistic, due to the amount of overhead involved. After all, aren’t profilers extremely computationally expensive to run? Despite a reputation for being computationally expensive, many programming languages have examples of profilers built to run in production. The importance of seeing how your application behaves in production is critically important to understanding how it performs in the real world.

Share Progress and Celebrate Wins With Demo Days

In my last blog post, I talked about the cadence of product planning and delivery at Honeycomb. Tucked away in there was a mention of “demo day”—and I’m back to tell you all about that because it’s a pretty big deal around here, and I want to encourage you to give it a try as a way to see progress on new feature development and get folks excited about what’s on the horizon.

Gartner Market Guide for IT Infrastructure Monitoring Tools and What It Means for Broadcom's DX Unified Infrastructure Management Customers

The increasing adoption of modern and cloud-native architectures is enabling enterprises with IT infrastructure that is more dynamic and ephemeral, and thus more resilient. This trend drives infrastructure monitoring tools to transition from simply “keeping the lights on” to providing advanced insights such as predictive analytics for infrastructure workload optimization. Infrastructure monitoring that was once art has become science.

NEW: Splunk Synthetic Monitoring Adds Single Sign-On (SSO) and Security Improvements

Splunk customers are security conscious organizations demanding enterprise-grade features for their global workforce. Today, we are excited to announce several Splunk Synthetic Monitoring updates, including: support for Single Sign-On (SSO) via SAML 2.0, Concealed Global Variables, and an updated synthetic browser version (Chrome 97).

PromQL cheat sheet

PromQL is the dedicated query language for the metrics and monitoring Stack known as Prometheus. PromQL is well know for having a steep learning curve. Because of this we've created a helpful cheat sheet as a reference to help you with understanding the most common PromQL queries. Please feel free to save the sheet below and share it with any team members that you think would appreciate learning some of the most important queries of PromQL.

What Happens When Digital Transformation Comes to a Complete Standstill?

When an SCCM update failed, this organization found a creative workaround to install MS Teams on 1,200 devices in 48 hours. While every employee is more reliant on their digital experience in remote and hybrid workplaces, the future of work has brought about an increasing number of communication and support challenges for EUC teams. This gap is only exacerbated by the growing pressure on IT teams to deliver unmatched digital employee experience in the ongoing effort to retain talent.

When technology goes wrong: Tesla's outage case study

You’d be right to think that Tesla’s technology surely wouldn’t go wrong, especially with the huge amounts of media coverage it gets. But in 2021, Tesla suffered a few awkward technological faults. You may have read that Tesla went offline which lead to customers around the world reporting issues around gaining access to their cars.

Slash MTTR, avoid costly downtime with improved cross-team Collaboration

Every second counts when IT teams are called upon to resolve business impacting issues. In modern enterprises, poor communication, fragmented toolchains and spiralling IT complexity can conspire to slow down incident response, putting service availability and ultimately customer satisfaction in peril.

How to Test Multi-Factor Authentication for Microsoft Teams

Our last blog introduced Multi-Factor Authentication (MFA) for synthetics and discussed how MFA works. Most of our customers use Microsoft Teams as their Go-To messaging and collaboration application. So in today’s article I will show you how to deploy the Teams Audio Video sensor in your environment with an MFA configuration. This will enable testing MFA while at the same time testing the performance of a Teams audio video conference.

UI Breadcrumbs for Android Error Events

In cases, when a crash happens in your Android application, you want more context on what occured before the issue — kind of like following breadcrumbs to the exception. Our SDKs automatically report breadcrumbs for activity lifecycle events, system events, HTTP requests, and many more. Now, Android developers will also see UI events listed as breadcrumbs and get the full picture of what happened without ever having to recreate the issue.

How Your Web Monitoring Benefits From Multi-Channel Alerting

Have you ever had to purchase a CPU or a GPU? If so, you have probably come across the term “bottlenecking”. There is a certain threshold where output exceeds ability to process, and that can prevent optimal system functionality. One of the methods used in computing to overcome these bottlenecks is multi-threading, where requests are processed simultaneously by multiple threads. We can apply a similar principle to downtime monitoring.

What to Know About Azure SQL Database Serverless Compute Tier

Over the past several years, I've helped numerous customers migrate SQL Server workloads to Azure SQL, including Azure SQL Database, Azure SQL Managed Instance, and Azure SQL Virtual Machines. In this article, I'll explain some of the challenges of optimizing the compute cost for an Azure SQL Database deployment and review how the serverless compute tier can greatly simplify it.

The Observability Lake: Total Recall of an Organization's Observability and Security Data

Enterprises are dealing with a deluge of observability data for both IT and security. Worldwide, data is increasing at a 23% CAGR, per IDC. In 5 years, organizations will be dealing with nearly three times the amount of data they have today. There is a fundamental tension between enterprise budgets, growing significantly less than 23% a year, and the staggering growth of data.

Releasing Icinga for Windows v1.8.0 - The Power of Rework

Today we are happy to announce that after month of work we finally can release Icinga for Windows v1.8.0. As discussed in our live Icinga for Windows Q&A on our YouTube-Channel, we spent lots of time resolving issues reported by our community and customers and in general improved the performance as well.

Are Network Problems Hard to Find? Not for you!

In our daily life we can face different difficulties. From spilling coffee on our clean shirt just before leaving home to not finding an emoji that satisfies us to answer that someone we like. Stupid little things compared to how difficult it is sometimes to identify network problems for an external IT provider.

10 Best Practices to Get the Most Out of Your IT Infrastructure Monitoring

IT infrastructures are in a constant state of change. From centralized mainframe systems to distributed serverless multi-cloud environments, these changes have happened relatively quickly. And nothing is stopping it. Gartner predicts that by 2023, over 90% of IT organizations will have most of their staff working remotely. This is largely due to companies shifting to using more cloud services. IT operations teams have had to find ways to keep up by implementing effective IT infrastructure monitoring.

5 Capabilities You Should Look for in an Application Performance Monitoring Tool

With the rapid pace of change in applications and the infrastructure they run on, it’s more important now than ever to monitor what’s happening. Using an application performance monitoring (APM) tool can help provide the necessary visibility and insight you need for proactive and reactive problem resolution. But whether you’re looking for a replacement or have no monitoring in place, where do you start?

What is a Supply Chain Attack (and What Can You Do About It)?

Any cybersecurity breach is damaging to individual companies. But when it becomes a supply chain attack, the results can be chaotic and widespread. While most businesses overlook the dangers of supply chain cyber attacks, hackers have not. Malicious actors are continuously looking for, and finding, new ways to invade company networks. With these looming threats, companies must know how to prevent supply chain attacks and find new means of securing against cybersecurity breaches.

InfluxData Named a Winner in 2021-22 Cloud Awards

InfluxDB Cloud wins Best Use of the Cloud in the Internet of Things category SAN FRANCISCO, February 8, 2022 – InfluxData, creator of the leading time series platform InfluxDB, today announced InfluxDB Cloud was named a winner in the 2021-22 Cloud Awards category for Best Use of the Cloud in the Internet of Things. Now in its 10th year, the Cloud Awards recognize innovation and excellence in cloud computing among startups and global enterprises.

3 reasons top-notch infrastructure performance is critical for service providers

Brocade, a Broadcom Company, named 2/8/2022 as End-of-Support (EOS) for Brocade Network Advisor (BNA), the collection mechanism for Brocade fabrics. Broadcom recommends Brocade SANnav as a replacement for BNA. To continue providing industry-leading infrastructure intelligence, Galileo’s new v2 agent for Brocade will use the REST functionality to collect all the configuration and performance metrics required. Read on for all the details you need to know.

Percepio Releases Tracealyzer 4.6 with Improved Zephyr and ThreadX Support

Percepio®, the leader in visual trace diagnostics for embedded systems and the Internet of Things (IoT), today released Tracealyzer 4.6 with official support for Zephyr RTOS and Microsoft Azure RTOS ThreadX. The new release also includes Percepio’s next generation trace recorder library with improved support for snapshot trace.

New release: SquaredUp 5.4 - Bring your answers to the surface

SquaredUp 5.4 is here and it’s got some brilliant new features and upgrades to help you troubleshoot and find answers faster than ever. You’ll find the upgrades across all three SquaredUp editions – for SCOM, Azure, and the free Community Edition. Here are the highlights in the new release: Check out our release webinar for a detailed walkthrough and demo.

Advanced filters on the upcoming Traces tab, 40+ PRs and getting featured - SigNal 09

Hola! Welcome to SigNal 09, where I will run you through the updates of the first month of 2022! The focus of the month was our upcoming brand new Traces page. It will enhance the application debugging experience manifolds with powerful filters to see your data across different dimensions. We also launched our Technical Writer Program. The idea is to educate our community more about SigNoz, OpenTelemetry, and distributed tracing among other things.

A Beginner's Guide for Grafana Loki (Open-source Log Aggregation by Prometheus)

Many logging solutions are available on the market to deal with log data, each focusing on a different part of the logging issue including log aggregation. These solutions are open-source and proprietary software and tools incorporated into cloud provider platforms, as well as a variety of capabilities to fulfill your requirements. Grafana Loki is a new industry solution, so let's take a closer look at what it is, where it originated from, and whether it can suit your logging requirements.

Going off-label with Grafana Loki: How to set up a low-cost Twitter analysis

The term “off-label” is used to describe when a product is being used successfully for something other than its intended purpose. It’s a quite common occurrence in the pharmaceutical industry, but it can also happen in the world of software. Grafana Loki was written as — and is marketed as — a simple, Prometheus-friendly logging backend with a very low total cost of ownership.

A collection of 24 great 404 http error pages

The 404 error is one of the most common web errors experienced by users. There are a number of different reasons that the server might not be able to find the resources requested by the user. For example, if a link on your site points to a non-existent page, a 404 error will be generated by the server. Here is a collection of 24 great 404 pages for your inspiration.

What is Uptime Monitoring and Why You Need It for Your Website

Your website is the lifeblood of your business. It’s how you connect with your customers and market your product or service. You want to know that it’s running smoothly at all times, but that may not always be the case. Sites go down due to many different events such as DDoS attacks, hardware failures, and human error. Luckily, there are ways to monitor your site for downtime and take precautions before it reaches critical status.

Active Directory Auditing Best Practices

Active Directory (AD) is a foundational element of any Microsoft Windows environment because of the part it plays in authentication, access management, account management, and authorization. To ensure the health and efficiency of your Active Directory, it’s crucial for you to engage in proper Active Directory auditing and reporting best practices. This article will detail essential Active Directory auditing best practices and provide recommendations for the best Active Directory auditing tools.

Getting Started with Google Cloud Logging Python v3.0.0

We’re excited to announce the release of a major update to the Google Cloud Python logging library. v3.0.0 makes it even easier for Python developers to send and read logs from Google Cloud, providing real-time insights into what is happening in your application. If you’re a Python developer working with Google Cloud, now is a great time to try out Cloud Logging! If you're unfamiliar with the `google-cloud-logging` library, getting started is simple.

Kubernetes Monitoring: A Beginner's Guide

Kubernetes monitoring involves tracking application performance and resource utilization across cluster components, such as pods, containers, and services. The goal is to gain visibility into the health and security of your clusters. Kubernetes provides built-in features for monitoring, including the resource metrics pipeline that tracks several metrics like node CPU and memory usage and a full metrics pipeline.

Is AIOps NoOps? No, But It's the Closest We'll Come

Making IT operations simpler – which AIOps does by helping teams to make smarter, more informed decisions about complex monitoring and APM problems – is great. But what would be even greater is eliminating the need for IT teams to make decisions at all – a prospect known as NoOps. By automating application management to the point that human involvement is no longer necessary, NoOps offers tantalizing possibilities for the IT operations teams of the future.

How to Configure the Opentelemetry Collector to Begin Collecting Metrics

OpenTelemetry enables Observability, and building observable systems requires you to understand the various ways in which they can fail. Jumping from one possible fix to another and one change to another without fully recognizing the impact on the system can be a significant hindrance to a successful customer experience. In this post, I’ll explain how to get started with OpenTelemetry to help you make your systems more observable.

How to monitor Amazon Kinesis

We live in a world that becomes more connected with each passing day. Public cloud hosts like Amazon Web Services (AWS) provide platforms with a wide array of capabilities that quickly scale based on demand. As a result, we’ve seen an explosion of new applications and services that continue to change our daily lives for the better. Data is a critical component of all of these systems. They can ingest vast amounts of data, process or transform it, and then pass it on.

Is your SAP world ready for the cloud?

Running SAP systems in the cloud can provide a litany of benefits to users at every level of the organization. However, many hidden, unexpected and expensive challenges can quickly arise before, during and after migrating mission-critical SAP systems to the cloud. In this white paper you'll discover the three main challenges plaguing both Enterprise IT operations and Managed Service Providers (MSPs) today, and learn how to overcome them.

How to prevent SAP security vulnerabilities:

SAP creates some of the world's most popular products for managing information, with more than 400 million users worldwide. But SAP connectivity presents one of the biggest security risks for your company. In this ebook, we will consider some of the steps you can take to secure your SAP systems: we'll explore how SAP systems can be compromised, plus we will investigate some of the ways to prevent this from happening.
Sponsored Post

A SNMP-enabled temperature + humidity sensor for under $110

Monitoring temperature and humidity in a server room are quite important if you want to reduce the risk of expensive equipment failure. Yet, many server rooms either aren’t monitored at all or rely on ancient wall-thermostats that, in case of a problem, only emit desperate beeps that nobody will hear.

6 ways your organization can benefit from a network management solution

In today’s world, businesses depend on the internet and networks for nearly all their operations. Most large-scale corporations from banks to IT services have their critical operations built around a network. With network types ranging from wired and wireless to virtual environments, network management has only become increasingly complex, and network administrators need all the help they can get.

Use Datadog's Sourcegraph extension to navigate code and visualize service dependencies

Sourcegraph is a universal code search tool that enables you to easily navigate and understand all of your code, regardless of the number of repositories you have and where they’re hosted. Its built-in code intelligence feature lets you jump to the definition and references of functions and variables, helping you learn new codebases faster.

Boost productivity across teams using new monitor permissions

For IT managers, the workload only seems to keep piling up. That trend was glaringly evident, according to 86% of respondents to a recent IT industry survey who said their workloads have increased post-pandemic. Just because workloads are increasing doesn’t mean they have fall into the lap of one or two individuals if you’re using monitor permissions to your advantage.

Dashboard Fridays: Sample au2mator Services Dashboard

Our au2mator customers heavily use the au2mator Self Service Portal to present automation as a delegated task to the Service Desk, Users, and Admins. This dashboard shows how au2mator were able to visualize all their services within a single SquaredUp dashboard. Using SQL, PowerShell, Azure Log Analytics, and Web Content, au2mator made a dashboard that looks simple but visualizes a lot. Join Adam Kinniburgh and special guest Michael Seidl from au2mator as they demonstrate how this dashboard was built, the challenges it solves and how you can get the dashboard pack!

A look at how the U.S. Department of Defense deploys the Grafana stack

In September 2021, the U.S. Department of Defense’s Iron Bank formally authorized Grafana, Grafana Enterprise, and Grafana Loki, allowing the 100,000 employees and contractors who work on DoD software, both classified and unclassified, to easily select and immediately deploy Grafana Labs software without additional approvals and security certifications. In our first-ever government session at ObservabilityCon 2021, former U.S.

How to Monitor SD WAN Networks | Obkio

To truly see the performance & promise of your SD-WAN network, you need to monitor: How to Monitor SD-WAN Networks With the increasing use of cloud-based apps, businesses are more reliant than ever on the Internet to deliver WAN traffic. So they're migrating from MPLS networks to hybrid WAN architectures and SD-WAN technology. But many businesses lack visibility of their SD-WAN network. So here's how to achieve the visibility you need to monitor your SD-WAN service.

From eBPF to CI/CD: 12 emerging trends in observability

As businesses accelerate digital transformations and cloud adoption to better serve customers and employees in the face of the global pandemic, operational complexity has also mounted. To untangle these complexities and enable executive visibility into IT ecosystem , business leaders are increasingly looking to observability solutions as a strategic investment.

How a company saved 32k hours of IT support and $1.6M on their Windows Migration

A Windows migration is like taking out the trash–you can always delay it, but you’ll have to do it at some point (and you’ll be so happy once it’s done). Unlike taking out the trash, however, the risk of failure throughout the entire migration process is extremely high, especially when you consider all the moving parts.

Getting Started with the InfluxDB API

This article was written by Nicolas Bohorquez. Scroll below for the author’s picture and bio. Time series databases, like InfluxDB, index data by time. They are very efficient at recording constant streams of data, like server metrics, application monitoring data, sensor reports, and any data containing a timestamp. Data in a time series database is always written with the most recent data values but with the previous values not updated.

Website Change Monitoring - What is it & why your business needs it?

Website change detection is a technique that alerts relevant people when a website is modified or updated. A web crawler can review a website on a regular basis to see whether there have been any changes since the previous time it was checked. Monitoring a web page is considered an important step in marketing, sales and advertising, and product support initiatives.

Tracing on the Race Track

Today is test day at Curborough Sprint Track, and the University of Nottingham’s Race Team is taking its creation out for a spin. Frankie is their fully electric 2WD vehicle, able to achieve speeds of up to 80mph (129 km/h). The Team uses Tracealyzer to test the functionality of their embedded software; while writing the code, to record the trace while Frankie is running, and to review the data afterwards. And with great results.

Sponsored Post

What is MTTR? Resolve incidents faster through ops, alerting and documentation

When downtime strikes any distributed software deployment or platform, it's all hands on deck until the lights are green and service is restored. This process, from the recognition of a problem to a deployed solution, has most commonly been defined as MTTR - mean time to resolution. In just the last few years, DevOps and site reliability (SRE) professionals have developed sophisticated new models for how they work and audit their successes. In 2022, MTTR is one of the most widely-used software performance success metrics.

Introducing Datadog Application Security

Securing modern-day production systems is expensive and complex. Teams often need to implement extensive measures, such as secure coding practices, security testing, periodic vulnerability scans and penetration tests, and protections at the network edge. Even when organizations have the resources to deploy these solutions, they still struggle to keep pace with software teams, especially as they accelerate their release cycles and migrate to distributed systems and microservices.

Monitor mainframe performance with mainstorconcept's offering in the Datadog Marketplace

mainstorconcept’s z/IRIS software provides performance monitoring solutions for IBM mainframe z/OS systems, so you can assess your mainframes’ health and their impact on mission-critical services. With support for OpenTelemetry, z/IRIS creates integrable observability data from your mainframe systems.

How to monitor your uptime with OnlineOrNot

Jumping into monitoring software for the first time can be pretty overwhelming. If you're not in an exploring mood, it can be easy to get lost, and you're not entirely sure what all these knobs and buttons do. To help lighten this feeling for OnlineOrNot, I thought it might be useful to let folks know how I use OnlineOrNot, to monitor OnlineOrNot (as part of running OnlineOrNot day to day). Also, our friends at DebugBear wrote a similar article about how DebugBear uses DebugBear to keep their site fast.

Defining your naming conventions: The key to a structured SCOM environment

When it comes to sophisticated software like System Center Operations Manager (SCOM), where a structure is vital to maintain your environment, your naming conventions are the key to long-term success. Many different people are involved in the process of monitoring. To maintain documentation and procedures, you must define how you should name parts used in SCOM. Everything from the management group name to groups, management packs, override management packs, views, and folders.

10 Website Performance Statistics Every SRE Should Know For 2022

Two major shifts are simultaneously taking place in the world of website monitoring: the acceleration of digital dependence has increased the need for high-performing websites and the frequency (and severity) of downtime outages continues to climb. These shifts have made it more important than ever for businesses of all sizes and industries to monitor uptime and page speed.

Pro tip: How to use semi-relative time ranges in Grafana

If you’re even the slightest bit familiar with how Grafana dashboards work, you’ve probably realized that the time range selector is one of the most important features. After all, when you’re using Grafana to visualize time series and logs, defining a time range is required for metrics and logs queries.

What Challenges Does a "Single Pane of Glass" Bring to Enterprise Data?

If I had a penny for each time someone asked for a single pane of glass view across my 20 years in the application monitoring (now observability) space, and I would be retired instead of writing this blog. But, on the other hand, I’d be in big trouble if I paid out each time we failed we finished that ask.

You can judge your monitoring by the tools you use

Whether you are a DIY ace or a master at roast beef, a decorated luthier or the best seamstress in the neighborhood, we all love to work with good tools, right? This includes, of course, good IT professionals. Because IT monitoring tools are fundamental when it comes to supervising a network infrastructure and applying the corresponding policies and security measures. Even so, not every monitoring tool is perfect, in fact some could even get to the point of harming us. Let’s take a look!

Ask Miss O11y: Observability vs BI Tools & Data Warehouses

Yes! While data is data (and tools exist on a continuum, and can and often are reused or repurposed to answer questions outside their natural domain), observability and BI/data warehouses typically exist on opposite ends of the spectrum in terms of time, speed, and accuracy, among others.

Code coverage for eBPF programs

I bet we all have heard so much about eBPF in recent years. Data shows that eBPF is quickly becoming the first choice for implementing tracing and security applications, and Elastic is also working relentlessly on supercharging our security solutions (and more) with eBPF. However, one major challenge is that the eBPF ecosystem lacks tooling to make developers' lives easier. eBPF programs are written in C but compiled for a specific ISA later executed by the eBPF Virtual Machine.

The Question Isn't Whether You're Overspending in the Cloud, It's by How Much

Everyone is doing it. No, I am not talking about the latest Tik Tok challenge… The thing that everybody is doing—every company, that is—is that they are spending more money in the cloud than they need to. In fact, 82% of respondents in our own recent survey admitted that their organizations have incurred unnecessary cloud costs.

SRE: How the role is evolving

The growth of site reliability engineering (SRE) has demonstrated the need for SRE implementations is here to stay for the foreseeable future. LinkedIn voted SRE jobs as the second most promising positions in the US in 2019, and now as we head into 2022, you can be sure to see the evolution of SRE continue to grow and expand. Below, we’ll get into what SRE is, what SRE engineers do, and how SRE will continue to evolve into the future.

How to Get Started with ChaosSearch

ChaosSearch activates your cloud object storage for analytics at scale via multi-API access, with no data movement, no sharding nor re-indexing, and no data retention trade-offs. To help engineers and IT leaders experience the power of ChaosSearch for themselves, we’ve made it easier than ever to get started with our free trial experience.

Get the most out of your Hyper-V infrastructure using ManageEngine OpManager

Virtualization is the technique of creating a software-based virtual version of something, whether that be computers, storage, networking, servers, or applications. Virtualization creates a virtual layer over the hardware, enabling the creation of virtual machines (VMs), which are virtual computers that you can run multiple of on a single piece of hardware.

Know your network needs: A simple guide to why you need a bandwidth monitoring tool

Understanding the needs of your network is vital to keep your network up and running. In the wake of the remote work era, it’s important to monitor and plan your bandwidth utilization. Recent surveys have reported a 45% increase in VoIP and video traffic as the need for telecommuting has doubled since the pandemic. Business Wire, a broadband provider, also reported a 30% spike in data traffic and a 50% rise in voice traffic since mid-March.

Sponsored Post

Just How Important Is Your Integration Infrastructure?

Most companies take their integration infrastructure for granted. I'm talking about middleware such as IBM MQ, Kafka, Solace, ActiveMQ, RabbitMQ. These form the basis of most enterprise-level businesses. One of our electronic manufacturing customers was building products worth $40K per minute. A failure in one of the factory floor's automated systems brought manufacturing operations to a complete halt.

Analyze Ruby code performance with Datadog Continuous Profiler

Ruby is an object-oriented programming language celebrated for its simple and easy-to-read syntax. It powers Ruby on Rails, the open source web development framework that streamlines common development tasks involved in building web applications. We’re pleased to announce that our Continuous Profiler, which provides low-overhead, code-level performance insights, is now generally available for Ruby applications.

How to get the optimal image size for web

If you’ve ever started a project to improve the load times for your website, web app, or mobile app, your heart is in the right place — but your efforts might not be. For many technology leaders, the first instinct is to blame code and infrastructure. They dive deep into optimizing front-end code, scale infrastructure resources, or migrate to a new type of database-as-a-service offering that promises to process requests a few milliseconds faster.

How Do You Manage A Multi-Platform Infrastructure Quickly and Efficiently?

Today, I would like to simplify the technical advantage that Nastel Technologies offers its clients. In a nutshell, Nastel is the leader in i2M (Integration Infrastructure Management) by managing a multi-middleware-platform infrastructure (MQ, Tibco, Kafka, Solace, …) from one interface.

10 tips for log shipping using Fluentd

Fluentd is an open-source data collector that unifies data collection and consumption. It has different types of plugins that retrieve logs from external sources, parse them, and send them to log management tools like Site24x7 AppLogs. tail, forward, udp, tcp, http, syslog, exec, and windows_eventlog are common input plugins.

Atomic User Journeys

The temptation with synthetic user journeys is to create a single, long running journey that checks everything in one go and run it every 5 minutes. This may sound like a good idea because one journey is cheaper than five or because there’s fewer scripts to maintain, but it will make your life much harder and is likely to still cost as much due to the total time it takes to run. Therefore a good approach to user journey monitoring is to create atomic journeys.

Backed by $2.5B valuation, Sysdig goes channel first

It’s an exciting day at Sysdig as we announce our channel-first approach to doing business. What does this mean exactly? Going forward, we will be conducting sales for all customers outside of the Global 500 through a channel partner. For more than three decades, customers have leveraged channel partners as trusted advisors for vendor-agnostic IT consultation and expertise. Our channel-first approach moves Sysdig in line with how customers buy.

Announcing Grafana Incident, smart incident management for your teams

A huge challenge when dealing with incidents is the coordination and communication needed to put things right. What’s happened so far? Who has tried what query? Did we remember to keep stakeholders informed? What is the severity of the incident? Does this affect customers? Figuring this out requires a lot of back and forth as new team members join the incident.

Grafana Incident: First look at the smart incident management tool

Announcing Grafana Incident, the smart incident management tool for your teams. Grafana Incident allows teams to start collaborating immediately by automatically setting up all the essential spaces and resources needed for incident response, from Zoom meetings and Slack channels to a tracker for important tasks and TODO items. A chatbot offers a command-line interface for managing incidents, and provides the ability to instantly embed Grafana queries, dashboards, and metadata, GitHub issues and pull requests, and more. Grafana Incident is available in preview for Grafana Cloud users.

Get Proactive with Nexthink - Part 1

With the rapid development of innovation within the IT space, teams are dealing with an endless influx of support tickets. In order to provide a consistently exceptional digital experience IT teams must keep up with this evolution. This entails advancing beyond reacting to incidents that have already occurred – and starting to proactively solve issues before they ever make an impact on employees.

Difference Between Public, Private, and Hybrid Cloud

Cloud computing is vast. It encompasses a huge range of architectural styles, classifications, and types. This complex computing network has transformed the way we work and is a crucial part of our daily lives, both at home and at work. For organizations, there are many ways to “cloud”, but let’s start with the basics of cloud computing; the internet cloud.

Get Proactive with Nexthink - Part 2

With the rapid development of innovation within the IT space, teams are dealing with an endless influx of support tickets. In order to provide a consistently exceptional digital experience IT teams must keep up with this evolution. This entails advancing beyond reacting to incidents that have already occurred – and starting to proactively solve issues before they ever make an impact on employees. Stay up-to-date with Nexthink Follow us on.

Get Proactive with Nexthink - Part 3

With the rapid development of innovation within the IT space, teams are dealing with an endless influx of support tickets. In order to provide a consistently exceptional digital experience IT teams must keep up with this evolution. This entails advancing beyond reacting to incidents that have already occurred – and starting to proactively solve issues before they ever make an impact on employees. Stay up-to-date with Nexthink Follow us on.

Sneak Peak at Nexthink Engage

Nexthink Engage allows employees to cut through the digital workplace noise with two-way communication. Attention grabbing notifications further reduce the inefficiency caused by emails and ensure employees only respond by sending messages relevant to their digital experience. Nexthink Engage combines employee feedback with Nexthink technical data to solve problems that matter in the workplace.

What is OpenTelemetry and Why is Scout All In?

Before we talk about OpenTelemetry, we should talk about telemetry. Telemetry is: And an instrument is: For the purpose of measuring running computer software and systems, our instruments are virtual instruments. That is to say, code that measures other code. It sounds simple: read a measurement and send it to a remote location. In practice, to make that telemetry data useful in today’s cloud-native and ever more complex environments, there are huge logistical and technical hurdles to overcome.

Grafana OnCall is now generally available on Grafana Cloud, with a generous free tier

Today we’re announcing the general availability of Grafana OnCall on Grafana Cloud for all paid and free plans. A big part of delivering great software is ensuring the right people get the right information when the inevitable incidents occur. We want to help you do that with Grafana OnCall, an easy-to-use, developer-first on-call management tool that’s built on top of the Grafana stack you know and love.

Monitoring and Managing Azure Active Directory Users

This blog post is part 2 of our Monitoring Microsoft Azure Active Directory series. Managing Identity is a big challenge in a cloud environment, especially when users can potentially log in from anywhere. Additionally, users can often use different types of devices to log in and access cloud-hosted resources. Without a central Authentication and Authorization source, it is very difficult to manage who can login to what and who can do what with a cloud resource.

Icinga L10n - The Future is Here

It’s soon two years since I’ve introduced you to Icinga L10n. There I’ve talked about a place on-line to ease collaboration. So, without further ado, let me introduce you to translate.icinga.com! We’re using Weblate on there and hope that interested users may already be familiar with it. The basic tools are easy to grasp though. And even if not, it has an extensive user documentation.

Tapping Into the Hive Mind: Sharing Query History

Ever wonder how your teammates go about debugging? When you use Honeycomb, you’re not only getting observability into your systems; it also provides observability into how your teammates use Honeycomb! Very meta, no? You’re never alone when writing or running Honeycomb queries. Opening up the right sidebar will show you the queries your teammates have recently run on the same dataset.

Beyond IT Operations: Why Developers Need AIOps, Too

To date, AIOps has been a solution first and foremost for IT operations teams. In other words, AIOps has been used primarily to help IT teams manage what happens in the post-deployment part of a CI/CD pipeline, when they need to detect and remediate issues in production environments. That doesn’t mean, however, that AIOps leaves developers out of the picture. Although the conversation surrounding AIOps hasn’t paid a lot of heed to developers so far, it’s perhaps time to change that.

Getting Started with Dart and InfluxDB

You just launched your application and it’s attracted more users than you were expecting. Your web server is bombarded with data. Now you need to know more about your users: what is the dominant device they’re using, and how long are they staying on the app? A time series database will help you answer these questions. It allows you to save data for a given point over a specified period of time, which gives you insight into what type of usage you’re getting and when.

Introducing Multi-Factor Authentication for Synthetics

Multi-Factor Authentication (MFA) provides an enhanced security mechanism for your entire organization by requiring multiple methods of authentication credentials. Using traditionally managed passwords for accessing your apps, services, and networks is no longer a secure methodology. Indeed, cyber threats are on the rise. Hackers today employ sophisticated techniques such as spear-phishing or pharming to gain unauthorized access to corporate accounts.

Sponsored Post

Best Practices for IT to Support Hybrid Work in 2022

I hate to say this, but #Omicron is at the doorstep. According to the CDC website, there have been over 60M cases in the US so far. As a result, companies like Google and Apple are delaying returning to the office while some call the return date as now 'history'. Although we cannot predict the nature of the virus, we have some best practices to help our customers and IT manage their employee experience in a hybrid distributed environment.

Containerization and Kubernetes Monitoring

As cloud-native solutions are gaining recognition and becoming a common approach to developing applications, more attention has been directed towards container orchestration and Kubernetes. Both concepts within the realm of IT have been around for a while. Thanks to the technologies' maturing and cloud adoption, they've recently gained significant attention. We all know that software containers are far from traditional shipping containers, yet they function similarly: they standardize and combine.

Tutorial: Auto-instrumentation of a Java app by OpenTelemetry for K8s Environment

This tutorial demonstrates how to auto-instrument a Java app by OpenTelemetry for Kuberenetes easily with the help of a sample Java app. It also shows how to connect it to the hosted collector, and trace the transactions in Sumo Logic. Learn the prerequisites and the detailed step-by-step auto-instrumentation process in this tutorial. Reference Links: Links to refer to or download useful material to try the steps independently.

An advanced guide to network monitoring with Grafana and Prometheus

In your career, if your role has ever included the monitoring or managing of any network infrastructure devices such as switches, routers, firewalls, etc., you’ve very likely heard of SNMP. In case you haven’t, SNMP stands for Simple Network Management Protocol, and, unlike its name suggests, it is anything but simple. It is a standard protocol for collecting information from network devices and organizing it in a way that humans can (sort of) understand.

Network Testing: How to Test Network Performance

Our networks are always changing and evolving. Network Testing for higher speeds, better application and network device performance, and after a new service deployment or migration, helps us understand the impact of changes on our network. In this article, we’re running you through how to test network performance using Network Testing and Monitoring tools.

Better together: Rollbar and Datadog

Modern software development is a high-pressure affair. Competition means getting to market faster with higher quality code and being able to release software quicker, monitor it and both find and fix problems quickly. By using modern tools and building a new approach and workflow to allow for monitoring, observability, and intelligent and actionable alerts it is possible to achieve faster release cycles with higher code quality.

Navigating venture capital and networking with Alan Cohen | Network AF Episode 10

Alan Cohen, partner at venture capital firm DCVC, sits down with Avi to talk about his experience working in networking and security. During the conversation the two discuss Alan's history working for Nicira, Cisco, Illumio, and VMware. They also cover the advent of virtualization and multi-cloud, and strategies he has learned throughout his venture capital days to reach and grow entrepreneur's businesses.

2021 Pingdom Web Performance Year in Review

Welcome to 2022! We hope it will be a great year. First, the team at SolarWinds® Pingdom® would like to thank all Pingdom users for their support and loyalty in 2021. Thank you for trusting us to monitor your websites, servers, and services in a more volatile internet ecosystem. And while we look back at 2021 web performance in review, we can’t wait to monitor, report, and analyze more online events in 2022.

Data Lakes and Beyond: Complementing the New AWS CloudTrail Lake Service With LogStream

AWS announced CloudTrail Lake on January 5th, 2022, as a fully-managed solution for storing and querying CloudTrail logs. At first glance, it is straightforward to set up, can be enabled for all your organization’s accounts with a radio button, and keeps data for up to seven years by default! It’s a huge time saver and headache eliminator for many, as getting CloudTrail from all organization accounts to a SIEM can be tedious and time-consuming. But all this comes with a cost.

Living Your Stream: Build Your Observability Data Pipeline with Cribl LogStream Free

Our mission at Cribl is to unlock the value of all your observability and telemetry data, regardless of source or destination. We aim to give you choice and control over your data—because we know data has different value to different stakeholders at different times in the data lifecycle. Users are just scratching the surface in terms of the ways they are finding value from Cribl LogStream.

Six reasons why SAP monitoring is important for CIOs

Gartner reported that unplanned IT system downtime cost: $5,600 per minute. $300,000 per hour. IDC, on the other hand, estimates that system outages can cost a company between $500,000 to $1 million or more per hour. (“DevOps and the Cost of Downtime: Fortune 1000 Best Practice Metrics Quantified”) With more than 345,000 customers in more than 180 countries and 15,000 partnering companies globally, SAP is the world’s third largest independent software manufacturer.

AIOps: What It Is, and How It Can Streamline IT Services

In recent years, the adoption of artificial intelligence is on the rise. Different sectors of service providers are witnessing a massive integration of AI within their workflow. This singular action has given birth to a better work pattern and greater service delivery. This is because artificial intelligence is changing the narrative and dictating the pathway for the future of work.

AppNeta is Now Part of Broadcom and Will Lead Industry in Network Visibility Anywhere

Broadcom officially closed on the acquisition of AppNeta on Jan 31, 2022. This marks a new beginning for AppNeta and the Broadcom network monitoring software business. AppNeta will take the lead in our vision to enable Network Visibility Anywhere, focusing especially on operational blind spots and experience in the last mile. We aim to ensure a quality digital experience anywhere while working, transacting, communicating and automating.

STOP PRESS: How website downtime affects your brand

Website downtime happens to the best of us, even the likes of mammoth websites like Amazon, Facebook, and Twitter all experience it. Luckily for these big companies, they’re so established and have such a huge customer base that downtime is unlikely to make them lose a large proportion of them. It will, however, cost them larger amounts of money for every second that their website is down. Take Amazon as a prime example.

A Guide to Systematically Identify and Reduce False Positives

False positives waste time, cause alert fatigue, and can be extremely expensive. Any time spent by the ITOps teams on false positives is an avoidable cost affecting the company's top line. False positives lead to alert fatigue. ITOps teams regularly identify it as a cause of overwhelm, so much so that they mentally shut the alerts off. They become desensitized to it and begin to ignore it, consciously or otherwise.

How HEAL Augments Your Monitoring Setup

In 2021, having too many monitoring tools doesn't necessarily mean you have 100% uptime. In this ebook, we discuss the gaps in what the industry needs out of an AIOps/APM tool and why current technologies are failing. We will also give a primer on how HEAL bridges these gaps to help you achieve the holy grail of 100% uptime with proactive, preventive AIOps.

Hybrid Cloud and the Network Observability Gap

Organizations are adopting cloud in a big way. Some went whole hog right away, but most took a hybrid approach for security, compliance, or just to move more cautiously. No matter the reason, hybrid clouds can leave network pros a little, well, foggy. No longer can you see, fix, and run your network across your data center, private clouds, public clouds, and SaaS. Ai ai ai! Never fear! Read this short white paper on how to achieve network observability nirvana in your hybrid cloud.

How to Continuously Monitor Critical Cloud Services with Synthetic Testing

A guide to assuring performance and availability of critical services across public and hybrid clouds and the internet You're responsible for monitoring the performance and availability of critical cloud services. If your users start complaining about slow response times, it's tough to get started when you can't even see where the problem is. This is where synthetic testing comes in - to help you find problems before they affect your users. This means fewer complaints and happier customers!