Operations | Monitoring | ITSM | DevOps | Cloud

September 2021

Why Windows service monitoring is vital for optimal network performance

A network service is an application running at the network application layer and above that provides data storage, manipulation, presentation, or communication. It is often implemented using client-server or peer-to-peer architecture based on application-layer network protocols. Windows services are critical processes that support vital server functionality. Sometimes these services fail to start or they stop working.

Rollbar Pro Tips: Telemetry

Telemetry allows you to view a 'breadcrumb' of events leading up to an exception directly in the Item details view. Telemetry is currently available in the Javascript, iOS, .NET, and Node.js SDKs. (and more to come!) Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

ScienceLogic Acquires Restorepoint, Expanding Portfolio Into NetOps & SecOps Domains

Hi, my name is Erik Rudin, and I have the privilege of leading our technical alliances and ecosystem team here at ScienceLogic. We are excited to announce that ScienceLogic has acquired the network configuration and change management vendor Restorepoint. With this acquisition, we’re expanding our IT operations business into the Network Operations (NetOps) and Security Operations (SecOps) domains.

AppDynamics vs. Datadog vs. Scout APM

Currently, the world generates an unprecedented amount of data. Research shows that the world generated more data in the last two years than the amount of data produced in the history of the human race. The most critical part of this entails analyzing the generated data and observing trends. It's at this point where tools like Scout APM, Datadog, and AppDynamics become crucial. Working with an unprecedented amount of data means tackling millions/thousands of distinct data points.

How Lightbend uses Grafana Cloud to monitor a platform-as-a-service launch

When top companies have needed platforms for their demanding, globally distributed, cloud native application environments and streaming data pipelines, they’ve turned to Akka, the most popular implementation of the Actor Model for cloud native applications running on Kubernetes. The company behind Akka is Lightbend, a leader in the world of cloud native applications and architectures.

We are growing ... and hiring. Come join us!

We are GripMatix and believe in no-nonsense tooling to help organizations leveraging the quality of their IT infrastructure and applications and ensure Business Continuity. We think that quality tooling does not need too much explanation. What you see is what you get. We provide CITRIX® Ready Monitoring and because we have a lot of experience in IT consulting for many years ourselves, we provide value by offering tooling made by the experts, for the experts.

Q3 Roundup - What's New With Logs? Let's Log About It!

When we launched LM Logs in November of 2020, we knew the product would aid in reducing troubleshooting time and identifying root causes to enable a more proactive approach to not only monitoring and planning, but also taking action. After talking with our customers and understanding what they needed in order to accelerate their business transformation, we focused on a few key enhancements in Q3. TL;DR – Lumber Bob highlights the top features: Want the long version?

How to Troubleshoot Clinician Issues with George Spiers

Check out this Health IT webinar with George Spiers: Citrix CTP and certified Epic Client Systems Administrator. He will share how to troubleshoot and resolve performance issues for clinicians using Citrix and EHR applications (Epic, Cerner, MEDITECH, Allscripts).

Evolve hybrid cloud security to meet COVID-19 challenges

Remote work poses new demands for hybrid cloud security. Here’s how organizations can protect themselves from security threats in this complex landscape.With employees continuing to work from home to meet the social distancing requirements of COVID-19, more people are juggling business with personal engagements, at times outside normal business hours. Often, they’re using their mobile business laptop, as well as a plethora of additional wireless mobile devices.

Salesforce Application Performance Monitoring

Can you imagine trying to keep track of all your prospect- and customer-related activities on a spreadsheet? What about ye olde days of rolodexes (do people still remember what those are?!)? Thank goodness for Salesforce, the Customer Relationship Management (CRM) solution that revolutionized sales, marketing, and customer care - and how we interact with customers in general. Salesforce is a critical component for many businesses.

Planning Ahead to Maximize the Value of Attending VMworld

The difference between attending as magnificent an event as VMworld with, versus without, a plan of attack is a night versus day experience. I’ve had the good fortune of attending this event as both a speaker and an attendee, and I look back with fondness on the interactions and experience. However, both those visits were in person. As I prepared to attend VMworld virtually for the first time, I wondered if my plan of attack would change? Turned out, not so much.

Understanding Lambda Sleep Cycles With CONCURRENCY

It started with a simple question: Why did one query take 10 seconds, while another almost identical query took 5? At Honeycomb, we use AWS Lambda to accelerate our query processing. It mostly works well, but it can be hard to understand and led us to wonder: What was really going on inside this box called Lambda? These questions kicked off the development of CONCURRENCY, a new aggregate in the Query Builder that lets us look at how many spans are active at once.

Updated: Let's Encrypt's root certificate expiry and what it means

You might have seen the name “Let’s Encrypt” across the internet for the past week and it’s because their root certificate expires on 30th September. It’s been planned for a good long while, with Let’s Encrypt providing users with updates on the expiry and new certificate since 2020.

The Complete Guide to Server Monitoring

IT structures form the backbone of many companies in today's world. Servers allow connected devices to share information within a network by analyzing, processing, and recording information. They store all the company's data and enable users to access their information from any computer connected to the network. It is essential to use server monitoring tools that ensure the servers keep running effectively and smoothly to prevent crashes and keep data safe.

Announcing Support for AWS Lambda Functions running on AWS Graviton2 processors

AWS Graviton2 processors use the Arm architecture to provide high-efficiency, low-cost computing. AWS already offers the ability to provision EC2 instances powered by Graviton2, and Datadog is proud to partner with them for the launch of new Graviton2 compute resources for Lambda functions. In this post, we’ll discuss how Datadog can provide deep visibility into your Lambda functions across whichever platform you’re using.

What is OpenTelemetry: A guide to understanding OpenTelemetry and the way forward

OpenTelemetry is a vendor-neutral approach that enables DevOps and developers to collect performance metrics in a standardized manner. Currently a Cloud Native Computing Foundation (CNCF) sandbox project, OpenTelemetry was conceived by merging OpenCensus, Google's open-source method of collecting metrics and traces, and OpenTracing, a vendor-neutral API to collect traces.

Log Observability and Log Analytics

Logs play a key role in understanding your system’s performance and health. Good logging practice is also vital to power an observability platform across your system. Monitoring, in general, involves the collection and analysis of logs and other system metrics. Log analysis involves deriving insights from logs, which then feeds into observability. Observability, as we’ve said before, is really the gold standard for knowing everything about your system.

Request Metrics Released! What We Got Right.

It's been a long time since we produced a new Request Metrics video and we wanted to give an update! Things have been going well and the product is out there! We made some good choices. Some not so good choices. And we've made many enhancements since launch. Watch Part 1 of our Request Metrics Released series to see how things are going, and what we did right!

New Feature: Slug URLs

Healthchecks.io pinging API has always been based on UUIDs. Each Check in the system has its own unique and immutable UUID. To signal a start, a failure, or a particular exit status, clients can add more bits after the UUID: This is conceptually simple and has worked quite well. It requires no additional authentication. The UUID value is the authentication, and the UUID “address space” is so vast nobody is going to find valid ping URLs by random guessing any time soon.

Using the Resource Timeline in Request Metrics

The Resource Timeline in Request Metrics is a heat map of all the files requested by your pages. It shows the range of resource load time and critical load events experienced by all users, not just a single point load. The data allows you to see which resources impact your page metrics as well as the variability in their load time.

Let's Encrypt DST Root CA X3 certificate set to expire

If you've been using Let's Encrypt for a while, you may have noticed that their certificates are signed by a root certificate titled DST Root CA X3. That root certificate is set to expire in a few hours. Any certificates still signed by that root will no longer be valid. But luckily, that shouldn't form a problem for most Let's Encrypt users. For a while now, new SSL issuances by Let's Encrypt have issued certificates against DST Root CA X3 (the one that is about to expire) and ISRG Root X1.

The U.S. Department of Defense formally authorizes Grafana, Grafana Enterprise, and Loki for its 100,000 developers

Not so long ago, development teams working for the U.S. Department of Defense could take anywhere from three to ten years to deliver software. “It was mostly teams using waterfall, no minimum viable product, no incremental delivery, and no feedback loop from end users,” Nicolas M. Chaillan, Chief Software Officer of the U.S. Air Force, said in a CNCF case study. “Particularly when it comes to AI, machine learning, and cybersecurity, everyone realized we have to move faster.”

What Is Distributed Tracing?

Modern software development is evolving rapidly, and while the latest innovations allow companies to grow through greater efficiency, there is a cost. Modern architectures are incredibly complex, which can make it challenging to diagnose and rectify performance issues. Once these issues affect customer experience, the consequences can be costly. So, what is the solution? Observability — which provides a visible overview of the big picture.

Next Generation AWS Lambda Functions Powered by AWS Graviton2 Processors

Modern computing has come a long way in the last couple of years and the introduction of new technologies is only accelerating the rate of advancements. From the immense compute power at our disposal to lightning-fast networks and ready-made services, the opportunities are limitless. In such a fast-paced world, we can’t ignore economics.

Then and Now: Distributed Systems Alerting and Monitoring

Distributed systems are everywhere. Although many teams don’t think of their applications as distributed systems, if they’re developing using container-based microservices and serverless functions instead of a monolith, they’re creating a distributed system. This change also means that monitoring needs are becoming more complex.

SQL Server Storage Best Practices: Choosing Storage Options

Storage is one of the most critical components for any relational database management system, and getting the right storage configuration affects reliability, availability, and performance. When it comes to SQL Server storage best practices, choosing between storage hardware options has changed significantly over the last decade, but that doesn’t necessarily make choosing the correct storage options for SQL Server any easier.

How to monitor a web server running NGINX|httpd

Web servers are software services that store resources for a website and then makes them available over the World Wide Web. These stored resources can be text, images, video and application data. Computers that are interfaced with the server mostly web browsers (clients), request these resources and presents to the user. This basic interaction determines every connection between your computer and the websites you visit.

Graviton-Based Lambda Functions, What It Means For You

AWS just announced support for AWS Lambda functions powered by AWS Graviton2 processors. These are 64-bit Arm-based processors that are custom built by AWS and offer a better price to performance ratio. In this post, let me take through what we have learnt about this new option and what it means for you.

Troubleshooting Outages at 3 AM with Alert Response

Imagine you are an on-call engineer, who receives an alert at 3 AM in the morning informing you that customers are experiencing high latency on your website, and are unable to shop. Being an Incident response coordinator myself at Sumo Logic, I can tell you, I don’t envy being that engineer. If this alert fired, this is what would likely follow: The biggest challenge is how to gather this information quickly, so you can decide whether to jump out of the bed or go back to sleep.

Sumo Logic Extends Monitoring for AWS Lambda Functions Powered by AWS Graviton2 Processors

Organizations are constantly trying to maintain pace with users' expectations and desires from a digital experience. These users expect an experience that constantly changes based on their preferences and behavior, which means innovating quickly and improving software is critical to user happiness and driving business success.

Introducing Sensu Plus: An Integrated Analytics Solution for All Sensu Users

As part of its recent acquisition of Sensu, Sensu Go is now part of the Sumo Logic Continuous Intelligence Platform, empowering enterprises and developers to quickly get real-time insights from unstructured data for troubleshooting, performance improvement, and security across dynamic multi-cloud infrastructure.

Taming Hybrid Cloud Complexity: One Platform for Monitoring Your IT Universe

The transition to the cloud continues unabated, along with the dramatic increase in operational complexity. Unfortunately, legacy monitoring tools only compound this complexity. This white paper examines how today's hybrid cloud infrastructures pose unprecedented challenges and require modern management approaches.

Making Hybrid Infrastructure Monitoring Work For You

To deliver information, transactions, and interactions quickly and efficiently to your customers, you need to rely on a vast collection of interconnected technologies that work seamlessly together. But as transactions grow in complexity, so does your IT infrastructure. This eBook examines how today's hybrid cloud infrastructures pose unprecedented challenges in complexity and what you can do to meet these challenges with a modern approach to monitoring.

What's new in Sysdig - September 2021

Welcome to another monthly update on what’s new from Sysdig! Happy Janmashtami! Shanah Tovah! 中秋快乐! With lockdown lifting by varying degrees across the world, we hope you had a safe but pleasant holiday! It has certainly been long overdue. Here at Sysdig, we celebrated Labor Day in the USA with an extended weekend and a well being day for the team.

Automate, Group, and Get Alerted: A Best Practices Guide to Monitoring your Code - Part 1

As companies grow, so do their products, teams, and the number of external tools. For engineers, that can mean code sprawl, data silos, notification fatigue, and some “what the…?” moments along the way as they try to make sense of it all.

How Splunk IT Service Intelligence Assures Business Service Performance for Financial Institutions

With an influx of data and technology, financial institutions are transforming their digital services to adapt to shifting regulations, customer expectations and geopolitical trends. They need to digitally transform their business while protecting service performance and availability of their critical business services. Splunk IT Service Intelligence (ITSI) is a premium analytics solution that empowers these teams to gain visibility across their environments and predict incidents before they impact customers. Unlike legacy IT or point-monitoring solutions, Splunk ITSI correlates and applies machine learning intelligence to monitoring data for 360° service visibility, predictive analytics and streamlined incident management.

Troubleshooting the Internet: If You Can't Have Authority, Visibility Will Do

The vast majority of businesses aren’t all on-premises or all-cloud, but rather in some form of hybrid IT middle ground. That means “who owns the system” (and a closely related issue: “who owns the problem”) continues to be a thorn in the side of many IT pros. In this conversation, SolarWinds Technical Content Manager for Community Kevin M. Sparenberg and Head Geek Leon Adato break down some ways technical teams can get around the lack of authority needed to solve problems and keep critical services up and running.

With the Salesforce plugin for Grafana, easily visualize your SFDC data and correlate it with other data sources

Good news for Salesforce users: With the new Salesforce plugin for Grafana, available now with an Enterprise license, you can instantly visualize your SFDC data in Grafana dashboards. Plus, Grafana allows you to visualize the Salesforce data alongside all sorts of other data. One interesting use case is correlating sales data to system metrics and logs, which would be valuable if your company uses any software systems at all to help generate revenue.

An Inside Look at the Exoprise Sales Team with Mark Yohai, VP Sales & Business Development

Interview with Mark Yohai, VP Sales & Business Development at Exoprise. During this video interview, Mark discusses: Who Exoprise is and what they do The details on their platform How the sales process works Roles they are hiring for What to expect during the interview process Exoprise's culture And more!

Rollbar Tip of the Day: Auto-Resolve

Simplify your error management workflow and configure Rollbar to automatically resolve items on deploy, or set an inactivity timer to resolve items if they have not occurred in a certain number of days. Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

Career and Networking Evolution with BGPMon's Founder Andree Toonk | Network AF Podcast Ep. 1

In our first episode of Network AF, our host Avi Freedman sits down with BGPMon Founder Andree Toonk to discuss the world of networking. Andree is the Senior Engineering Manager at Cisco and has 20 years of experience in network infrastructure. In today's episode, you'll hear insights from Andree that will help you drive your career growth by taking control of your learning experience and finding your mentors. In terms of networking, you'll hear about some of the trends in desegregation and cloud-scale networking. Listen now for a deep dive into network engineering!

Backbone Engineering and Interconnection with Nina Bargisen | Network AF Podcast Ep. 2

Today's conversation is with Nina Bargisen, Director of GTM Strategy for the Service Provider customer segment at Kentik. She is an experienced interconnection specialist, and today she shares her journey through the networking industry. You'll get to hear about hot topics within the industry, with insights into what we can be on the lookout for in the future. Most importantly, Nina discusses how mentoring, learning processes, and community helped her get to where she is now that can help you get into the world of networking as well. Lastly, Avi and Nina discuss the critical topic of inclusivity and how we can be better. Listen now!

Listen up: The Network AF podcast is here

Hear here! Today we’re very excited to share that our Co-founder and CEO Avi Freedman launched a new podcast, Network AF. If you like nerding out on all-things networking, cloud and the internet, this podcast is for you. If you like networking how-tos, best practices and biggest mistakes, this podcast is for you. If you want to up your poker game, well… this podcast might also be for you.

Boss-Level Log Management for WordPress Site Administrators

WordPress is the most dominant content management system (CMS) in the enterprise website market today. Its open-source nature, thousands of plugins, and wide adoption by commercial hosting providers have bolstered its success. In addition, it’s highly compatible with other website technologies like web servers, database servers, or middleware.

What Is Data Observability and Why Do You Need It?

The word observability has its root in control theory. R.E. Kálmán in 1960 defined it as a measure of how well you can infer the internal states of a system from knowledge of its external outputs. Observability is such a powerful concept because it allows you to understand the internal state of a system without the complexity of the inner workings. In other words, you can figure out what’s going on just by looking at the output.

What Exactly is a Website Monitoring "False Alarm" and Why You Should Care About It

What Exactly is a Website Monitoring “False Alarm” and Why You Should Care About It You know what falsehoods are. You know what false teeth are. You may even know some falsehoods about false teeth. But do you know what a website monitoring false alarm (also known as a “false positive”) is? If not, then please keep reading to find out — because it’s a very big deal.

Logit.io Announces The Beta Launch Of Hosted Grafana

We are pleased to announce the beta launch of hosted Grafana in addition to our existing ELK as a Service & hosted Open Distro services. As organisations around the world are constantly looking for ways that they can ensure compliance is being upheld, speeding up Mean Time To Repair (MTTR) and reducing the risk of DDoS attacks, managed Grafana forms a vital role in improving metrics observability across the entirety of your infrastructure.

The Cost of Going Before You Know

Like many things in life, when you’re new to the cloud you don’t know what you don’t know. Given that migrating workloads to the public cloud is often a key component of a business transformation initiative, you want to avoid a long, expensive learning curve—especially since accelerating time-to-value is often a major impetus for the move.

Avoid This SLA Fail - Q&A w/ Neil Keating (Chief Experience Officer, Bright Horse)

Recently I got the chance to speak with Neil Keating, Co-Founder and Chief Experience Officer at Bright Horse, a full-service IT experience consulting and training company. Neil’s candor and deep knowledge about IT Operations and digital experience was obvious from the start. Find a brief clip of our conversation here and several helpful nuggets for IT leaders in the text below!

Apache Kafka Tutorial: Use Cases and Challenges of Logging at Scale

Enterprises often have several servers, firewalls, databases, mobile devices, API endpoints, and other infrastructure that powers their IT. Because of this, organizations must provide resources to manage logged events across the environment. Logging is a factor in detecting and blocking cyber-attacks, and organizations use log data for auditing during an investigation after an incident. Brokers, such as Apache Kafka, will ingest logging data in real-time, process, store, and route data.

Introducing Pre-Installed Logz.io Metrics Dashboard Bundles

We are proud to announce the launch of direct dashboard uploads with Logz.io. These new metrics dashboard templates are available for 25 different tools and more to come. Each of these templates is now available to Logz.io customers and covers the gamut of popular monitoring tools used by DevOps teams. Some of these tools also include multiple options. The process is simple. Head into the Logz.io app and head to your metrics account.

How to visualize real-time data from an IoT smart home weather station with Grafana dashboards

One of the experiences I’ve truly enjoyed over my first year as a senior solutions engineer here at Grafana Labs has been learning from our community and customers about their own Grafana journeys. I’ve been impressed by some remarkable dashboards for home automation, personal health data visualizations, family Minecraft statistics, and energy usage projects.

We Will All Be Remembered Forever - And There's Nothing You Can Do About It

I want to be remembered. I think a lot of us do. At least, that’s what I used to think. Now I am not so sure. I have a bad habit of looking at the universe through an existential lens where value is measured by impact. Impact, meaning the measurable change created by specific action. Since everything physical ultimately decays, the longest lasting impacts are those that linger in our collective memory. Great works, great triumphs, great discoveries, and great inventions – great impacts.

Micro Lesson: Introduction to Observability Solution

This video describes what observability is, why we need observability, and how it is different from monitoring. The video also explains how Sumo Logic's Observability Solution helps in all the stages of the incident remediation process to ensure the production apps are functioning reliably.

This Month in Datadog: September 2021 (Episode 5)

Datadog is constantly elevating the approach to cloud monitoring and security. This Month in Datadog updates you on our newest product features, announcements, resources, and events. This month we put the Spotlight on Session Replay, go around the Water Bowl with Decalog, introduce a bunch of new features, and give you more information about our flagship conference Dash..

Getting Started with Citrix Remote Monitoring and Management Services

Brian Diamond, CEO, LANStatus has dedicated the past 25 years to help educate and support people just like you to find economical, custom-fit solutions that complement existing teams, not eliminate them. Brian is interviewed by Goliath Technologies CEO Thomas Charlton.

Is Application Sprawl in Government Really a Big Problem?

When considering whether to add more applications or monitoring components to your IT system, the answer should always be quality over quantity. Agencies often fall into the trap of application sprawl by adding more and more to their systems—more applications, more tools—without realizing this actually has the potential to reduce system effectiveness. Instead of simply adding more apps, consider instead interoperable ones, or ones you can plug into a common platform.

Ingest data directly from Google Pub/Sub into Elastic using Google Dataflow

Today we’re excited to announce the latest development in our ongoing partnership with Google Cloud. Now developers, site reliability engineers (SREs), and security analysts can ingest data from Google Pub/Sub to the Elastic Stack with just a few clicks in the Google Cloud Console. By leveraging Google Dataflow templates, Elastic makes it easy to stream events and logs from Google Cloud services like Google Cloud Audit, VPC Flow, or firewall into the Elastic Stack.

Extending Observability to App Infrastructure

We know organizations today rely on software applications to drive their digital transformation, providing customers with the tools, features and experience end-users have come to expect when doing things such as transact, work and communicate, to name a few. Ensuring a great user experience, however, means making sure the various elements making up a usable application are running smoothly and reliably.

TL;DR InfluxDB Tech Tips: Multiple Aggregations with yield() in Flux

The yield() function determines which table inputs should be returned in a Flux script. The yield() function also assigns a name to the output of a Flux query. The name is stored in the default annotation. For example, if we query the following table: Without the yield function: The following Annotated CSV output is returned. Notice the default annotation is set to _results by default. Now if we add the yield() function: The following Annotated CSV output is returned.

5 priorities for CISOs to regain much needed balance in 2022

Here’s what security leaders need to do in the face of rising stress levels and cyberattacks Nearly 9 out of 10 CISOs say their existing systems secured their enterprise through a shift to remote work, an ongoing labor shortage, and a huge spike in cybersecurity attacks. But that success came with a price: 64% say they’re more stressed out than they were a year ago. How can CISOs navigate a new set of challenges in 2022, while also regaining some much needed balance?

Leveling up your IT management game: 4 best practices for IT infrastructure management

IT infrastructures are constantly evolving, meaning conventional management processes have become outdated and inadequate to tackle complex IT issues. A study by ESG found that 75% of IT decision-makers admit that complexity of IT infrastructures has increased drastically from two years ago. This rapid surge in complexity has disrupted admins’ understanding of network behavior and decreased the chances of foreseeing unanticipated network issues.

News Roundup September 24, 2021

On this day in 1979, CompuServe (CIS) offered one of the first dial-up online services to the masses. It was the dominant internet service provider through the 1990s. By 1981, it had 10,000 subscribers. Within a decade, that number was in the millions. Speaking of how technology makes life easier, here’s the latest news in AIOps, ITOps, and IT infrastructure monitoring.

How to Monitor Multiple Websites With Uptime.com

Monitoring a website can already mean hundreds of checks on all sorts of different pathways, URLs, and other services. Monitoring multiple websites is an ever growing web that can make you start to feel like you’re trapped in an episode of Law & Order. The format of the show (I am talking about the real Law & Order, not its offshoots) involves the crime from occurrence to trial outcome and every beat and interrogation in between.

How to monitor Redis with Prometheus

Redis is a simple – but very well optimized – key-value open source database that is widely used in cloud-native applications. In this article, you will learn how to monitor Redis with Prometheus, and the most important metrics you should be looking at. Despite its simplicity, Redis has become a key component of many Kubernetes and cloud applications. As a result, performance issues or problems with its resources can cause other components of the application to fail.

Intro to distributed tracing with Tempo, OpenTelemetry, and Grafana Cloud

I’ve spent most of my career working with tech in various forms, and for the last ten years or so, I’ve focused a lot on building, maintaining, and operating robust, reliable systems. This has led me to put a lot of time into researching, evaluating, and implementing different solutions for automatic failure detection, monitoring, and more recently, observability. Before we get started: What is observability?

Getting Started with Citrix Remote Monitoring and Management Services

Brian Diamond, CEO, LANStatus has dedicated the past 25 years to help educate and support people just like you to find economical, custom-fit solutions that complement existing teams, not eliminate them. Brian is interviewed by Goliath Technologies CEO Thomas Charlton.

NodeJS Application Manual Instrumentation for Distributed Traces

In this blog series, we are covering application instrumentation steps for distributed tracing with OpenTelemetry standards across multiple languages. Earlier, we covered Java Application Manual Instrumentation for Distributed Traces, Golang Application Instrumentation for Distributed Traces, and DotNet Application Instrumentation for Distributed Traces. Here we are going to cover the instrumentation for NodeJS.

5 Best Tools for Log Collection and Archiving With Guide

Collecting and archiving logs is an essential practice for any organization looking to maintain the performance and security of their network. Logs are like a diary for your devices. They record every message sent from any of your network systems. This information can prove essential for everything from understanding the daily activities of your infrastructure, to improving functionality across your platforms, to identifying and troubleshooting issues.

Catchpoint Digital Experience Score Is An Industry-First

Catchpoint recently announced the Digital Experience Score. This score is the first all-encompassing metric to represent all essential drivers of digital end-user experience. With pressure on IT teams ever growing to fix the IT issues of a remote workforce, we wanted to make troubleshooting as straightforward as possible. The score provides IT teams tasked with improving employee experience with a quantifiable measurement of what each employee is experiencing digitally.

Observability: The 5-Year Retrospective

Two years ago, I wrote a long retrospective of observability for its third anniversary. It includes a history of instrumentation and telemetry, a detailed explanation of the technical spec, and why the whole “three pillars” thing is nonsense. At the time, it’s what was needed to steer conversations away from silly rabbit holes about data types and back to what matters: how we understand our systems.

How the French Ministry of Agriculture deploys Elastic to monitor the commercial fishing industry

Within the French Ministry of Agriculture and Food (the Ministry), our team of architects in the Methods, Support and Quality office (BMSQ) evaluate and supply software solutions to resolve issues encountered by project teams that affect various disciplines. As data specialists, one area we’ve been involved in includes reconfiguring the traceability of activities for the commercial fishing industry.

Elastic APM iOS agent technical preview released

We are proud to announce the preview release of the Elastic APM iOS agent! This release is intended to elicit feedback from the community, while providing some initial functionality within the Elastic Observability stack and is not intended for production use. Now is your chance to influence the direction of this new iOS agent and let us know what you think on our discussion forum. If you find an issue, or would like to contribute yourself, visit the GitHub repository.

Streamline Migration and Application Onboarding in DX APM with EasySeries

To realize the full potential of APM, many customers are migrating from their existing APM 10.7 clusters to DX APM. In addition, they continue to onboard new applications for monitoring. These efforts require a series of steps, including the configuration of experience views, universes, and DX Operational Intelligence services.

Grafana vs Kibana: The Updated Guide For 2022

If you have any experience with comparing open source data visualisation tools then it is very likely you will have encountered both Kibana and Grafana during your research and discovery phase. As two of the most popular solutions for logs and metrics analysis, it can be difficult to distinguish between the two and make the choice to use either Grafana or Kibana depending on the analysis task at hand.

Why LogDNA Received the EMA Top 3 Award for Observability Platforms

We’re honored to be included in Enterprise Management Associates’ EMA Top 3 Award for Observability Platforms. This award recognizes software products that help enterprises reach their digital transformation goals by optimizing product quality, time to market, cost, and ability to innovate—all the things we’re passionate about at LogDNA.

3 ways OpUtils' IP address tracker fosters effective IP management

How a network’s IP address space is structured, scanned, and managed differs based on the organization’s size and networking needs. The bigger your network is, the more IPs you need to manage, and the more complex your IP address hierarchy gets. As a result, issues such as IP resource overutilization and address conflicts become challenging to avoid without an IP address management (IPAM) solution in place.

Tutorial: Setting up AWS CloudWatch Alarms

AWS CloudWatch is a service that allows you to monitor and manage deployed applications and resources within your AWS account and region. It contains tools that help you process and use logs from various AWS services to understand, troubleshoot, and optimize deployed services. I’m going to show you how to get an email when your Lambda logs over a certain number of events.

Using Jaeger for your microservices

Jaeger is a popular open-source tool used for distributed tracing in a microservice architecture. In a microservice architecture, a user request or transaction can travel across hundreds of services before serving what a user wants. Distributed tracing helps to track the performance of a transaction across multiple services. Before we deep dive into how Jaeger accomplishes distributed tracing for microservices-based architecture, let's take a short detour to understand distributed tracing.

Telegraf Integrations with Logz.io

Logz.io is proud to announce a slew of new integrations via Telegraf. Logz.io utilizes Prometheus in its product, but aims to support compatibility across common DevOps tools. A number of our customers, and the community in general, are strong users of Telegraf and its companion apps in the TICK Stack (which includes InfluxDB). Telegraf is not as popular as Prometheus, but it’s a strong element in the DevOps toolbox.

How Sitech builds modern industrial IoT monitoring solutions on Grafana Cloud

Chemelot is an industrial park in the Netherlands with more than 150 companies in chemical and process industries that are working to build the most sustainable and competitive chemical site in Western Europe. Sitech Services is part of making that happen. The Dutch technology firm brings together maintenance and engineering specialists with data scientists to create multidisciplinary solutions that achieve optimal safety, efficient infrastructure, and efficient processes for the plants.

Tracing Issues in Your Application

Imagine that you are receiving a support ticket that your application is not working. You read the attached stack trace and now it's time to solve the mystery—what did the user do that led to triggering this exception? Is it possible to find all the logs from all the applications that correspond to this user's business operation? What if the user is complaining that the system is slow? How can you decide which concrete operation is the culprit? Is there any way to visualize the latency?

Monitoring compute infrastructure with the Cloud Ops Agent

How can you improve observability for workloads that use compute infrastructure directly and run on Google Compute Engine instances? In this episode of Engineering for Reliability, we show how you can use the Cloud Operations agent to do just that. Watch to learn about the Cloud Operations Agent, how to install it manually and automatically, and how to use the data it collects to improve the reliability of your services - and keep your users happy!

Monitoring PostgreSQL With pgmetrics and pgDash

I am currently trialing pgmetrics and pgDash for monitoring PostgreSQL databases. Here are my notes on it. pgmetrics is a command-line tool you point at a PostgreSQL cluster and it spits out statistics and diagnostics in a text or JSON format. It is a standalone binary written in Go, and it is open source. Here is a sample pgmetrics report. Rapidloop, the company that develops pgmetrics, also runs pgDash – a web service that collects reports generated by pgmetrics and displays them in a web UI.

Unexpected Parallels Between Yoga and Observability

Yoga is to ideal human health what observability is to an application’s ideal functioning. It is well established that observability is a critical factor for the successful implementation and maintenance of cloud-native, serverless, cloud-agnostic, and microservices-based applications. Well-established observability helps DevOps and development teams cross the boundaries of complex systems and get complete visibility into their functioning.

What is Proactive Monitoring?

In the realm of monitoring products, proactive monitoring usually means identifying potential issues within IT infrastructure and applications before users notice and complain and initiating actions to avoid the issue from becoming user noticeable and business impacting. Proactive monitoring means a business is continuously searching for signs that indicate a problem is about to happen.

What to expect from an Icinga Fundamentals training

Let’s set the scene: You just started out with Icinga, maybe because you have realised your need for monitoring or you have inherited an environment. Maybe your boss just decided that this is what you are going to do now. So you are now sitting in front of the documentation, maybe started an installation process. But there are all of those terms that you don’t know, things are looking complicated and you don’t even know where to get started in your journey. And that’s okay!

UX Research: The Power of Customer Feedback

Gathering data doesn’t just help a business understand what’s working and not working. It also shows the way forward, enabling teams to make the right improvements to benefit both the business and the people they serve. And in many cases, the most powerful data a business can collect comes directly from its customers. Here at Nexthink, the UX Research team talks to customers and partners to learn about their goals and challenges.

A look at the upcoming improvements to LINQ in .NET 6

When .NET Framework 3.5 was released back in 2007 it included a new feature known as Language Integrated Query, or LINQ for short. LINQ allows .NET developers to write efficient C# code using arrow functions to query collections of objects or even databases using libraries like Entity Framework Core. Like all things with .NET, LINQ continues to evolve over time. The upcoming release of .NET 6 brings a number of really interesting features, including a suite of new LINQ capabilities.

Gearing up your IT ecosystem for the Apple WWDC21 updates

The September Apple Event is one of the most important events for any IT admin because it is preceded by the Apple Worldwide Developers Conference. It witnesses the release of new hardware like the iPhone and, more importantly for enterprises, the release of the latest versions of it’s operating systems—iOS 15, iPadOS 15, and tvOS 15 were announced. iOS and iPadOS updates rolled out on September 20, while the new macOS will roll out later this year.

What is Distributed Network Monitoring for SaaS and SD-WAN

Nowadays, companies are embracing flexibility. Many businesses are embracing remote offices and working from home, storing their data in the Cloud, ditching centralized data infrastructures, and moving towards networks using SaaS and SD-WAN. With distributed architectures becoming the new normal, it’s important to have a distributed monitoring solution that can keep up. In this article, we’re running you through everything you need to know about how distributed network monitoring works.

Datadog vs. Grafana: Compare Use Cases and Features

The current big data world allows even tiny IT environments to produce massive amounts of information. After determining how to open up various data generation sources, a business analyzes the information. Here, the analysis method you leverage varies depending on the data, the tools/equipment used, and the use case. A good practice is to visualize the traces, weather logs, data, or metrics.

Top 10 Questions About Uptime Monitoring

Monitoring for uptime is becoming increasingly necessary as SaaS and Always-On services integrate deeper with our professional and personal lives. When bottom lines and infrastructure requirements are tied so closely to 24/7 accessibility, making sure your websites are UP becomes priority one. We’ve scoured our support tickets, talked to our users, and kept an ear to the ground to compile the top 10 questions surrounding uptime monitoring and break down the answers.

DBAle 31: Monitoring matters for modern data management

Is it a beer, is it a muffin or is it a Panda Pop? Who knows but at 9% strength, producer Louise joins our Chris duo as plan B, to monitor proceedings. Very fitting as our discussion focuses on monitoring for the modern data age. We talk busyness, hybrid estates, tooling, Multi-RDBMS, and a surprising amount about car mechanics. In The News we debate scrape or breach, and the potential maximum fines for the latest Facebook scandal. So, grab yourself a beer and tune in – cheers.

With Grafana and InfluxDB, CSS Electronics visualizes CAN IoT data to monitor vehicles and machinery

Martin Falch, co-owner and head of sales and marketing at CSS Electronics, is an expert on “CAN bus” data. Martin works closely with end users, typically OEM engineers, across diverse industries (automotive, heavy-duty, maritime, industrial). He is passionate about open source software and has been spearheading the integration of the CANedge with InfluxDB databases and Grafana telematics dashboards.

Rollbar Tip of the Day: Item status & Severity level

Control the state and priority level of your incoming items, and learn how you can quickly act upon them via Slack! Rollbar is the leading continuous code improvement platform that proactively discovers, predicts, and remediates errors with real-time AI-assisted workflows. With Rollbar, developers continually improve their code and constantly innovate rather than spending time monitoring, investigating, and debugging.

Top 6 AWS Lambda Monitoring Tools

Monitoring AWS Lambda performance plays a crucial part in your everyday AWS Lambda usage. Monitoring helps you identify any performance issues, and it can also send you alerts and notify you of anything you might need to know. The world is slowly getting to a point where machines and computers will be flawless, but until then, if we let them perform various tasks for us, we could at least monitor their performance.

Expanded Customer Adoption Lands Splunk First in 2020 IT Operations Analytics Market Share Report

For the 7th year in a row, IDC has ranked Splunk as #1 in ITOA*. We’re thrilled with this news, but let me start by saying that our success is due to the continued success of our customers, and we’re very grateful for the opportunity to be a part of it. Need a refresher on ITOA? We know we know, another day another acronym. ITOA is IT Operations Analytics. IDC derived this market from portions of their IT Operations management (ITOM) software market.

The Challenge of Monitoring a Distributed Network

It’s surprisingly difficult to find information related to monitoring a distributed network. I think this is because, in part, network pros take the term for granted. We all intuitively know what a distributed network is, and the term is pretty common in conversation. But when you start to think about a precise definition, or even search for one online, things get fuzzy. What exactly makes a network distributed? Is a distributed network fundamentally different from a decentralized network?

The Fine Line Between Confidence and Delusion in the Cloud

In a recent survey of hybrid cloud decision makers, we uncovered a disconcerting trend. The vast majority of respondents reported that they are confident in their existing tools and capabilities for a whole host of activities necessary to manage cloud performance and spend … and yet they don’t have many of the tools that are actually needed to perform those tasks.

Industry 4.0 Defined and Explained

With Industry 4.0 fundamentally transforming manufacturing systems and processes through IIoT technologies, manufacturers large and small are seeking the most efficient ways to reap its benefits. Potential gains include optimizing operations, generating data-driven insight, creating new revenue streams, and accelerating innovation. To paint the big picture, let’s start with a definition of Industry 4.0, followed by an explanation of what adopting it involves.

Getting Started with OpenTelemetry and VMware Tanzu Observability

Modern application architectures are complex, typically consisting of hundreds of distributed microservices implemented in different languages and by different teams. As a developer, SRE, or DevOps engineer, you are responsible for the reliability and performance of these complex systems. But while you might have metrics that will help you debug when there’s an issue, metrics alone can’t help you narrow down and ultimately identify the root cause.

3 Ways to Improve DataPower Performance

IBM DataPower Gateway is the latest variant of DataPower and helps organizations meet the security and integration needs of a digital business in a single multi-channel gateway. It provides security, control, integration and optimized access to a full range of mobile, web, application programming interface (API), serviceoriented architecture (SOA), B2B and cloud workloads.

Build Vs. Buy Discussion

This a complex question with deep implications that touch many areas of the business including the executive team, finance, procurement and Human Resources. In this short webinar we will discuss some of the areas that must be considered and provide you with some strategic ideas to improve your business. This presentation is gleaned from the hundreds of man-years of experience our experts have had in answering this exact question.

How we use Grafana and Prometheus to monitor the traffic of our many GitHub repositories

If you want to understand the popularity of your GitHub repositories, knowing the number of stars isn’t enough. GitHub understands this, and that’s why the team released traffic insights. Anyone with push access to a repository can view these insights, which include: full clones, visitors from the past 14 days, referring sites, and popular content in the traffic graph.

Splunk Delivers Real-Time Salesforce Visibility with New Streaming API Integration

You might already be using Splunk to manage your Salesforce environment with the help of the Splunk App for Salesforce and the Splunk Add-on for Salesforce that allows a Splunk administrator to collect different types of data from Salesforce using REST APIs. This solution is great and the events give you an idea of how users interact with Salesforce. These events can range from Apex executions to page views.

Comparing gRPC performance across different technologies

gRPC is an open-source Remote Procedure Call system focusing on high performance. There exist several gRPC benchmarks including an official one, yet we still wanted to create our own. Why would we torture ourselves doing such a thing? So with those points in mind, we created a completely open-source benchmark where everyone is welcome to contribute and which could be run with a single command, having only Docker as a prerequisite.

Using the Flux VS Code Extension for IoT Application Development

InfluxData prides itself on its effort to prioritize developer happiness. This included providing developers with a variety of tools to interact with InfluxDB v2 OSS or InfluxDB Cloud, so they can pick the development style that works best for them. This article assumes you’re using the InfluxDB Cloud Free tier, which is the easiest way to get started and maintain InfluxDB. You can use any of the following tools for your IoT application development.

Roman Khavronenko | Open-source strategy at VictoriaMetrics

Building a company around free software product is not something new. What's less common is creating a company in order to build a free software product. This talk will cover our story of creating time series database, the lessons we learned, the mistakes we made. The free software world has changed over the last years. The one thing remains essential - importance of community, people who use the product.

observIQ Cloud and the OpenTelemetry Collector

Our log agent is powerful, efficient, and highly adaptable. Now, with OpenTelemetry setting new standards in the observability space, we wanted to incorporate that collaboration into our log agent and offer our users the ability to take advantage of the OpenTelemetry ecosystem. Starting today, you can upgrade the log agents in your observIQ account to the new Open Telemetry-based observIQ log agent with a single click.

The More You Monitor: What Are the Three Pillars of Observability?

A common way to discuss observability is to break it down into three types of telemetry: metrics, traces, and logs. These three data points are often referred to as the three pillars of observability. In this episode of The More You Monitor Product Manager, Chris Sternberg, breaks down the three pillars of observability and how they can help you gain better control and visibility of your infrastructure, applications, and networks. It’s important to remember that although these pillars are key to achieving observability, they are only the telemetry and not the end result.

The Confident Commit | ep. 11 Observability and CI/CD: meaningful measurement with Charity Majors

Rob sits down with Charity Majors to discuss the journey to creating Honeycomb, business building practices, and the importance of proper CI/CD and monitoring. Charity gives us the latest insights on observability and the necessities for engineering team success. What metrics are meaningful for your team to measure? Which ones are not? Tune in today to find out. Watch, learn, and leave us a comment with your thoughts, questions, or ideas for future podcast episodes.

Understanding Managed Service Providers (MSPs) and What They Do

The field of information technology has advanced at a breakneck pace in the last 20 years. Hence, it has become imperative for any business to know and adopt technologies that can make them productive and more competent at the same time. However, not all firms have the resources to expand their team of IT professionals for several reasons, particularly within small and medium-sized businesses.

ChaosSearch Named "Most Likely to be the Next Boston Unicorn" in Startup Boston's Community Awards

This week, ChaosSearch announced some exciting news -- we're honored to have been named Most Likely to be the Next Boston Unicorn in Startup Boston’s first-ever Community Awards! The award celebrates companies that have made extraordinary achievements in the startup ecosystem in Boston, and represents grassroots recognition from tech entrepreneurs, employees, investors, journalists, and educators in the region.

By 2030, e-Waste Will Reach 2.5 Million Metric Tons Unless We 'Sober Up'

Let’s admit something without shame: we love to use new technology. Maybe the latest smart phone, or a new lightweight laptop or tablet that makes it easier to work from anywhere, it’s all fair game. And when we are at our desk, we enjoy creating expansive workspaces with multiple monitors, a full-size keyboard, and other extras. When we look across our household, we might have duplicate, perhaps triplicate resources growing our technology real estate exponentially.

Model-driven observability: Embedded Alert Rules

This post is about alert rules. Operators should ensure a baseline of observability for the software they operate. In this blog post, we cover Prometheus alert rules, how they work and their gotchas, and discuss how Prometheus alert rules can be embedded in Juju charms and how Juju topology enables the scoping of embedded alert rules to avoid inaccuracies.

Sponsored Post

Kent RO sees 50% faster MTTR with Applications Manager

Kent RO is an Indian multinational healthcare product company and a leader in the reverse osmosis (RO) water purifier category. Founded in 1999, Kent RO pioneered and brought the revolutionary RO technology to India. With a vision to enhance the quality of everyday living, Kent RO has evolved as a market leader that provides technologically advanced healthcare products ranging from water purifiers, air purifiers, to water softeners.

Find the right person at the right time to fix the right issue with SCIM for Okta, Code Owners with GitHub, and more

If you know someone who actually likes managing work across projects, we’d love to meet this mythical being. Because we can’t imagine who enjoys hand-sifting through digital piles of notifications, prioritizing issues, then tracking down the right developer to assign the issue to. And once you’re done with that detective work, your engineer-of-the-hour may not even have access to the right tools to resolve the issue. Who’s got time for all this org chart spelunking?

UptimeRobot August 2021 Update: Configurable Timeout, support improvements and Plesk extension

We’ve been busy this summer and introduced two new features last month – longer intervals and a grace period for recurring jobs monitoring (Heartbeat monitors). There is some good news this month too, so let’s get right into it!

Authentication and Authorization for RESTful APIs: Steps to Getting Started

Why do APIs require authentication in the first place? Users don't always need keys for read-only APIs. However, most commercial APIs require permission via API keys or other ways. Users might make an unlimited number of API calls without needing to register if your API had no security. Allowing limitless requests would make it impossible to develop a business structure for your API. Furthermore, without authentication, it would be difficult to link requests to individual user data.

Avoid dropped logs due to out-of-order timestamps with a new Loki feature

Dropped log lines due to out-of-order timestamps can be a thing of the past! Allowing out-of-order writes has been one of the most-requested features for Loki, and we’re happy to announce that in the upcoming v2.4 release, the requirement to have log lines arrive in order by timestamp will be lifted. Simple configuration will allow out-of-order writes for Loki v2.4.

Scale for fully automated Kubernetes monitoring in minutes

Monitoring your Kubernetes clusters within Splunk Infrastructure Monitoring has never been easier. Just click data setup and select Kubernetes to begin learning more about your Kubernetes environment and workloads. Test drive your free trial of Splunk Infrastructure Monitoring today to seamlessly get your data in to navigate effortlessly and pinpoint problems in real time.

Monitoring password protected sites using Oh Dear

Keeping an eye on your site and sending you a notification when it goes down is one of the core features of Oh Dear. Under the hood, we'll send a request to your site and take a look if the response code is in the 200-299 range, which is the default response code range to indicate that everything is ok. Some of our users are monitoring password protected sites. In such cases, the web server might reply with status code 401 (unauthorised).

What Are JSON Web Tokens?

A JSON web token (JWT) is a URL-safe method of transferring claims between two parties. The JWT encodes the claims in JavaScript object notation and optionally provides space for a signature or full encryption. The JWT proposed standard has started to see wider adoption with frameworks like OAuth 2.0 and standards like OpenID connect leveraging JWTs.

Automate Your Way to a Better Citrix End User Experience

See how IT generalists can deliver exceptional end-user experience using Citrix, regardless of their individual level of Citrix expertise. Technology is the solution - Goliath’s embedded intelligence and automation automatically discovers and maps your Citrix Delivery Infrastructure, monitors it based on best practices, and then alerts on any issue that may impact end-user experience, so root cause can be isolated and fixed quickly.

Elastic and StackState Team Up to Pull Needles Out of Haystacks

Delivering great performance and reliability for your critical applications just keeps getting harder, doesn’t it? Between microservices, mercurial cloud resources, containers spinning up and down, distributed teams, specialized teams, and developers making changes, it’s an increasingly complex environment. With so many moving parts, if something goes wrong, how do you know what happened where, and what your environment looked like at the precise moment the problem began?

How Refinery Helps With Sampling Complex Event Data

Sampling is the practice of extracting a subset of data from a dataset to make conclusions about that larger dataset. It’s far from a perfect solution, but when it’s implemented with Refinery, Honeycomb’s trace-aware sampling proxy, sampling can help you manage very high volumes of complex event data.

Elastic named EMA Top 3 Award winner in Automatic End-to-End Observability

We are excited to announce that Elastic Observability has earned the Enterprise Management Associates Top 3 Award for Observability in 2021, a recognition of our commitment to empowering customers with products and features that advance digital transformation and solve real-life problems. This award is driven by EMA’s exhaustive, quantitative research into the top challenges and use cases facing developers, DevOps, SREs, IT professionals, and business professionals.

A simplified stack monitoring experience in Elastic Cloud on Kubernetes

To monitor your Elastic Stack with Elastic Cloud on Kubernetes (ECK), you can deploy Metricbeat and Filebeat to collect metrics and logs and send them to the monitoring cluster, as mentioned in this blog. However, this requires understanding and managing the complexity of Beats configuration and Kubernetes role-based access control (RBAC). Now, in ECK 1.7, the Elasticsearch and Kibana resources have been enhanced to let us specify a reference to a monitoring cluster.

What Is Distributed Tracing and Why You Need It

It is no surprise that monitoring workloads are top of mind for many organizations to ensure a successful customer experience. As our applications become more distributed and cloud-native, we find that monitoring can become more complex. A single user transaction fans out to interact with tens or hundreds of microservices, each one requesting data from backend data stores or otherwise interacting with each other and other parts of your infrastructure.

Nexthink Named Customers' Choice in 2021 Gartner Voice of Customer Report

The Nexthink team is excited to announce that we have been recognized in the September 2021, Peer Insights ‘Voice of the Customer’ report. Customers submitted over 150 reviews with an overall star rating of 4.6 (out of 5) and an impressive 4.7 rating (out of 5) for ‘Support Experience’ as of July 2021. Our team at Nexthink takes great pride in this distinction, as customer feedback continues to shape our products and services.

Service Watch Digital Experience Monitoring

Digital Experience Monitoring (DEM) solutions like Exoprise Service Watch provide the ultimate measure of how IT services and assets are enhancing - or not - the productivity of the business and employees no matter where they work. In this day and age, monitoring is crucial. The future of work is all digital, all connected, and all kinds of complicated.

Microsoft Teams Monitoring by Exoprise

Whether migrating from Skype for Business or embarking on an enterprise collaboration and communications upgrade, Microsoft Team's deployments require planning, assessment and continuous visibility for network optimization. CloudReady Teams Monitoring executes synthetic video, audio and messaging conference calls against an Exoprise AV Bot for end-toend Quality of Service (QoS) and WebRTC metrics. Easily deployed for 24x7, steady-state visibility into your infrastructure.

It's Here! Monitor Microsoft Teams Audio Video Conferencing

Exoprise released its long awaited Teams Audio Video Conferencing sensor. This sensor fully tests Audio/Video end-to-end capacity, throughput, and network performance through the actual underlying Microsoft Teams and Azure infrastructure. The Teams AV sensor provides deep insight into a network’s capability to handle the Teams/Skype Unified Communications (UC) platform.

Cannot connect to a website in Vietnam? Try these steps if your website is not accessible.

On September 4, 2021 a major submarine cable broke down in Vietnam causing network connectivity issues for a large portion of the population. Organizations hosted online and those with data centers outside those perimeters were hit the worst with most of their applications down or running extremely slow.

How the new Datadog plugin enhancements extend interoperability for our customers

At Grafana Labs, we have a “big tent” philosophy: We believe our users should be able to determine their own observability strategy and choose their own tools, so we help users bring different data sources together. Datadog is a powerful product used by many teams, and we hear a lot from customers about how we should further embrace and support this critical data source — which is why we created the Datadog data source plugin a few years ago.

How to Debug Cloudflare Workers with AppSignal

In this article, you’ll learn how to capture error logs in your Cloudflare Workers application using AppSignal. We’ll build a simple workers project and integrate AppSignal’s code to collect the necessary metrics. We’ll also learn how to utilize AppSignal’s dashboard to analyze and track errors. Let’s get stuck in!

Catchpoint Co-Founders Q&A: What Better Way To Celebrate Our 13th Birthday?

To celebrate our 13th birthday today, I sat down with Catchpoint's co-founders and my friends, Mehdi Daoudi, Chief Executive Officer, Drit Suljoti, Chief Product and Technology Officer, and J. Scotte Barkan, Chief Technology Officer (dialing in from Long Island after a long week of patch fixes), for an informal chat. We looked back to the days when they all met at DoubleClick prior to the three of them (along with Veronica Ellis, now a Principal Engineer at Eventbrite) founding Catchpoint.

What's new in Elastic Maps: Maps tailored to your geospatial data

Sysadmins, cartographers, and dashboard designers can now personalize Elastic Maps to create richer geodata stories. The 7.14 release of Elastic Maps has the geo capabilities to highlight points of interest, hide unnecessary details, and help you explore new trends in your data. Elastic Maps is available now on Elastic Cloud — the only hosted Elasticsearch offering to include all of its latest features.

Network monitoring: The build vs. buy dilemma

AI-based monitoring and anomaly detection is the key to ensuring that businesses can keep pace with the high level of service required for mission-critical applications. Early, contextual detection is a basic requirement for speedy resolution. AI-based monitoring creates more visibility and provides the agility needed to mitigate the outages, blackouts, glitches and issues that do and will happen.

Proactively monitoring customer experience and network performance in real-time

According to a recent Capgemini research, fewer than half (48%) of consumers feel that the connectivity services that they have today adequately meet their remote needs. Still, many CSPs openly admit that they often hear about service issues via social media and sites like Downdetector. And with the fixed/mobile convergence, a negative home broadband experience now has the potential to cause churn for CSPs’ mobile customers too.

Increase approval rates with AI-based payment transaction monitoring

The financial world is being increasingly digitized and decentralized with many more networks, data sources and movements that need to be reconciled and monitored — not to mention satisfy compliance requirements with various regulatory bodies. The digital transformation and decentralization of the Fintech segment has resulted in increasing channel complexities, more third party applications and a higher volume and velocity of payments data that need to be monitored in real-time.

Payment gateway analytics for payment service providers

Payment gateway analytics tracks the payment processing journey and related event data across all payment gateways. When used efficiently, payment gateway analytics can benefit businesses by providing insights into their revenues, payment trends, and customer behavior. Payment gateway analytics provides much needed visibility into the payments environment to enable the fast detection of transaction performance issues, anomalies or trends.

How to Retain Customers on your SaaS Platform: Four Key Techniques

New customer logos may be the lifeblood of top-line revenue growth and the focus of sales and marketing teams. But renewals have emerged in the last few years as a key growth driver for SaaS companies. A recent McKinsey study indicated that existing customers can drive between a third and a half of new revenue growth, even at startups. And we’ve all seen the data that says new customer acquisition costs 5X as much as customer retention.

Why a Slow Website Hurts Your Conversion Efforts

Six years ago, Microsoft found that our global attention span had shrunk from twelve to eight seconds in just five years. This was back in 2015 when Instagram had 400 million users (it currently has over 1 billion), and TikTok, the 15-second-video king, wasn't even born. Yes, our patience is becoming shorter and shorter, and the internet is flooded with websites and content. But all is not lost. If you make sure your website loads faster than average, you may have a chance to bag a potential customer.

Compare and optimize your code with Datadog Profile Comparison

Code profilers offer detailed insight into the efficiency of application code by measuring things like the execution time and resource utilization of a service. Datadog’s always-on, low overhead Continuous Profiler provides snapshots of code performance for a service that are tagged with key metadata (e.g., region, service, release), so you can easily identify and optimize inefficient code.

AWS X-Ray vs Jaeger - key features, differences and alternatives

Both AWS X-Ray and Jaeger are distributed tracing tools used for performance monitoring in a microservices architecture. Jaeger was originally built by teams at Uber and then open-sourced in 2015. On the other hand, AWS X-Ray is a distributed tracing tool provided by AWS specifically focused on distributed tracing for applications using Amazon Cloud Services. Jaeger is a popular open-source tool that graduated as a project from Cloud Native Computing Foundation.

What's new in Grafana Cloud for September 2021: New panels, query caching, synthetic monitoring updates, and more

Here at Grafana, we’re constantly shipping new features to help our users get the most out of Grafana Cloud. Over the last few months, we’ve made it even easier to get started with out-of-the-box dashboards and new visualizations in Grafana Cloud. We also introduced capabilities like query caching, a “prettify JSON” option and commands for cortex-tools to make your data, dashboards, and queries more efficient.

Overprovisioned and Overspent: Optimize Before You Lift and Shift

Do you know what’s wrong with “lift and shift?” Everything. But why? In this video, SolarWinds Head Geek Leon Adato and Technical Content Manager for Community Kevin M. Sparenberg dig into what goes wrong during on-prem to cloud migration of applications and systems in many organizations and how monitoring can help not only avoid those problems but improve the overall outcome as well.

Logging Gitlab Runners for MacOS and Linux

Gitlab is the DevOps lifecycle tool of choice for most application developers. It was developed to offer continuous integration and deployment pipeline features on an open-source licensing model. GitLab Runner is an open-source application that is integrated within the GitLab CI/ CD pipeline to automate running jobs in the pipeline. It is written in GoLang, making it platform agnostic. It is installed onto any supported operating system, a locally hosted application environment, or within a container.

Open-Source Monitoring With SolarWinds AppOptics

In software terms, “open source” means applications and their source code are available for the public to download and modify free of cost. Anyone can access, edit, and supplement the code to create an enhanced version of the application. Vendors often do this by forking the source code to create their own version of the application, marketing their version commercially.

Server Health and Health Checks: A Detailed Guide

Undeniably, monitoring your servers is extremely important. Not only does it help you stop issues daily, but it also helps you with tasks like scaling and capacity planning. But no matter how advanced your monitoring is, it always starts with a simple server health indication. Actually, maybe “simple” isn’t the best word here. “Server health” usually gives you a “healthy/not healthy” indication.

No more searching for a needle in a haystack: A world where Elastic & StackState team up

Meeting the goal of delivering great performance and reliability in the face of our ever-changing, increasingly autonomous IT environments is fundamentally challenged by a data problem. Sure, there’s lots of it - logs, metrics, and APM traces - but it is exceedingly hard to extract actionable information when there are so many fast moving parts.

Broadcom Awarded Highest Vendor Score in EMA Radar Report For NPM

Broadcom is proud to be named a “Value Leader” in the 2021 EMA Radar Report For Network Performance Management. Broadcom received the highest vendor strength score and was selected as having the best alert and alarm management. We believe this recognition validates our strong NetOps vision and our ability to speed the delivery of new network monitoring software innovations that help address the network transformation challenges of our customers.

What is QUIC? Everything You Need to Know

When I hear QUIC, my immediate reaction is, “The QUICk brown fox jumps over the lazy dog.” That sentence has been ingrained in me since my first typing classes decades ago! I doubt the creators of QUIC were going for this type of reaction when they put together the name… but the good news is that this isn’t an article on typing. We’re diving into what the QUIC protocol is, how it works and how it’s used, and how it’s going to impact web traffic in the future.

Why ZE PowerGroup chose Applications Manager to monitor its data analytics platform

ZE PowerGroup Inc. is a British Columbia-based software company. It offers ZEMA, an award-winning data management, analytics, and integration platform. Although ZEMA was created in-house, the developers at ZE were never successful at measuring the performance of the application during the initial years. They tried a few third-party tools, but measuring the actual application performance continued to be a dilemma until they evaluated ManageEngine’s Applications Manager.

Auto-Instrumenting Ruby Apps with OpenTelemetry

In this tutorial, we will go through a working example of a Ruby application auto-instrumented with OpenTelemetry. To keep things simple, we will create a basic “Hello World” application, instrument it with OpenTelemetry’s Ruby client library to generate trace data and send it to an OpenTelemetry Collector. The Collector will then export the trace data to an external distributed tracing analytics tool of our choice.

Status Dashboards: Now with dark mode!

Since we first launched our customizable, brandable, public status dashboards, customers have been asking us for a dark mode version. We’re excited to announce dark mode has arrived! StatusGator is the premier status page aggregator that collects the status of all the services you depend on and organizes them into a handy public dashboard you can send to your team, users, or stakeholders. And now it won’t blind you with it’s bright white background!

Status Dashboards now automatically refresh!

Many customers have requested that StatusGator’s customizable, brandable, aggregated status dashboards automatically refresh. Well your auto refreshing dreams have just come true because StatusGator status dashboards now update every 5 minutes automatically! Each dashboard will now refresh every 300 seconds, otherwise known as 5 minutes, automatically. And, of course, you can still refresh your browser yourself to get the latest content.

Logz.io's New Lookz is Generally Available!

Back in June, we announced the Public Beta for Logz.io’s New Lookz – which is a new UI that completely changes the way users navigate across Logz.io products and features. The Public Beta gave users the option to toggle between the old and new UIs to see which one they liked better. And the answer from our users was as clear as it could be.

You can now customize the names of services in StatusGator

StatusGator is the easiest way to publish a unified status page featuring the status of all the services you depend on. Our public status dashboards have become a favorite feature allowing schools, startups, and enterprises alike to publish a quick and easy page showing the status of all their cloud services. One commonly requested feature has been the ability to customize the name of each status page listed in your dashboard.

How to extend the Geth collector

This is the the last of a 2-part blog post series regarding Netdata and Geth. If you missed the first, be sure to check it out here. Geth is short for Go-Ethereum and is the official implementation of the Ethereum Client in Go. Currently it’s one of the most widely used implementations and a core piece of infrastructure for the Ethereum ecosystem. With this proof of concept I wanted to showcase how easy it really is to gather data from any Prometheus endpoint and visualize them in Netdata.

How Do You Know Your Website is Down?

Did you know that California was one of the earliest adopters in the world for earthquake automated detection? Though rudimentary, early systems were literally horns strapped to government buildings, the idea was simple: sound an alarm the moment that an earthquake could be confirmed. The critical period of warning residents get can prove the difference between finding shelter and securing your family. In a land where earthquakes level buildings, detection was critical.

Introducing our new support bubble

Like most SaaS products, Oh Dear is a living platform. We add new features proposed by our users, fix bugs that get reported, and regrettable also sometimes introduce new bugs. Most users use email to communicate with us. Even though sending an email is often perceived as friction-free, it can be a minor hurdle. We've introduced a little support bubble at the bottom of every page to make it easier for our users to pass us feature requests and report bugs.

3 Ways to Use the xMatters and Google Operations Suite Integration

Not too long ago, you would have needed development experience to oversee the delivery of scalable and reliable software. But with the rise of low-code and no-code tools, that requirement is now obsolete. What used to be hours of coding has turned into a few minutes of dragging and dropping.

Workload Pricing and SVCs: What You Can See and Control

The Cloud Monitoring Console (CMC) lets Splunk Cloud Platform administrators view information about the status of a Splunk Cloud Platform deployment. For workload pricing, the CMC lets you monitor usage and stay within your subscription entitlement. From the CMC you can see both ingest and SVC usage information and can gain insight into how your Splunk Cloud Platform deployment is performing.

What is Splunk Virtual Compute (SVC)?

A Splunk Virtual Compute (SVC) unit is a powerful component of our workload pricing model. Historically, we priced purely on the amount of data sent into Splunk, leading some customers to limit data ingestion to avoid expense related to high volumes of data with low requirements on reporting. With Splunk workload pricing, you now have ultimate flexibility and control over your data and cost.

Destination Transformation: Planning Your Cloud Migration Journey

If the cloud is a destination you have planned for any of your enterprise workloads, then you need to be prepared to navigate the journey that is the cloud migration process. It’s not unlike planning for a physical trip to a fabulous destination (or maybe we’re just really really ready to start traveling again). Either way, we’ve got some travel tips to ensure that your cloud-bound workloads have a great trip.

Greater Expectations: How the Smartphone Put Employees in Charge

The first time I was introduced to a real, proper PC was in the early 90’s: my uncle’s work PC, which I used instead to play games. In fact, my first experience with any kind of tech support service was when I phoned the Codemasters help line after getting stuck on a particularly knotty puzzle in the game ‘Dizzy’. No matter how loudly I shouted into the phone, the machine continued to route me back to the beginning of its automated script.

Getting Started with PHP and InfluxDB

This article was written by Cameron Pavey, a full-stack dev living and working in Melbourne. Scroll below for this picture and bio. As a developer, it is likely that you will eventually run into a situation where a traditional relational database’s document stores don’t quite cut it. If you need to store points of data over time, you’ll likely need a time series database.

Logz.io Extends Alert Communications via Microsoft Teams Integration

If you’re a DevOps practitioner working in a Microsoft-centric environment, you’ll be pleased to learn that Logz.io recently added support for the popular Teams communications hub to help broadcast pressing alerts and other monitoring data. The integration comes on the heels of making the Logz.io platform directly available from within the Azure Console and expands organizations’ abilities to communicate and share notifications about everything from log data to security events.

Jaeger vs Elastic APM - key differences, features and alternatives

Jaeger is an open-source end-to-end distributed tracing tool for microservices architecture. On the other hand, Elastic APM is an application performance monitoring system that is built on top of the ELK Stack (Elasticsearch, Logstash, Kibana, Beats). In this article, let's explore their key features, differences, and alternatives. Application performance monitoring is the process of keeping your app's health in check. APM tools enable you to be proactive about meeting the demands of your customers.

OP5's network monitoring as an alternative to SolarWinds' Orion

An infamous cyberattack in late 2020 made SolarWinds a household name in the tech industry after it was discovered to be at the center of a supply-chain attack on its Orion network management tool. That attack allowed state-sponsored actors to push a malicious update to nearly 18,000 customers, including U.S. government agencies and about 100 large private enterprises.

Integrate SCOM with your tools using Webhooks Webinar 9-Sep-2021

In this webinar, we demonstrate how our latest product, Connection Center, uses Webhooks to seamlessly integrate SCOM with any of your other IT tools. Now you can integrate SCOM with anything using Webhooks. Our SCOM integrations are 100% code-free, meaning you can make your SCOM alerts truly actionable using real-time synchronization. We'll demonstrate the simple setup process, our plug and play integrations mean it only takes a matter of minutes to get SCOM working in unison with your enterprise applications,.

New in Grafana 8.1: Gradient mode for Time series visualizations and dynamic panel configuration

Grafana 8 brought with it many exciting new features, including the launch of a new alerting system and the expansion of Grafana’s live and streaming data functionality. We didn’t stop there. In Grafana 8.1, alongside new additions like the Geomap and Annotations panel, we introduced some new features to the Time series panel as well as two transformations to help make panel configuration more dynamic.

Product Explainer Video Short: Splunk Infrastructure Monitoring for Real-time Cloud Monitoring

Wherever you are in your cloud journey and whatever your environment looks like, Splunk Infrastructure Monitoring is a purpose-built metrics platform to address real-time cloud monitoring requirements at scale. Get real-time observability for data from any cloud, any vendor, and any service.

Proactive Microsoft Teams Service Monitoring

The IT department of an organization is tasked with helping maintain the productivity of their Microsoft Teams and Microsoft 365 services, facilitating effective service delivery for all users. The executives of the business (VIPs) are sensitive to optimized performance as they must regularly conduct very important calls and meetings.

observIQ Releases First PnP Solution for monitoring arm-based Kubernetes

Arm-based Kubernetes clusters have been in use for a while, albeit mostly for niche uses, by enthusiasts, and DIY hobbyists. But that is changing. Arm architecture offers an efficiency and scalability that other architectures do not, and that makes it appealing to businesses.

Node.js Performance Monitoring

Software developers use the Node.js environment to develop robust and innovative applications. But the bigger the goal, the higher the risk. Learn about Node.js performance monitoring to ensure quality and risk-free software products. Part of diligent software development is making sure all system applications work well individually and as a whole.

Mean Time to Innocence: Avoiding the Blame Game with Desktop Virtualization

Mean time to innocence (MTTI) is a term used by IT teams to prove that their respective domain is not the source of a particular issue. In other words, it’s a fancy term to avoid blame when something goes wrong. Each team has its own domain-specific tools to prove the issue is not their fault. With respect to desktop virtualization, here are just some of the domains that are relevant when diagnosing issues.

Green IT and DEX: Impactful Change Through Employee Experience.

The fast-paced adoption of digital workplace technologies provides fantastic opportunities to improve digital employee experience (DEX). But it comes at a cost. Digital technology has a serious—and increasing—impact on our environment, with a carbon footprint of about 4% of global carbon emission (that’s more than the aviation industry’s 2.5% contribution).

Flux Aggregation in InfluxDB: Now or Later

Aggregations are a powerful tool when processing large amounts of time series data. In fact, most of the time you’re going to care more about the min, max, mean, count or last values of your dataset than you will about the raw values you’re collecting. Knowing this, InfluxDB and the Flux language make it as easy as possible to run these aggregations, whenever and wherever you need to, and sometimes that leads people to running them in ways that aren’t as efficient as they could be.

Why You Need Error Grouping

Anything we can do to make debugging more efficient will lead to major cost and time savings. Currently, developers have to sift through alerts to pinpoint the root cause of errors and these alerts don't always contain meaningful data to help solve the problem. By grouping errors, developers have the opportunity to clear away the noise and focus on the bugs that are truly causing the most problems, making the task faster and more efficient.

Best practices for getting started with Datadog Network Performance Monitoring

Whether running on a fully cloud-hosted environment, on-premise servers, or a hybrid solution, modern services and applications are heavily reliant on network and DNS performance. This makes comprehensive visibility into your network a key part of monitoring application health and performance. But as your applications grow in scale and complexity, gaining this visibility is challenging.

Jaeger vs New Relic - Key differences, use-cases and alternatives

Jaeger and New Relic are tools used in the application monitoring and observability domain. While Jaeger is an open source tool under Cloud Native Computing Foundation, New Relic is a SaaS vendor in the observability domain. Let us explore the key differences between Jaeger and New Relic in this article. New Relic is an extensive SaaS tool and provides application performance as well as infrastructure monitoring. Jaeger provides an open-source solution for end-to-end distributed tracing.

A Comparison of 6 Airbrake Alternatives for Error Monitoring

We all need an application that can run smoothly, but this is not what we always get. After creating an application and putting it to use, you need to check and know when it faces exceptions/errors. As a result, this explains why the current market features several error tracking tools. Airbrake falls under the top-notch error monitoring tools used for log analysis and log management. However, it’s not without its problems.

Smart gardening with a Raspi and Prometheus

Let’s build a smart gardening system with Prometheus and a Raspberry pi. Having plants at home can reduce your stress levels and make your home look more delightful. Seeing your indoor oasis growing gives us a sense of accomplishment and makes us feel proud… until you see that first brown leaf. That’s when you start doubting your green fingers.

OpenSearch Queries: Query DSL and Beyond

OpenSearch has evolved rapidly since its fork from the source code of the last truly open source version of Elasticsearch. So far, the community’s work has focused on removing proprietary code from Elastic, including a number of things that were never purely open source themselves. These include some aspects of the querying languages and capabilities of Elasticsearch.

How Streamlining ITSM Operations Can Reduce Service Remediation Costs

When using Microsoft 365 services the main benefit of having a monitoring tool that can assess performance quality and identify issues is that it sends alerts into a ticketing tool such as ServiceNow for example, to initiate the process of remediating the problem. When you don’t have a monitoring tool in place then support tickets aren’t automatically sent and users must identify issues and send in tickets manually with little to no information on where the problem came from.

How to use AWS IoT SiteWise Edge and Grafana to collect and monitor industrial data on-premises

The AWS IoT SiteWise plugin for Grafana was created to enable AWS IoT SiteWise customers to visualize and monitor industrial equipment data using Grafana dashboards. Industrial customers use AWS IoT SiteWise to collect, process, and monitor their industrial data at scale. This plugin allows them to use Grafana dashboards to monitor this data, stored by AWS IoT SiteWise in the AWS Cloud.

Use Suspect Tags to improve App Performance

When you’re trying to optimize your application for performance, it helps to understand not only the number of people affected, but also user conditions of the slowest transactions, such as OS, browser type, and even connection type. When you’re looking at performance data, it can be hard to see the forest through the trees.

How LogicMonitor Can Use Errors From Airbrake in Dashboards and Alerts

LogicMonitor continues to grow its visibility of critical business infrastructure and applications. We acquired Airbrake to help see more of what is happening in a customer’s environment at the code level. Simply integrating Airbrake’s dashboard metrics within LogicMonitor alongside your existing networks, services, and cloud infrastructure enables you to gain access to errors and the context you need to effectively troubleshoot issues impacting your systems and users.

What is IT Automation?

Sogeti, a Managed Service Provider (MSP) that provides tech and engineering resources worldwide, had a crucial IT challenge to solve. The MSP operates in more than 100 locations globally and was using six different monitoring tools to monitor its customers’ environments. It was a classic example of tool sprawl and needing to scale where multiple teams of engineers relied on too many disparate tools to manage their customers’ environments.

Why SQL Server Monitoring Is the First Step in Improving Performance

SQL Server monitoring is continuous collection and analysis of usage, performance, and event metrics for Microsoft SQL Server. It’s the first step in optimizing performance for applications that depend on your data platform. Highly effective monitoring gives a bird’s-eye view of your entire data estate. It also provides the deep analytics necessary to perform root cause analysis on the most challenging performance problems.

Incident Review - What Was Behind the September 7 Spectrum Outage: A Case of Dr. BGP Hijack or Mr. BGP Mistake?

September 7, 2021, 16:36 UTC: an outage hit Spectrum cable customers in the Midwest of the U.S., including Ohio, Wisconsin and Kentucky. Users of their broadband and TV services hit social media to voice their annoyance at the disruption it was causing. Everything was resolved at around 18:11 UTC, and services were restored to users.

De Watergroep and Devoteam build Elastic Observability pipeline to deliver water to millions

De Watergroep is responsible for the supply of water to more than 3 million customers and hundreds of companies in Belgium. An organisation operating in the public sector, De Watergroep's main goal is to continuously ensure the availability of high-quality drinking water. De Watergroep also is constantly engaged in technological innovation, focusing on keeping distribution costs low, and making maintenance more cost efficient.

Secure your deployments on Elastic Cloud with Google Cloud Private Service Connect

We are pleased to announce the general availability of the Google Cloud Private Service Connect integration with Elastic Cloud. Elastic Cloud VPC connectivity is now available to all customers across all subscription tiers and cloud providers (AWS, Microsoft Azure, and Google Cloud).

All You Need To Know About HAProxy Log Format

HAProxy is one of the fastest and most widely-used load balancing solutions available today. If you’re already using HAProxy, or if you’re considering using HAProxy in your environment, then this is a great place to start. On this page, we discuss HAProxy logging and why logging is such a vital component of the load balancer implementation. We then take a deep dive into the logging offered by HAProxy.

Data Lakes Are Gaining Maturity, According to 2021 Gartner Hype Cycle for Data Management

IT leaders’ experiences with data lakes have been a roller coaster ride since their inception in 2010. To some, that roller coaster ride might resemble the canonical Hype Cycle graphic, trademarked by Gartner to show the maturity curve of technologies in a given category over time. This year’s Hype Cycle for Data Management report was just released, revealing that modern data lakes are poised to exit the Trough of Disillusionment and enter the Slope of Enlightenment in 2022.

Monitor your Netlify sites with Datadog

Netlify is a Jamstack web development platform that lets customers build and deploy dynamic, highly performant web apps. By uniting popular JavaScript frameworks, developer tools, and APIs into streamlined workflows, Netlify helps teams rapidly spin up and ship common Jamstack use cases, including e-commerce stores, SaaS applications, and corporate sites. Netlify supports these deployments with an integrated CI/CD tool, global multi-cloud edge network, and serverless backend.

Announcing support for EKS Anywhere

Amazon Elastic Kubernetes Service (EKS) is a cloud-based compute platform that includes a fully managed Kubernetes control plane in order to simplify cluster operations. AWS introduced EKS Anywhere to bring the operational ease of EKS to organizations that manage on-premise environments (e.g., to meet data sovereignty requirements).

Jaeger vs OpenTracing - Key differences, use-cases and alternatives

Jaeger and OpenTracing are both open-source projects. Jaeger was originally built by teams at Uber and then open-sourced. The OpenTracing project was also started by teams at Uber, and hence they are compatible with each other. While Jaeger is an end-to-end distributed tracing tool, OpenTracing is a set of APIs and libraries that can be used to instrument your application.

Top 7 Dynatrace Competitors to Know in 2021

Dynatrace is a publicly-traded global technology company that provides a software intelligence platform based on artificial intelligence (AI) and automation to monitor and enhance application performance, development and security, IT infrastructure, and user experience for enterprises and government organizations around the world. The headquarters of Dynatrace is in Waltham, Massachusetts. Dynatrace's CEO is John Van Siclen.

Maintaining reliable services with advanced Cloud Logging features

We’ve covered ingesting, routing, storing, and viewing logs from your services in Cloud Logging already, but what else can you do with all that data? In this episode of Engineering for Reliability, we show how you can use advanced features like alerting on logs, logs-based metrics, and capturing application exceptions in Error Reporting. Watch to learn how you can find issues faster, make your services more reliable, and keep your users happy.

InfluxDB IOx Tech Talks - Observability of InfluxDB IOx: Tracing, Metrics and System Tables

InfluxDB IOx Tech Talks - Observability of InfluxDB IOx: Tracing, Metrics and System Tables. The September 2021 edition of InfluxDB IOx Tech Talks is now available to watch on-demand. InfluxDB IOx Tech Talks are cone-hour community sessions that provide a chance to interact directly with Influxers about all things InfluxDB IOx and time series and a chance to get your questions answered in the live Q&A.

Automate and Virtualize the NOC: A Gannett/USA TODAY Network Case Study

Mission creep is a phenomenon that occurs after a project begins and gains momentum, but then gradually grows beyond the original, intended scope. One day you wake up and realize that, instead of an efficient, manageable project, you’ve got a monster on your hands. For enterprises in the midst of dynamic growth, IT infrastructure is often beset by mission creep. The incumbent organization acquires smaller operations, integrates their technology, and soon things are out of control.

How to Perform Python Remote Debugging

Debugging is the process of identifying, analyzing and removing errors in the software. It is a process that can start at any stage of the software development, even as early as the software has been written. Sometimes, remote debugging is necessary. In the simplest terms, remote debugging is debugging an application running in a remote environment like production and staging.

How to Clear Up Alert Storms by 90%?

Alerts are notifications from AIOps monitoring tools that indicate that there is an anomaly. IT teams get these alerts on their monitoring dashboard via emails or enterprise collaboration tools such as Slack or Teams. Service level agreements expect IT teams to analyze every alert within a specific timeframe and take appropriate action.

How to Control Alert Fatigue?

Alerts are indispensable to any IT operations system today. Site reliability engineers (SREs) or ITOps executives set up several monitoring tools for their IT landscape. When there is a change, high-risk action, or outage in any of these incidents, the monitoring tool triggers an automated alert. This could happen on the monitoring tool’s dashboard itself, via email, or enterprise collaboration tools like Slack or Teams.

The Five Data Pillars of Effective Root-Cause Analysis

The most effective way to understand an incident, resolve it and prevent it from occurring again is root-cause analysis. Simply put, root-cause analysis is the study performed by ITOps teams or site reliability engineers (SREs) to pinpoint the exact element/error that caused the unexpected behavior. Based on this, they plan remediation. Accurate and timely root-cause analysis can have a direct impact on the company’s top and bottom line.

Spotting and Avoiding Database Drift

Managing any database ecosystem is difficult enough: taking backups, maintaining statistics, and doing performance tuning all tax the time of the DBA or database developer. The job is complex even without considering the work you do to manage the various schema and data drifts that can occur. Unless you operate in a vacuum or within a single person organization (and even then, schema drift can occur), drift is going to manifest naturally and as the size of the environment expands.

Metrics now generally available in Honeycomb

Starting today, Honeycomb Metrics is now generally available to all Enterprise customers. You’ve adopted our event-based observability practices, in part to overcome the debugging roadblocks you hit when using custom metrics to identify application issues. But metrics do still provide value at the systems level. Now, you can easily see and use your metrics data alongside your event data in Honeycomb—all in one interface.

Tracing AWS Lambdas with OpenTelemetry and Elastic Observability

Open Telemetry represents an effort to combine distributed tracing, metrics and logging into a single set of system components and language-specific libraries. Recently, OpenTelemetry became a CNCF incubating project, but it already enjoys quite a significant community and vendor support. OpenTelemetry defines itself as “an observability framework for cloud-native software”, although it should be able to cover more than what we know as “cloud-native software”.

Painting a Complete Network Monitoring Picture: Why Context is Critical

In order to produce their masterpieces, artists like van Gough, Rembrandt, Picasso, and Monet painted with more than just one color. Being able to choose from multiple colors (not to mention an abundance of talent, inspiration, and creativity) is what allowed these artists to see their complete vision come to life on canvas. However, if you’re relying on a single set of data to troubleshoot network issues, it’s like you’re stuck painting with one color.

5 Best IT Experience Practices Your Team Can Make Today

If you were to put 100 enterprise tech leaders in a room together and ask them if they think their company’s employee experience is dependent upon IT, I’m certain all would agree it is. But I’m also certain those 100 wouldn’t know: For IT decision-makers, the devil is in the details. Many are judged by uncompromising Service Level Agreements (SLAs) and shoddy survey data, not comprehensive digital experience trends and indexes.

CheckMK and Enterprise Alert - a scripted heartbeat check

A few days ago I received an inquiry about a scripting problem from one of our longtime partners, to be exact our DCP Marc Handel from IT unlimited AG. In the exchange with Marc I realized that his idea to use the Enterprise Alert Scripting Host, the Windows Task Scheduler and CheckMK to realize a roundtrip monitoring could be interesting for the whole community. Especially for all our CheckMK customers.

Cloud or On-Prem? With Monitoring, It's Both-And, Not Either-Or

Despite the migration of services and systems to cloud (either all or in part), many of the fundamental aspects of the day-to-day work IT practitioners do hasn’t changed. It’s just moved. In this session, SolarWinds Head Geek Leon Adato and Technical Content Manager for Community Kevin M. Sparenberg discuss that state of affairs, as well as what monitoring can do to help view those resources as a contiguous whole, despite possibly being split across the on-prem/cloud divide.

Introducing the Lightstep Metrics plugin for Grafana

Chris Sackes is a Software Engineer at Lightstep. A New Yorker by birth, he loves public transportation, architecture photography, and urban exploration. He’s spent the last five years engineering delightful user experiences for a variety of applications. Lightstep’s powerful metrics reporting and analysis are now available for Grafana users. Using the new Lightstep Metrics plugin for Grafana, you can view metrics data reported to Lightstep directly in your Grafana instance.

Monitoring Amazon cloudfront with Graphite via Graphite APIs

MetricFire offers a complete system, infrastructure, and application monitoring using a suite of open-source monitoring tools. With MetricFire, you can monitor all your infrastructure on a single dashboard. The platform displays metrics on the dashboard using either Hosted Prometheus or Graphite-as-a-Service.

How Lowe's SRE reduced its mean time to recovery (MTTR) by over 80 percent

The stakes of managing Lowes.com have never been higher, and that means spotting, troubleshooting and recovering from incidents as quickly as possible, so that customers can continue to do business on our site. To do that, it’s crucial to have solid incident engineering practices in place. Resolving an incident means mitigating the impact and/or restoring the service to its previous condition.

Finding the Gaps in Your Data Causing Data Drift

When drift happens within a database, it can occur at a couple of different levels. Drift refers to entities—tables, views, or even data—out of synchronization with each other. This could be a difference in schema structure, data, or even operational metadata like permissions. Often, drifts happen between two different environments like development and staging databases.

The Ultimate Guide To Telemetry

If you’re anywhere in the Queensland region of northern Australia, look out. There’s an eight-foot-nine-inch-long (2.65 meters) crocodile, deceptively named Danny-Boy, who might be looking for a snack. Specifically, if you’re anywhere near -12.975388, 141.987344, you should stay on your toes. That’s the last place Danny-Boy was sighted. So unless you want your pipes to be calling, keep your eyes peeled.

Using Satellite Server for distributed environment monitoring

Today we will talk about one of the most versatile elements that Pandora FMS Enterprise offers us for monitoring distributed environments, the Satellite server. It will allow you to monitor different networks remotely, without the need to have connectivity directly from the monitoring environment with the computers that make it up.

What Is a Traffic Analysis Attack?

The times when it was enough to install an antivirus to protect yourself from hackers are long gone. We actually don’t hear much about viruses anymore. However, nowadays, there are many different, more internet-based threats. And unfortunately, you don’t need to be a million-dollar company to become a target of an attack. Hackers these days use automated scanners that search for vulnerable machines all over the internet. One such modern threat is a traffic analysis attack.

Understanding Cardinality in a Monitoring System and Why It's Important

The journey to becoming cloud-native comes with great benefits but also brings challenges. One of these challenges is the volume of operational data from cloud-native deployments — data comes from the cloud infrastructure, ephemeral application components, user activity, and more. The increased number of data sources does not only increase datapoint volume – it also requires that monitoring systems store and query against data with higher cardinality than ever before.

The 95th Percentile: How to Manage Capacity Before You Run Out

One of the largest challenges with network bandwidth metering is the way traffic flows. Traffic comes in bursts. It’s never a constant, predictable stream of data you can measure once, spec hardware for and be done with (wouldn’t that be nice?!). Instead, you need to account for the dynamic nature of bandwidth utilization and its impact on performance. You’ll never be able to predict every burst of traffic your network experiences.

Visualizing Your Time Series Data with the Highcharts Library and InfluxDB

If you’re building an IoT application on top of InfluxDB, you’ll probably use a graphing library to handle your visualization needs. Today we’re going to take a look at the charting library, Highcharts, to visualize our time series data with InfluxDB Cloud. However, I also encourage you to take a look at Giraffe, a React-based visualization library that powers the data visualizations in the InfluxDB 2.0 UI.

Taming Rails Logging with Lograge and LogDNA

Rails is a classic on Ruby for a reason. The framework is powerful, intuitive and the language has a low entry bar. However, being designed when systems existed on a single server, standard Rails logging is excessively fractionalized. Even on a single server, a straightforward call can quickly turn into seven unique, unconnected logs.

Build an Uptime Monitoring System in Ruby with GCE, Cloud Storage, and PubSub

Google Cloud Platform provides developers with many tools to build scalable apps in a way friendlier than AWS. In this article, Olasubomi Oluwalana shows us how we can use the Google Cloud Engine, Storage, and PubSub offerings to build an uptime monitoring system in Ruby.

Metrics first look, more robust frontend and much more - Signal 04

Folks! Great to have you over for our monthly product update aka Signal #04. This month we made great strides in both our frontend and backend pods. Metrics ingestion, testing frameworks, improved tracking features for gRPC calls and much more! We also crossed 200+ members on our slack community 🎉🎉🎉 Let's dive in to see what humans at SigNoz have been upto!

Monitoring Load Balancers with Grafana

Load balancers play an important role in distributed computing. With load balancers, you can distribute heavy work loads across multiple resources, which allows you to scale horizontally. Since they are placed prior to computing resources, they need to endure heavy traffic and allocate it to the right resources fast. For this to happen, monitoring the health and performance of load balancers is key. In monitoring, visualization helps users to view various metrics quickly.

Monitoring Network Switches with Grafana

In monitoring, a target system or device is a deciding factor in designing your monitoring stack. You will have to consider various aspects starting from how you want to collect data in what frequency to how you want to surface metrics to end users. You will have to take this strategic approach when you want to monitor your network infrastructure. In this article, we will discuss how Grafana, an open-source visualization tool, can help you to monitor network switches.

Observability 101 using OpenTelemetry & SigNoz @ Kubernetes Community Day

In this workshop, we will learn about the basics of observability and its benefits. We will take a hands on approach and actually instrument an application with OpenTelemetry, which is a vendor neutral instrumentation library. Then we will visualise the data sent by open telemetry with SigNoz, which is a full stack observability platform. In the last section, we will take an example of a real world issue and how this observability stack can be used to find the root cause of the issue.

How to Build a Multi-Cloud Mindset

We often talk about migrating applications to THE cloud, or running workloads in THE cloud, as if the cloud is one, homogenous environment. The reality is, of course, far more complex. There are private clouds and public clouds—and different public cloud service providers (CSPs) that each have their own particular capabilities and strengths. Modern, digitally transformed businesses usually leverage a combination of these clouds.

Top 10 Tools For Google Cloud Architect

Google Cloud Platform is a complex suite of services that are aimed at satisfying client’s computing, storaging, and application operating needs. App Engine, Cloud SQL, Cloud Speech API, Deployment Manager ( just a share of ) are the proud services of a GCP. All of them are developed for optimizing business and making business-client relationships easy and comfortable for sealing the deals and conversions go high.

What Is Network Monitoring?

Network monitoring is the practice of making sure the network as a whole, functions optimally by keeping a watch over all endpoints of a network, which is the heart of any business’s routine functioning. Any discrepancy in the form of a breach or slowdown could prove costly. Proactively monitoring networks helps administrators identify and prevent any potential issues that could occur at any time.

OpenTelemetry tracing - things you need to know before implementing

Setting up observability and robust monitoring for distributed systems is a challenging task. Engineering teams need access to different pieces of information to understand what's happening with their application. Is OpenTelemetry a step in the right direction for distributed tracing? Let's find out. Nothing can guarantee how your systems will behave in production. Things will go wrong, and it's critical to monitor your application for any signs that need troubleshooting.

9 Best Practices for Application Logging that You Must Know

Have you ever glanced at your logs and wondered why they don't make sense? Perhaps you've misused your log levels, and now every log is labelled "Error." Alternatively, your logs may fail to provide clear information about what went wrong, or they may divulge valuable data that hackers may exploit. It is possible to resolve these issues!!!

Sponsored Post

Announcing: Bitbucket for APM

Raygun's latest integration with Bitbucket gives you code-level insights into your traces, directly in APM. Today, Raygun expands its suite of integrations for APM, introducing the latest addition - Bitbucket. Once your Raygun account is integrated with Bitbucket, you'll be able to see method source code pulled directly from your repository when inspecting a method in APM. If this sounds interesting to you but you use GitHub instead of Bitbucket, don't worry, we've got you covered for that too. Gain greater context into code execution and get to the root cause of slow performance, faster.

Assign Read-Only Access to Users in Logz.io

Cloud monitoring and observability can involve all kinds of stakeholders. From DevOps engineers, to site reliability engineers, to Software Engineers, there are many reasons today’s technical roles would want to see exactly what is happening in production, and why specific events are happening. However, does that mean you’d want everyone in the company to access all of the data?

Root cause analysis using Metric Correlations

As complexity of systems and applications continue to evolve and change, the number of metrics that need to be monitored grows in parallel. Whether you’re on a DevOps team, an SRE, or a developer building the code yourself, many of these components may be fragmented across your infrastructure, making it increasingly difficult to identify the root cause when experiencing downtime or abnormal behavior.

Logging Agents Vs Log Libraries

Log management has been around for a long time, but how we manage our logs has changed profoundly over the years. For effective log management, there are times when you may have to trade off the new for the old, and vice versa. A clear understanding of log agents and log libraries will help assess what works best for different applications and infrastructures.

How Uptime.com Can Help Troubleshoot a Server Outage

Everyone has heard about the 3 AM wakeup call, but what about those troublesome issues that dig at your team and eat away at your SLA hours? Hard-to-diagnose issues can strike at any time. They leach from your team, hurt morale, impede the customer experience… it’s just a whole mess. These kinds of incidents are ones that test what “response” really means to your organization, as fixing them is not always a simple task. Something has gone wrong.

Eliminate Blind Spots by Monitoring Citrix Cloud and Cloud Connectors

Learn how our embedded intelligence proactively monitors Citrix Cloud Connectors and automatically alerts you if a cloud connector is down, all while taking remediation actions to avoid disruption in end-user experience.

Infrastructure as Code - IAC for Azure

Infrastructure as code and automating deployment and scale-up/down in Azure is becoming the new normal. Solution architects and system administrators are becoming coders and scripting is becoming part of their day-to-day job, whilst in parallel a raft of vendors is providing products to try and help avoid this need to script and address the shortage of staff with those skills to script and code this now necessary functionality.

Incident Review - AWS Outage Led To Spikes In Response Times For Applications Using AWS Services

On Tuesday August 31, users across large parts of the West coast (US-West-2 region) were impacted by major spikes in response time. Some of AWS’ most critical services were affected, including Lambda and Kinesis. SRE teams care about Service Level Indicators (SLIs) and Service Level Objectives (SLOs), and this practice is a must for SRE teams.

What to Do About Java Memory Leaks: Tools, Fixes, and More

Memory management is Java’s strongest suit and one of the many reasons developers choose Java over other platforms and programming languages. On paper, you create objects, and Java deploys its garbage collector to allocate and free up memory. But that’s not to say Java is flawless. As a matter of fact, memory leaks happen and they happen a lot in Java applications. We put together this guide to arm you with the know-how to detect, avoid and fix memory leaks in Java.

Why integrate SCOM with anywhere?

While SCOM is a valuable monitoring tool, you may also be using a suite of monitoring tools, such as SolarWinds to monitor network devices, VROps to monitor VMware, and Nagios to monitor your Linux devices, as all these tools are best in class. But, you don’t want to be looking in numerous different consoles to gather all your monitoring data!

Bi-Directional Integration for SCOM & your ITSM Tools

Bi-directional sync enables data to be sent to and from SCOM and your ITSM tools, in the following ways: a) OUTBOUND Notifications (PUSHES alerts from SCOM to another tool) b) INBOUND Notifications (PULLS updates on alerts into SCOM from another tool) This means you can choose which SCOM alerts to send across to your ITSM tools (Cherwell or ServiceNow), they are then raised as incidents, and then using bi-directional sync, info relating to the incidents is pulled back into SCOM (Incident ID, Configurat

AppDynamics vs. Dynatrace vs. Scout | A Side-by-Side Comparison

Choosing the perfect Application Performance Monitoring tool for your business always remains a tricky decision. There are so many options in the market, and each alternative has its own set of features and flaws. Sometimes, the profile of two solutions overlaps, which creates an even bigger grey area around which to opt.

Difference Between IPv4 and IPv6: Why haven't We Entirely Moved to IPv6?

IPv4 and IPv6 are the two versions of IP. IPv4 was first released in 1983 and is currently widely used as an IP address for a variety of systems. It aids in the identification of systems in a network through the use of an address. The 32-bit address, which may store multiple addresses, is employed. Despite this, it is the most widely used internet protocol, controlling the vast bulk of internet traffic. IPv6 was created in 1994 and is referred to as the "next generation" protocol.

Reduce costs and increase performance with query caching in Grafana Cloud

We are excited to announce the launch of query caching in Grafana Cloud, which can significantly reduce load times and costs of your most popular Grafana dashboards. Now, when the same query is submitted repeatedly, the results will come back from the cache rather than the data source itself. This not only lowers load times for popular dashboards, but will also reduce API costs for your data sources and decrease the likelihood that those APIs will rate-limit or throttle requests.

Overcoming Information Barriers in Microsoft Teams

The phrase “Teams is slow” means that somewhere something isn’t working. But who in the organization should lead the charge to address the problem – especially when the problem isn’t Teams? This blog will examine the different information barriers in Microsoft Teams and how to overcome them.

Investigating the Database Family Tree

Investigating your family tree can be an interesting experience. For example, what if you discovered you were related to a famous person who won a Nobel Prize or performed a heroic act? Conversely, what if you realized you had an ancestor who was an infamous criminal? Much like examining your genealogy can be an exciting adventure, looking at the family tree of your database can prove to be just as rewarding. Databases occasionally undergo a phenomenon known as drift.

Incident Review - Akamai Performance Degradation Slows Down Major Websites Worldwide

This summer has seen a series of outages and performance degradations from some of the world’s most widely used CDNs, including the June 8, 2021 Fastly outage (owing to DNS or configuration issues) and an Akamai outage on July 22, 2021 (also likely caused by DNS failure).

An Introduction to Distributed Tracing

There’s no strict definition of a distributed system. But generally speaking, if you have reached a point where you’re running more than five interdependent services at once, that means you’re running a distributed system. It also means you are more than likely experiencing difficulties when troubleshooting using traditional debugging tools. Unfortunately, pulling up multiple tools, each built for a monolithic world, doesn’t help pinpoint the problem.

Smarter CPU Testing - How to Benchmark Kaby Lake & Haswell Memory Latency

Modern CPUs are complex beasts with billions of transistors. This complexity in hardware brings indeterminacy even in simple software algorithms. Let’s benchmark a simple list traversal. Does the average node access latency correspond to say, a CPU cache latency? Let’s test it! Here we benchmark access latency for lists with a different number of nodes. All the lists are contiguous in memory, traversed sequentially, and have a 4 KB padding between the next pointers.

Full visibility of Microsoft Azure cloud service health - resolve issues before they impact on your customers.

For large digital enterprises Microsoft Azure and private cloud offering Azure Stack Hub have emerged as the strategic cloud platforms of choice for many organizations. Azure offers an open and flexible platform on which to quickly build, deploy and manage applications at scale.

NiCE Domino Management Pack 8.10

The NiCE Domino Management Pack is an enterprise-ready Microsoft SCOM add-on for advanced HCL Domino monitoring. It supports Domino system and application administrators in centralized Domino health and performance monitoring to improve user experience and business results. The Management Pack provides clear and precise performance indicators and timely alerts enriched by pinpointing problem identification and troubleshooting information.

Sponsored Post

6 Use Cases for Digital Experience Monitoring

If you live and breathe in the technology industry, chances are you are hearing Digital Experience Monitoring a lot these days. So what is Digital Experience Monitoring (DEM), and why is IT obsessed with it? With a remote-first culture brewing in every company, IT needs to ensure that employees on their machines are productive and satisfied with the performance of typical enterprise applications such as Microsoft 365, Salesforce, Workday, etc. A DEM solution collects application and desktop user experience (UX) insights holistically, giving IT a broader context for troubleshooting performance issues. Let's discuss six use cases for DEM.

How to Handle Exceptions in Java: Complete Tutorial with Examples and Best Practices

As developers, we would like our users to interact with applications that run smoothly and without issues. We want the libraries that we create to be widely adopted and successful. All of that will not happen without the code that handles errors. Java exception handling is often a significant part of the application code. You might use conditionals to handle cases where you expect a certain state and want to avoid erroneous execution – for example, division by zero.

11 Best Tips to Node.js Debugging that You Didn't Know

When people hear the term "Node.js Debugging," they immediately think of the function "console.log()." They also assumed that's how pros debug Node.js applications. Nah!!! That's not good enough, mate. You'll need more than the console.log() function to debug your Node.js application like a pro. If the proper technique is not taken before testing, debugging a Node.js application might be difficult. Testing is an essential part of the development process for any application, software, or website.

Best Practices for Logging in Node.js

Good logging practices are crucial for monitoring and troubleshooting your Node.js servers. They help you track errors in the application, discover performance optimization opportunities, and carry out different kinds of analysis on the system (such as in the case of outages or security issues) to make critical product decisions. Even though logging is an essential aspect of building robust web applications, it’s often ignored or glossed over in discussions about development best practices.

What is Synthetic Monitoring and Why is it Important for Your Business

As a business, whichever industry you are in, there is a fair chance that you depend upon online assets such as mobile applications or API’s for conducting operations. Assuming that one wants to ensure their availability, correct functioning and quick response at all times, it is important to use synthetic monitoring for better customer experience.

What is Forensic Analysis and Why is it Important for the Security of Your Infrastructure

With the advent of cybercrime in recent years, tracking malicious online activities has become imperative for protecting operations in national security, public safety, law and government enforcement along with protecting private citizens. Consequently, the field of computer forensics is growing, now that legal entities and law enforcement has realized the value IT professionals can deliver.

Shortcut to Value With Loggly

In this video, we will show you how Loggly is laid out and demonstrate the major functions that will have you leveraging the out-of-the-box functions immediately. The SolarWinds® Loggly® log management service integrates into the engineering processes of teams employing continuous deployment and DevOps practices to reduce mean time to resolution (MTTR), improve service quality, accelerate innovation, and better use valuable development resources.

Modern Security Monitoring Demands an Integrated Strategy

The ultimate success of any security monitoring platform depends largely on two fundamental requirements – its ability to accurately and efficiently surface threats and its level of integration with adjacent systems. In the world of SIEM, this is perhaps more relevant than any other element of contemporary IT security infrastructure.

Cost of ELK

Do you know how much your ELK stack costs? Managing and analyzing your data is a critical part of your business. However, the true cost of an ELK stack can be hard to calculate, and the truth is you may be spending a lot more than you think. Elasticsearch wasn't designed to work efficienctly at the scale required by today's data volume, especially the growth of log data. As your data grows, your ELK stack becomes more expensive to scale and maintain, leaving you with the headache and the tab. Well, ChaosSearch has the answer.

Monitoring Cisco webex with Grafana

In this article, we will take a look at what Cisco Webex is, how it works, and why it is great for your business. Then we will explore how to monitor Cisco Webex metrics using beautiful and customizable Grafana dashboards. We’ll also look at what are the most popular data sources Grafana uses. Finally, we will figure out how MetricFire simplifies the task of monitoring metrics for us and what are its main advantages.

Java Application Manual Instrumentation for Distributed Traces

In this blog series, we are covering application instrumentation steps for distributed tracing with OpenTelemetry standards across multiple languages. Earlier, we covered Golang Application Instrumentation for Distributed Traces and DotNet Application Instrumentation for Distributed Traces. Here we are going to cover the instrumentation for Java.

IBM MQ 9.2.3 Streaming Queues: Your Integration Infrastructure (i2) Data Win

By now, you have probably seen the announcement for IBM MQ 9.2.3. The first thing to mention is that Nastel had support for 9.2.3 right away. Nastel Technologies is an integration infrastructure management (i2M) solutions company, and IBM MQ is at the heart of most enterprises’ integration infrastructure (i2). Nastel works with the IBM teams to ensure we are ready for any new releases and changes. One key enhancement included in 9.2.3 is the streaming queue.

Monitoring HAProxy Logs and Metrics with Sumo Logic

HAProxy is one of the world’s most innovative and highest-performing load balancing solutions. The load balancer is critical for enabling high availability and supporting the dynamic scaling of infrastructure within modern applications. Because of its importance, engineers need tools that can quickly and effectively diagnose any problems with the load balancer if they arise.

SharePoint, SharePoint Online Monitoring

If your organization leverages SharePoint for content, communication and collaboration then you must continuously test and monitor your SharePoint installations. Whether its SharePoint Online in Office 365, Hosted SharePoint on Azure or your own SharePoint 2016 installation, synthetic uptime and performance monitoring will help you find and fix problems, detect changes and plan your network and servers for capacity.

Skype for Business Monitoring

CloudReady Skype for Business Sensors continuously execute real conversations between Skype endpoints and capture metrics to determine the health of your network, ISPs, WiFI, Edge and Media Servers. Deploy sensors across multiple branch and cloud locations to model your different topologies, network conditions and scenarios for always-on monitoring and end-to-end test.

Seismic Shifts in NetOps: The Case for Modern Tools

In recent months, a lot has changed in network operations (NetOps). Networks, architectures, and entire operational models have shifted dramatically-disrupting and destabilizing digital business services. Unfortunately, most NetOps teams are still relying on legacy tools and approaches. This white paper offers a look at how the world has changed - and the new capabilities your team needs to succeed.