OpsMatters was always meant to be a platform for everyone. When our founder, Gerald Curley, first had the idea to create a one-stop shop for all news and information related to operational tools and applications, he did not want to make it an elite club. Decision makers in these industries have a wide range of skills and backgrounds, so if you find yourself a bit overwhelmed with all the different options, we don’t blame you.
To help us all play a bit of catch up, we thought it could be helpful to publish some more informational pieces about the various industries we cover, starting with one of the most prominent and evolving areas of operations: monitoring.
Now, as you can imagine, this industry is broad. Just about anything can be monitored, and in the world of tech you truly can find a monitor for nearly every piece of your business. With that in mind, this will be a 3-part series, starting today with application monitoring. Let’s break this down into smaller pieces and discuss the key terms and major players in this dynamic industry.
One of the bigger areas within this type of monitoring is all about keeping you up-to-date with the performance of your website, server, application, network, and more. Generally, they are great tools if you need to be constantly updated with business-critical information as it happens. Here are a few of the terms you may hear when researching performance monitoring tools.
Application Performance Management (or APM) is “the art of managing the performance, availability, and user experience of software applications.” By monitoring the speed at which transactions are performed by end users or any part of your system itself, you can more quickly understand why these transactions are slow or failing.
The general consensus of late seems to be that the term “APM” has become diluted as competition increased and more vendors from unrelated backgrounds (e.g. network monitoring, application instrumentation, systems management, etc.) entered the market. The term has evolved into “a concept for managing application performance across many diverse computing platforms, rather than a single market.”
As I mentioned, the competition and number of vendors in the APM market has exploded in recent years, but a few of the big players are Dynatrace, AppDynamics, Datadog, Stackify Retrace, New Relic APM, Raygun, AlertSite and eG Innovations.
Also known as transaction monitoring, business transaction management is essentially the supervision of the bigger picture by looking at the smaller details. These tools keep an eye on “critical business applications and services by auditing the individual transactions that flow across the application infrastructure.” By measuring the response time performance of each component of a system (along with the links between each component), these tools provide operations teams the data they need to see precisely where a performance impact is happening.
If you are familiar with call stacks, you know they involve showing the flow of execution (Method A called Method B, which called Method C), along with details and parameters about each call. These are great tools for debugging in monolithic architectures or services that run on a single process, but what about everything else?
Distributed tracing is “the equivalent of call stacks for modern cloud and microservices architectures, with the addition of a simplistic performance profiler thrown in.” Examples of companies and projects in this space include Datadog, LightStep, SignalFx and JaegerTracing.
The latest buzzword in monitoring, observability, is actually not new at all: the term comes from the world of engineering and control theory. Wherever it came from, some have said it is “basically monitoring, but on steroids.” It is up to you to decide if you agree with that notion, but here’s the gist of it: whereas monitoring is something we actually do (a verb), observability (a noun) is more of a property of a system. It is a measure of “how well internal states of a system can be inferred from knowledge of its external outputs.”
Moving on from performance monitoring, let’s take a look at logs. In a computing context, logs are documentation of events relevant to a specific system. They are automatically produced when something notable happens and each one has a timestamp.
Log management is “the collective processes and policies used to administer and facilitate the generation, transmission, analysis, storage, archiving and ultimate disposal of the large volumes of log data created within an information system.” Although the term is fairly general, effective log management is an essential piece of security and compliance: regulations such as HIPAA, the Gramm-Leach-Bliley Act and the Sarbanes Oxley Act all have specific rules relating to audit logs.
This term relates to the analysis of computer-generated records, aka logs. The term is pretty self-explanatory, but its implications are important: this branch of analytics “helps in reducing problem diagnosis, resolution time and in effective management of application and infrastructure,” which thereby provides essential support for any existing or new data source.
Mobile Log Management
As our world continues to become more mobile-focused, so must our tools. As with the general “log management” we discussed earlier, mobile log management involves the collection and analysis of important log events. However, being specific to mobile apps, these tools have the ability to understand and track log data from a source (e.g. a smartphone) that is increasingly becoming many users’ main contact with some companies.
These days it seems like everyone and their mother has a website. Not all sites are created equal, though, and if they want to ensure it is running smoothly for all visitors, odds are they use website monitoring tools. There are many types of tools within this area, so let’s dive into a few.
If you use the web, it is a near guarantee that many of the sites you visit are monitoring you. End User Experience Monitoring, or EUEM, “enables teams to monitor the impact of application and device performance from the end user’s point of view.” There are four primary approaches to EUEM:
1. Synthetic monitoring is a method that involves simulating users’ interactions with your application. Sometimes referred to as “robotic testing,” these scripts are run from various locations at regular intervals, ensuring all systems are running smoothly for users in each geographical area. Examples of synthetic monitors include Uptrends, Monitive. Pingdom and Site24x7.
3. Real User Monitoring (RUM) is “a passive monitoring technology that records all user interaction with a website or client interacting with a server or cloud-based application.” This method, again, is different than synthetic as it is capturing metrics that reflect real end users’ experience. Some RUM tools are AppDynamics, Stackify and Catchpoint.
4. Device-based EUEM, or Device Performance Monitoring (DPM), uses lightweight agents to track the health and performance of end users’ PCs, laptops, and virtual desktops. While these metrics definitely relate to the experience of the end user, they do not provide any visibility into how the end user actually experiences the applications they are using. Products in this space are not yet covered on OpsMatters, but if you are seeking more information, please visit Riverbed SteelCentral or Nexthink.
Website monitoring services often include other related services such as certificate monitoring, domain monitoring, content monitoring, DNS monitoring and blacklist monitoring.
Exception and Crash Reporting
Another facet of website (and app) monitoring, exception and crash reportingallows you to “measure the number and type of caught and uncaught crashes and exceptions that occur.” While some people may picture a desktop application terminating abruptly or a blue screen, on the web a crash usually looks more like a mobile app suddenly shutting down, an unresponsive website or even a strange server error. Tools that can help you monitor crashes and exceptions include Raygun, Sentry and Firebase Crashlytics.
An API, or Application Programming Interface, is a set of routines, protocols and tools for building software applications. In the simplest terms, an API specifies how software components should interact. Knowing this, it is a bit easier to understand what API monitoring is: the practice of monitoring APIs, “most commonly in production, to gain visibility into performance, availability and functional correctness.” These tools are designed to help users analyze how their applications are performing and, in turn, improve any of their poorly performing APIs. Because API services, like customer websites, are expected to be “always on,” reliable monitoring and maintaining 24/7 uptime is of critical importance.
As you may have guessed, API monitoring is an offshoot of website monitoring, as the underlying protocols and interactions are the same. However, it commonly utilizes more secure (and complex) authentication mechanisms such as Open Authorization (or OAuth). Additionally, API monitoring is becoming increasingly important these days due to the trend in delivering services as remote APIs using SaaS platforms and gateways.
Products in this space have a variety of features, but often include monitoring, testing and load testing capabilities. If you are looking to implement some API monitoring tools, check out Apigee, SmartBear, API Fortress and Checkly.
As you can see, the application monitoring field is quite broad and very deep. It is a crowded space with many offerings and seemingly endless competitors, but we hope that this piece (and the rest of the series — stay tuned!) can help to demystify some of the associated terms and offer a good place to start. OpsMatters has over 200 contributing organisations and more than 6,500 articles and videos (with more being added every day) to help you research the best tools and applications to fit your needs.
As always, if you have any questions or comments regarding this piece or the OpsMatters platform, please leave a comment below or reach out to us at firstname.lastname@example.org.