Modern Monitoring: Infrastructure Monitoring Decoded

By Lauren Detweiler

Jan 24, 2019

6 minutes

OpsMatters

Infrastructure monitoring (or server monitoring) tools help you review and analyze your server for operations-related processes like availability, operations, performance, and security. Since your application’s health depends largely on the health of your underlying server, it is essential to use great server monitoring tools that “ensure your server machine is capable of hosting your applications.”

When used in conjunction with other monitoring data from your application, the data you receive from server monitoring can help you get a true glimpse into how your system is working at the core.

Ready to look into some tools? Below, we’ll break down the different areas within infrastructure monitoring and discuss key players in the industry.

Network Monitoring

A subset of network management, network monitoring services are used “to detect whether a given Web server is functioning and connected properly to networks worldwide.” With so many moving pieces in your network, it helps to have a tool that can spot slow or failing network components, such as crashed servers, failing routers, or failed switches.

Along with monitoring the health of a network and searching for trends, these systems also track and log network parameters, such as throughput, error rates, uptime, downtime, and use-time percentages.

SolarWinds, Auvik, and Observium are great examples of network monitoring tools that can help you monitor the health and reliability of your network.

Database Monitoring

Databases are the core foundation of many enterprises’ key business processes. While they may have been simply data storage spaces in the beginning, databases have evolved and having proper management systems and monitoring tools is of the utmost importance. Database monitoringinvolves the “tracking of database performance and resources in order to create and maintain a high performance and highly available application infrastructure.”

With applications becoming more complex and IT infrastructures diversifying, database monitoring that can resolve issues quickly and accurately is critical in helping IT troubleshoot problems before they ever reach the end user.

Database monitors like Quest Foglight, Datadog, ManageEngine Applications Manager, and NiCE IT Management Solutions help to check things like your database health and query latency. Whether your organization runs on Microsoft SQL Server, Oracle, IBM DB2, or MongoDB, it is essential that you know your system is prepared should business start to skyrocket. If your database’s workload suddenly increases drastically, will it crash? Will queries take too long and turn away end users? Monitoring your database can ensure your team finds these issues before they turn into major situations.

Cloud and DevOps Monitoring

ICYMI, the cloud is kind of a big deal. You most likely have at least some cloud assets within your various systems, and ensuring that a cloud infrastructure or platform is performing optimally is of vital importance. Cloud monitoring is “the process of reviewing, monitoring and managing the operational workflow and processes within a cloud-based IT asset or infrastructure.”

ICYMI #2, DevOps is also kind of a big deal. As an enterprise capability enabling continuous delivery and continuous deployment of software, it drastically reduces the amount of time your team needs to address customer feedback. Whereas Dev and Ops were once siloed, they now work cohesively to improve agility and speed up the entire application lifecycle.

Monitoring in the DevOps space should be proactive, not just reactive. These monitoring tools often find ways to “improve the quality of your applications before problems even show up,” and can help improve your toolchain by spotlighting areas that may need more automation.

The “new wave” of infrastructure monitoring also monitors the clouds themselves (e.g. pulling metrics from CloudWatch, Azure, or GCP and adding anomaly detection and alerting). The most modern tools also now support containers (such as Docker and Kubernetes).

Another example? Cost monitoring. To many operations managers it feels like cost management in the cloud is a never-ending exercise. However, there are tools to ease this burden: if you use something like AWS (Amazon Web Services) billing, systems such as Metricly can help.

Some of the big players in the Cloud, DevOps, and Container monitoring space include AWS CloudWatch, Google Stackdriver, Azure Monitor, Wavefront, Scout, and Sysdig.

Serverless Monitoring

“The serverless revolution” has been gaining steam amongst developers in recent years, and it shows no signs of slowing down. This means that the uptick in serverless monitoring tools is now a major trend as well. These tools monitor resources and workloads rather than servers, and complement “observability” by also monitoring the modern “serverless” platforms that are increasingly being used to execute highly distributed architectures such as microservices.

The serverless development method cuts costs drastically and makes it easier for startups to develop awesome software for a fraction of the cost. However, what needs to be altered for monitoring these systems to be a possibility? Traditional methods of monitoring will not work.

A new mindset is needed, and there are some great serverless monitoring tools ready to save the day.

Most of the early adopters are based on AWS Lambda and Azure, but other cloud providers are sure to follow in the near future. The main players in this space are Thundra, Dashbird, Epsagon, and IOPipe.

Specialised Infrastructure Monitoring

There are a few areas within infrastructure monitoring that are either 1) their own breed or 2) overlap with other categories. For the sake of being thorough, I would like to mention a few of them just briefly.

Virtual Desktop Infrastructure monitoring (VDI monitoring) is the process of “reviewing, monitoring and managing the operations of a VDI environment for the purpose of performance management, troubleshooting and/or security.” Companies use this to ensure that virtual desktops are performing as expected for their users by monitoring the operation of the desktop infrastructure as a whole. Examples of VDI providers are Citrix and VMware vCenter.

Some of the monitoring systems for these VDI providers are Goliath Technologies, eG Innovations, and Correlsense.

Microsoft Office 365 monitoring includes each of the products in the suite such as Exchange Online, SharePoint Online, and Skype for Business Online. To be effective, this needs to take into account on-premise/datacenter deployments and cloud apps as well as the networks and ISPs connecting them.

Key players include Exoprise, GSX, and NiCE IT Management.

Job monitoring is the automated monitoring of the scheduled tasks (or jobs) running within a system. For example, let’s look at Cron — a time-based job scheduler in Unix-like computer operating systems, used to schedule commands at specific times. Cron is generally used for “running scheduled backups, monitoring disk space, deleting files (for example log files) periodically which are no longer required, running system maintenance tasks and a lot more.”

Job monitoring applications are able to alert users for jobs that either do not run on schedule or run for longer than expected. Without such monitoring it is possible critical jobs could fail “silently” or not run at all, threatening the smooth running of the system.

Looking for job monitoring tools? Check out Cronitor, Healthchecks, Dead Man’s Snitch, and PushMon.

DNS monitoring: The DNS, or Domain Name System, is a service that converts user-friendly domain names like opsmatters.com into a computer-friendly IP address such as 64.233.160.0. DNS monitoring uses network monitoring tools “to test connectivity between your authoritative name servers and local recursive servers.” Monitoring these records helps you ensure that the DNS can continue properly routing traffic to your websites, services, and electronic communications.

Some of the key players in this space are DNS Spy and DNS Check.

Full Stack Monitoring

The Holy Grail in any part of the tech industry is referred to as “full stack,” aka the all-in-one combination of services that reduces the number of separate tools you have to keep track of. The benefit of a full stack solution is that it provides a single source of truth for all teams to fully monitor your environment.

Now, many companies may refer to their service as “full stack,” but not all of them actually live up to the name. According to Dynatrace, these capabilities are mandatory in a full stack monitoring solution. “If anything is missing from this list, it’s not full stack.”

End user experience

Real user — browser
Real user — mobile
Synthetic transactions

Application performance monitoring

Transaction tracing
Code-level visibility
Business transactions
SQL performance

Infrastructure monitoring

Network monitoring

Log file monitoring

Before you sign on with any full stack monitoring solution, you may want to see if it has these additional key capabilities as well. While not mandatory, Dynatrace asserts that an enterprise-caliber full stack monitoring solution should still include them.

User sessions and click-actions
Container monitoring
Microservices monitoring
Log analytics
Cloud & platform monitoring
Mainframe monitoring
3rd party performance

Finally, it should be a single all-in-one product. If multiple products are required for the coverage to be considered full-stack, then it really is not full stack at all.

Infrastructure monitoring has endless competitors and offerings to keep track of, and that is just one silo of monitoring overall. In this Modern Monitoring series, we hope to continue explaining the jargon and major players of the industry in understandable terms so that you can make the best operations decisions possible. (ICYMI: make sure to check out our first article in this series, Modern Monitoring: Application Monitoring Demystified, and stay tuned for article #3 on security monitoring: coming soon.)

OpsMatters has over 200 contributing organisations and more than 7,000 articles and videos (with more being added every day) to help you research the best tools and applications to fit your needs.

If you have any questions or comments regarding this piece or the OpsMatters platform, please leave a comment below or reach out to us at enquiries@opsmatters.com.