Operations | Monitoring | ITSM | DevOps | Cloud

Monitoring

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Service-Aware AIOps and finding answers to the question of 'what can I automate?'

Based on our interactions with buyers evaluating vendors in the AIOps market, much of what we’re hearing chimes with this quote - “What will AI allow us to automate? We'll be able to automate everything that we can describe. The problem is: it's not clear what we can describe.” Stephen Wolfram, computer scientist and physicist.

How to perform simple testing of critical network assets state

Certain changes in services or devices can often be misheeded; failure to recognize even subtle changes can later result in unpleasant consequences. Below we list several examples of such incidents; the checks described are rather lightweight and can be run frequently for critical network assets. The cases below assume that any change in current device’ state should be treated as security issue.

GrafanaCONline: The People and Business of Grafana Labs and Closing

The GrafanaCONline schedule is here: https://grafana.com/about/events/grafanacon/2020/#schedule
Presenters will take and answer questions after the presentations in our #grafanaconline channel in our public slack: https://slack.grafana.com/
Can't hear us? Can't see us? Ask for help in #grafanaconline slack channel or DM us on any social network.

Through the crisis: Nexthink customer stories (AXA IM)

For a lot of Nexthink’s technical professionals, 2020 has been one of the most challenging and rewarding periods of their professional lives. Our customer base was impacted globally by COVID-19, and we were honored to be able to support them whenever and however we could, working closely to leverage their existing investment, as well as developing and distributing tailored services for those that needed them.

Hospital's IT Almost Wastes $900k on Hardware

After working in enterprise IT for over 20 years, I’ve come to the realization that most departments suffer from the same underlying contradiction. By nature, we IT professionals are a logic-seeking, detail-oriented bunch. Much of our work can take months, if not years, of meticulous planning and research. We find comfort in gazing upon complex, multi-colored scrum boards and searching for answers to problems that any sane person would avoid.

Top 5 Network Performance Metrics | Obkio

Continuous network performance monitoring can help you ensure that your network is always performing at its highest level. The best to way to measure network performance is by measuring certain performance metrics. In this short video, learn about the top 5 network performance metrics that you should be monitoring. Continuous network performance monitoring can help you ensure that your network is always performing as its highest level.

Top Industry Performers in Unplanned Server Downtime | Q1 The Uptime Report

Can you be incompetent and still stay in business? Not as far as your web infrastructure is concerned. All the studies show that when a website is unavailable, or even just slow to load, customers go elsewhere—and often they don’t come back. After all, if you can’t keep a website up and running, why should people trust you to deliver any other product or service? So it’s worth asking: how reliable is your website relative to the top brands in your industry?

Tools for debugging apps on Google Kubernetes Engine

Editor’s note: This is a follow up to a recent post on how to use Cloud Logging with containerized applications running in Google Kubernetes Engine. In this post, we’ll focus on how DevOps teams can use Cloud Monitoring and Logging to find issues quickly. Running containerized apps on Google Kubernetes Engine (GKE) is a way for a DevOps team to focus on developing apps, rather than on the operational tasks required to run a secure, scalable and highly available Kubernetes cluster.

Tracking COVID-19 Data in South America Using Telegraf and InfluxDB

I wanted to better understand how COVID-19 has been developing in South America. As I’ve recently started playing with InfluxDB, the open source time series database, I created a dashboard of cases and deaths using InfluxData’s platform. I usually use InfluxDB, Chronograf, Grafana, Zabbix and other similar solutions to monitor services and systems. However, until this point, I hadn’t used them to process and visualize other kinds of data.