Operations | Monitoring | ITSM | DevOps | Cloud

Latest Blogs

Uptime Monitoring: A One-week Project, a Decade In the Making

We recently released uptime monitoring, a pretty big addition to our set of features. Our customers have often requested it, and it was a logical next step for us to add uptime monitoring to our app. In today’s post, we’ll explain how we went from considering uptime monitoring impossible to build, to building it in a week. We’ll break down how seemingly over-engineering can really pay off in the end.

Investigating Network Anomalies - A sample workflow

Network anomalies vary in nature. While some of them are easy to understand at first sight, there are anomalies that require investigation before a resolution can be made. The MITRE ATT&CK framework introduced in Kemp Flowmon ADS 11.3 streamlines the analysis process and gives security analyst additional insight by leveraging knowledge of adversaries' techniques explaining network anomalies via the ATT&CK framework point of view.

Event-driven autoscaling in Kubernetes

In modern cloud architecture applications are broken down into independent building blocks usually as microservices. These microservices allow teams to be more agile and deploy faster. Microservices form distributed systems in which communication between them is critical in order to create the unified system. A good practice for such communication is to implement an event-driven architecture.

How to build a team that demands metrics

When we talk about metrics in software delivery, a lot of developers think of execution metrics — things like throughput, delivery and number of deploys. But in reality, those metrics don’t motivate anyone — at least not without connecting them to a bigger picture. I’ve worked in software for 23 years. I’m a three-time founder and four-time CTO, responsible for leading a 200+ member distributed engineering organization.

Error Budgets Explained (And How to Make One for Your Team)

Wondering what error budgets (EBs) are and how they are useful? We explain what they are, how they are defined, and how they can help your team. An error budget is the amount of acceptable unreliability a service can have before customer happiness is impacted. If a service is well within its budget, the developers can take more risks in their releases. If not, developers need to make safer choices.

Mattermost plugins: The server side

In the first article in this series, we explained how to set up your developer environment to begin creating Mattermost plugins. In the second, we examined the structure of server-side and web app plugins and how to deploy them. Now, it’s time to dive deeper into the server side of the application, which is written in Golang.

How to Troubleshoot Network Issues-Guide and Recommended Tools

You’re going to run into network issues during normal operations—in part because so many kinds of errors can cause noticeable problems in your network. Identifying the root cause of each issue is critical and to do so successfully, you want to make sure you have the right network troubleshooting solutions in your arsenal before wading in. This helps ensure you have a clear understanding of the scope of the problem before you attempt any network troubleshooting steps.

Do You Need an Alert for Your Alerts? Building Smarter Monitoring Systems

Traditional systems monitoring solutions poll various counters (typically simple network management protocol [SNMP]), pull in data and react to it. If an issue requiring attention is found, an event is triggered—perhaps an email to an administrator or the firing of an alert. The admin subsequently responds as needed. This centralized pull approach is resource-intensive. Due to the pull nature of the requests, it results in data gaps and data that may not be granular enough.