Operations | Monitoring | ITSM | DevOps | Cloud

How to create an on-call schedule that doesn't suck.

A lot of tech companies struggle with creating an effective and efficient on-call schedule internally for their product and service, this results in much longer downtimes when something goes wrong. They often over-burden their team members with repeated on-call duty which results in team member fatigue. Here’s how to create an on-call schedule that your team might love.

Detecting CVE-2020-0601 Exploitation Attempts With Wire & Log Data

Editor’s note: CVE-2020-0601, unsurprisingly, has created a great deal of interest and concern. There is so much going on that we could not adequately provide a full accounting in a single blog post! This post focuses on detection of the vulnerability based on network logs, specifically Zeek as well as Endpoint. If you are collecting vulnerability scan data and need to keep an eye on your inventory of systems that are at risk, then check out Anthony Perez’s blog.

The Future of Cortex: Into the Next Decade

The Cortex project, a horizontally scalable Prometheus implementation and CNCF project, is more than three years old and shows no sign of slowing down. Right now, there are a lot of things going on in Cortex, but sometimes it’s not clear why we’re doing them. So I want to provide some clarity for both the Cortex community – and the wider Prometheus community – regarding our intentions, especially with regards to the Thanos Project.

LogicMonitor and Unomaly can pre-empt business problems with AIOps

Curious about AIOps these days? You’re not alone. AIOps (Artificial Intelligence for IT Operations) is all about analyzing and automating your IT operations using artificial intelligence and machine learning algorithms. These operations include end-to-end workflows that bring monitoring, analytics, incident management, and automation systems together with a common goal of optimizing and automating operational tasks.

Observability vs Monitoring

Observability is a hot Subject right now, stirring a great deal of debate among IT admins. This report brings some clarity and will shed some light on the topic – “What is the difference between monitoring & observability?”. Enterprise IT is complex as IT infrastructure solutions are delivered from enormous datacenters located at remote locations.

Automating Sentry Releases with CircleCI

Continuous integration tools like CircleCI let developers automate builds and tests, so that teams can merge changes into their codebase quickly and frequently. In this article, we’ll take a look at how to combine Sentry’s command line interface with CircleCI to automatically create Sentry releases. This will unlock some of our best features, like identifying suspect commits that likely introduced new errors, applying source maps to see the original source code within Sentry, and more.

Introducing Netdata's step-by-step tutorial

Health monitoring and performance troubleshooting aren’t easy. That’s exactly why we’re building Netdata, to democratize monitoring and make it accessible to anyone interested in learning more about their systems and applications. Of course, teaching a complicated topic isn’t easy either. Until recently, the only resource to help new users after installation has been our getting started guide.

Attention leaders: there's something your team isn't telling you

You’re a manager. You’re not one to pat yourself on the back, but you’ve got to admit: your team is doing pretty darn well. At least, that’s how it seems to you. Big projects are being checked off the list, deadlines are being met. You haven’t had to referee any heated conflicts, your inbox has no complaints. You’re good! But is that the whole story? Our research suggests your direct reports might describe the situation a wee bit differently.