Operations | Monitoring | ITSM | DevOps | Cloud

Server Monitoring and Alerts - Getting Past Common Obstacles

Keeping a server running optimally on a consistent basis involves managing multiple system elements simultaneously. Automated scripts and specialized software can handle the tasks your server needs to complete on a daily basis—but when one of these experiences an error, it can throw the entire system off.

The case for boring tech

Solutions for challenging technical problems shouldn’t result in a whole set of new ones. Sometimes, we make things harder on ourselves by choosing the new hotness to tackle technical problems (such as scaling infrastructure). We may be solving our problem in an interesting and fun way, but we bring on more complexity (and more problems) as a result of that technology choice.

Splunk Remote Work Insights - Now Available on Mobile!

The way we work has fundamentally changed in recent months due to the impact of the global COVID-19 pandemic. As more employees are working remotely, organizations are looking at new ways to ensure their workers can stay productive and secure. We released Splunk Remote Work Insights (RWI) to help IT and security teams have insight into the systems that their employees rely upon while working remotely.

How to protect your IT infrastructure from a Maze ransomware attack

Pitney Bowes, a global package delivery giant, has been hit by a second ransomware attack in less than seven months, according to ZDNet. Those responsible for the attack have released screenshots portraying directory listings from inside the company’s network. What is Maze ransomware and what makes it so special?

Kubernetes Log Management: The Basics

Log messages help us to understand data flow through applications, as well as spot when and where errors are occurring. There are a lot of resources for how to store and view logs for applications running on traditional services, but Kubernetes breaks the existing model by running many applications per server and abstracting away most of the maintenance for your applications. In this blog post, we focus on log management for applications running in Kubernetes by reviewing the following topics.

COVID-19's Impact On Infrastructure Security

It’s no secret that COVID-19 is negatively impacting businesses of all sizes in a number of ways. Some more obvious than others. Unless you are in IT, you’re probably not thinking of how COVID-19 can affect the infrastructure security of your organization, but the truth is that as businesses make the tough decision to layoff employees in order to stay in business, basic security hygiene can easily be overlooked.

Optimizing Your Alerting Escalation Policy

Reacting to alerts can be a pain, however, there are ways to be proactive and decrease frustration concerning IT Alerting. Developing an alerting strategy saves IT Operations and Development teams time, money, and eliminates notifications from low priority alerts. Keep reading for more information on routing and escalation chains, fielding alerts, and how to communicate an alerting strategy to management.

Monitoring AWS Lambda with Prometheus and Sysdig

In this post, we will show how it’s easily possible to monitor AWS Lambda with Sysdig Monitor. By leveraging existing Prometheus ingestion with Sysdig, you will be able to monitor serverless services with a single-pane-of-glass approach, giving you the confidence to run these services in production.

Prometheus Metric Federation with Thanos

Prometheus is a CNCF graduated project for monitoring and alerting. It is one of the most widely used monitoring and alerting tools in the Kubernetes ecosystem. Rancher users can leverage Prometheus quickly by using the built-in monitoring stack. Prometheus stores its metrics as a time series database on the local disk. Prometheus local storage is limited by the size of the disk and amount of metrics it can retain.