Operations | Monitoring | ITSM | DevOps | Cloud

Monitoring

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

When It Comes to Security of the Platform, We Mean Business. Here's How.

At Splunk, we understand that a secure platform is a trustworthy one. We strive to implement a protected foundation for our customers to turn data into action, and part of that effort is giving you more frequent insight into the security enhancements that we’ve made to the platform. In this blog series, we’ll share the latest enhancements to Splunk Enterprise, review our security features in depth, and explain why these updates are important for you and your organization.

5 Great Reasons to Store and Analyze Centralized Logs

Whether you’re trying to troubleshoot a problem, defend against attacks, or simply optimize your environment, event logs are your best source of information. More than that, not logging or ignoring your logs is like not checking your blindspot when you’re changing lanes—sooner or later you’re going to seriously regret it because the effects will be disastrous.

Introducing Prometheus-style alerting for Grafana Cloud

Hi! My name’s Richard Lam, and I’m the new product manager for Grafana Cloud. I’m really excited for my first contribution to this community, both so I can introduce myself to you all, and so I can highlight an awesome new Grafana Cloud feature that’s coming your way! Happy reading, and hopefully this is just the start of many more communications from me.

Server Monitoring with OpsRamp

For decades, compute or server infrastructure has been the backbone of the IT world. Compute has gradually evolved from on-premise hardware to programmable compute in the form of software containers. Technology operators need to constantly monitor the performance of their Windows, Linux, and container infrastructure so that they can optimize their compute environments to match workload demands.

TrackJS for Node

TrackJS error monitoring, on your servers. We’re thrilled to announce official support for Node environments and the 1.0.0 release of our Node agent. We’ve actually had Node since sometime last year, but we’re finally formalizing it as a first-class citizen and fully-supported part of TrackJS! Here are some of the cool things you can do with TrackJS for Node.

TL;DR InfluxDB Tech Tips - How to Extract Values, Visualize Scalars, and Perform Custom Aggregations with Flux and InfluxDB

In this post, we learn how to use the reduce(), findColumn(), and findRecord() Flux functions to perform custom aggregations with InfluxDB. This TL;DR assumes that you have either registered for an InfluxDB Cloud account – registering for a free account is the easiest way to get started with InfluxDB – or installed InfluxDB 2.0 OSS. In order to easily demonstrate how these functions work, let’s use the array.from() function to build an ad hoc table to use in the query.

Using Machine Learning for Root Cause Analysis

From a security breach to a complete system outage, when an incident occurs and your network or service is impacted, it’s typically the result of a chain of events. A problem with one service has impacted another service, and so on until finally, you’re facing a problem that’s compromising availability and damaging your customer experience. In the event of a serious incident, your team’s immediate response is to focus on identifying the root cause and restoring service.

7 Wins That Helped IT Save its Work-From-Anywhere Program

I think I have been reading way too many “doom and gloom” articles this year about IT. For many companies, the switch to a prolonged work-from-anywhere (WFA) model has exposed serious cracks within their IT infrastructure. To be honest though, the cards have been stacked against IT for some time and 2020 was just the tipping point. Employees often resist new work technologies and there’s mounting evidence to prove that IT tends to overestimate how well their services are received.

How To Succeed When Adopting A Multi Cloud Environment

Today, a vast majority of companies are working with multiple cloud providers. But moving IT operations to the cloud has significant consequences they need to deal with. Discover how Broadcom helps customers to manage critical workloads in multi-cloud environments, simplifying and accelerating the deployment of new business services.

How Automation Helps The Site Reliability Engineer

Automation has been with us for decades now and with years of experience and experimentation we are arriving at a best practice known as site reliability engineering. Site reliability engineering seeks to manage the risk imposed from multiple agile changes to protect business revenues and sustain positive customer experiences.