Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Operational Intelligence: 6 Steps To Get Started

The ability to make decisions quickly can mean the difference between success and stagnation. Of course, quick decisions aren’t necessarily the right decisions. The right decisions are the best informed, and the best way to get informed is through data. That’s what operational intelligence is all about. In this article, we’re diving into all things operational intelligence (OI), including key benefits, goals and how to get started.

Distributed Systems Explained

Distributed systems might be complicated…luckily, the concept is easy to understand! A distributed system is simply any environment where multiple computers or devices are working on a variety of tasks and components, all spread across a network. Components within distributed systems split up the work, coordinating efforts to complete a given job more efficiently than if only a single device ran it.

Mean Time to Repair (MTTR): Definition, Tips and Challenges

The availability and reliability of any IT service ultimately govern end-user experience and service performance, both of which have significant business impact. These two concepts — availability and reliability — are particularly relevant in the era of cloud computing, where software drives business operations, but that software is often managed and delivered as a service by third-party vendors.

Big Data Analytics: Challenges, Benefits and Best Tools to Use

Imagine yourself with a folder containing millions of gigabytes of data. If you were asked to process it with an Excel spreadsheet, you wouldn’t need to be a data expert to know that’s impossible. We refer to that amount of data as “big data”. Big data requires advanced techniques, tools, and methods beyond what regular data analytics entails, which is where big data analytics comes in.

Splunk and the Four Golden Signals

Last October, Splunk Observability Evangelist Jeremy Hicks wrote a great piece here about the Four Golden Signals of monitoring. Jeremy’s blog comes from the perspective of monitoring distributed cloud services with Splunk Observability Cloud, but the concepts of Four Golden Signals apply just as readily to monitoring traditional on-premises services and IT infrastructure.

Code, Coffee, and Unity: How a Unified Approach to Observability and Security Empowers ITOps and Engineering Teams

In today's fast-paced and ever-changing digital landscape, maintaining digital resilience has become a critical aspect of business success. It is no longer just a technical challenge but a crucial business imperative. But when observability teams work in their own silos with tools, processes, and policies, disconnected from the security teams, it becomes more challenging for companies to achieve digital resilience.

IT Orchestration vs. Automation: What's the Difference?

As modern IT systems grow more elaborate, encompassing hardware and software across hybrid environments, the prospect of managing these systems often grows beyond the capacity an IT team can handle. Automation is one great way to help. But it's important to know that not all automation is the same — chatbots are probably not the solution your team is looking for to handle these incredibly complex systems.

What Is ITOPs? IT Operations Defined

IT operations, or ITOps, refers to the processes and services administered by an organization's IT staff to its internal or external clients. Every organization that uses computers has a way of meeting the IT needs of their employees or clients, whether or not they call it ITOps. In a typical enterprise environment, however, ITOps is a distinct group within the IT department. The IT operations team plays a critical role in accomplishing business goals.

Developing the Splunk App for Anomaly Detection

Anomaly detection is one of the most common problems that Splunk users are interested in solving via machine learning. This is highly intuitive, as one of the main reasons our Splunk customers are ingesting, indexing, and searching their systems’ logs and metrics is to find problems in their systems, either before, during, or after the problem takes place. In particular, one of the types of anomaly detection that our customers are interested in is time series anomaly detection.

Introducing the Splunk App for Behavioral Profiling

Splunk is the platform for a million use cases, used to investigate operational data across security, observability, fraud, business intelligence and many other domains. But, in my time at Splunk, I’ve come to realize that all of our customers face challenges that stem from the same core problem: Within exploding data volumes, finding the anomalously behaving entities that are most threatening to the resilience of their organization.