Operations | Monitoring | ITSM | DevOps | Cloud

Splunk

Using AI & ML for Application Performance (APM)

Today, IT and site reliability engineering (SRE) teams face pressure to remediate problems faster than ever, within environments that are larger than ever, while contending with architectures that are more complex than ever. In the face of these challenges, artificial intelligence has become a must-have feature for managing complex application performance or availability problems at scale.

Cloud Log Management Strategy & Best Practices

For IT Operations and Site Reliability Engineering (SRE) teams, logging is nothing new. In fact, collecting and analyzing logs is one of the oldest cornerstones of performance management. Logs have been part and parcel of APM workflows for decades. Yet the logging strategies that worked in eras past often fall short today. That’s thanks to the advent of cloud-native computing, which has ushered in fundamental new challenges in the way teams aggregate, analyze, and manage logs.

Synthetic Monitoring for CI/CD Pipelines

For DevOps teams, delivering quality software has long required reconciling a major tension: In a perfect world, you’d catch every issue in each new release of your application before you deployed the release into production. But in the real world, doing so is tricky, not least because it’s hard to collect data about application performance before the application is actually deployed.

On-Premises Application Monitoring: An Introduction

In the present age of cloud-native everything, it can be easy to forget that some applications still run on-premises. But they do and managing the performance of on-premises apps is just as important as monitoring those that run in the cloud. With that reality in mind, here’s a primer on how to approach on-premises application performance monitoring as part of a broader cloud-native performance optimization strategy.

Distributed Tracing Best Practices for Microservices

The management of modern software environments hinges on the three so-called “pillars of observability”: logs, metrics and traces. Each of these data sources provides crucial visibility into applications and the infrastructure hosting them. For many IT operations and site reliability engineering (SRE) teams, two of these pillars — logs and metrics — are familiar enough.

SRE vs DevOps: What's The Difference?

Whether you’ve heard of or fully jumped on the DevOps or SRE bandwagon, you may have also wondered how the two relate. What’s the difference? Are they really just different ways of looking at the same problem? The term DevOps hit the market first, but SRE wasn’t too far behind. And though they have different origin stories, they both focus on autonomy, automation, and iteration. So why do these paradigms exist? And why do we need both? Let’s look at this further.

Splunk Operator 1.1.0 Released: Monitoring Console Strikes Back!

The latest version of the Splunk Operator builds upon the release we made last year with a whole host of new features and fixes. We like Kubernetes for Splunk since it allows us to automate away a lot of the Splunk Administrative toil needed to set up and run distributed environments. It also brings a resiliency and ease of scale to our heavy-lifting components like Search Heads and Indexer Clusters.

What is Splunk? (2022)

How do you thrive in today’s unpredictable world? You keep your digital systems secure and resilient. And above all, you innovate, innovate, innovate. Splunk is the extensible data platform that processes data from any cloud, any data center and any third party tool. At massive scale. We’re ready to help you accelerate your digital transformation and pave the way for incredible innovation.