Operations | Monitoring | ITSM | DevOps | Cloud

"Why Are My Tests So Slow?" A List of Likely Suspects, Anti-Patterns, and Unresolved Personal Trauma

“Lead time to deploy” means the interval from when the code gets written to when it’s been deployed to production. It has also been described as “how long it takes you to run CI/CD.” How important is it? It’s nigh-on impossible to have a high-performing team if you have a long lead time, and shortening your lead time makes your team perform better, both directly and indirectly.

Welcome to the Future of Data Search & Exploration

You have more data coming at you than ever before. Over the next five years, the total amount of digital data is going to be more than twice the amount of data created since the advent of digital storage. With the success of your company often determined by how you anticipate and respond to threats – and leverage meaningful insights – you need the ability to quickly search and find insights in your data, despite this increasing deluge of information.

Automating Common Diagnostics for Kubernetes, Linux, and other Common Components

This is the second piece in a series about automated diagnostics, a common use case for the PagerDuty Process Automation portfolio. In the last piece, we talked about the basics around automated diagnostics and how teams can use the solution to reduce escalations to specialists and empower responders to take action faster. In this blog, we’re going to talk about some basic diagnostics examples for components that are most relevant to our users.

Best Practices to Maximize Cloud ROI

As businesses shift to a digital-first environment, cloud computing will play a dominant role in delivering greater flexibility and faster innovation. In a recent report by Deloitte, nearly 90% of US-based senior decision makers proclaim cloud to be the cornerstone of their digital strategy. Covid accelerated cloud migration initiatives, with no signs of slowing down. Gartner forecasts worldwide end-user spend on public cloud services will grow by 20.4% in 2022 to a total of $494.7 billion.

Network as Code Explained: How Ansible & Automation Support Agile Infrastructure

When considering application source code, the way you maintain consistency throughout environments is mostly straightforward. You write application code, commit it to source control, and then build, test and deploy via a CI/CD pipeline. Since the application is defined by the source code living in source control, the build will be identical in all environments to which it’s deployed. But what about the infrastructure on which an application runs?

Exporting Splunk Data at Scale: See a Need, Fill a Need

The Core Splunk platform is rightfully recognized as having sparked the log analytics revolution when viewed through the lenses of ingest, search speed, scale, and usability. Their original approach leveraged a MapReduce approach, and it still stores the ingested data on disk in a collection of flat files organized as “buckets.” These immutable buckets are not human-readable and largely consist of the original raw data, indexes (.tsidx files), and a bit of metadata.

Retrace Power User Tips and Tricks - Advanced Metrics and Reporting

Monitoring and reporting on your most important business metrics is a fundamental part of any APM or ITIM solution. Our Retrace Power User Tips and Tricks series has already looked at “Error and Log Management” functionalities. We’ve discussed useful, advanced features for monitoring app performance in our “Extending APM” post. In this latest edition, let’s take a look at how power users capture advanced server and application metrics.