Operations | Monitoring | ITSM | DevOps | Cloud

Latest posts

Game Day: Stress-testing our response systems and processes

At incident.io, we deal with small incidents all the time—we auto-create them from PagerDuty on every new error, so we get several of these a day. As a team, we’ve mastered tackling these small incidents since we practice responding to them so often. However, like most companies, we’re less familiar with larger and more severe incidents—like the kind that affect our whole product, or a part of our infrastructure such as our database, or event handling.

Multi-Cloud Deployment: Deploying Consistent Infrastructure Across AWS, GCP, Azure + More

Moving to the cloud is no small feat, especially for enterprise-scale infrastructure. And just imagine the complications when you need to deploy across more than one cloud. Multi-cloud deployment is sometimes characterized by slow, error-prone workloads, lack of consistency, and inflexibility that holds users back. In this blog, we’ll provide clear definitions, use cases, benefits, challenges, and factors to consider for multi-cloud deployment.

Using Cribl Search for Anomaly Detection: Finding Statistical Outliers in Host CPU Busy Percentage

In this video, we'll demonstrate how to use Cribl Search for anomaly detection by finding statistical outliers in host CPU usage. By monitoring the "CPU Busy" metric, we can identify unusual spikes that may indicate malware penetration or high load/limiting conditions on customer-facing hosts. The best part? This simple but powerful analytic is easily adaptable to other metrics, making it a versatile tool for any data-driven organization.

Using Cribl Search for Anomaly Detection: Finding Statistical Outliers in Host CPU Busy Percentage

In this blog post, we’ll demonstrate how to use Cribl Search for anomaly detection by finding statistical outliers in host CPU usage. By monitoring the “CPU Busy” metric, we can identify unusual spikes that may indicate malware penetration or high load/limiting conditions on customer-facing hosts. The best part? This simple but powerful analytic is easily adaptable to other metrics, making it a versatile tool for any data-driven organization.

It had to be said: Teleworking promotes productivity and employee satisfaction

The formula of teleworking, extended among many companies from the phenomenon of Confinement, caused by the pandemic, has given rise to a new model of labor relations. A system already called “hybrid” because it combines teleworking with conventional physical presence.

Rethink your Cloud strategy in 2023

Gartner forecasts that worldwide end-user spending on public cloud services will grow 20.7% from $490.3 billion in 2022 to $591.8 billion in 2023. By 2026, the Public Cloud market will double its size today to $1 trillion, also predicted by Gartner. AWS, Microsoft Azure, and Google Cloud all maintained double-digit growth in Q4 2022, especially Google Cloud grew 32% to $7.32 billion. CDN giant Akamai also unveiled Akamai Connected Cloud and New Cloud Computing Services on Feb 14, 2023.

Exploring DORA: Why creating a path to resilience maturity is a critical success factor for financial services organisations

DORA (the Digital Operational Resilience Act) recently came into force and will soon impact thousands of financial services organisations across the European Union (EU). In this blog, my colleague Clara Lemaire and I share some insights about the requirements of DORA, as well as how Splunk can support financial services organisations on their resilience journey. Let’s explore DORA!

How to choose and track your security KPIs

There's no denying that Key Performance Indicators (KPIs) can be critical for any security program, and many of us are fully aware of that. Nonetheless, in practice, confusion still remains about what security KPIs are crucial to track and how to choose the right KPIs to measure and improve the robustness of your security program. Here we'll propose a few ideas about how to select and track the right KPIs for your organization.