Operations | Monitoring | ITSM | DevOps | Cloud

Efficiently retrieve old logs with Datadog's Log Rehydration

Logs provide invaluable information about issues you need to troubleshoot. In some circumstances, that may mean that you have to look back at old logs. For example, you may be running a security audit and need to analyze months-old HTTP request logs for a list of specific IP addresses over a period of time. Or you might need to investigate why a scheduled service never occurred, or run an exhaustive postmortem on incidents that happened over a couple months but that you suspect are related.

How and When to Inform Website Users of a Data Breach

Data breaches don’t wait for a convenient time to strike. They sometimes take months to uncover. They are complicated beasts, but once you’ve uncovered them some complex rules kick in that determine when you need to report the breach. Reporting a breach can be a daunting prospect. You’ll need to make a public statement in most cases, you may need to report the breach, and there may be legal requirements.

Behind the Grafana UX: Redesigning the Thresholds Editor

As part of building the new Gauge panel in React, we also wanted to update the panel controls, especially the thresholds control. A threshold in the context of Grafana is simply a value that, when exceeded, a condition occurs. An example would be a single stat panel with a green background that changes its background color to red when a threshold is breached.

How to Secure a Kubernetes Cluster

Kubernetes is one of the most advanced orchestration tools that currently exists in the software world. It provides out-of-the-box automation for environment maintenance and simplifies deployment and upgrade processes. It has different implementation types (on-premise, cloud-managed, hybrid, and more), multiple open-source supporting tools, and supports a wide range of configuration options.

Achieve better AWS security with just 10 Cloudtrail logs alerts

CloudTrail logs track actions taken by a user, role, or an AWS service, whether taken through the AWS console or API operations. In contrast to on-premise-infrastructure where something as important as network flow monitoring (Netflow logs) could take weeks or months to get off the ground, AWS has the ability to track flow logs with a few clicks at relatively low cost.

Summit Day Two: New Integrations and Developer Platform to Bring Real-Time Work to More People

Yesterday, we kicked off PagerDuty Summit by launching new features that support the themes of Visibility and Intelligence. If you missed the keynotes or want to know more, check out this blog post. Today, we are making several announcements around two other themes that our CEO Jennifer Tejada touched on during her keynote yesterday: Platform and People. In fact, these themes are so closely related that we refer to them as one—that PagerDuty is a platform for people to do real-time work.

CIO Dive Playbook: AIOps Brings Calm to Overwhelmed IT Ops Teams

Much has been said about how Artificial Intelligence (AI) is already proving its ability to transform business, as well as the way most people live. In fact, according to Accenture’s “ExplAIned: A Guide for Executives,” AI is on par with such life-changing innovations as electricity and the internal combustion engine, and is no longer science fiction.

Slack Loses $8M to Outages

On July 22, 2019, Slack was in the middle of deploying an update to their desktop app. The update was supposed to decrease memory consumption and increase load time, but instead the company suffered a significant, widespread outage on a global scale. After approximately 40 minutes of downtime, the service was back up. But in the meantime, the company whose motto is ‘where work happens’ essentially stopped working.