Back in the good old days of monolithic applications, most developers and application owners relied on tribal knowledge for what performance to expect. Although applications could be incredibly complex, the understanding of their inner workings usually resided within a relative few in the organization. Application performance was managed informally and measured casually.
How is the incident response process set up at your organization? At PagerDuty, our approach is to holistically look at your infrastructure, your customer-facing applications, and your products. We distinguish these by describing these items as “services” that roll up to and make up a “business service.” This setup allows teams to better manage these services so that when incidents do happen, responders can gain context much faster. But how?
Complex problems can often be solved with simple, practical solutions. That’s what our team at Kenshoo discovered when we realized that we needed a way to easily and proactively track specific log messages which indicate mission-critical events.
Manual ticket creation can often be a pain. It’s difficult enough handling the barrage of alerts coming in, let alone opening tickets and copy/pasting their details into these tickets. In this post – we discuss a simple way to ease this pain, and share a video on how to do it.
Alerting is fundamental to Elastic's use cases. Since Watcher (our original suite of alerting features for Elasticsearch) was introduced back in 2015, we’ve received a lot of feedback that’s helped refine our understanding of what an alerting system needs to be and what the user experience should entail.
Diamond mining is recognized as a dangerous occupation, causing serious accidents for mineworkers across the globe. Often times, these incidents turn out to be fatal because the victim didn’t receive immediate care from first responders. However, significant strides are being made to minimize the impact of these accidents by large, international organizations.
Software developers and IT professionals alike are spending more time in production environments – detecting anomalies in performance and fixing issues in real-time. Instead of writing code and deploying new updates on a monthly, quarterly or even yearly basis, software companies are now releasing multiple deployments each day.
Today, technology problems can alter the trajectory of a business. Minutes of downtime or latency (slow is the new down) cost organizations dearly in lost revenue and can jeopardize customer relationships. However, there’s an even more important consequence of technology problems than top-line risk: reduced innovation as teams are forced into reactive fire drills that take time away from product development.