Operations | Monitoring | ITSM | DevOps | Cloud

Log analytics and dashboarding in Datadog

Achieving optimal performance can be challenging when you depend on separate platforms to monitor service health and to manage your logs. When data about your systems is spread across multiple platforms, investigating issues—and ultimately resolving them—takes longer and requires expertise with more tools. It takes more effort to identify real customer impact, as well as to verify that your responses to an incident are having the desired effect.

Managing Python Processes with PM2

PM2 is a production-grade process manager that makes management of background process easy. In the Python world we could compare PM2 to Supervisord, but PM2 has some nifty features you might like. With PM2, rolling restarts, monitoring, checking logs and even deploying application has never been that simple. We really value CLI UX, so PM2 is really simple to use and master.

Monitoring Social Signals to Reduce Alert Fatigue With SignalFx and PagerDuty

“I need to be notified if there’s a significant event ongoing with SignalFx.” This is what I tell my team. However, despite being the CTO of a monitoring company, creating the right set of alerts for me to stay informed of incidents in progress or potential issues was harder than it seemed at first glance. Why?

Massachusetts Natural Gas Explosions - A Lesson in The Importance of Alert Automation

The pressure in the natural gas pipelines under three Massachusetts communities spiked to 12 times their normal level last week, just before the explosions and fires that destroyed dozens of homes and killed an 18-year-old man. Columbia Gas went under fire for their mismanagement of the incident. The NTSB says a Columbia Gas control room in Columbus, Ohio, registered pressures of 6 pounds per square inch last Thursday in pipelines that are intended to carry just 0.5 PSI.

Saving lives by ensuring uptime of mission-critical IT at Gift of Hope

Gift of Hope Organ & Tissue Donor Network is a non-profit organ procurement organization that coordinates organ and tissue donation and provides public education on donation in Illinois and northwest Indiana. As one of 58 OPOs that make up the nation’s donation system, Gift of Hope works with 180 hospitals and serves 12 million people in their donation service area.

Alert fatigue, part 2: alert reduction with Sensu filters & token substitution

In my previous post, I talked about the real costs of alert fatigue — the toll it can take on your engineers as well as your business — and some suggestions for rethinking alerting. In part 2 of this series, I’ll share some best practices for fine-tuning Sensu to help reduce alert fatigue.

Sentry + Microsoft Azure DevOps: Error-Tracking, Crash-Reporting, & More

Sentry is updating our key integrations for Azure DevOps (formerly VSTS). With these tightly-woven integrations, developers (like you) can unlock enhanced release tracking, informative deploy emails, and assignee suggestions for new errors. Route alerts to the right person based on the Azure DevOps commit that caused the issue, cutting remediation time to five minutes.

ManageEngine Strengthens Endpoint Security with the Launch of Browser Security Plus at London User Conference

LONDON - Sept. 18, 2018 - ManageEngine, the real-time IT management company, today announced its launch of Browser Security Plus, a browser management solution that helps organisations secure their corporate data in the cloud and protect their networks from web-based cyberattacks. Available immediately, Browser Security Plus provides organisations with a layer of management capabilities for browsers and their add-ons to maintain robust enterprise security.

Connect Insights to Real-Time Action With PagerDuty Visibility

Have you ever gotten that dreaded text from your boss: “The site is down”? Maybe you were meeting with a customer. Or having dinner with your family. Maybe you were presenting at a conference. Doesn’t matter. Whatever else you were doing, now you’re doing emergency incident communication too. You check in with your team leads and confirm there is a problem. You let your boss know the response is under way.