VictorOps

victorops

Identifying the Best IT Monitoring Software of 2020

What’s the best IT monitoring software of 2020? It’s an interesting question. IT monitoring could mean a whole lot of things to a whole lot of different organizations. To some teams, a robust application performance monitoring (APM) solution might be more important than their network performance monitoring (NPM) tools.

victorops

The War Room for Major Incident Response and Remediation

Critical application errors and infrastructure incidents are bound to happen. Highly interconnected systems, microservice architectures and containers mean developers, sysadmins, technical support and IT security analysts can’t simply work in silos. A simple alert regarding client-side latency may not only affect frontend development teams.

victorops

When and Why to Adopt Feature Flags

What if there was a way to deploy a new feature into production but not actually turn it on until you’re ready? Well, there is. These tools are called feature flags (or feature toggles or flippers, depending on whom you ask). Feature flags are a powerful way to gain fine-tuned control over which features are enabled within a software deployment. With that being said, feature flags aren’t the right solution in all cases.

victorops

The Comprehensive Site Reliability Engineering (SRE) PDF

Site reliability engineering (SRE) isn’t a new concept or role. But, it isn’t old either. With the growth in Agile, DevOps practices, and remote engineering teams, as well as changes to the traditional NOC and SOC models, SRE is helping fill a void of software engineers dedicated to IT infrastructure and application resilience.

victorops

The Importance of Log Monitoring for Incident Response

One aspect critical to a development organization’s application quality is the implementation of a high-functioning incident response strategy. That being said, achieving efficiency in the realm of incident response hinges on an organization’s ability to effectively use all information at their disposal. As anyone who has ever had the pleasure of troubleshooting application and system problems will tell you, log files are one source of information that can’t be overlooked.

victorops

Key Methods for Optimizing the Software Testing Lifecycle

Software testing, both automated and manual, is essential for QA, DevOps and IT practitioners looking to maintain CI/CD pipelines without hurting the reliability of their underlying applications and services. Testing can be incorporated across all aspects of software development and delivery, not simply maintained in a silo by your QA and testing team. Testing is like overall service reliability in DevOps – everyone is accountable for its success.

victorops

VictorOps Release Notes

The new War Room interface pops out from the incident pane, allowing teams to better digest incident context and mobilize on-call teams. From this view, incident responders can track critical incident metrics such as time to detection/acknowledgment and find related integrations such as ServiceNow tickets and Slack channels associated with the incident.

victorops

Using Data and Automation to Help Engineering Teams Avoid Coronavirus

Nothing seems to unite humans more than this widespread virus epidemic. COVID-19, the current coronavirus, is top of mind for everyone right now – and it’s something we wish we didn’t have to think about. As cases grow, people are already thinking about ways to keep themselves, coworkers, friends and family out of harm’s way. The easiest answer is to limit the amount of travel and human contact. But, as they say in show business, the show must go on.

victorops

Using Data and Automation to Help Engineering Teams Work Remotely

Working remotely is at the top of everyone’s mind right now. But, working remotely across distributed teams and engineering disciplines is easier than it’s ever been. And, as they say in show business, the show must go on. Organizations need to strongly consider the power of automation and data to help engineering teams collaborate and improve transparency for remote workers.

victorops

5 Incident Response Metrics and How to Use Them

Two categories a software organization should always strive to improve in are application quality and incident response. Data analysis is one manner in which such an organization can improve the efficiency of incident management and overall application quality. However, the questions still remain – which metrics should be collected and how can analysis of these metrics facilitate these improvements? Read on to hear about five key incident response related metrics.