Operations | Monitoring | ITSM | DevOps | Cloud

What the Big Brother Approach to IT Monitoring and Incident Management May Be Missing

We asked in a recent poll which popular TV show your IT team resembles the most. Big Brother came out on top, with almost 40% of respondents saying that their incident resolution process most resembled this show. Would you compare your incident management process to an episode of Big Brother? If so, it's likely that your IT environment is highly monitored, but incidents still seem to slip through the cracks.

SLA vs SLI vs SLO: Know the differences between them.

SLA basically means a Service Level Agreement. It’s a formal agreement between you and your customer. It basically describes the reliability of your product/service so you can have a formal agreement which basically says our product will be online 99 percent of the time annually and if we fail to achieve that objective we will give 30% of your annual license fee back. SLA’s also include penalties in the contract.

The U.S. COVID Vaccine Distribution Plan: Challenges and Solutions

As coronavirus (COVID-19) continues to spread and new virus strains emerge, the public is frantically looking for answers regarding the U.S. government’s vaccine distribution plan. A sound vaccine distribution plan is especially crucial in times like these. All U.S. states, stretching from both coasts, are experiencing a vast number of COVID-related deaths and hospitalizations. The dire situation underscores the importance of having an effective, accelerated vaccine delivery process.

New Feature: Incident types

Incidents are inevitable, and the reality is some of them are inevitably going to repeat themselves. FireHydrant has always strived to make the entire incident response lifecycle smooth, but up until today, common incident types were slightly burdensome for our customers. We decided it was time to help people make it easy to declare incidents using easy-to-use templates, which we’re deeming Incident types.

OnPage Corporation Continues To Grow Despite the 2020 Pandemic

WALTHAM, Mass., Jan. 25, 2021 — OnPage Corporation, a Boston-based incident management and pager replacement company, today unveiled its fiscal 2020 year in review. OnPage delivered another year of strong results considering the uncertain situation brought upon the world with COVID-19. Past year results were driven by current customers that rely on OnPage for critical notifications and had to enlarge their deployment.

How to build your own incident management process

IT incident management is a fundamental operational process designed to ensure rapid service restoration. This process is typically assigned to the help desk but is also very much entrenched in the day-to-day of DevOps. When incident management goes right, service is restored quickly and the impact on productivity, continuity, and customer satisfaction is minimal.

7 Tips On Building And Maintaining An SRE Team In Your Company

In today's "always on" world, Reliability is a primary business KPI. Plant the culture of Reliability by implementing these 7 simple tips to build a solid SRE team in your organization. Many of today’s hottest jobs didn’t exist at the turn of the millennium. Social media managers, data scientists, and growth hackers were never heard of before. Another relatively new job role in demand is that of a Site Reliability Engineer or SRE. The profession is quite new.

The Key Differences between SLI, SLO, and SLA in SRE

To incentivize reliability in your platform, there should be shared goals across your team to measure & quantify the capabilities of your product/service along with customer experience. Define the path of "Always-On" services by understanding few key SRE fundamentals and their implications - SLIs, SLOs & SLA. Framing SRE metrics for building or scaling a product is quite a daunting task.

Why AlertOps is the best PagerDuty alternative

We will compare AlertOps to PagerDuty in 3 broad areas: On-call management Whether your on-call management needs are basic or complex, AlertOps has a solution for you. Creating on-call schedules is simple whether there one person on-call, two or more people on-call, or even multiple teams on-call. Escalations Automatic escalations based on your on-call schedules. Expand the possibilities with Workflows and Escalation Rule.

4 Essential Types of MSP Tools (in 2021)

Managed service providers (MSPs) need the right tools to get the job done quickly and securely. MSP tools dictate control over everything from virtual machine (VM) management and database administration to application and server monitoring. They can also help MSPs oversee IT infrastructure. MSP tools are valuable, but not all tools are created equal.

2021 is the Year of Reliability

There’s no better time than now to dedicate effort to reliable software. If it wasn’t apparent before, this past year has made it more evident than ever: People expect their software tools to work every time, all the time. The shift in the way end-users think about software was as inevitable as our daily applications entered our lives, almost like water and electricity entered our homes.

OnPage Recognized in Gartner's Market Guide for Emergency Mass Notification Solutions

Gartner’s Market Guide for Emergency Mass Notification Solutions (EMNS) is a trusted report for security and risk management leaders. It provides insight into effective crisis communication procedures and identifies solutions that help perfect emergency management plans. The EMNS Market Guide has a large, loyal readership in several industries including, state and local government, healthcare, IT support and higher education.

Best Practices for Incident Management: A Checklist

If productivity is the engine that helps optimize how a business operates then being proactive is the oil and knowing how to effectively maintain productivity is regularly checking and replacing said oil. Whenever a service outage occurs it throws a wrench into the whole process and can put an entire organization in flux, mainly because the outage.

The True Cost of Building your Own Incident Management System (IMS)

Is your organization on the lookout for an incident management tool? If yes, you may wonder- am I better off building my own? Our latest blog outlines some of the key factors to consider while choosing whether to build or buy an incident management software.

Incident Communications With Alina Anderson

Incidents happen. They’re disruptive, they can be stressful, and if they aren’t managed well, they can cause chaos on your team. How your team manages incidents is only half the battle. How you let other stakeholders know what is going on is the other half. Alina Anderson from Smartsheet joined the Community team in our booth this year at PagerDuty Summit to talk about Incident Communications, and we’ve shared that conversation as an episode of our Page It to the Limit podcast.

What's in store for IT Ops in 2021? Top execs from leading enterprises share their predictions

2020 is (finally) over, and it’s safe to say that this very challenging year taught us once again that (as the old Danish proverb says) it’s difficult making predictions, especially about the future. Who would have imagined in January 2020 that we would find ourselves where we are today… And yet, as Tim Harford once wrote in the Financial Times, predictions are like Pringles: nobody thinks that there’s any great virtue in them but we find them hard to resist.

A look back at 2020

2020 was, needless to say, not the best. Looking on the brighter side, in December, FireHydrant turned 2, and in spite of it all, we grew quite a bit. We raised our $8M Series A in May, our team grew nearly 4x in size, added some amazing features such as making FireHydrant Runbooks even more powerful with conditions, and great integrations, which you can find here. But even better, we got to work with all of you!

Building and Scaling Your SRE Team

Building Site Reliability Engineering (SRE) teams is hard! There are so many articles and explanations of what SRE means, it’s easy to get lost. Going beyond understanding what the individual SRE role is into building and scaling a team of SREs is more of a challenge. It’s important to find the right information that will help you take your SRE team to the next level.

5 Steps to Building a Robust Incident Response Plan for your MSP

Today’s organizations face ransomware, malware, and other cyber attacks, and managed service providers (MSPs) need an incident response plan (or “IRP”) to mitigate against these threats. In a recent survey of 200 MSPs, 74% of respondents said they have suffered a cyber attack, and 83% noted their small and medium-sized business (SMB) customers experienced one as well. Yet, with an incident response plan (IRP), MSPs can protect themselves and their customers against cyber attacks.

Seamless CMDB Provisioning Gives Responders the Data They Need to Respond Faster

We knew that the most loved feature in our ServiceNow 7.0 release would be the CMDB features. And in our ServiceNow 7.5 release (available now), we’ve expanded our CMDB capabilities even further—based on your feedback—around the importance of reducing the effort it takes to re-create the same services within PagerDuty.

2020 Year in Review: OnPage Continues to Grow Despite the Pandemic

2020 was an unpredictable year that presented several challenges, such as the outbreak of the coronavirus (COVID-19) pandemic. As part of the “new normal,” the world has adopted infection prevention procedures. The 2020 calendar year was defined by face coverings, constant sanitization and physical distancing. At its core, the year was an exhausting, surreal 12-month period for many.

Better incident management while working remotely: The Squadcast way

As the pandemic wears on, remote incident management has become the norm worldwide for businesses. Here we share some best practices that helped us to address remote incidents and make on-call less stressful. With the onset of remote work due to Covid-19, remote incident management has become the norm for businesses worldwide. Organisations that were earlier used to having war rooms now find themselves having to coordinate teams through Slack, MS Teams or other collaboration tools.

Four key metrics for responding to IT incidents and failures

If you’re a veteran in this space, you probably understand the many incident response metrics and concepts, along with the many (at times exasperating) acronyms. For those new to the space, or even those with years of experience, the terminology is often overwhelming. If you’re one of those people who’s struggling to navigate through the world of DevOps metrics, we’ve created this article for you.

G2 Recognizes Squadcast as Momentum Leader in Incident Management

We are thrilled to begin the year on a high note! Squadcast has been awarded in the Incident management and IT Alerting category in G2's Winter Report 2021 for below categories. ‍‍ “We are honoured to be recognised as a Momentum Leader in the IT Incident management category by G2. We have always strived to create the fastest and easiest Incident Response experience for Engineering and DevOps teams that enables organisations to better monitor their IT infrastructure and applications.

Leverage MSP Automation to Drive Profitability (in 2021)

Managed service providers (MSPs) require automation, so they can deliver fast, efficient IT services that meet customer expectations. But, MSP automation can be difficult — and the longer it takes an MSP to automate IT service management (ITSM), the further it falls behind its competitors. Today’s MSPs face several challenges relative to automation, including: 1. Complex Scripting Language IT technicians may need to learn a complex scripting language to leverage an ITSM platform.

Incident Ready: How to Chaos Engineer Your Incident Response Process - FireHydrant

We’re pretty sure using a real incident to test a new response process is not the best idea. So, how do you test your process ahead of time? In this video, FireHydrant CEO, Robert Ross, will share how FireHydrant customers leverage best practices to break, mitigate, resolve, and fireproof incident processes. We’ll show you how to use chaos engineering philosophies to stress test 3 critical parts of a great process.
Sponsored Post

Boost IT Savings with CloudReady and Incident Workflow

Companies love data. Aggregating data from multiple sources makes decision-making easier and brings a new depth of the conversation to business meetings. But all of this is at the management level. IT managers and administrators also search for data from multiple sources to ensure that the ecosystem works. Companies demand the continued maintenance and availability of mission-critical applications. Without a framework or incident workflow, revenue can suffer, and customers churn if the company does not proactively address problems that arise in its infrastructure.

Segment and SIGNL4: Know your Customer's Actions, Anywhere and Anytime

You have a web site, app, online shop, or SaaS offering? Then you have plenty of user actions. That can be visiting a certain page, signing up for a service or canceling a subscription. Wouldn’t it be great to know in real time when an important customer action takes place? This would allow you sales, customer service or technical teams to act immediately no matter where they are.