Operations | Monitoring | ITSM | DevOps | Cloud

Blog

We can now notify you through PagerDuty

When we detect a problem with your site, we can notify you via mail, a Slack message, a webhook, or any of our other notifications channels. This is enough for most of our users, but those who work in larger teams often need more flexibility. Today, we are launching our PagerDuty integration. PagerDuty is a cloud-based incident management platform that helps organizations improve operational reliability by providing real-time alerts, on-call scheduling, and incident tracking.

Common Causes of Outages and Tips to Prevent Them

Recently, Ron DeSantis used Twitter Spaces to launch his presidential campaign. At least, he tried to. As you may have heard, the event was marred with technical difficulties, resulting in false starts, confused hosts, glitches, echoes, and the “melting” of servers. Of the more than 600,000 Twitter users who initially tuned in, less than half remained by the time they relaunched the event using a different account.

On-call management on the go: Introducing the Grafana OnCall mobile app

We’ve all been there: Sleeping peacefully in bed over the weekend, finally getting rest after a long week at your computer making AI-generated memes writing code. Then at 3 a.m., your phone makes an ungodly sound, and you wake up startled, frazzled, and confused. When you finally type in your passcode to unlock your phone (because facial recognition doesn’t register your bleary-eyed, squinty face), you see an alert, and all dreams of sleep are over.

ITIL Service Transition, Explained

ITIL Service transition is the third stage of the service lifecycle. It involves transitioning the services that were created and developed in strategy and design – first and second stage of the cycle – into the production environment effectively, efficiently, and safely. This stage deals with everything from preparing for change to documenting the components of the asset that make up the service to creating knowledge articles for support teams and end users.

SD-WAN: Monitoring Blind Spots, and What to Do About Them

The adoption of software-defined wide area network (SD-WAN) technologies continues to pick up pace. By employing SD-WAN technologies, organizations have the potential to realize a range of advantages. Teams can achieve better performance while using lower cost, using commercially-available technologies. For example, teams can use public internet services rather than more expensive private WAN technologies, such as MPLS.

How Honeycomb Monitors Kubernetes

While Kubernetes comes with a number of benefits, it’s yet another piece of infrastructure that needs to be managed. Here, I’ll talk about three interesting ways that Honeycomb uses Honeycomb to get insight into our Kubernetes clusters. It’s worth calling out that we at Honeycomb use Amazon EKS to manage the control plane of our cluster, so this document will focus on monitoring Kubernetes as a consumer of a managed service.

A Guide To Ensuring Profitability For Your SaaS Company

This isn’t the “good old days” anymore. It used to be that cloud companies could pursue growth at all costs and still garner the support of venture capitalists and other investors. In 2023, however, with costs rising and margins getting narrower every day, investors are now favoring companies that can ensure long-term profitability rather than just growth. Now, more than ever before, it’s crucial to drive toward SaaS profitability as the ultimate goal.

Logic App Best Practices, Tips, and Tricks: #30 How to validate if a JSON structure is an Array or a single object

In the last two posts, we addressed validating whether a string or an array was null or empty. Today we will continue on the same topic, validations, and I will speak about another good Best practice, Tips, and Tricks that you must consider while designing your business processes (Logic Apps): How to validate if a JSON structure is an Array or a single object.

Manage your incidents with the new ilert integration

Hello, SREs, DevOps engineers, and developers! We have some news! At Checkly, we understand the importance of proactive monitoring and quick incident resolution in maintaining your apps’ reliability and performance. Have you heard of ilert? ilert is the incident response platform made for DevOps teams. It helps organizations efficiently respond to, communicate and resolve incidents in real-time by offering advanced alerting, on-call management, and status pages.