Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

The 2022 Managed Kubernetes Showdown: GKE vs AKS vs EKS

Kubernetes may provide an abundance of benefits, but those who are using it may be well aware that it often requires quite a bit (or even a lot!) of effort and skill to run the platform independently. So – rather than having to put up with it on their own, organizations are able to pay for a managed Kubernetes service instead. This is where Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS), and Amazon Elastic Kubernetes Service (EKS) come in.

A new channel per incident - helpful or harmful?

I caught the tail-end of a Twitter thread the other day which centred around the use of Slack channels for incidents, and whether creating a new channel for each new incident is helpful or harmful. It turns out this is a much more evocative subject than I thought, and since I have opinions I thought I’d share them!

Uptime + Squadcast Integration: Routing Alerts Made Easy

Uptime is a site monitoring solution used to reach various endpoints & notify users via push notifications when downtime is detected. It collects and stores downtime & response time data & which is then made available as reports to the users. If you use Uptime for your monitoring needs, you can now integrate it with Squadcast to route detailed alerts from Uptime to the right users in Squadcast. The below steps will help you set up Uptime and Squadcast integration.

See the big picture with the Service Dependency Graph

Understanding the impact and scope of an incident when degradation occurs is critical for returning your service online. This requires modeling the many downstream and upstream relationships between your services. Our new Service Dependency Graph provides a shortcut – a way to surface dependencies quickly, understand the relationship between services, and determine the scope or impact of an incident.

geeks+gurus: Rise of SRE - Survey Insights

Site Reliability Engineering (SRE) continues to rise in adoption. Teams that leverage SRE “good” practices are benefitting, individuals are excited about their jobs and IT and the business are collaborating more efficiently. Sounds interesting? We hope so, as there are a few key insights which you should know. Join us to learn more about the exciting journey of SRE. We have partnered with DevOps Institute (DOI) to conduct their inaugural 2022 Global SRE Pulse Survey, and we are excited to share the pulse on SRE.

Top 5 IoT challenges and how to solve them

There are a number of challenges to surmount for enterprises in the IoT sector, including having a short time to market, airtight security, a versatile update mechanism for hardware and software and mastering device management. The more planning and practical steps that are taken to address key considerations, the faster an IoT project can get to market and make an impact on the world.

What is Chaos Engineering? A Guide on Its History, Key Principles, and Benefits

Many organizations invest in high availability and disaster recovery for their key applications. Too many of these organizations, however, forego the most important aspect of this process—testing the failover process regularly. Whether gripped by the fear of downtime or dreaded DNS problems, development teams are frequently hesitant to test out what they’ve built in the real world.