Operations | Monitoring | ITSM | DevOps | Cloud

May 2023

Exploring Key Concepts of Site Reliability Engineering (SRE)

Site Reliability Engineering is a process of automating IT infrastructure functions, including system management and application monitoring using software tools. It is used by businesses to guarantee that their software applications are reliable even when they receive frequent upgrades from development teams. SRE allows engineers or operations teams to automate the activities that are traditionally performed by operations teams manually to manage production systems and handle issues.

Building A DevTools Saas Company Today! Incidentally Reliable Podcast | Zenduty

Catch Rajesh Tilwani talking about Building a DevTools SaaS Company, and everything reliability only on the Incidentally Reliable Podcast, live now on all major platforms! About Zenduty: Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle. With the Zenduty API, you can supplement and deploy Zenduty in sync with other tools and services, allowing you to create and update incidents, users, teams, services, integrations, schedules etc. and automate your workflows using simple scripts.

Insights into Observability Tools: Commercial vs. Open-Source

Observability has become a critical aspect of modern software development and operations, allowing organizations to gain insights into the health and performance of their applications and systems. One of the key decisions when implementing observability is choosing between commercial or open-source tools. We spoke to several professionals who shared their experiences and insights on this topic, shedding light on the pros and cons of each approach.

Major Incident Management with Zenduty, Grafana, Slack and Zendesk

In the current fast-paced world, businesses are seeking methods to increase their efficiency and simplify their processes. But, there are times when teams are unaware of an issue at the initial stage, leading to a bad customer experience. For example, you are a part of the Infrastructure team, where your primary responsibility is to check resources and notify when they reach their maximum capacity. Let's say due to an anomalous traffic load, our resource CPU utilization goes above 90%.

Insights on Hiring Engineers with Different Tech Stacks

In the world of software engineering, the choice of programming languages, frameworks, and technologies is constantly evolving. As a result, hiring engineers who have experience in different tech stacks has become a common practice for many companies. However, this practice also raises questions and concerns about the potential challenges and advantages of hiring engineers who work in predominantly different stacks.

How to Manage Customer Support Channels in Slack: A Step-by-Step Plan

As more and more teams transition to remote work, collaboration tools like Slack have become increasingly popular. Slack's chat-based communication platform makes it easy to keep teams connected and informed, but it can also create challenges when it comes to managing support channels. In this post, we'll explore different approaches to building a Slack-based support system and provide some tips for success.