Zenduty

Bangalore, India
2019
  |  By Shubham Srivastava
I was out there in sunny Austin this February, speaking at Civo Navigate 2024. The event was jam packed with amazing talks, and it was great meeting so many people with long and fascinating careers in engineering and Site Reliability. I had the privilege of meeting Bob Lee, who currently leads DevOps at Twingate — a cloud-based service that provides secured remote access, and poised to replace VPNs.
  |  By Shubham Srivastava
I was out there in sunny Austin this February, speaking at Civo Navigate 2024. The event was jam packed with amazing talks, and it was great meeting so many people with long and fascinating careers in engineering and Site Reliability. I had the privilege of meeting Bob Lee, who currently leads DevOps at Twingate — a cloud-based service that provides secured remote access, and poised to replace VPNs.
  |  By Anjali Udasi
The digital world comes with advantages and inherent risks. These IT incidents, which can encompass cyberattacks, system outages, and data breaches, can have a devastating impact. Beyond financial losses, IT incidents disrupt business operations, damage reputations, and erode customer trust. During an outage, having a well-prepared Incident Response Team (IRT) is essential to reduce downtime and improve response times.
  |  By Menahi Shayan
We’ve come a long way from the first On-call schedule editor we built, and this new release takes the user experience for Zenduty Schedules a hundred steps further! (…103 steps further precisely, says our internal changelog)
  |  By Anjali Udasi
The pressure to constantly innovate and release new features can often clash with the need for a stable and reliable product. While there might be some temporary cutbacks in testing time to achieve high feature velocity, ensuring reliability doesn't have to be an afterthought. We reached out to industry experts to gather their insights on ensuring reliability during phases that demand high feature velocity. Here's what they had to say.
  |  By Anjali Udasi
External dependencies and third-party services play a crucial role in powering modern applications. These components bring a wealth of benefits, ranging from access to specialized tools and resources to the ability to offload non-core tasks, allowing development teams to focus on delivering value-added features.
  |  By Anjali Udasi
Alert fatigue is a state of exhaustion caused by receiving too many alerts. This can happen when the alerts are not actionable, are irrelevant or too frequent. Misconfigurations or configurations with the wrong assumptions or that lack Service-level objectives (SLOs) can have a dual impact, leading to alert fatigue and, more alarmingly, the potential of overlooking critical alerts We spoke with more than 200 teams using Prometheus Alertmanager. Many face alert fatigue from trivial, nonactionable alerts.
  |  By Anjali Udasi
Securing reliable system operation necessitates building a formidable Site Reliability Engineering (SRE) team. However, a critical strategic decision confronts every organization: do we cultivate SRE talent internally or venture into the external talent pool? Both approaches possess distinct advantages and disadvantages, each impacting the composition, skillset, and overall effectiveness of the SRE team.
  |  By Anjali Udasi
Site Reliability Engineers (SREs) and DevOps teams often deal with alert fatigue. It's like when you get too alert that it's hard to keep up, making it tougher to respond quickly and adding extra stress to the current responsibilities. According to a study, 62% of participants noted that alert fatigue played a role in employee turnover, while 60% reported that it resulted in internal conflicts within their organization.
  |  By Anjali Udasi
Non-Abstract Large System Design (NALSD) is an approach where intricate systems are crafted with precision and purpose. It holds particular importance for Site Reliability Engineers (SREs) due to its inherent alignment with the core principles and goals of SRE practices. It improves the reliability of systems, allows for scalable architectures, optimizes performance, encourages fault tolerance, streamlines the processes of monitoring and debugging, and enables efficient incident response.
  |  By Zenduty
We're about to drop a major revamp to one of your most used Zenduty features. Get ready to experience scheduling like never before! Join our YouTube Premiere Live to see the new on-call schedules that'll make your on-call life smoother and better! P.S. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.
  |  By Zenduty
Dive into an in depth conversation on how software has now become the backbone of things and get access to extraordinary reliability nuggets with Piyush. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.
  |  By Zenduty
Catch Piyush Verma, Co-Founder and CTO at Last9 in conversation with Ankur Rawal, Co-Founder and CTO at Zenduty — discussing what reliability means to the modern consumer, why SREs make excellent decision-makers, and the current state of observability. Exclusively on The Incidentally Reliable podcast — made by SREs for SREs, hosted by Zenduty. Zenduty is an advanced incident management platform that gives you greater control and automation over the incident management lifecycle.
  |  By Zenduty
Settle in and listen to Suresh Kumar Khemka(Head of Platform & Infra at apna) talk about platform engineering, balancing bureaucracy and velocity at startups and Tech Giants, and the rippling impact of an e-commerce's downtime. Exclusively on The Incidentally Reliable podcast — made by SREs for SREs, hosted by Zenduty.
  |  By Zenduty
Grab some popcorn and catch Viraj talk about his experiences and BookMyShow's journey from its inception in the early 2000s to the entertainment behemoth it is today, their stints innovating at the forefront of the mobile and e-commerce revolutions, and their harmony with reliability engineering in the colourful, ever-changing yet challenging world of movies and online ticketing. Exclusively on The Incidentally Reliable podcast — made by SREs for SREs, hosted by Zenduty.
  |  By Zenduty
Incidentally Reliable Episode 4 dropping this Thursday the 14th, chatting about BookMyShow's journey from inception to the entertainment behemoth it is today, their experience innovating at the forefront of the mobile and e-commerce revolutions, and their harmony with reliability in the colourful yet challenging world of movies. Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.
  |  By Zenduty
Zenduty is an end-to-end incident management platform that gives you greater control and automation over the incident management lifecycle.
  |  By Zenduty
Zenduty is a revolutionary incident management platform that gives you greater control and automation over the incident management lifecycle.
  |  By Zenduty
Zenduty is an end-to-end incident management platform that gives you greater control and automation over the incident management lifecycle.
  |  By Zenduty
Watch Ankur Rawal and Dheeraj Reddy talk about how to choose the right metrics for noise K8s alerting, with insights and suggestions based on the mistakes made by hundreds of companies while implementing Prometheus Alertmanager in their production systems, and learn how much bad monitoring could be costing you. This talk was delivered at PromCon'2023 in Berlin.

Zenduty is a collaborative incident management system for the management of always-on services, helping teams orchestrate incident response for creating better user experiences and brand value. Zenduty centralizes all incoming alerts through predefined notification rules to ensure that the right people are notified at the right time.

Zenduty supports over 100+ integrations where IT teams receive contextual notifications from the services of their choice to foster speedy resolution of potentially damaging downtime:

  • Assign predefined incident roles along with highly customizable task templates to empower teams to rapidly resolve crisis with minimal noise and confusion.
  • Customizable escalation policies define your internal alerting rules as per your company's on-call schedules to notify the right responders.
  • Leverage rich contextual data to perform rapid RCAs
  • Customizable post-mortems insights to streamline processes and institutionalize a culture of continuous improvement and world-class reliability.

Modern on-call and incident response platform for SRE, DevOps, ITOps and Support teams.