Zenduty

Tutorial: Integrating Prometheus with Zenduty

Aug 8, 2024 By Zenduty In Zenduty

Zenduty is a distributed, end-to-end major incident management platform for production engineering teams, that helps you minimize downtime, implement scalable incident response processes and institutionalize site reliability within your organization. Alertmanager is a powerful component of the Prometheus ecosystem designed to handle alerts. It manages alerts by deduplicating, grouping, and routing them to the appropriate receiver integrations such as email, Slack, or custom webhooks.

View Video

Zenduty

Read more about Tutorial: Integrating Prometheus with Zenduty

Purpose and Goals of Daily Stand-up Meetings

Aug 5, 2024 By Ankur Rawal In Zenduty

Stand-up meetings are a cornerstone for any engineering team. When done right, they can make a huge difference in keeping everyone on the same page, fostering collaboration, and building a strong team culture. However, getting them right can be a bit tricky. Drawing from our own experience of running engineering stand-ups at Zenduty, and insights from some of the best engineering managers in my network, I'd love to share some tips and insights on how to make your stand-ups effective.

Read Post

Zenduty

Read more about Purpose and Goals of Daily Stand-up Meetings

[New] Schedule Overrides is now live for every team member!

Jul 26, 2024 By Vishwa Krishnakumar In Zenduty

We are excited to announce a significant enhancement to our scheduling feature based on your valuable feedback! At Zenduty, we understand the importance of flexibility and efficiency in managing on-call schedules and ensuring seamless incident response. Previously, only team managers had the capability to edit schedules and add overrides. This meant that non-manager team members had to reach out to their managers to request override coverage, potentially delaying critical adjustments.

Read Post

Zenduty

Read more about [New] Schedule Overrides is now live for every team member!

OpenTelemetry, AI, and the Future of Observability with Andreas Grabner

Jul 19, 2024 By Anjali Udasi In Zenduty

Shubham Srivastava from our team had the pleasure of meeting Andreas Grabner at KubeCon + CloudNativeCon Europe earlier this year. Andreas wears many hats in his daily work, primarily serving as a DevOps Activist at Dynatrace, where he has dedicated over 16 years to shape the Observability solutions we see today. He is also a Developer Advocate at Keptn – helping teams automate and orchestrate their deployments end-to-end and plays an active role as an Ambassador in the CNCF community.

Read Post

Zenduty

Read more about OpenTelemetry, AI, and the Future of Observability with Andreas Grabner

Why First-Call Resolution Is Non-Negotiable in Modern Business

Jul 1, 2024 By Ashwin Hariharan In Zenduty

In 1750 BCE, in the bustling heart of ancient Mesopotamia, a copper merchant named Ea-nāṣir thought he had closed another routine sale of copper ingots. Little did he know, his customer wasn't exactly thrilled. In fact, the customer was so displeased that he decided to write Ea-nāṣir a strongly worded letter. Yes, you heard that right! A literal stone tablet of dissatisfaction, complaining about the shoddy grade of copper and some other delivery mishap.

Read Post

Zenduty

Read more about Why First-Call Resolution Is Non-Negotiable in Modern Business

MTBF, MTTR, MTTF, MTTA: Incident Metrics Explained

Jun 26, 2024 By Anjali Udasi In Zenduty

When it comes to managing incidents and ensuring operational efficiency, understanding key metrics is crucial. Among the most important are MTBF (Mean Time Between Failures), MTTR (Mean Time To Repair), MTTF (Mean Time To Failure), and MTTA (Mean Time To Acknowledge). In this blog, we'll explore these metrics along with some best practices and practical applications.

Read Post

Zenduty

Read more about MTBF, MTTR, MTTF, MTTA: Incident Metrics Explained

The Science of Building Cloud Native DevTools - Incidentally Reliable with Ramiro Berrelleza

Jun 21, 2024 By Zenduty In Zenduty

Catch Ramiro Berrelleza — Founder and CEO at Okteto talk about how impactful DevTool startups are built, the importance of investing in Developer Experience, and the emerging issues with the Cloud Native ecosystem.

View Video

Zenduty

Read more about The Science of Building Cloud Native DevTools - Incidentally Reliable with Ramiro Berrelleza

Four Golden Signals: Key Indicators for System Reliability

Jun 3, 2024 By Anjali Udasi In Zenduty

System reliability is crucial for providing seamless user experiences and enabling effective business operations. The "4 Golden Signals" —latency, traffic, errors, and saturation—offer a comprehensive view of system performance and potential issues. In this blog, we deep dive into system reliability and explore these four key metrics for monitoring system health and ensuring optimal performance.

Read Post

Zenduty

Read more about Four Golden Signals: Key Indicators for System Reliability

Credit-Worthy Reliability - Incidentally Reliable with Krishnendu Majumdar

May 30, 2024 By Zenduty In Zenduty

Catch Krishnendu Majumdar (CPTO at Yubi) talk about his journey in the dynamic Indian startup ecosystem, strategies to build for scale from Day 1 and insights into building sustained user trust via exceptional product performance in high governance industries like credit and finance.

View Video

Zenduty

Read more about Credit-Worthy Reliability - Incidentally Reliable with Krishnendu Majumdar

The Reliability Stories You Won't Hear on LinkedIn

May 24, 2024 By Anjali Udasi In Zenduty

We had the pleasure of meeting Ponmani Palanisamy, a Staff Site Reliability Engineer at LinkedIn, at a recent SRE Meetup in Bangalore. Ponmani gave an insightful talk on "Improving data redundancy and rebalancing data in HDFS." We were captivated by his talk and eager to learn more about his experience in the reliability space. We talked about everything including his journey, experiences, and of course, his most memorable war room stories over a steady career of 17 years. Here's what he had to share.

Read Post

Zenduty

Read more about The Reliability Stories You Won't Hear on LinkedIn

Operations | Monitoring | ITSM | DevOps | Cloud

Zenduty

Tutorial: Integrating Prometheus with Zenduty

Purpose and Goals of Daily Stand-up Meetings

[New] Schedule Overrides is now live for every team member!

OpenTelemetry, AI, and the Future of Observability with Andreas Grabner

Why First-Call Resolution Is Non-Negotiable in Modern Business

MTBF, MTTR, MTTF, MTTA: Incident Metrics Explained

The Science of Building Cloud Native DevTools - Incidentally Reliable with Ramiro Berrelleza

Four Golden Signals: Key Indicators for System Reliability

Credit-Worthy Reliability - Incidentally Reliable with Krishnendu Majumdar

The Reliability Stories You Won't Hear on LinkedIn

Monthly Archive

Follow Us