Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Sprint planning - How to prioritize urgent production issues?

Small engineering team members wear a lot of hats while working on a product. It becomes hard to prioritize and deal with issues that arise during production when a sprint is already planned and put in place. This not only makes sprints harder to plan but also reduces accountability. How do you tackle this problem and make sure your engineering team does not burn out at the same time? Let’s list down a couple of characteristics of this engineering team that is quite common across the board.

Announcing our $1.9M round of funding

It is with a great deal of anticipation and excitement that I’m announcing our $1.9M round of funding, led by StartupXSeed Ventures along with participation from marquee enterprise SaaS investors Powerhouse Ventures, Secure Octane fund, Kwaish Ventures, Supermorpheus, Titan Capital, 100X Entrepreneurs, Viral Bajaria(CTO, 6Sense), Premal Shah(SVP, 6Sense), Hitesh Chawla(CEO SilverPush), Sumit Jain(CTO, BirdEye) and existing investors Anand Chandrasekaran(EVP, Five9), Rajesh Sawhney(GSF), Ashish To

Preventing your teams from burning out while working from home

In the past year of covid related working from home, we are increasingly seeing more burnouts in engineering teams worldwide. More and more devs are partially checked out and may not be putting their 100% in team activities (planning, grooming, code review, quality checks). In these testing times, we have found some of the ways to keep your team motivated.

FAANG proofing your Job Applications

There is one thing that hurts more than being rejected by a hiring manager - being rejected because you’re not ex-FAANG. This was not always the case though - FAANG’s combined engineering workforce is currently at 330,000+ and growing at an astounding 20% YoY. This means that at any given point in time, there are tens of thousands of FAANG engineers active in the job market vying for spots in great up-and-coming companies.

Escalating Prometheus alerts to SMS/Phone/Slack/Microsoft-Teams via AlertManager and Zenduty

Prometheus is by far, one of the most popular open-source monitoring tools used by millions of engineering teams globally with a robust community and continued adoption and evolution. We at Zenduty shipped our Prometheus integration integration a while back and we’re happy to report that the adoption of our Prometheus integration has been absolutely through the roof!

Site reliability engineering-what is SRE?

As companies today are racing to build site reliability engineering(SRE) practices within their engineering teams, site reliability engineering has become one of the hottest and highest paying jobs in tech. Site reliability engineering was a term coined by Google engineer Benjamin Treynor in 2003 when he was tasked with making sure that Google services were reliable, secure and functional.

Difference between a team lead and an engineering manager and how to transition between these roles

Transitioning from a team lead role to an engineering manager role is tough and you will experience many changes when transitioning between these two roles. What happens when you become an engineering manager?

The difference between Event Logging and Tracing in Observability

I have been noticing that a lot of folks are often confused between event logging and tracing. In terms of building out a generic SD for devs to report on observability data, should Event APIs be distinct from Trace APIs? Is an Event just a single Trace Span ? If you look at Honeycomb’s implementation, an Event seems to be equivalent to a single span trace. The middleware wrapper creates a Honeycomb event in the request context as a span in the overall trace.

Attaching incident playbooks to Azure monitor alerts for rapid remediation

Incident response playbooks are a set of actions that need to be executed by your incident repsonders depending on the nature of the outage. Having well defined incident response playbooks can be extremely critical, especially during high customer impact events, that you would typically classify as Sev-0 incidents.

On-call compensation models

Providing customers with a world-class and seamless user experience is critical for the success of any business. It is therefore important that you have a robust on-call strategy that optimizes the availability of the right subject matter experts, on-call engineers, and support engineers to resolve critical, user-impacting incidents as soon as possible.