Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Efficient task management for remote/work-from-home teams

As COVID-19 continues to impact communities globally with health care professionals working tirelessly to prepare for emergencies and prevent the further spread of the pandemic, technology companies are also doing their part. Twitter, Google, and Amazon have issued directives instructing employees to work from home as the companies themselves move to pull out of tech events while hosting their own events virtually.

Incident management for remote/WFH teams

As the world tries to battle COVID-19, most of our customers here at Zenduty have started implementing social distancing measures within their companies by asking all their employees, including the NOC, SRE, ITOps, Support, and software engineering teams to work remotely or from home. While that may appear to be a drastic change in your day-to-day operations, it need not disrupt your reliability and support operations.

5 reasons why Zenduty is a great alternative to Opsgenie, Pagerduty, and VictorOps

Every integration(with a monitoring source) within O/P/V sends alerts according to a single escalation policy that is linked to that monitoring source(service), no matter what the nature of the alert is, what time of the day or week it is triggered, what the severity of the alert is, which component is affected.

Incident management with Microsoft Teams and Zenduty

Teams is Microsoft’s versatile chat and collaboration solution for enterprise communication. Teams come bundled with Office365, offering chat, file sharing, and a host of other collaborative features. The platform also integrates with a host of popular project management applications, chatbots, and alert management platform makes it a hot favorite of production teams.

Application peformance monitoring with Datadog 2020

Datadog is an application performance monitoring and analytical SaaS for cloud infrastructure. Datadog enables DevOps teams, SREs and IT operation teams to optimize their systems for uptime and availability. Modern services generate massive amounts of data from all of the different services and technologies, Datadog supports over 400+ integrations and collects data for improving visibility across dynamic production environments.

Incident Response - how great companies do it

An incident response plan is a pre-devised action stratagem for IT teams on how to respond to critical IT events efficiently. As modern applications continue to grow in scale and complexity, there will be more people working on more interdependent systems, consequently, the question is not if a system will fail, but when, and how best to respond.

Monitoring with New Relic- Everything you need to go to get started

DevOps is an organizational philosophy that enables continuous delivery and continuous deployment with a focus on continuous testing, automation and collaboration among dev teams, business, and operations teams. Consequently, continuous monitoring is also a key phase of the DevOps lifecycle, which is where application performance monitoring tools come into the picture. APM tools enable developers to monitor user experience in real-time with an eye on the health and stability of their applications.

Grafana- Everything you need to know

Grafana is an open-source platform for data visualization, monitoring, and analysis. It's designed around providing context-rich visualizations, mainly though graphs but also supports other ways to present data through pluggable panel architecture. Every dashboard is versatile and custom-buildable for specific projects of software development or business requirement. Grafana’s beautiful dashboards are one of the reasons Grafana is so popular with users.

Site reliability engineering- Predictions for 2020

As we head into 2020, it's clear that DevOps has finally crossed the divide and gone mainstream. With DevOps firmly ingrained as a standard practice, we now look at how it will evolve. DevOps is driving more overall alignment between development and operations teams than has ever existed in the past. For developers, that means building and delivering impeccable apps to market quickly.

Incident Alert Routing - Getting woken up only by alerts that matter to you

Site reliability engineers have one of, if not the, toughest roles in any organization. While dealing with incidents is one part of the job, the other is to build reliable systems. Google’s SRE book sums this approach nicely. One of the most important challenges for an SRE when it comes to balancing work between firefighting and toil reduction is the issue of alert noise.