Operations | Monitoring | ITSM | DevOps | Cloud

How to build your own incident management process

IT incident management is a fundamental operational process designed to ensure rapid service restoration. This process is typically assigned to the help desk but is also very much entrenched in the day-to-day of DevOps. When incident management goes right, service is restored quickly and the impact on productivity, continuity, and customer satisfaction is minimal.

ServiceNow addresses vaccine management challenges

We all have words to describe 2020. Few of them would ring with nostalgia. COVID-19 has created pain, loss, and disruption on a scale not seen in generations. The economy has also been a casualty, as some companies have transformed and thrived while many others have stumbled and dissolved. But then, just as 2020 was bowing out, hope emerged. Three promising vaccines had produced better than expected results in clinical trials. Governments around the world rushed to approve them.

Is the New Elasticsearch SSPL License a Threat to Your Business?

The recent changes to the Elasticsearch license could have consequences on your intellectual property. On the 14th of January 2021, Elastic announced through their blog that Elasticsearch and Kibana will be moving over to a Server Side Public License (SSPL). This license change, effective from Elasticsearch version 7.11, has business owners that rely on the ELK stack rightly concerned.

Get ready for SCOMathon 2021 | The Big Survey

SCOMathon 2020 was one of the highlights of Microsoft SCOM community-driven events last year. Within a 16-hours marathon on all things SCOM, high-class tutorials were delivered to an excited audience of over 1.000 participants, eager to learn the latest hot topics to evolve their SCOM knowledge. Not to mention the overwhelming runner’s high when crossing the finishing line along with so many like-minded people.

7 Tips On Building And Maintaining An SRE Team In Your Company

In today's "always on" world, Reliability is a primary business KPI. Plant the culture of Reliability by implementing these 7 simple tips to build a solid SRE team in your organization. Many of today’s hottest jobs didn’t exist at the turn of the millennium. Social media managers, data scientists, and growth hackers were never heard of before. Another relatively new job role in demand is that of a Site Reliability Engineer or SRE. The profession is quite new.

Building powerful tailored dashboards: end users, management, infrastructure

In my position, I get to work with a wide variety of organizations that each have a different level of monitoring maturity. But I’ve noticed an emerging pattern that I’ll call the ‘Critical Service Offering’ or ‘Executive Level Status’ dashboard. At their most basic level, these dashboards should communicate the current health of the application, provide some historical context and, most importantly, not be tied to infrastructure monitoring.

Take the first step toward SRE with Cloud Operations Sandbox

At Google Cloud, we strive to bring Site Reliability Engineering (SRE) culture to our customers not only through training on organizational best practices, but also with the tools you need to run successful cloud services. Part and parcel of that is comprehensive observability tooling—logging, monitoring, tracing, profiling and debugging—which can help you troubleshoot production issues faster, increase release velocity and improve service reliability.

Truly Doubling down on open source #2

Earlier this week, I wrote a blog stating our intention to fork Kibana and Elasticsearch. This was a huge decision on our end, one that we did not take lightly. A few days have passed since this announcement and I wanted to share how humbled and excited we are with the responses from companies and individuals who are eager to participate and contribute.