Operations | Monitoring | ITSM | DevOps | Cloud

7 Tips On Building And Maintaining An SRE Team In Your Company

In today's "always on" world, Reliability is a primary business KPI. Plant the culture of Reliability by implementing these 7 simple tips to build a solid SRE team in your organization. Many of today’s hottest jobs didn’t exist at the turn of the millennium. Social media managers, data scientists, and growth hackers were never heard of before. Another relatively new job role in demand is that of a Site Reliability Engineer or SRE. The profession is quite new.

Building powerful tailored dashboards: end users, management, infrastructure

In my position, I get to work with a wide variety of organizations that each have a different level of monitoring maturity. But I’ve noticed an emerging pattern that I’ll call the ‘Critical Service Offering’ or ‘Executive Level Status’ dashboard. At their most basic level, these dashboards should communicate the current health of the application, provide some historical context and, most importantly, not be tied to infrastructure monitoring.

Take the first step toward SRE with Cloud Operations Sandbox

At Google Cloud, we strive to bring Site Reliability Engineering (SRE) culture to our customers not only through training on organizational best practices, but also with the tools you need to run successful cloud services. Part and parcel of that is comprehensive observability tooling—logging, monitoring, tracing, profiling and debugging—which can help you troubleshoot production issues faster, increase release velocity and improve service reliability.

Truly Doubling down on open source #2

Earlier this week, I wrote a blog stating our intention to fork Kibana and Elasticsearch. This was a huge decision on our end, one that we did not take lightly. A few days have passed since this announcement and I wanted to share how humbled and excited we are with the responses from companies and individuals who are eager to participate and contribute.

Building powerful tailored SCOM dashboards with Enterprise Applications (Part 1)

In my position, I get to work with a wide variety of organizations that each have a different level of monitoring maturity. But I’ve noticed an emerging pattern that I’ll call the ‘Critical Service Offering’ or ‘Executive Level Status’ dashboard. At their most basic level, these dashboards should communicate the current health of the application, provide some historical context and, most importantly, not be tied to infrastructure monitoring.

3 Keys to Customer Satisfaction through Visibility

Did you know that retaining a customer is five times cheaper than acquiring a new one? Customer satisfaction is extremely vital to business success. Your business strategy for the year 2021 probably includes generating more leads, but it should also include retaining your current customers. According to the World Bank, the 2020 recession has been one of the worst since the Great Depression, which was a decade-long economic slowdown.

Troubleshooting Kubernetes Job Queues on DigitalOcean, Part 2

Kubernetes work queues are a great way to manage the prioritization and execution of long-running or expensive menial tasks. DigitalOcean managed Kubernetes services makes deploying a work queue straightforward. But what happens when your work queues don’t operate the way you expect? SolarWinds® Papertrail™ advanced log management complements the monitoring tools provided by DigitalOcean and simplifies both the debugging and root cause analysis process.

Top 10 Metrics to Track when Monitoring Microsoft IIS Performance

Microsoft Internet Information Services (IIS, formerly known as Internet Information Server) is an extensible web server software created by Microsoft for use with the Windows family. IIS supports various protocols, including HTTP, HTTP/2, HTTPS, FTP, FTPS, SMTP, and NNTP. According to the most recent ranking by W3Techs, Microsoft IIS is the second most popular web server technology behind Apache.

How to Save Hundreds of Hours on Lambda Debugging

Although AWS Lambda is a blessing from the infrastructure perspective, while using it, we still have to face perhaps the least-wanted part of software development: debugging. In order to fix issues, we need to know what is causing them. In AWS Lambda that can be a curse. But we have a solution that could save you dozens of hours of time. TL;DR: Dashbird offers a shortcut to everything presented in this article.