Operations | Monitoring | ITSM | DevOps | Cloud

Take the first step toward SRE with Cloud Operations Sandbox

At Google Cloud, we strive to bring Site Reliability Engineering (SRE) culture to our customers not only through training on organizational best practices, but also with the tools you need to run successful cloud services. Part and parcel of that is comprehensive observability tooling—logging, monitoring, tracing, profiling and debugging—which can help you troubleshoot production issues faster, increase release velocity and improve service reliability.

Truly Doubling down on open source #2

Earlier this week, I wrote a blog stating our intention to fork Kibana and Elasticsearch. This was a huge decision on our end, one that we did not take lightly. A few days have passed since this announcement and I wanted to share how humbled and excited we are with the responses from companies and individuals who are eager to participate and contribute.

Building powerful tailored SCOM dashboards with Enterprise Applications (Part 1)

In my position, I get to work with a wide variety of organizations that each have a different level of monitoring maturity. But I’ve noticed an emerging pattern that I’ll call the ‘Critical Service Offering’ or ‘Executive Level Status’ dashboard. At their most basic level, these dashboards should communicate the current health of the application, provide some historical context and, most importantly, not be tied to infrastructure monitoring.

3 Keys to Customer Satisfaction through Visibility

Did you know that retaining a customer is five times cheaper than acquiring a new one? Customer satisfaction is extremely vital to business success. Your business strategy for the year 2021 probably includes generating more leads, but it should also include retaining your current customers. According to the World Bank, the 2020 recession has been one of the worst since the Great Depression, which was a decade-long economic slowdown.

Troubleshooting Kubernetes Job Queues on DigitalOcean, Part 2

Kubernetes work queues are a great way to manage the prioritization and execution of long-running or expensive menial tasks. DigitalOcean managed Kubernetes services makes deploying a work queue straightforward. But what happens when your work queues don’t operate the way you expect? SolarWinds® Papertrail™ advanced log management complements the monitoring tools provided by DigitalOcean and simplifies both the debugging and root cause analysis process.

Top 10 Metrics to Track when Monitoring Microsoft IIS Performance

Microsoft Internet Information Services (IIS, formerly known as Internet Information Server) is an extensible web server software created by Microsoft for use with the Windows family. IIS supports various protocols, including HTTP, HTTP/2, HTTPS, FTP, FTPS, SMTP, and NNTP. According to the most recent ranking by W3Techs, Microsoft IIS is the second most popular web server technology behind Apache.

How to Save Hundreds of Hours on Lambda Debugging

Although AWS Lambda is a blessing from the infrastructure perspective, while using it, we still have to face perhaps the least-wanted part of software development: debugging. In order to fix issues, we need to know what is causing them. In AWS Lambda that can be a curse. But we have a solution that could save you dozens of hours of time. TL;DR: Dashbird offers a shortcut to everything presented in this article.

Walkthrough to Set Up the Deep Learning Toolkit for Splunk with Amazon EKS

The Splunk Deep Learning Toolkit (DLTK) is a very powerful tool that allows you to offload compute resources to external container environments. Additionally, you can use GPU or SPARK environments. In last Splunk blog post, The Power of Deep Learning Analytics and GPU Acceleration, you can learn more about building a GPU-based environment. Splunk DLTK supports Docker as well as Kubernetes and OpenShift as container environments.

Get to Know Splunk Machine Learning Environment (SMLE)

One of our most exciting new projects at Splunk is coming to life. Over the past year, we have been hard at work putting together our vision: a place where Splunk admins, NOC/SOC teams, data analysts, and data scientists can collaborate, experiment, and operationalize their work, all in a single environment inside the Splunk ecosystem. We call it Splunk Machine Learning Environment (SMLE).

Best Website Performance Testing Tools

What is the usual criteria in choosing an online store? It should have reasonable prices, sell quality products, and most of all, it should have a fast loading time. A website’s performance is essential. A two-second delay can make a big difference to your website and revenue as well. In fact, Neil Patel reported that a mere second delay may cost an e-commerce site up to $2.5 million in sales annually.