Operations | Monitoring | ITSM | DevOps | Cloud

Easily view your old queries with Cloud Logging recent queries

As you analyze your logs for application performance, infrastructure errors, system events, and more, sometimes you may need to look back to logs you were previously analyzing to help correlate events and identify the root cause of a problem. To help, we are excited to introduce Google Cloud Logging recent queries, to make it easy to track and run your past searches as you deep dive on your log data.

Building and Using a 2020 Status Page with Uptime.com

A hosted status page gives you the peace of mind that users can always answer one simple question: is it up or down. Hosted status pages work because they offer third-party confirmation your services are up. If your site goes down, the third party is likely not down and you can use them to refer to your status. Status pages are your personal 24 hour news cycle. Regardless of if you’re up or down, customer service fields fewer support tickets, and users praise your transparency.

New in Grafana 7.2: $__rate_interval for Prometheus rate queries that just work

What range should I use with rate()? That’s not only the title of a true classic among the many useful Robust Perception blog posts; it’s also one of the most frequently asked questions when it comes to PromQL, the Prometheus query language. I made it the main topic of my talk at GrafanaCONline 2020, which I invite you to watch if you haven’t already. Let’s break the good news first: Grafana 7.2, released only last Wednesday, introduced a new variable called $__rate_interval.

You spoke, Microsoft listened. Ignite 2020 SCOM takeaways

Have you ever looked at SCOM on User Voice - Microsoft’s way of collecting feedback on what end users have to say about SCOM and its future? Well good news, the top 2 items are going to be addressed! At Ignite 2020 - System Center session, Dianna Marks (SCOM Product Marketing Manager) told us Microsoft have heard your feedback and will be taking action.

Best practices for monitoring AWS CloudTrail logs

Engineering teams that build, scale, and manage cloud-based applications on AWS know that at some point in time, their applications and infrastructure will be under attack. But as applications expand and new features are added, securing the full scope of an AWS environment becomes an increasingly complex task. To add visibility and auditability, AWS CloudTrail tracks the who, what, where, and when of activity that occurs in your AWS environment and records this activity in the form of audit logs.

Five worthy reads: The evolving employee experience

Five worthy reads is a regular column on five noteworthy items we have discovered while researching trending and timeless topics. This week, we explore how the employee experience is evolving with the new normal. The employee experience (EX) isn’t about ping pong tables or bring-your-pet-to-work days anymore. The new normal of working remotely has brought in a paradigm shift in the way businesses and employees operate.

Americaneagle.com and ROC Commerce stay ahead with Retrace

As a digital agency, the last thing you need are production issues for your ecommerce clients. The stakes are even higher when your ecommerce clients are running Super Bowl ads for millions to see. Instead of enjoying the game, you are faced with troubleshooting a dumpster fire. The development teams at Americaneagle.com and ROC Commerce rely heavily on Application Performance Management (APM) tools, especially on high stakes game days.

Leveraging logs to better secure cloud-native applications

With the growing popularity of cloud computing, security incidents related to it have been on the rise. Logs are indispensable resources for countering these threats, and they can be utilized for alerting, taking remedial action, and even preventing future attacks. In this post, we will examine ways to better secure cloud-native applications using logs.

Harnessing the Transformative Power of Disruption

They say that necessity is the mother of invention. I believe that the aphorism has a business corollary: Disruption is the mother of transformation. I’ve seen it prove out over and over again throughout my career. In fact, I’d even go one step further to say that lack of disruption can actually stand in the way of successful change—and I have the scars to prove it.

Incident Review - Google Outage

When something as ubiquitous as Google goes down, there is a lot of online frenzy with users tweeting and searching for updates on the issue. That’s exactly what we witnessed today between 9/24/2020 17:59:44 PST to 9/24/2020 18:23:20 PST. Multiple Google services like Mail, Drive, Meet, Hangouts experienced downtime. Frustrated users took to Twitter to report the outage and the tweets were captured by Websee. Users trying to access Google services got a 502 error screen.