Google Operations

Aug 4, 2021   |  By Charles Baer
Visualizing trends in your logs is critical when troubleshooting an issue with your application. Using the histogram in Logs Explorer, you can quickly visualize log volumes over time to help spot anomalies, detect when errors started and see a breakdown of log volumes. But static visualizations are not as helpful as having more options for customization during your investigations.
Jul 20, 2021   |  By Rahul Harpalani
Running and troubleshooting production services requires deep visibility into your applications and infrastructure. While basic logs and metrics are available out of the box with Google Cloud Compute Engine (GCE), capturing advanced data used to require the installation of both a metrics agent and a logging agent.
Jul 15, 2021   |  By Alisa Goldstein
Being alerted to an issue with your application before your customers experience undue interruption is a goal of every development and operations team. While methods for identifying problems exist in many forms, including uptime checks and application tracing, alerts on logs is a prominent method for issue detection. Previously, Cloud Logging only supported alerts on error logs and log-based metrics, but that was not robust enough for most application teams.
Jul 1, 2021   |  By Joy Wang
Setting up Cloud Monitoring dashboards for your team can be time consuming because every team's needs are different. Picking the right metrics, using the right visualizations to represent these metrics, deciding what metrics can go on the same chart, and determining the right pre-processing steps for metrics requires background and experience that may not yet exist among your development and operations teams.
Jun 29, 2021   |  By John Day
When you grow your peak concurrent users by 5x nearly overnight, ensuring that your operations can successfully support that growth can be a make or break for your success. Rocket League is a popular online multiplayer game created by Psyonix described as arcade-style soccer and vehicular mayhem. In the summer of 2020, the game maker decided to switch the business model of the game from an upfront purchase to a free to play model.
Jun 23, 2021   |  By Joy Wang
The observability of metrics is a key factor for a successful operations team, allowing for increasingly effective visualizations, analysis, and troubleshooting. Google Cloud works with third-party partners, such as Grafana Labs, to make it easy for customers to create their desired observability stack leveraging a combination of different tools. More than two years ago, we collaborated with Grafana Labs to introduce the Cloud Monitoring plugin for Grafana.
Jun 9, 2021   |  By Rahul Harpalani
Customers need scale and flexibility from their cloud and this extends into supporting services such as monitoring and logging. Google Cloud’s Monitoring and Logging observability services are built on the same platforms used by all of Google that handle over 16 million metrics queries per second, 2.5 exabytes of logs per month, and over 14 quadrillion metric points on disk, as of 2020.
Jun 7, 2021   |  By Vivek Balivada
At Lowe’s, we’ve made significant progress in our multiyear technology transformation. To modernize our systems and build new capabilities for our customers and associates, we leverage Google’s SRE framework and Google Cloud, which helps us meet their needs faster and more effectively. With these efforts, we’ve been able to go from one release every two weeks to 20+ releases daily—about 20X more releases per month.
May 27, 2021   |  By Charles Baer
We know that developers or operators troubleshooting applications and systems have a lot of data to sort through while getting to the root cause of issues. Often there are fields like error response codes that are critical for finding answers and resolving those issues. Today, we’re proud to announce log field analytics in Cloud Logging, a new way to search, filter and understand the structure of your logs so you can find answers faster and easier than ever before.
May 17, 2021   |  By Irene Abezgauz
Network traffic analysis is one of the core ways an organization can understand how workloads are performing, optimize network behavior and costs, and conduct troubleshooting—a must when running mission-critical applications in production. VPC Flow Logs is one such enterprise-grade network traffic analysis tool, providing information about TCP and UDP traffic flow to and from VM instances on Google Cloud, including the instances used as Google Kubernetes Engine (GKE) nodes.
Aug 4, 2021   |  By Google Operations
To optimize GKE costs, it’s important for cluster administrators to understand how much of their cluster resources are used by workloads in their clusters. In this video, we’ll show you what steps to take to ensure that fewer cluster resources are idle or wasted.
Aug 2, 2021   |  By Google Operations
In this video, we show you how to use monitoring systems specific to GKE to help establish cost optimization as a discipline.
Jun 30, 2021   |  By Google Operations
Cloud Operations can help you quickly isolate or eliminate infrastructure issues from a limited set of data, but how can you identify problems with your service itself? And when there's a problem, how can you quickly fix it? In this episode of Engineering for Reliability, we’ll show how you can manage your services running on GKE with Cloud Operations.
Jun 15, 2021   |  By Google Operations
How can you easily debug a Kubernetes application? In this episode of Kubernetes Essentials, we show how you can use the kubectl command line tool to identify and resolve bugs within your application. Watch to learn how you can use this tool to easily debug and gain greater observability over your Kubernetes application!
May 26, 2021   |  By Google Operations
If your system's security has been breached, what can you do to stop this attack and not make the situation worse? In this episode of Cloud Security Basics, we show how you can use Cloud Operations Suite to check for security breaches. Watch to learn some best practices when dealing with and handling malicious attacks!
May 14, 2021   |  By Google Operations
APIs are packages of data and functionality that contain business-critical information. However - as API programs scale - it becomes impossible to individually manage each API. In this video, we demo how Apigee helps simplify API operations and allows you to deliver seamless and connected experiences for your customers.
May 3, 2021   |  By Google Operations
Cloud Logging is a real-time log management tool that allows you to securely store, search, analyze, and alert on all of your log data and events. In this video, we show you what Cloud Logging is and how you can use it to convert logs to log-based metrics for monitoring, alerting, analyzing and visualizing for your applications infrastructure.
Apr 10, 2021   |  By Google Operations
Almost every app and digital interaction today depends on APIs, so it’s important to be able to find and fix issues fast. Apigee’s API monitoring can alert you to live issues, give you in-depth details for every problem, and recommend a course of action. Take a look at this API monitoring demo from the Apigee team to keep your APIs running smoothly!
Mar 15, 2021   |  By Google Operations
APIs are great tools since they provide developers a simplified way to consume data and functionality that resides in backend systems. However, they are targets for malicious attacks because they contain business-critical information. In this video, we demo how Google Cloud can help you better secure your APIs with Apigee and Cloud Armor. Watch to learn how these tools offer security at multiple levels for your APIs!
Feb 6, 2021   |  By Google Operations
Want to visualize your monitoring data like never before? In this episode of Stack Doctor, we show you how to use the new Dashboard Editor to easily visualize your Cloud Monitoring data. Specifically, we’ll show you how to create a dashboard using gauges, scorecards, and text widgets and how you can utilize the new layouts and chart configuration modes to closely monitor the health of your services!

Monitoring and management for services, containers, applications, and infrastructure.

Operations aggregates metrics, logs, and events from infrastructure, giving developers and operators a rich set of observable signals that speed root-cause analysis and reduce mean time to resolution (MTTR). Operations doesn’t require extensive integration or multiple “panes of glass,” and it won’t lock developers into using a particular cloud provider.

Operations is built from the ground up for cloud-powered applications. Whether you’re running on Google Cloud Platform, Amazon Web Services, on-premises infrastructure, or with hybrid clouds, Operations combines metrics, logs, and metadata from all of your cloud accounts and projects into a single comprehensive view of your environment, so you can quickly understand service behavior and take action.