Google Operations

May 20, 2022   |  By Alexander Losovsky
We’re thrilled to announce the release of a new update to the Cloud Logging Library for Node.js with the key new features of improved error handling and writing structured logging to standard output which becomes handy if you run applications in serverless environments like Google Functions!
May 16, 2022   |  By Latav Dudley
With Cloud SQL for SQL Server, you can bring your existing SQL Server on-premises workloads to Google Cloud. Cloud SQL takes care of infrastructure, maintenance, and patching so you can focus on your application and users. A great way to take better care of your application is by monitoring the SQL Server error log for issues that may be affecting your users such as deadlocks, job failures, and changes in database health.
May 16, 2022   |  By Lee Yanco
Prometheus is considered the de facto standard for Kubernetes application metrics, but running it yourself can strain engineering time and infrastructure resources when your usage grows. In March, we announced the general availability of Google Cloud Managed Service for Prometheus to help you offload that burden, and today, we’re excited to announce a new low-cost, high-usage pricing tier designed for customers who are moving large volumes of Kubernetes metrics over to the service.
May 13, 2022   |  By Roy Arsan
We’re thrilled to announce several new observability features for the Pub/Sub to Splunk Dataflow template to help operators keep a tab on their streaming pipeline performance. Splunk Enterprise and Splunk Cloud customers use the Splunk Dataflow template to reliably export Google Cloud logs for in-depth analytics for security, IT or business use cases.
May 4, 2022   |  By Ayelet Sachto
Setting up Service Level Objectives (SLOs) is one of the foundational tasks of Site Reliability Engineering (SRE) practices, giving the SRE team a target against which to evaluate whether or not a service is running reliably enough. The inverse of your SLO is your error budget — how much unreliability you are willing to tolerate.
Apr 25, 2022   |  By Charles Baer
When you’re troubleshooting an issue, finding the root cause often involves finding specific logs generated by infrastructure and application code. The faster you can find logs, the faster you can confirm or refute your hypothesis about the root cause and resolve the issue! Today, we’re pleased to announce a dramatically simpler way to find logs in Logs Explorer.
Apr 5, 2022   |  By Eyamba Ita
Building new applications is a lot of fun, but troubleshooting and fixing the crashes that can come with app development is not. While many organizations are fast adopting the DevOps model, there are still some legacy frameworks where developers and operations teams are separate. Developers build and submit apps to their ops team, who in turn deploy and maintain the production stack. A common issue that arises due to this workflow is the time it takes to find and resolve crashes.
Apr 4, 2022   |  By Haskell Garon
When IT operators and architects begin their journey with Google Cloud, Day 0 observability needs tend to focus on infrastructure and aim to address questions about resource needs, a plan for scaling, and similar considerations. During this phase, developers and DevOps engineers also make a plan for how to get deep observability into the performance of third-party and open-source applications running on their Compute Engine VMs.
Mar 29, 2022   |  By Alizah Lalani
When you are dealing with a situation that fires a bevy of alerts, do you instinctively know which alerts are the most pressing? Severity levels are an important concept in alerting to aid you and your team in properly assessing which notifications should be prioritized. You can use these levels to focus on the issues deemed most critical for your operations and triage through the noise.
Mar 7, 2022   |  By Leonid Yankulin
Today it is even easier to capture logs in your Java applications. Developers can get more data with their application logs using a new version of the Cloud Logging client library for Java. The library populates the current executing context implicitly with every ingested log entry. Read this if you want to learn how to get HTTP requests and tracing information and additional metadata in your logs without writing a single line of code.
May 16, 2022   |  By Google Operations
Cloud SQL provides a managed service for MySQL, PostgreSQL, and SQL Server databases as well as backups, high availability, maintenance, and so much more! In this episode of Networking End to End, Lorin Price discusses networking concepts from implementation and security to connectivity on Cloud SQL. Watch along to learn about the options for deploying Cloud SQL and tips on how to determine who and what can access your Cloud SQL instance.
Apr 29, 2022   |  By Google Operations
Learn from our 2021 Google Cloud DevOps Award winner, Broadcom Software, on how they have been utilizing IT operations to drive business decisions.
Mar 30, 2022   |  By Google Operations
When you run applications in production, you need to monitor the infrastructure they run on - and collect important signals about application health like error rates and latency. In this episode of Engineering for Reliability with Google Cloud, Yuri will demonstrate how to instrument your service to expose application-specific telemetry with Prometheus and how to configure Google's managed service for Prometheus to collect those metrics.
Jan 26, 2022   |  By Google Operations
In this video we talk about the importance of CI/CD, different strategies, and how to design a CI/CD workflow on Google Cloud including a demonstration.
Jan 25, 2022   |  By Google Operations
In this video we help you understand the different services in the Google Cloud Operations Suite with a focus on how they help you. We conclude with a customer success story (Krikey gaming).
Dec 15, 2021   |  By Google Operations
We covered best practices for ingesting, centralizing, and managing cloud logs in our previous episode. But how can you quickly find the logs you're looking for when troubleshooting? And how can you manage and optimize your logging costs? In this episode, we'll show you how to use advanced log queries to find the exact logs you're looking for and how to manage logging costs.
Dec 1, 2021   |  By Google Operations
In our last episode, we covered how to best deploy and use Cloud Monitoring. This week, we answer the most important questions about Cloud Logging - what’s the best way to ingest logs? And how do you centralize logs and manage access? Watch this episode of Engineering for Reliability to learn some best practices for using Cloud Logging. Watch to learn how to keep your services reliable and your users happy.
Nov 17, 2021   |  By Google Operations
In our last episode, we covered best practices for deploying and using Cloud Operations in an enterprise environment. But we still left some questions unanswered. How should you monitor your services? How should you deal with alerts? And what about managing cost? In this episode of Engineering for Reliability, Yuri discusses best practices for setting up and using Cloud Monitoring and optimizing monitoring costs.
Nov 5, 2021   |  By Google Operations
Learn about innovations in cloud network security over a global network. This includes Google Cloud innovations released this year from DDoS and Web Application Firewall (WAF), Google Cloud Armor, Google Cloud firewalls, and Google Cloud IDS - the newest network based intrusion detection solution.
Nov 3, 2021   |  By Google Operations
How can you get the most value out of Cloud Operations, especially as your Cloud footprint grows? In this episode of Engineering for Reliability, we look at the enterprise best practices for setting up and using Cloud Operations. Watch to learn how to improve the security of your services, better manage capacity, and keep your users happy!

Monitoring and management for services, containers, applications, and infrastructure.

Operations aggregates metrics, logs, and events from infrastructure, giving developers and operators a rich set of observable signals that speed root-cause analysis and reduce mean time to resolution (MTTR). Operations doesn’t require extensive integration or multiple “panes of glass,” and it won’t lock developers into using a particular cloud provider.

Operations is built from the ground up for cloud-powered applications. Whether you’re running on Google Cloud Platform, Amazon Web Services, on-premises infrastructure, or with hybrid clouds, Operations combines metrics, logs, and metadata from all of your cloud accounts and projects into a single comprehensive view of your environment, so you can quickly understand service behavior and take action.