Monthly Archive

Drilling down into Stackdriver Service Monitoring

Jul 30, 2018 By Jay Judkowitz In Google Operations

If you’re responsible for application performance and availability, you know how hard it can be to see it through the eyes of your customers and end users. We think that’s really going to change with last week’s introduction of Stackdriver Service Monitoring, a new tool for monitoring how your customers perceive your applications, and that then lets you drill down to the underlying infrastructure when there’s a problem.

Read Post

Google Operations

Read more about Drilling down into Stackdriver Service Monitoring

Transparent SLIs: See Google Cloud the way your application experiences it

Jul 27, 2018 By Jay Judkowitz In Google Operations

Like all good IT organizations, you religiously measure the performance and availability of your services and applications. But if those apps run in the cloud, critical components are often delivered by a third party or the cloud provider. In the case of a service disruption or degraded performance, how do you know what the problem is—your code, the network, or the provider? And, if the problem is with the service provider, how do you convince them to take action as quickly as possible?

Read Post

Google Operations

Read more about Transparent SLIs: See Google Cloud the way your application experiences it

Centralized Logging Solution for Google Cloud Platform (Cloud Next '18)

Jul 27, 2018 By Google Operations In Google Operations

In this session, we’ll give practical guidance on consolidating and managing your logs, share tips on both what to log and what not to log, discuss logging agents and their potential pitfalls, and show you how to extract value from your log entries for reporting and alerting on logs.

View Video

Google Operations

Read more about Centralized Logging Solution for Google Cloud Platform (Cloud Next '18)

Improving Reliability with Error Budgets, Metrics, and Tracing in Stackdriver (Cloud Next '18)

Jul 26, 2018 By Google Operations In Google Operations

Members of the Stackdriver and Customer Reliability Engineering teams will demonstrate how Stackdriver tooling inspired by the needs of SREs at Google brings you the ability to run services more reliability and with fewer false positive signals through tracking and alerting upon error budgets and debugging with the exemplar technique during an outage.

View Video

Google Operations

DevOps

Read more about Improving Reliability with Error Budgets, Metrics, and Tracing in Stackdriver (Cloud Next '18)

Google SRE for Availability, Reliability, and Scalability (Cloud Next '18)

Jul 26, 2018 By Google Operations In Google Operations

In this session, we’ll discuss how you can use the Stackdriver APM product suite to supercharge your development and operations teams, at an incredibly attractive price.

View Video

Google Operations

DevOps

Read more about Google SRE for Availability, Reliability, and Scalability (Cloud Next '18)

Visualizing Network Topologies and Traffic (Cloud Next '18)

Jul 26, 2018 By Google Operations In Google Operations

In this session, we will look at which use cases in the field of network monitoring and management are relevant in a cloud environment and which data Google Cloud Platform provides to gain insights. We will then demo how to visualize traffic flows and topologies using a mix of Google and Open Source tools.

View Video

Google Operations

Read more about Visualizing Network Topologies and Traffic (Cloud Next '18)

Using Cloud Audit Logs to Help Manage Insider Risk (Cloud Next '18)

Jul 26, 2018 By Google Operations In Google Operations

You’ll learn how you can alert on audit logs using Stackdriver or process and take action from audit logs using Cloud Functions.

View Video

Google Operations

Read more about Using Cloud Audit Logs to Help Manage Insider Risk (Cloud Next '18)

Optimizing and Troubleshooting Your Application, the Google Way (Cloud Next '18)

Jul 26, 2018 By Google Operations In Google Operations

In this session, you’ll learn about the value of these kinds of tools, how you can automatically extract telemetry from your app with OpenCensus, and will receive a demonstration of how to solve customer issues in a multi-cloud deployment with Stackdriver APM and other tools supported by OpenCensus.

View Video

Google Operations

Read more about Optimizing and Troubleshooting Your Application, the Google Way (Cloud Next '18)

SRE fundamentals: SLIs, SLAs and SLOs

Jul 19, 2018 By Jay Judkowitz In Google Operations

Next week at Google Cloud Next ‘18, you’ll be hearing about new ways to think about and ensure the availability of your applications. A big part of that is establishing and monitoring service-level metrics—something that our Site Reliability Engineering (SRE) team does day in and day out here at Google.

Read Post

Google Operations

DevOps
Blog

Read more about SRE fundamentals: SLIs, SLAs and SLOs

Operations | Monitoring | ITSM | DevOps | Cloud

Drilling down into Stackdriver Service Monitoring

Transparent SLIs: See Google Cloud the way your application experiences it

Centralized Logging Solution for Google Cloud Platform (Cloud Next '18)

Improving Reliability with Error Budgets, Metrics, and Tracing in Stackdriver (Cloud Next '18)

Google SRE for Availability, Reliability, and Scalability (Cloud Next '18)

Visualizing Network Topologies and Traffic (Cloud Next '18)

Using Cloud Audit Logs to Help Manage Insider Risk (Cloud Next '18)

Optimizing and Troubleshooting Your Application, the Google Way (Cloud Next '18)

SRE fundamentals: SLIs, SLAs and SLOs

Monthly Archive

Follow Us