Operations | Monitoring | ITSM | DevOps | Cloud

Self Service Monitoring at Planet Scale: Waze Case Study (Cloud Next '18)

You’ve built a successful app that serves millions of users - great! Now how do you manage your 100’s of microservices that are running in multiple clouds, by various different teams across the org? In this session, we'll share the Waze team’s stories as they’ve transitioned to zero config, self service monitoring for their dev teams.

Release with Confidence: Testing, Debugging, and Monitoring in a Serverless World (Cloud Next '18)

Identifying the cause of a bug in a serverless system can sometimes be difficult. We'll show you how to tame your bugs with testing, and how to diagnose and mitigate problems in production.

Drilling down into Stackdriver Service Monitoring

If you’re responsible for application performance and availability, you know how hard it can be to see it through the eyes of your customers and end users. We think that’s really going to change with last week’s introduction of Stackdriver Service Monitoring, a new tool for monitoring how your customers perceive your applications, and that then lets you drill down to the underlying infrastructure when there’s a problem.

Transparent SLIs: See Google Cloud the way your application experiences it

Like all good IT organizations, you religiously measure the performance and availability of your services and applications. But if those apps run in the cloud, critical components are often delivered by a third party or the cloud provider. In the case of a service disruption or degraded performance, how do you know what the problem is—your code, the network, or the provider? And, if the problem is with the service provider, how do you convince them to take action as quickly as possible?

Centralized Logging Solution for Google Cloud Platform (Cloud Next '18)

In this session, we’ll give practical guidance on consolidating and managing your logs, share tips on both what to log and what not to log, discuss logging agents and their potential pitfalls, and show you how to extract value from your log entries for reporting and alerting on logs.

Optimizing and Troubleshooting Your Application, the Google Way (Cloud Next '18)

In this session, you’ll learn about the value of these kinds of tools, how you can automatically extract telemetry from your app with OpenCensus, and will receive a demonstration of how to solve customer issues in a multi-cloud deployment with Stackdriver APM and other tools supported by OpenCensus.

Improving Reliability with Error Budgets, Metrics, and Tracing in Stackdriver (Cloud Next '18)

Members of the Stackdriver and Customer Reliability Engineering teams will demonstrate how Stackdriver tooling inspired by the needs of SREs at Google brings you the ability to run services more reliability and with fewer false positive signals through tracking and alerting upon error budgets and debugging with the exemplar technique during an outage.