Operations | Monitoring | ITSM | DevOps | Cloud

Visualizing Network Topologies and Traffic (Cloud Next '18)

In this session, we will look at which use cases in the field of network monitoring and management are relevant in a cloud environment and which data Google Cloud Platform provides to gain insights. We will then demo how to visualize traffic flows and topologies using a mix of Google and Open Source tools.

Optimizing and Troubleshooting Your Application, the Google Way (Cloud Next '18)

In this session, you’ll learn about the value of these kinds of tools, how you can automatically extract telemetry from your app with OpenCensus, and will receive a demonstration of how to solve customer issues in a multi-cloud deployment with Stackdriver APM and other tools supported by OpenCensus.

Improving Reliability with Error Budgets, Metrics, and Tracing in Stackdriver (Cloud Next '18)

Members of the Stackdriver and Customer Reliability Engineering teams will demonstrate how Stackdriver tooling inspired by the needs of SREs at Google brings you the ability to run services more reliability and with fewer false positive signals through tracking and alerting upon error budgets and debugging with the exemplar technique during an outage.

Announcing our new partnership with Slack

When we announced Stride in September 2017, we said, “It’s time we rethink the way we’re working. We believe that teams can stay connected and keep moving forward.” We still believe that. We knew we were taking a risk by entering an already competitive real-time team communications market, but we were willing to do the hard work necessary to build a great product. And we believe we were on that path.

How to Ensure Your App Is Online and Working Properly

Your application is deployed. You checked a few endpoints, and they work as expected. You can log in and see the generic home page. There aren’t any exceptions in the logs—or at least any new ones. Great! But what does that mean for your customers and partners? Does everything work for them?

When is a website considered down ...as opposed to just slow?

When you visit a webpage that is down, most of the time you'll see an error: you'd see a 404 error if the page can't be found or a 503 if the server isn't unavailable. Although this is not what you want to see, it is helpful. You know that the site is down and have a rough idea why. But sometimes you don't see an error... just a spinning wheel.