Operations | Monitoring | ITSM | DevOps | Cloud

Experience at 35,000 Feet w/ Derek Whisenhunt (Southwest Airlines)

This week we bring you another special “live from the road” episode of the DEX Show – as we sat down with Southwest Airlines’ Derek Whisenhunt ahead of his amazing talk at Experience Everywhere in New York City! If you’ve ever wondered what separates a best-in-class airline from the rest of the pack, this episode’s for you.

Diagnose Any Microsoft Teams Problem in 3 Clicks or Less!

In this video, we demonstrate how to diagnose Microsoft Teams problems with hop-by-hop analysis and insights in real time - All in three clicks or less. Whether your users work from home, the office, or anywhere in between, a superb call-quality experience is a must. How can IT operations staff ensure that? Using a combination of synthetics and real user monitoring (RUM), support teams can now get comprehensive visibility into Teams performance and use those insights for optimization.

"Managing OpenTelemetry Through the OpAMP Protocol" by Mike Kelly, observIQ

Managing thousands of data collection Agents across just as many servers can overwhelm DevOps teams. Open Agent Management Protocol (OpAMP) is a new network protocol from the OpenTelemetry Project that enables remote management of OpenTelemetry collectors, allowing them to report their status to and receive configuration from a Server and to receive agent package updates from the server. This eliminates the need to create new custom distributions and redeploy, drastically simplifying Agent management.

geeks+gurus: How Ulta Beauty digital services shine for the holidays

For many online retailers, the bulk of sales happen during the holiday season. It is critical everything goes off without a hitch. In this session, longtime digital services veteran Omar Koncobo, IT Director of Ecommerce/Digital and Marketing Systems at Ulta Beauty discusses his top tips for successful holidays learned from seasons past: When to start preparing for the holiday traffic spikes Lessons learned from Ulta on scalability When things go wrong — spotting problems and fixing them fast Managing costs and preparedness.

I've Made a Huge Mistake: Implementing Agile on Infrastructure Teams

Bad planning methods can damage team morale and prevent teams from improving the systems they maintain. In this talk, Sam Handler from Shopify explains how his attempts to fix poor infrastructure planning processes through Agile methods failed. Drawing from this experience, he offers several principles that can help infrastructure teams improve the way they work.

Scaling Up, One Network Bottleneck at a Time

Processing data at scale involves moving packets through a network—but what happens when that network isn't cooperative? Anatole Beuzon, a Software Engineer at Datadog, discusses how he investigated and resolved network issues in Datadog’s larger data-processing apps and how you can apply these same methods to your own production workloads.

Ask a Site Reliability Engineer (SRE)

Site reliability engineering (SRE) can be complicated, and at Datadog, we’ve spent a lot of time thinking about SRE and refining how we implement it. Join Datadog’s Brandon West and Rick Mangi as they provide a brief overview of SRE and its core concepts. This video also contains a Q&A session from the live taping of this panel.