Scaling AI Infrastructure at OpenAI

Scaling AI Infrastructure at OpenAI

Published Aug 5, 2019

In the last year, OpenAI trained a team of five neural networks to defeat the reigning world champion esports team, achieved state-of-the-art results on a variety of domain-specific language modeling tasks, released a public demonstration of combining multiple musical styles using unsupervised learning, and more.

_x000D_

To deliver on these results, OpenAI operates a wide range of complex infrastructures, including some of the largest Kubernetes clusters in the world. Many of the workloads running on this infrastructure don’t adhere to commonly accepted practices or ways of operating software.

_x000D_

This talk covers the infrastructure patterns and techniques OpenAI uses to successfully scale their research. Audience members will come away with ideas and practices to help with their own ML/AI teams—in both development and deployment, from research through to production.