Ray: A System for Scalable Python and ML |SciPy 2020| Robert Nishihara

Distributed computing is becoming the norm, and this trend is driven largely by the computational requirements of machine learning applications. However, building distributed applications today requires tons of expertise. Ray aims to make it as easy to program a cluster of machines as it is to program a laptop. The goal of this is to enable many more software developers to take advantage of all the advances in machine learning to solve harder problems without needing to build systems infrastructure or to have expertise in distributed computing. Ray is a rapidly growing open source project used in production by large companies as well as startups and in cutting-edge research. In addition to the core distriuted system, Ray encompasses a collection of state-of-the-art libraries targeting scalable machine learning. These include scalable libraries for hyperparameter tuning, distributed training, reinforcement learning, and model serving. This talk will discuss the lessons learned from developing Ray and deploying it in production, the key architectural decisions that enable Ray’s performance, as well as how companies are using Ray for scalable machine learning.





Related Articles


  1. What's the AWS cluster that you set up? EC2 instances? ECS? 30 seconds for provisioning 10 units (hosts?) is really good. Surprisingly good.

Leave a Reply