Accelerate Model Training with an Easy to Use High-Performance AI/ML Stack for the Cloud
E145 | Sun 16 Jul 3 p.m.–4 p.m.
Presented by
-
Michael Clifford is a Data Scientist at Red Hat working in the Office of the CTO on Emerging Technologies, where he works primarily on exploring tools, methodologies and use cases for cloud native data science.
-
Erik Erlandson
@manyangled
http://erikerlandson.github.io/
Erik Erlandson is the Data Science team lead at Red Hat Emerging Technologies, where he explores tools, methodologies and use cases at the intersection of data science workloads and the Kubernetes ecosystem.
Erik Erlandson
@manyangled
http://erikerlandson.github.io/
Abstract
The advent of large scale machine learning models has exacerbated the ongoing problem of resource and infrastructure management for ML practitioners. How can a data scientist, who has little or no DevOps knowledge, train and deploy models that require compute clusters with dozens or hundreds of nodes and GPU resources? In this talk, Michael Clifford will discuss how members of Red Hat’s Emerging Technologies team leverage two open source projects, Ray and Open Data Hub, to simplify their distributed training and cloud based resource allocation for their team. We will cover:
* An overview of Open Data Hub and Ray
* A detailed discussion on how we’ve integrated Ray with Open Data Hub to improve the user experience for developing large machine learning models
* A demonstration of a real-world use case where Ray is used to accelerate an AI/ML workload on Open Data Hub
* A discussion on the open source project developing this work to improve ML workflow tooling in the cloud, project CodeFlare
By the end of this talk, attendees will have a better understanding of how to build high-performance and scalable AI/ML systems.
The advent of large scale machine learning models has exacerbated the ongoing problem of resource and infrastructure management for ML practitioners. How can a data scientist, who has little or no DevOps knowledge, train and deploy models that require compute clusters with dozens or hundreds of nodes and GPU resources? In this talk, Michael Clifford will discuss how members of Red Hat’s Emerging Technologies team leverage two open source projects, Ray and Open Data Hub, to simplify their distributed training and cloud based resource allocation for their team. We will cover: * An overview of Open Data Hub and Ray * A detailed discussion on how we’ve integrated Ray with Open Data Hub to improve the user experience for developing large machine learning models * A demonstration of a real-world use case where Ray is used to accelerate an AI/ML workload on Open Data Hub * A discussion on the open source project developing this work to improve ML workflow tooling in the cloud, project CodeFlare By the end of this talk, attendees will have a better understanding of how to build high-performance and scalable AI/ML systems.