Scaling MLOps on NVIDIA DGX Systems (Special guest from NVIDIA)

Presented by

Yochay Ettun (, Michael Balint (NVIDIA)

About this talk

Developing, experimenting, and deploying ML models at scale requires substantial tooling, scripting, tracking, versioning, and monitoring. Data scientists want to do data science – and are slowed down by MLOps and DevOps tasks. They lack user friendly tools needed to track experiments, attach resources, manage datasets and launch multiple ML pipelines. In this webinar CEO, Yochay Ettun will host a special guest from NVIDIA, Sr. Product Manager for NVIDIA DGX systems, Michael Balint, and discuss how to optimize the use of any NVIDIA DGX and NVIDIA GPU asset both on-prem or in the cloud with the machine learning platform. We will show best practices to reach high utilization of NVIDIA DGX systems, while conducting meta-scheduling across multiple heterogeneous Kubernetes/OpenShift/Linux server clusters. In addition, we will introduce the concept of production flows, which automate hundreds of models from the data hub to deployment. We will wrap up with a real-life demo of flows, exercising many experiments across DGX platforms. What you will learn: - Creating a data science flow: from data to deployment, while attaching different NVIDIA DGX Kubernetes clusters to each step of the flow - The concept of meta-scheduler: scheduling experiments disperse resources or other schedulers, accomplishing high utilization at scale - How the NVIDIA DGX ecosystem with makes GPU assets consumed easily, with one-click, bypassing complexity of MLOps - How to leverage NGC containers in ML pipelines

Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (13)
Subscribers (599)
Learn from various data science and engineering experts about key topics for successful machine learning. will provide you with hands-on tutorials and workshops about the top methods in data science team management, and MLOps getting you from research to production. Stay ahead with the latest developments in auto-adaptive machine learning and CI/CD for machine learning. Learn the latest methods for machine learning model management and deployment with open source tools. Find answers on how to enhance team collaboration in your data science department, and smoothly bridge science to engineering.