Deep Learning and Multi-GPU Training
05 Jun 2018
Yes
-  

 

 

5 June 2018 | A practical course for AI practitioners, as well as the support staff responsible for the HPC systems they are using.

Yes

​​​​​​​​​​​​

 


Price:  FREE

Capacity: 35attendees

Registration​ page: http://www.cvent.com/d/wgqqhr

Registration closes:  29th of May 2018

Lecturer: Dr Adam Grzywaczewski, Deep Learning Solution Architect, NVIDIA Ltd. ​

Best for: The course is designed with 2 main groups in mind:

  • AI practitioners, so people that on a day to day basis develop neural networks and are looking to learn how to scale their code in order to accelerate their work. In particular the audience needs to understand the fundamental principles of training neural networks so understand concepts such as: cost function, backpropagation, Stochastic Gradient Descent and their mathematical foundations.
  • As well as the support staff responsible for the system so that they can understand the infrastructure requirements of the AI training process.

Prerequisites: The course requires deep learning experience, which suggests that the audience has developed some neural networks already in one programming language/deep learning framework or another. What the audience will need is the ability to read the code comfortably and make changes to it following the instruction.

Joining requirements for the Nvidia online portal: All of the participants need to create accounts on the following portal: https://nvlabs.qwiklab.com/

Please, use the same email address you are using to register for the course as to be possible to access  the appropriate class.

Objectives and Learning Outcomes: The goal of the class will be threefold: to outline the basic principles of Data Parallel training of neural networks, outline the algorithmic challenges involved in large scale training as well as outline the associated engineering challenges involved in building and operating AI focused infrastructure.

Agenda:

09:30 Registration and coffee

10:00 to 12:00 Part 1: Theory of Data Parallelism (an introduction aiming at both of the groups)

12:00 lunch break

13:00 to 15:00 Part 2: Algorithmic Challenges of Multi GPU training (a detailed discussion about algorithmic implications of large batch training and is more focused on AI practitioners, but it also helps support staff to understand how algorithm design decisions affect the performance of the infrastructure)

15:00 coffee break

15:30 to 17:30 Part 3: Engineering Challenges of Multi GPU training (a detailed discussion about the engineering aspect of the problem which is more focused on the support staff. It also covers a set of algorithmic improvements that will need to be introduced by the AI team in order to offload the infrastructure) Edit your text here.​


Contact: Alexandrova, Evguenia (STFC,DL,HC)