Courses Detail Information
ECE6903J – Distributed Machine Learning Systems
Instructors:
Credits: 3 Credits
Pre-requisites: Graduate Standing
Description:
Machine learning (ML) techniques are enjoying rapidly increasing adoption in various industrial verticals. However, designing the systems that support ML models in real-world deployments and applying appropriate ML models to solve practical system problems remain a significant obstacle. Machine learning systems emerge as an inter-disciplinary research area at the intersection of traditional systems and artificial intelligence. This course is a high-level research-oriented course that introduces a wide range of applications, problems, techniques, and solutions in the machine learning systems field. The content covers two complementary and equally important directions in machine learning systems: ML for systems, and systems for ML. Topics include distributed training, compression, edge computing, federated learning, and promising applications, like video streaming, and task scheduling. Students will get in contact with the latest development in this field, learn how to design efficient machine learning algorithms to handle practical system constraints, and how to model the real-world problems and solve them through the lens of machine learning.
Course Topics:
This course introduces a wide range of applications, problems, techniques and solutions in machine learning systems field. The detailed topics include:
Machine learning system history
Machine learning foundations: deep models and workflow
Distributed system foundations: modularity and layering, fault-tolerance, consistency
Distributed ML training: architecture, framework, and algorithms
Federated learning and analytics
Edge learning: quantization, pre-processing, model splitting
ML for system applications: video streaming
ML for system applications: task scheduling