Dimension Reduction & Matrix Completion
- Course type
- STATISTICS
- Correspondant
- François PORTIER
- Unit
-
UE-MSD01 : Machine Learning
- Number of ECTS
- 2
- Course code
- MSD 01-3
- Distribution of courses
-
Heures de cours : 18
- Language of teaching
- English
Objectifs
In modern datasets, many variables are collected and, to ensure good statistical performance, one needs to circumvent the so-called "curse of dimensionality" by applying dimension reduction techniques. The key notion to clarify the performance of dimension reduction is sparsity, understood in a broad sense meaning that the phenomenon under investigation has a low-dimensional intrinsic structure. Sparsity is also at the core of compressive sensing for data acquisition. The simplest notion of sparsity is developed for vectors, where it provides an opening to high-dimensional linear regression (LASSO) and non-linear regression, such as for instance generalized high-dimensional linear models, using regularization techniques. Such methods can be extended to deal with the estimation of low-rank matrices, that arise for instance in recommender systems under the problem of matrix completion. Sparsity is also helpful in the context of highly non-linear machine learning algorithms, such as clustering. While clearly stating the mathematical foundations of dimension reduction, this course will focus on methodological and algorithmic aspects of these techniques.
– Understand the curse of dimensionality and the notion of sparsity.
– Know the definition of the Lasso and its main variants, as well as its main algorithmic implementations.
– Understand the tuning of the Lasso and know the main techniques.
– Know how to regularize a high-dimensional generalized linear model.
– Understand the matrix completion problem and the collaborative filtering approach.
– Know how to use the SVD and solve a low-rank matrix estimation problem.
Plan
– High-dimensional linear regression.
– High-dimensional generalized linear models.
– Low-rank matrix estimation.
Prérequis
Basic statistics, linear algebra and probability.