Unbiased estimates for linear regression via volume sampling
Seminario: 27 maggio 2019 alle 11:00
Aula 204, Settore Didattico, via Celoria 20
Relatore: Manfred Warmuth, UC Santa Cruz, USA
Persona di riferimento: Nicolò Cesa-Bianchi
Consider the following basic machine learning task: Given a fixed set of n input points in a d-dimensional linear regression problem, we wish to predict a hidden response value for each of the points. We can only afford to attain the responses for a small subset of the points that are then used to construct linear predictions for all points in the dataset. The performance of the predictions is evaluated by the total square loss on all responses. We show that a good approximate solution to this least squares problem can be obtained from just d responses (where d is the input dimension) by using a joint sampling technique called volume sampling. Moreover, the least squares solution obtained for the volume sampled subproblem is an unbiased estimator of optimal solution based on all n responses. This unbiasedness is a desirable property that is not shared by standard subset selection techniques.
Motivated by these basic properties, we develop a theoretical framework for studying volume sampling, which leads to a number of new expectation formulas and proof techniques with applications in experimental design, second-order optimization, low-rank matrix reconstruction and determinantal point processes. We present several extensions of volume sampling, including a regularized variant, and we propose the first efficient algorithms which make this technique a practical tool in the machine learning toolbox.
Manfred Warmuth is a Professor Emeritus from UC Santa Cruz working in Machine Learning. He is well known for analysing algorithms that are online in the sense that they continuously adapt to a changing data stream. He developed a technique of motivating and analysing online algorithms using Bregman divergences and devised (with coauthors) some key new algorithms such as the Weighted Majarity, the Exponentiated Gradient and the Last Step Minimax algorithm. He also introduced the Matching Loss and Compression Schemes, and framed a number of core open problems in Machine Learning. He is currently visiting Google Brain in Zuerich.