8:00am: Registration
8:45am: Introductory Remarks
Dan Huttenlocher, Inaugural Dean of the MIT Stephen A. Schwarzman College of Computing, MIT
9:00am: Plenary Talk: “On Dynamics-Informed Blending of Machine Learning and Game Theory”
Michael Jordan, Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science & the Department of Statistics, University of California, Berkeley
Abstract: Statistical decisions are often given meaning in the context of other decisions, particularly when there are scarce resources to be shared. Managing such sharing is one of the classical goals of microeconomics, and it is given new relevance in the modern setting of large, human-focused datasets, and in data-analytic contexts such as classifiers and recommendation systems. I’ll discuss several recent projects that aim to explore the interface between machine learning and microeconomics, including leader/follower dynamics in strategic classification, the robust learning of optimal auctions, and a Lyapunov theory for matching markets with transfers.
9:45am: Plenary Talk : “Mitigation of Inherent Biases in Data Science“
Elisa Celis, Assistant Professor in the Statistics & Data Science department, Yale University
Abstract: It is now evident that data-driven algorithms and policy decisions can be discriminatory due to the inherent biases in the data and algorithms on which they rely. In this talk, I will present vignettes from my recent works which illustrate how these biases arise, the hurdles in overcoming them, and mitigation strategies for rectifying the vital problems that arise.
10:30am: Break
11:00am: Plenary Talk: “A Multi-resolution Theory for Approximating Infinite-p-Zero-n: Transitional Inference, Individualized Predictions, and a World Without Bias-Variance Tradeoff“
Xiao-Li Meng, Whipple V. N. Jones Professor of Statistics, Harvard University
Abstract: Transitional inference is an empiricism concept, rooted and practiced in clinical medicine since ancient Greece. Knowledge and experiences gained from treating one entity (e.g., a disease or a group of patients) are applied to treat a related but distinctively different one (e.g., a similar disease or a new patient). This notion of “transition to the similar” renders individualized treatments an operational meaning, yet its theoretical foundation defies the familiar inductive inference framework. The uniqueness of entities is the result of potentially an infinite number of attributes (hence p = ∞), which entails zero direct training sample size (i.e., n = 0) because genuine guinea pigs do not exist. However, the literature on wavelets and on sieve methods for nonparametric estimation suggests a principled approximation theory for transitional inference via a multi-resolution (MR) perspective, where we use the resolution level to index the degree of approximation to ultimate individuality. MR inference seeks a primary resolution indexing an indirect training sample, which provides enough matched attributes to increase the relevance of the results to the target individuals and yet still accumulate sufficient indirect sample sizes for robust estimation. Theoretically, MR inference relies on an infinite-term ANOVA-type decomposition, providing an alternative way to model sparsity via the decay rate of the resolution bias as a function of the primary resolution level. Unexpectedly, this decomposition reveals a world without variance when the outcome is a deterministic function of potentially infinitely many predictors. In this deterministic world, the optimal resolution prefers over-fitting in the traditional sense when the resolution bias decays sufficiently rapidly. Furthermore, there can be many “descents” in the prediction error curve, when the contributions of predictors are inhomogeneous and the ordering of their importance does not align with the order of their inclusion in prediction. These findings may hint at a deterministic approximation theory for understanding the apparently over-fitting resistant phenomenon of some over-saturated models in machine learning.
11:45am: Lightning Talks
12:30pm: Poster Session and Lunch
1:30pm: Plenary Talk: “Are Dark Matter Signals Hiding In Gamma-Ray Data?“
Tracy Slatyer, Associate Professor of Physics, MIT
Abstract: Studies of data from the Fermi Gamma-Ray Space Telescope have revealed surprising excesses of gamma-ray emission toward the heart of the Milky Way, including giant structures known as the Fermi Bubbles and a central glow often called the Galactic Center Excess. I will discuss how these signals were discovered, and what they may be telling us about the astrophysics of our Galaxy or even the nature of dark matter. The origin of the Galactic Center Excess in particular remains an open question, with many recent attempts to statistically disentangle different possible contributions to the signal. I will discuss a number of these approaches, including possible pitfalls and paths forward.
2:15pm: Plenary Talk: “Min-norm interpolation and boosting: a precise high-dimensional asymptotic theory“
Pragya Sur, Assistant Professor in the Statistics Department, Harvard University
Abstract: Modern machine learning algorithms regularly produce classifiers that generalize well while interpolating the training data (that is, achieve zero training error). This counter-intuitive phenomenon has spurred enormous interest in interpolating classifiers in recent machine learning research. However, a precise understanding of the generalization performance of interpolants in contemporary high-dimensional settings is far from complete. In this talk, we will focus on min-L1-norm interpolants and present a theory for their generalization behavior on high-dimensional binary data that is linearly separable (in an asymptotic sense). We will establish this in the common modern context where the number of features and samples are both large and comparable.
Subsequently, we will study the celebrated AdaBoost algorithm. Utilizing its classical connection to min-L1-norm interpolants, we will establish an asymptotically exact characterization of the generalization performance of AdaBoost. Our characterization relies on specific modeling assumptions on the underlying data—however, we will discuss a universality phenomenon that allows one to apply our results to certain settings precluded by the prior assumptions. As a byproduct, our results formalize the following crucial fact for AdaBoost : overparametrization helps optimization. Furthermore, these results improve upon existing upper bounds in the boosting literature in our setting, and can be extended to min-norm interpolants under geometries beyond the L1. Our analysis is relatively general and has potential applications for other ensembling approaches. Time permitting, I will discuss some of these extensions. This is based on joint work with Tengyuan Liang.
3:00pm: Break
3:15pm: Poster Award Ceremony
3:30pm: Closing Remarks
Ankur Moitra, Director, Statistics and Data Science Center, MIT