What's a good imputation to predict with missing values? - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

What's a good imputation to predict with missing values?

Résumé

How to learn a good predictor on data with missing values? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all missing-values mechanisms, in contrast with the classic statistical results that require missing-at-random settings to use imputation in probabilistic modeling. Moreover, it implies that perfect conditional imputation is not needed for good prediction asymptotically. In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn. Crafting instead the imputation so as to leave the regression function unchanged simply shifts the problem to learning discontinuous imputations. Rather, we suggest that it is easier to learn imputation and regression jointly. We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables whatever the missing-value pattern. Experiments confirm that joint imputation and regression through NeuMiss is better than various two step procedures in our experiments with finite number of samples.
Fichier principal
Vignette du fichier
LeMorvan2021_ImputeThenRegress.pdf (1.41 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03243931 , version 1 (31-05-2021)
hal-03243931 , version 2 (29-11-2021)

Identifiants

Citer

Marine Le Morvan, Julie Josse, Erwan Scornet, Gaël Varoquaux. What's a good imputation to predict with missing values?. NeurIPS 2021 - 35th Conference on Neural Information Processing Systems, Dec 2021, Virtual, France. ⟨hal-03243931v2⟩
798 Consultations
495 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More