Learning models from data with measurement error: Tackling underreporting

Roy Adams, Yuelong Ji, Xiaobin Wang, Suchi Saria

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Measurement error in observational datasets can lead to systematic bias in inferences based on these datasets. As studies based on observational data are increasingly used to inform decisions with real-world impact, it is critical that we develop a robust set of techniques for analyzing and adjusting for these biases. In this paper we present a method for estimating the distribution of an outcome given a binary exposure that is subject to underreporting. Our method is based on a missing data view of the measurement error problem, where the true exposure is treated as a latent variable that is marginalized out of a joint model. We prove three different conditions under which the outcome distribution can still be identified from data containing only error-prone observations of the exposure. We demonstrate this method on synthetic data and analyze its sensitivity to near violations of the identifiability conditions. Finally, we use this method to estimate the effects of maternal smoking and heroin use during pregnancy on childhood obesity, two import problems from public health. Using the proposed method, we estimate these effects using only subject-reported drug use data and refine the range of estimates generated by a sensitivity analysis-based approach. Further, the estimates produced by our method are consistent with existing literature on both the effects of maternal smoking and the rate at which subjects underreport smoking.

Original languageEnglish (US)
Title of host publication36th International Conference on Machine Learning, ICML 2019
PublisherInternational Machine Learning Society (IMLS)
Pages95-104
Number of pages10
ISBN (Electronic)9781510886988
StatePublished - Jan 1 2019
Event36th International Conference on Machine Learning, ICML 2019 - Long Beach, United States
Duration: Jun 9 2019Jun 15 2019

Publication series

Name36th International Conference on Machine Learning, ICML 2019
Volume2019-June

Conference

Conference36th International Conference on Machine Learning, ICML 2019
CountryUnited States
CityLong Beach
Period6/9/196/15/19

ASJC Scopus subject areas

  • Education
  • Computer Science Applications
  • Human-Computer Interaction

Fingerprint Dive into the research topics of 'Learning models from data with measurement error: Tackling underreporting'. Together they form a unique fingerprint.

  • Cite this

    Adams, R., Ji, Y., Wang, X., & Saria, S. (2019). Learning models from data with measurement error: Tackling underreporting. In 36th International Conference on Machine Learning, ICML 2019 (pp. 95-104). (36th International Conference on Machine Learning, ICML 2019; Vol. 2019-June). International Machine Learning Society (IMLS).