Consistent estimation of functions of data missing non-monotonically and not at random

Ilya Shpitser

Research output: Contribution to journalArticle

Abstract

Missing records are a perennial problem in analysis of complex data of all types, when the target of inference is some function of the full data law. In simple cases, where data is missing at random or completely at random [15], well-known adjustments exist that result in consistent estimators of target quantities. Assumptions underlying these estimators are generally not realistic in practical missing data problems. Unfortunately, consistent estimators in more complex cases where data is missing not at random, and where no ordering on variables induces monotonicity of missingness status are not known in general, with some notable exceptions [13, 18, 16]. In this paper, we propose a general class of consistent estimators for cases where data is missing not at random, and missingness status is non-monotonic. Our estimators, which are generalized inverse probability weighting estimators, make no assumptions on the underlying full data law, but instead place independence restrictions, and certain other fairly mild assumptions, on the distribution of missingness status conditional on the data. The assumptions we place on the distribution of missingness status conditional on the data can be viewed as a version of a conditional Markov random field (MRF) corresponding to a chain graph. Assumptions embedded in our model permit identification from the observed data law, and admit a natural fitting procedure based on the pseudo likelihood approach of [2]. We illustrate our approach with a simple simulation study, and an analysis of risk of premature birth in women in Botswana exposed to highly active anti-retroviral therapy.

Original languageEnglish (US)
Pages (from-to)3152-3160
Number of pages9
JournalAdvances in neural information processing systems
StatePublished - Jan 1 2016

Fingerprint

Identification (control systems)

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Cite this

Consistent estimation of functions of data missing non-monotonically and not at random. / Shpitser, Ilya.

In: Advances in neural information processing systems, 01.01.2016, p. 3152-3160.

Research output: Contribution to journalArticle

@article{3b36151292104d0ea170b692cf4098df,
title = "Consistent estimation of functions of data missing non-monotonically and not at random",
abstract = "Missing records are a perennial problem in analysis of complex data of all types, when the target of inference is some function of the full data law. In simple cases, where data is missing at random or completely at random [15], well-known adjustments exist that result in consistent estimators of target quantities. Assumptions underlying these estimators are generally not realistic in practical missing data problems. Unfortunately, consistent estimators in more complex cases where data is missing not at random, and where no ordering on variables induces monotonicity of missingness status are not known in general, with some notable exceptions [13, 18, 16]. In this paper, we propose a general class of consistent estimators for cases where data is missing not at random, and missingness status is non-monotonic. Our estimators, which are generalized inverse probability weighting estimators, make no assumptions on the underlying full data law, but instead place independence restrictions, and certain other fairly mild assumptions, on the distribution of missingness status conditional on the data. The assumptions we place on the distribution of missingness status conditional on the data can be viewed as a version of a conditional Markov random field (MRF) corresponding to a chain graph. Assumptions embedded in our model permit identification from the observed data law, and admit a natural fitting procedure based on the pseudo likelihood approach of [2]. We illustrate our approach with a simple simulation study, and an analysis of risk of premature birth in women in Botswana exposed to highly active anti-retroviral therapy.",
author = "Ilya Shpitser",
year = "2016",
month = "1",
day = "1",
language = "English (US)",
pages = "3152--3160",
journal = "Advances in neural information processing systems",
issn = "1049-5258",

}

TY - JOUR

T1 - Consistent estimation of functions of data missing non-monotonically and not at random

AU - Shpitser, Ilya

PY - 2016/1/1

Y1 - 2016/1/1

N2 - Missing records are a perennial problem in analysis of complex data of all types, when the target of inference is some function of the full data law. In simple cases, where data is missing at random or completely at random [15], well-known adjustments exist that result in consistent estimators of target quantities. Assumptions underlying these estimators are generally not realistic in practical missing data problems. Unfortunately, consistent estimators in more complex cases where data is missing not at random, and where no ordering on variables induces monotonicity of missingness status are not known in general, with some notable exceptions [13, 18, 16]. In this paper, we propose a general class of consistent estimators for cases where data is missing not at random, and missingness status is non-monotonic. Our estimators, which are generalized inverse probability weighting estimators, make no assumptions on the underlying full data law, but instead place independence restrictions, and certain other fairly mild assumptions, on the distribution of missingness status conditional on the data. The assumptions we place on the distribution of missingness status conditional on the data can be viewed as a version of a conditional Markov random field (MRF) corresponding to a chain graph. Assumptions embedded in our model permit identification from the observed data law, and admit a natural fitting procedure based on the pseudo likelihood approach of [2]. We illustrate our approach with a simple simulation study, and an analysis of risk of premature birth in women in Botswana exposed to highly active anti-retroviral therapy.

AB - Missing records are a perennial problem in analysis of complex data of all types, when the target of inference is some function of the full data law. In simple cases, where data is missing at random or completely at random [15], well-known adjustments exist that result in consistent estimators of target quantities. Assumptions underlying these estimators are generally not realistic in practical missing data problems. Unfortunately, consistent estimators in more complex cases where data is missing not at random, and where no ordering on variables induces monotonicity of missingness status are not known in general, with some notable exceptions [13, 18, 16]. In this paper, we propose a general class of consistent estimators for cases where data is missing not at random, and missingness status is non-monotonic. Our estimators, which are generalized inverse probability weighting estimators, make no assumptions on the underlying full data law, but instead place independence restrictions, and certain other fairly mild assumptions, on the distribution of missingness status conditional on the data. The assumptions we place on the distribution of missingness status conditional on the data can be viewed as a version of a conditional Markov random field (MRF) corresponding to a chain graph. Assumptions embedded in our model permit identification from the observed data law, and admit a natural fitting procedure based on the pseudo likelihood approach of [2]. We illustrate our approach with a simple simulation study, and an analysis of risk of premature birth in women in Botswana exposed to highly active anti-retroviral therapy.

UR - http://www.scopus.com/inward/record.url?scp=85019196782&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019196782&partnerID=8YFLogxK

M3 - Article

SP - 3152

EP - 3160

JO - Advances in neural information processing systems

JF - Advances in neural information processing systems

SN - 1049-5258

ER -