Inference for respondent-driven sampling with misclassification

Isabelle S. Beaudry; Krista J. Gile; Shruti H. Mehta

doi:10.1214/17-AOAS1063

Inference for respondent-driven sampling with misclassification

Isabelle S. Beaudry, Krista J. Gile, Shruti H. Mehta

Bloomberg School of Public Health

Research output: Contribution to journal › Article › peer-review

4 Scopus citations

Abstract

Respondent-driven sampling (RDS) is a sampling method designed to study hard-to-reach human populations. Beginning with a convenience sample, each participant receives a small number of coupons, which they distribute to their contacts who become eligible. RDS participants are asked to report on their number of contacts in the target population. Also, a set of characteristics is observed for each participant. Current prevalence estimators assume that these attributes are measured accurately. However, ignoring misclassification may lead to biased estimates. The main contribution of this paper is to discuss two approaches to correct for the bias introduced by the misclassification on nodal attributes for existing RDS estimators. The two approaches leverage misclassification rates assumed to be available from external validation studies. Most importantly, our analysis identifies circumstances for which the performance of the correction methods is impaired in the specific context of RDS. The two methods that are discussed are an analytical correction for estimators of the Hájek estimator style and the Simulation Extrapolation Misclassification (SIMEX MC) approach. Extended methodology to estimate the uncertainty of the corrected estimators is also presented. The performance of the proposed methods is assessed under varying levels of known or uncertain misclassification error across simulated social networks of varying features. Finally, the methods are used to estimate HIV prevalence among people who inject drugs (PWID) and men who have sex withmen (MSM) in India.

Original language	English (US)
Pages (from-to)	2111-2141
Number of pages	31
Journal	Annals of Applied Statistics
Volume	11
Issue number	4
DOIs	https://doi.org/10.1214/17-AOAS1063
State	Published - Dec 2017

Keywords

Hard-to-reach population sampling
Misclassification
Network sampling
SIMEX MC
Social networks

ASJC Scopus subject areas

Statistics and Probability
Modeling and Simulation
Statistics, Probability and Uncertainty

Access to Document

10.1214/17-AOAS1063

Cite this

@article{da64e8389c734fad9f50a06f991e8da8,

title = "Inference for respondent-driven sampling with misclassification",

abstract = "Respondent-driven sampling (RDS) is a sampling method designed to study hard-to-reach human populations. Beginning with a convenience sample, each participant receives a small number of coupons, which they distribute to their contacts who become eligible. RDS participants are asked to report on their number of contacts in the target population. Also, a set of characteristics is observed for each participant. Current prevalence estimators assume that these attributes are measured accurately. However, ignoring misclassification may lead to biased estimates. The main contribution of this paper is to discuss two approaches to correct for the bias introduced by the misclassification on nodal attributes for existing RDS estimators. The two approaches leverage misclassification rates assumed to be available from external validation studies. Most importantly, our analysis identifies circumstances for which the performance of the correction methods is impaired in the specific context of RDS. The two methods that are discussed are an analytical correction for estimators of the H{\'a}jek estimator style and the Simulation Extrapolation Misclassification (SIMEX MC) approach. Extended methodology to estimate the uncertainty of the corrected estimators is also presented. The performance of the proposed methods is assessed under varying levels of known or uncertain misclassification error across simulated social networks of varying features. Finally, the methods are used to estimate HIV prevalence among people who inject drugs (PWID) and men who have sex withmen (MSM) in India.",

keywords = "Hard-to-reach population sampling, Misclassification, Network sampling, SIMEX MC, Social networks",

author = "Beaudry, {Isabelle S.} and Gile, {Krista J.} and Mehta, {Shruti H.}",

note = "Publisher Copyright: {\textcopyright} Institute of Mathematical Statistics, 2017.",

year = "2017",

month = dec,

doi = "10.1214/17-AOAS1063",

language = "English (US)",

volume = "11",

pages = "2111--2141",

journal = "Annals of Applied Statistics",

issn = "1932-6157",

publisher = "Institute of Mathematical Statistics",

number = "4",

}

TY - JOUR

T1 - Inference for respondent-driven sampling with misclassification

AU - Beaudry, Isabelle S.

AU - Gile, Krista J.

AU - Mehta, Shruti H.

N1 - Publisher Copyright: © Institute of Mathematical Statistics, 2017.

PY - 2017/12

Y1 - 2017/12

N2 - Respondent-driven sampling (RDS) is a sampling method designed to study hard-to-reach human populations. Beginning with a convenience sample, each participant receives a small number of coupons, which they distribute to their contacts who become eligible. RDS participants are asked to report on their number of contacts in the target population. Also, a set of characteristics is observed for each participant. Current prevalence estimators assume that these attributes are measured accurately. However, ignoring misclassification may lead to biased estimates. The main contribution of this paper is to discuss two approaches to correct for the bias introduced by the misclassification on nodal attributes for existing RDS estimators. The two approaches leverage misclassification rates assumed to be available from external validation studies. Most importantly, our analysis identifies circumstances for which the performance of the correction methods is impaired in the specific context of RDS. The two methods that are discussed are an analytical correction for estimators of the Hájek estimator style and the Simulation Extrapolation Misclassification (SIMEX MC) approach. Extended methodology to estimate the uncertainty of the corrected estimators is also presented. The performance of the proposed methods is assessed under varying levels of known or uncertain misclassification error across simulated social networks of varying features. Finally, the methods are used to estimate HIV prevalence among people who inject drugs (PWID) and men who have sex withmen (MSM) in India.

AB - Respondent-driven sampling (RDS) is a sampling method designed to study hard-to-reach human populations. Beginning with a convenience sample, each participant receives a small number of coupons, which they distribute to their contacts who become eligible. RDS participants are asked to report on their number of contacts in the target population. Also, a set of characteristics is observed for each participant. Current prevalence estimators assume that these attributes are measured accurately. However, ignoring misclassification may lead to biased estimates. The main contribution of this paper is to discuss two approaches to correct for the bias introduced by the misclassification on nodal attributes for existing RDS estimators. The two approaches leverage misclassification rates assumed to be available from external validation studies. Most importantly, our analysis identifies circumstances for which the performance of the correction methods is impaired in the specific context of RDS. The two methods that are discussed are an analytical correction for estimators of the Hájek estimator style and the Simulation Extrapolation Misclassification (SIMEX MC) approach. Extended methodology to estimate the uncertainty of the corrected estimators is also presented. The performance of the proposed methods is assessed under varying levels of known or uncertain misclassification error across simulated social networks of varying features. Finally, the methods are used to estimate HIV prevalence among people who inject drugs (PWID) and men who have sex withmen (MSM) in India.

KW - Hard-to-reach population sampling

KW - Misclassification

KW - Network sampling

KW - SIMEX MC

KW - Social networks

UR - http://www.scopus.com/inward/record.url?scp=85042672284&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85042672284&partnerID=8YFLogxK

U2 - 10.1214/17-AOAS1063

DO - 10.1214/17-AOAS1063

M3 - Article

AN - SCOPUS:85042672284

SN - 1932-6157

VL - 11

SP - 2111

EP - 2141

JO - Annals of Applied Statistics

JF - Annals of Applied Statistics

IS - 4

ER -

Inference for respondent-driven sampling with misclassification

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this