TY - JOUR
T1 - Investigation of bias in continuous medical image label fusion
AU - Xing, Fangxu
AU - Prince, Jerry L.
AU - Landman, Bennett A.
N1 - Funding Information:
This project was supported by National Institute of Neurological Disorders and Stroke grants 1R01NS056307 and 1R21NS064534 (http://www.ninds.nih.gov/). The author who received the funding is BL. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2016 Xing et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2016/6/1
Y1 - 2016/6/1
N2 - Image labeling is essential for analyzing morphometric features in medical imaging data. Labels can be obtained by either human interaction or automated segmentation algorithms, both of which suffer from errors. The Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm for both discrete-valued and continuous-valued labels has been proposed to find the consensus fusion while simultaneously estimating rater performance. In this paper, we first show that the previously reported continuous STAPLE in which bias and variance are used to represent rater performance yields a maximum likelihood solution in which bias is indeterminate. We then analyze the major cause of the deficiency and evaluate two classes of auxiliary bias estimation processes, one that estimates the bias as part of the algorithm initialization and the other that uses a maximum a posteriori criterion with a priori probabilities on the rater bias. We compare the efficacy of six methods, three variants from each class, in simulations and through empirical human rater experiments. We comment on their properties, identify deficient methods, and propose effective methods as solution.
AB - Image labeling is essential for analyzing morphometric features in medical imaging data. Labels can be obtained by either human interaction or automated segmentation algorithms, both of which suffer from errors. The Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm for both discrete-valued and continuous-valued labels has been proposed to find the consensus fusion while simultaneously estimating rater performance. In this paper, we first show that the previously reported continuous STAPLE in which bias and variance are used to represent rater performance yields a maximum likelihood solution in which bias is indeterminate. We then analyze the major cause of the deficiency and evaluate two classes of auxiliary bias estimation processes, one that estimates the bias as part of the algorithm initialization and the other that uses a maximum a posteriori criterion with a priori probabilities on the rater bias. We compare the efficacy of six methods, three variants from each class, in simulations and through empirical human rater experiments. We comment on their properties, identify deficient methods, and propose effective methods as solution.
UR - http://www.scopus.com/inward/record.url?scp=84973459172&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84973459172&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0155862
DO - 10.1371/journal.pone.0155862
M3 - Article
C2 - 27258158
AN - SCOPUS:84973459172
VL - 11
JO - PLoS One
JF - PLoS One
SN - 1932-6203
IS - 6
M1 - e0155862
ER -