Investigation of bias in continuous medical image label fusion

Fangxu Xing, Jerry Ladd Prince, Bennett A. Landman

Research output: Contribution to journalArticle

Abstract

Image labeling is essential for analyzing morphometric features in medical imaging data. Labels can be obtained by either human interaction or automated segmentation algorithms, both of which suffer from errors. The Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm for both discrete-valued and continuous-valued labels has been proposed to find the consensus fusion while simultaneously estimating rater performance. In this paper, we first show that the previously reported continuous STAPLE in which bias and variance are used to represent rater performance yields a maximum likelihood solution in which bias is indeterminate. We then analyze the major cause of the deficiency and evaluate two classes of auxiliary bias estimation processes, one that estimates the bias as part of the algorithm initialization and the other that uses a maximum a posteriori criterion with a priori probabilities on the rater bias. We compare the efficacy of six methods, three variants from each class, in simulations and through empirical human rater experiments. We comment on their properties, identify deficient methods, and propose effective methods as solution.

Original languageEnglish (US)
Article numbere0155862
JournalPLoS One
Volume11
Issue number6
DOIs
StatePublished - Jun 1 2016

Fingerprint

Labels
Fusion reactions
Medical imaging
Diagnostic Imaging
Labeling
Maximum likelihood
methodology
image analysis
Experiments

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Investigation of bias in continuous medical image label fusion. / Xing, Fangxu; Prince, Jerry Ladd; Landman, Bennett A.

In: PLoS One, Vol. 11, No. 6, e0155862, 01.06.2016.

Research output: Contribution to journalArticle

Xing, Fangxu ; Prince, Jerry Ladd ; Landman, Bennett A. / Investigation of bias in continuous medical image label fusion. In: PLoS One. 2016 ; Vol. 11, No. 6.
@article{4f1eaffe4db54c938b11c5a7be3b386a,
title = "Investigation of bias in continuous medical image label fusion",
abstract = "Image labeling is essential for analyzing morphometric features in medical imaging data. Labels can be obtained by either human interaction or automated segmentation algorithms, both of which suffer from errors. The Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm for both discrete-valued and continuous-valued labels has been proposed to find the consensus fusion while simultaneously estimating rater performance. In this paper, we first show that the previously reported continuous STAPLE in which bias and variance are used to represent rater performance yields a maximum likelihood solution in which bias is indeterminate. We then analyze the major cause of the deficiency and evaluate two classes of auxiliary bias estimation processes, one that estimates the bias as part of the algorithm initialization and the other that uses a maximum a posteriori criterion with a priori probabilities on the rater bias. We compare the efficacy of six methods, three variants from each class, in simulations and through empirical human rater experiments. We comment on their properties, identify deficient methods, and propose effective methods as solution.",
author = "Fangxu Xing and Prince, {Jerry Ladd} and Landman, {Bennett A.}",
year = "2016",
month = "6",
day = "1",
doi = "10.1371/journal.pone.0155862",
language = "English (US)",
volume = "11",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "6",

}

TY - JOUR

T1 - Investigation of bias in continuous medical image label fusion

AU - Xing, Fangxu

AU - Prince, Jerry Ladd

AU - Landman, Bennett A.

PY - 2016/6/1

Y1 - 2016/6/1

N2 - Image labeling is essential for analyzing morphometric features in medical imaging data. Labels can be obtained by either human interaction or automated segmentation algorithms, both of which suffer from errors. The Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm for both discrete-valued and continuous-valued labels has been proposed to find the consensus fusion while simultaneously estimating rater performance. In this paper, we first show that the previously reported continuous STAPLE in which bias and variance are used to represent rater performance yields a maximum likelihood solution in which bias is indeterminate. We then analyze the major cause of the deficiency and evaluate two classes of auxiliary bias estimation processes, one that estimates the bias as part of the algorithm initialization and the other that uses a maximum a posteriori criterion with a priori probabilities on the rater bias. We compare the efficacy of six methods, three variants from each class, in simulations and through empirical human rater experiments. We comment on their properties, identify deficient methods, and propose effective methods as solution.

AB - Image labeling is essential for analyzing morphometric features in medical imaging data. Labels can be obtained by either human interaction or automated segmentation algorithms, both of which suffer from errors. The Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm for both discrete-valued and continuous-valued labels has been proposed to find the consensus fusion while simultaneously estimating rater performance. In this paper, we first show that the previously reported continuous STAPLE in which bias and variance are used to represent rater performance yields a maximum likelihood solution in which bias is indeterminate. We then analyze the major cause of the deficiency and evaluate two classes of auxiliary bias estimation processes, one that estimates the bias as part of the algorithm initialization and the other that uses a maximum a posteriori criterion with a priori probabilities on the rater bias. We compare the efficacy of six methods, three variants from each class, in simulations and through empirical human rater experiments. We comment on their properties, identify deficient methods, and propose effective methods as solution.

UR - http://www.scopus.com/inward/record.url?scp=84973459172&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973459172&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0155862

DO - 10.1371/journal.pone.0155862

M3 - Article

C2 - 27258158

AN - SCOPUS:84973459172

VL - 11

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 6

M1 - e0155862

ER -