TY - JOUR
T1 - Addressing Inaccurate Nosology in Mental Health
T2 - A Multilabel Data Cleansing Approach for Detecting Label Noise From Structural Magnetic Resonance Imaging Data in Mood and Psychosis Disorders
AU - Rokham, Hooman
AU - Pearlson, Godfrey
AU - Abrol, Anees
AU - Falakshahi, Haleh
AU - Plis, Sergey
AU - Calhoun, Vince D.
N1 - Funding Information:
The research reported in this work was supported by the National Institute of Mental Health (Grant Nos. R01EB005846 and 1R01MH104680 [to VDC] ).
Publisher Copyright:
© 2020 Society of Biological Psychiatry
PY - 2020/8
Y1 - 2020/8
N2 - Background: Mental health diagnostic approaches are seeking to identify biological markers to work alongside advanced machine learning approaches. It is difficult to identify a biological marker of disease when the traditional diagnostic labels themselves are not necessarily valid. Methods: We worked with T1 structural magnetic resonance imaging data collected from 1493 individuals comprising healthy control subjects, patients with psychosis, and their unaffected first-degree relatives. Specifically, the dataset included 176 bipolar disorder probands, 134 schizoaffective disorder probands, 240 schizophrenia probands, 362 control subjects, and 581 patient relatives. We assumed that there might be noise in the diagnostic labeling process. We detected label noise by classifying the data multiple times using a support vector machine classifier, and then we flagged those individuals in which all classifiers unanimously mislabeled those subjects. Next, we assigned a new diagnostic label to these individuals, based on the biological data (magnetic resonance imaging), using an iterative data cleansing approach. Results: Simulation results showed that our method was highly accurate in identifying label noise. Both diagnostic and biotype categories showed about 65% and 63% of noisy labels, respectively, with the largest amount of relabeling occurring between the healthy control subjects and individuals with bipolar disorder and schizophrenia as well as in unaffected close relatives. The extraction of imaging features highlighted regional brain changes associated with each group. Conclusions: This approach represents an initial step toward developing strategies that need not assume that existing mental health diagnostic categories are always valid but rather allows us to leverage this information while also acknowledging that there are misassignments.
AB - Background: Mental health diagnostic approaches are seeking to identify biological markers to work alongside advanced machine learning approaches. It is difficult to identify a biological marker of disease when the traditional diagnostic labels themselves are not necessarily valid. Methods: We worked with T1 structural magnetic resonance imaging data collected from 1493 individuals comprising healthy control subjects, patients with psychosis, and their unaffected first-degree relatives. Specifically, the dataset included 176 bipolar disorder probands, 134 schizoaffective disorder probands, 240 schizophrenia probands, 362 control subjects, and 581 patient relatives. We assumed that there might be noise in the diagnostic labeling process. We detected label noise by classifying the data multiple times using a support vector machine classifier, and then we flagged those individuals in which all classifiers unanimously mislabeled those subjects. Next, we assigned a new diagnostic label to these individuals, based on the biological data (magnetic resonance imaging), using an iterative data cleansing approach. Results: Simulation results showed that our method was highly accurate in identifying label noise. Both diagnostic and biotype categories showed about 65% and 63% of noisy labels, respectively, with the largest amount of relabeling occurring between the healthy control subjects and individuals with bipolar disorder and schizophrenia as well as in unaffected close relatives. The extraction of imaging features highlighted regional brain changes associated with each group. Conclusions: This approach represents an initial step toward developing strategies that need not assume that existing mental health diagnostic categories are always valid but rather allows us to leverage this information while also acknowledging that there are misassignments.
KW - Data cleansing
KW - Deep learning
KW - Label noise
KW - Machine learning
KW - Psychosis disorders
KW - Structural MRI
UR - http://www.scopus.com/inward/record.url?scp=85088823732&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088823732&partnerID=8YFLogxK
U2 - 10.1016/j.bpsc.2020.05.008
DO - 10.1016/j.bpsc.2020.05.008
M3 - Article
C2 - 32771180
AN - SCOPUS:85088823732
SN - 2451-9022
VL - 5
SP - 819
EP - 832
JO - Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
JF - Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
IS - 8
ER -