Addressing Inaccurate Nosology in Mental Health: A Multilabel Data Cleansing Approach for Detecting Label Noise From Structural Magnetic Resonance Imaging Data in Mood and Psychosis Disorders

Hooman Rokham; Godfrey Pearlson; Anees Abrol; Haleh Falakshahi; Sergey Plis; Vince D. Calhoun

doi:10.1016/j.bpsc.2020.05.008

Addressing Inaccurate Nosology in Mental Health: A Multilabel Data Cleansing Approach for Detecting Label Noise From Structural Magnetic Resonance Imaging Data in Mood and Psychosis Disorders

Hooman Rokham, Godfrey Pearlson, Anees Abrol, Haleh Falakshahi, Sergey Plis, Vince D. Calhoun

School of Medicine

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Background: Mental health diagnostic approaches are seeking to identify biological markers to work alongside advanced machine learning approaches. It is difficult to identify a biological marker of disease when the traditional diagnostic labels themselves are not necessarily valid. Methods: We worked with T1 structural magnetic resonance imaging data collected from 1493 individuals comprising healthy control subjects, patients with psychosis, and their unaffected first-degree relatives. Specifically, the dataset included 176 bipolar disorder probands, 134 schizoaffective disorder probands, 240 schizophrenia probands, 362 control subjects, and 581 patient relatives. We assumed that there might be noise in the diagnostic labeling process. We detected label noise by classifying the data multiple times using a support vector machine classifier, and then we flagged those individuals in which all classifiers unanimously mislabeled those subjects. Next, we assigned a new diagnostic label to these individuals, based on the biological data (magnetic resonance imaging), using an iterative data cleansing approach. Results: Simulation results showed that our method was highly accurate in identifying label noise. Both diagnostic and biotype categories showed about 65% and 63% of noisy labels, respectively, with the largest amount of relabeling occurring between the healthy control subjects and individuals with bipolar disorder and schizophrenia as well as in unaffected close relatives. The extraction of imaging features highlighted regional brain changes associated with each group. Conclusions: This approach represents an initial step toward developing strategies that need not assume that existing mental health diagnostic categories are always valid but rather allows us to leverage this information while also acknowledging that there are misassignments.

Original language	English (US)
Pages (from-to)	819-832
Number of pages	14
Journal	Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
Volume	5
Issue number	8
DOIs	https://doi.org/10.1016/j.bpsc.2020.05.008
State	Published - Aug 2020

Keywords

Data cleansing
Deep learning
Label noise
Machine learning
Psychosis disorders
Structural MRI

ASJC Scopus subject areas

Clinical Neurology
Biological Psychiatry
Cognitive Neuroscience
Radiology Nuclear Medicine and imaging

Access to Document

10.1016/j.bpsc.2020.05.008

Cite this

Rokham, H., Pearlson, G., Abrol, A., Falakshahi, H., Plis, S., & Calhoun, V. D. (2020). Addressing Inaccurate Nosology in Mental Health: A Multilabel Data Cleansing Approach for Detecting Label Noise From Structural Magnetic Resonance Imaging Data in Mood and Psychosis Disorders. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(8), 819-832. https://doi.org/10.1016/j.bpsc.2020.05.008

Addressing Inaccurate Nosology in Mental Health: A Multilabel Data Cleansing Approach for Detecting Label Noise From Structural Magnetic Resonance Imaging Data in Mood and Psychosis Disorders. / Rokham, Hooman; Pearlson, Godfrey; Abrol, Anees et al.
In: Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, Vol. 5, No. 8, 08.2020, p. 819-832.

Research output: Contribution to journal › Article › peer-review

Rokham, H, Pearlson, G, Abrol, A, Falakshahi, H, Plis, S & Calhoun, VD 2020, 'Addressing Inaccurate Nosology in Mental Health: A Multilabel Data Cleansing Approach for Detecting Label Noise From Structural Magnetic Resonance Imaging Data in Mood and Psychosis Disorders', Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, vol. 5, no. 8, pp. 819-832. https://doi.org/10.1016/j.bpsc.2020.05.008

Rokham H, Pearlson G, Abrol A, Falakshahi H, Plis S, Calhoun VD. Addressing Inaccurate Nosology in Mental Health: A Multilabel Data Cleansing Approach for Detecting Label Noise From Structural Magnetic Resonance Imaging Data in Mood and Psychosis Disorders. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. 2020 Aug;5(8):819-832. doi: 10.1016/j.bpsc.2020.05.008

@article{23a54ca613dd467eba61d2e9beb13758,

title = "Addressing Inaccurate Nosology in Mental Health: A Multilabel Data Cleansing Approach for Detecting Label Noise From Structural Magnetic Resonance Imaging Data in Mood and Psychosis Disorders",

abstract = "Background: Mental health diagnostic approaches are seeking to identify biological markers to work alongside advanced machine learning approaches. It is difficult to identify a biological marker of disease when the traditional diagnostic labels themselves are not necessarily valid. Methods: We worked with T1 structural magnetic resonance imaging data collected from 1493 individuals comprising healthy control subjects, patients with psychosis, and their unaffected first-degree relatives. Specifically, the dataset included 176 bipolar disorder probands, 134 schizoaffective disorder probands, 240 schizophrenia probands, 362 control subjects, and 581 patient relatives. We assumed that there might be noise in the diagnostic labeling process. We detected label noise by classifying the data multiple times using a support vector machine classifier, and then we flagged those individuals in which all classifiers unanimously mislabeled those subjects. Next, we assigned a new diagnostic label to these individuals, based on the biological data (magnetic resonance imaging), using an iterative data cleansing approach. Results: Simulation results showed that our method was highly accurate in identifying label noise. Both diagnostic and biotype categories showed about 65% and 63% of noisy labels, respectively, with the largest amount of relabeling occurring between the healthy control subjects and individuals with bipolar disorder and schizophrenia as well as in unaffected close relatives. The extraction of imaging features highlighted regional brain changes associated with each group. Conclusions: This approach represents an initial step toward developing strategies that need not assume that existing mental health diagnostic categories are always valid but rather allows us to leverage this information while also acknowledging that there are misassignments.",

keywords = "Data cleansing, Deep learning, Label noise, Machine learning, Psychosis disorders, Structural MRI",

author = "Hooman Rokham and Godfrey Pearlson and Anees Abrol and Haleh Falakshahi and Sergey Plis and Calhoun, {Vince D.}",

note = "Funding Information: The research reported in this work was supported by the National Institute of Mental Health (Grant Nos. R01EB005846 and 1R01MH104680 [to VDC] ). Publisher Copyright: {\textcopyright} 2020 Society of Biological Psychiatry",

year = "2020",

month = aug,

doi = "10.1016/j.bpsc.2020.05.008",

language = "English (US)",

volume = "5",

pages = "819--832",

journal = "Biological Psychiatry: Cognitive Neuroscience and Neuroimaging",

issn = "2451-9022",

publisher = "Elsevier Inc.",

number = "8",

}

TY - JOUR

T1 - Addressing Inaccurate Nosology in Mental Health

T2 - A Multilabel Data Cleansing Approach for Detecting Label Noise From Structural Magnetic Resonance Imaging Data in Mood and Psychosis Disorders

AU - Rokham, Hooman

AU - Pearlson, Godfrey

AU - Abrol, Anees

AU - Falakshahi, Haleh

AU - Plis, Sergey

AU - Calhoun, Vince D.

N1 - Funding Information: The research reported in this work was supported by the National Institute of Mental Health (Grant Nos. R01EB005846 and 1R01MH104680 [to VDC] ). Publisher Copyright: © 2020 Society of Biological Psychiatry

PY - 2020/8

Y1 - 2020/8

N2 - Background: Mental health diagnostic approaches are seeking to identify biological markers to work alongside advanced machine learning approaches. It is difficult to identify a biological marker of disease when the traditional diagnostic labels themselves are not necessarily valid. Methods: We worked with T1 structural magnetic resonance imaging data collected from 1493 individuals comprising healthy control subjects, patients with psychosis, and their unaffected first-degree relatives. Specifically, the dataset included 176 bipolar disorder probands, 134 schizoaffective disorder probands, 240 schizophrenia probands, 362 control subjects, and 581 patient relatives. We assumed that there might be noise in the diagnostic labeling process. We detected label noise by classifying the data multiple times using a support vector machine classifier, and then we flagged those individuals in which all classifiers unanimously mislabeled those subjects. Next, we assigned a new diagnostic label to these individuals, based on the biological data (magnetic resonance imaging), using an iterative data cleansing approach. Results: Simulation results showed that our method was highly accurate in identifying label noise. Both diagnostic and biotype categories showed about 65% and 63% of noisy labels, respectively, with the largest amount of relabeling occurring between the healthy control subjects and individuals with bipolar disorder and schizophrenia as well as in unaffected close relatives. The extraction of imaging features highlighted regional brain changes associated with each group. Conclusions: This approach represents an initial step toward developing strategies that need not assume that existing mental health diagnostic categories are always valid but rather allows us to leverage this information while also acknowledging that there are misassignments.

AB - Background: Mental health diagnostic approaches are seeking to identify biological markers to work alongside advanced machine learning approaches. It is difficult to identify a biological marker of disease when the traditional diagnostic labels themselves are not necessarily valid. Methods: We worked with T1 structural magnetic resonance imaging data collected from 1493 individuals comprising healthy control subjects, patients with psychosis, and their unaffected first-degree relatives. Specifically, the dataset included 176 bipolar disorder probands, 134 schizoaffective disorder probands, 240 schizophrenia probands, 362 control subjects, and 581 patient relatives. We assumed that there might be noise in the diagnostic labeling process. We detected label noise by classifying the data multiple times using a support vector machine classifier, and then we flagged those individuals in which all classifiers unanimously mislabeled those subjects. Next, we assigned a new diagnostic label to these individuals, based on the biological data (magnetic resonance imaging), using an iterative data cleansing approach. Results: Simulation results showed that our method was highly accurate in identifying label noise. Both diagnostic and biotype categories showed about 65% and 63% of noisy labels, respectively, with the largest amount of relabeling occurring between the healthy control subjects and individuals with bipolar disorder and schizophrenia as well as in unaffected close relatives. The extraction of imaging features highlighted regional brain changes associated with each group. Conclusions: This approach represents an initial step toward developing strategies that need not assume that existing mental health diagnostic categories are always valid but rather allows us to leverage this information while also acknowledging that there are misassignments.

KW - Data cleansing

KW - Deep learning

KW - Label noise

KW - Machine learning

KW - Psychosis disorders

KW - Structural MRI

UR - http://www.scopus.com/inward/record.url?scp=85088823732&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85088823732&partnerID=8YFLogxK

U2 - 10.1016/j.bpsc.2020.05.008

DO - 10.1016/j.bpsc.2020.05.008

M3 - Article

C2 - 32771180

AN - SCOPUS:85088823732

SN - 2451-9022

VL - 5

SP - 819

EP - 832

JO - Biological Psychiatry: Cognitive Neuroscience and Neuroimaging

JF - Biological Psychiatry: Cognitive Neuroscience and Neuroimaging

IS - 8

ER -

Addressing Inaccurate Nosology in Mental Health: A Multilabel Data Cleansing Approach for Detecting Label Noise From Structural Magnetic Resonance Imaging Data in Mood and Psychosis Disorders

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this