The validity of three-class Hotelling trace (3-HT) in describing three-class task performance

Comparison of three-class volume under ROC surface (VUS) and 3-HT

Xin He, Eric Frey

Research output: Contribution to journalArticle

Abstract

In order to describe multiclass classification performance, several figures of merit (FOM) have been proposed. Among the earliest and most widely known of these is the three-class Hotelling trace (3-HT). The goal of this paper is to present theoretical and empirical data demonstrating the failure of 3-HT as a measure of three-class task performance. To help do this, we contrast it to a newly proposed three-class FOM, the volume under the three-class receiver operating characteristic (ROC) surface (VUS). The VUS is obtained from a decision theory based three-class ROC analysis method which has been proved to extend the decision theoretic, linear discriminant analysis (LDA), and psychophysical foundations of binary ROC analysis to a three-class paradigm. We demonstrate empirically that the VUS and 3-HT do not have a monotonic relationship in general when describing three-class task performance. Numerical experiments demonstrated that the VUS provided reasonable results, while the 3-HT failed to distinguish between the case where all objects could be perfectly classified from the case where only one pair of the classes could be perfectly classified. We have provided theoretical explanations of this failure of 3-HT. The significance of this work goes beyond merely demonstrating the problems of the 3-HT, it demonstrates that a FOM that is mathematically correct and has a strong theoretical basis can provide results that violate a common sense understanding of three-class task performance. This fact raises the question of "how to evaluate a classification performance evaluation method?" We believe the answer to this question lies in the theoretical foundations of binary ROC analysis. We have thus contrasted the two FOMs in terms of three fundamental theories underlying binary ROC analysis: decision theory, binary linear discriminant analysis, and the equivalence of two psychophysical classification procedures. These theoretical investigations demonstrated the importance of extending and unifying all the fundamental theories of binary classification in the development of a three-class FOM; violating one of theses fundamental binary classification theories may, as it did for the L-HT, provide predictions of three-class task performance that do not agree with a common sense understanding of three-class task performance.

Original languageEnglish (US)
Article number4580126
Pages (from-to)185-193
Number of pages9
JournalIEEE Transactions on Medical Imaging
Volume28
Issue number2
DOIs
StatePublished - Feb 2009

Fingerprint

Task Performance and Analysis
ROC Curve
Decision Theory
Decision theory
Discriminant Analysis
Discriminant analysis
Experiments

Keywords

  • L-class Hotelling trace
  • L-class linear discriminant analysis
  • Receiver operating characteristic (ROC) analysis
  • Three-class classification

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications
  • Radiological and Ultrasound Technology
  • Software

Cite this

@article{da6c691104fe4b3682f5581dde804edc,
title = "The validity of three-class Hotelling trace (3-HT) in describing three-class task performance: Comparison of three-class volume under ROC surface (VUS) and 3-HT",
abstract = "In order to describe multiclass classification performance, several figures of merit (FOM) have been proposed. Among the earliest and most widely known of these is the three-class Hotelling trace (3-HT). The goal of this paper is to present theoretical and empirical data demonstrating the failure of 3-HT as a measure of three-class task performance. To help do this, we contrast it to a newly proposed three-class FOM, the volume under the three-class receiver operating characteristic (ROC) surface (VUS). The VUS is obtained from a decision theory based three-class ROC analysis method which has been proved to extend the decision theoretic, linear discriminant analysis (LDA), and psychophysical foundations of binary ROC analysis to a three-class paradigm. We demonstrate empirically that the VUS and 3-HT do not have a monotonic relationship in general when describing three-class task performance. Numerical experiments demonstrated that the VUS provided reasonable results, while the 3-HT failed to distinguish between the case where all objects could be perfectly classified from the case where only one pair of the classes could be perfectly classified. We have provided theoretical explanations of this failure of 3-HT. The significance of this work goes beyond merely demonstrating the problems of the 3-HT, it demonstrates that a FOM that is mathematically correct and has a strong theoretical basis can provide results that violate a common sense understanding of three-class task performance. This fact raises the question of {"}how to evaluate a classification performance evaluation method?{"} We believe the answer to this question lies in the theoretical foundations of binary ROC analysis. We have thus contrasted the two FOMs in terms of three fundamental theories underlying binary ROC analysis: decision theory, binary linear discriminant analysis, and the equivalence of two psychophysical classification procedures. These theoretical investigations demonstrated the importance of extending and unifying all the fundamental theories of binary classification in the development of a three-class FOM; violating one of theses fundamental binary classification theories may, as it did for the L-HT, provide predictions of three-class task performance that do not agree with a common sense understanding of three-class task performance.",
keywords = "L-class Hotelling trace, L-class linear discriminant analysis, Receiver operating characteristic (ROC) analysis, Three-class classification",
author = "Xin He and Eric Frey",
year = "2009",
month = "2",
doi = "10.1109/TMI.2008.928919",
language = "English (US)",
volume = "28",
pages = "185--193",
journal = "IEEE Transactions on Medical Imaging",
issn = "0278-0062",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "2",

}

TY - JOUR

T1 - The validity of three-class Hotelling trace (3-HT) in describing three-class task performance

T2 - Comparison of three-class volume under ROC surface (VUS) and 3-HT

AU - He, Xin

AU - Frey, Eric

PY - 2009/2

Y1 - 2009/2

N2 - In order to describe multiclass classification performance, several figures of merit (FOM) have been proposed. Among the earliest and most widely known of these is the three-class Hotelling trace (3-HT). The goal of this paper is to present theoretical and empirical data demonstrating the failure of 3-HT as a measure of three-class task performance. To help do this, we contrast it to a newly proposed three-class FOM, the volume under the three-class receiver operating characteristic (ROC) surface (VUS). The VUS is obtained from a decision theory based three-class ROC analysis method which has been proved to extend the decision theoretic, linear discriminant analysis (LDA), and psychophysical foundations of binary ROC analysis to a three-class paradigm. We demonstrate empirically that the VUS and 3-HT do not have a monotonic relationship in general when describing three-class task performance. Numerical experiments demonstrated that the VUS provided reasonable results, while the 3-HT failed to distinguish between the case where all objects could be perfectly classified from the case where only one pair of the classes could be perfectly classified. We have provided theoretical explanations of this failure of 3-HT. The significance of this work goes beyond merely demonstrating the problems of the 3-HT, it demonstrates that a FOM that is mathematically correct and has a strong theoretical basis can provide results that violate a common sense understanding of three-class task performance. This fact raises the question of "how to evaluate a classification performance evaluation method?" We believe the answer to this question lies in the theoretical foundations of binary ROC analysis. We have thus contrasted the two FOMs in terms of three fundamental theories underlying binary ROC analysis: decision theory, binary linear discriminant analysis, and the equivalence of two psychophysical classification procedures. These theoretical investigations demonstrated the importance of extending and unifying all the fundamental theories of binary classification in the development of a three-class FOM; violating one of theses fundamental binary classification theories may, as it did for the L-HT, provide predictions of three-class task performance that do not agree with a common sense understanding of three-class task performance.

AB - In order to describe multiclass classification performance, several figures of merit (FOM) have been proposed. Among the earliest and most widely known of these is the three-class Hotelling trace (3-HT). The goal of this paper is to present theoretical and empirical data demonstrating the failure of 3-HT as a measure of three-class task performance. To help do this, we contrast it to a newly proposed three-class FOM, the volume under the three-class receiver operating characteristic (ROC) surface (VUS). The VUS is obtained from a decision theory based three-class ROC analysis method which has been proved to extend the decision theoretic, linear discriminant analysis (LDA), and psychophysical foundations of binary ROC analysis to a three-class paradigm. We demonstrate empirically that the VUS and 3-HT do not have a monotonic relationship in general when describing three-class task performance. Numerical experiments demonstrated that the VUS provided reasonable results, while the 3-HT failed to distinguish between the case where all objects could be perfectly classified from the case where only one pair of the classes could be perfectly classified. We have provided theoretical explanations of this failure of 3-HT. The significance of this work goes beyond merely demonstrating the problems of the 3-HT, it demonstrates that a FOM that is mathematically correct and has a strong theoretical basis can provide results that violate a common sense understanding of three-class task performance. This fact raises the question of "how to evaluate a classification performance evaluation method?" We believe the answer to this question lies in the theoretical foundations of binary ROC analysis. We have thus contrasted the two FOMs in terms of three fundamental theories underlying binary ROC analysis: decision theory, binary linear discriminant analysis, and the equivalence of two psychophysical classification procedures. These theoretical investigations demonstrated the importance of extending and unifying all the fundamental theories of binary classification in the development of a three-class FOM; violating one of theses fundamental binary classification theories may, as it did for the L-HT, provide predictions of three-class task performance that do not agree with a common sense understanding of three-class task performance.

KW - L-class Hotelling trace

KW - L-class linear discriminant analysis

KW - Receiver operating characteristic (ROC) analysis

KW - Three-class classification

UR - http://www.scopus.com/inward/record.url?scp=59449095907&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=59449095907&partnerID=8YFLogxK

U2 - 10.1109/TMI.2008.928919

DO - 10.1109/TMI.2008.928919

M3 - Article

VL - 28

SP - 185

EP - 193

JO - IEEE Transactions on Medical Imaging

JF - IEEE Transactions on Medical Imaging

SN - 0278-0062

IS - 2

M1 - 4580126

ER -