Loss function based ranking in two-stage, hierarchical models

Rongheng Lin; Thomas A. Louis; Susan M. Paddock; Greg Ridgeway

doi:10.1214/06-BA130

Loss function based ranking in two-stage, hierarchical models

Rongheng Lin, Thomas A. Louis, Susan M. Paddock, Greg Ridgeway

Research output: Contribution to journal › Article › peer-review

28 Scopus citations

Abstract

Performance evaluations of health services providers burgeons. Simi-larly, analyzing spatially related health information, ranking teachers and schools, and identification of differentially expressed genes are increasing in prevalence and importance. Goals include valid and efficient ranking of units for profiling and league tables, identification of excellent and poor performers, the most differentially expressed genes, and determining "exceedances" (how many and which unit-specific true parameters exceed a threshold). These data and inferential goals require a hierarchical, Bayesian model that accounts for nesting relations and identifies both population values and random effects for unit-specific parameters. Furthermore, the Bayesian approach coupled with optimizing a loss function provides a framework for computing non-standard inferences such as ranks and histograms. Estimated ranks that minimize Squared Error Loss (SEL) between the true and estimated ranks have been investigated. The posterior mean ranks minimize SEL and are "general purpose," relevant to a broad spectrum of ranking goals. However, other loss functions and optimizing ranks that are tuned to application-specific goals require identification and evaluation. For example, when the goal is to identify the relatively good (e.g., in the upper 10%) or relatively poor performers, a loss function that penalizes classification errors produces estimates that mini-mize the error rate. We construct loss functions that address this and other goals, developing a unified framework that facilitates generating candidate estimates, comparing approaches and producing data analytic performance summaries. We compare performance for a fully parametric, hierarchical model with Gaussian sampling distribution under Gaussian and a mixture of Gaussians prior distribu-tions. We illustrate approaches via analysis of standardized mortality ratio data from the United States Renal Data System. Results show that SEL-optimal ranks perform well over a broad class of loss functions but can be improved upon when classifying units above or below a per-centile cut-point. Importantly, even optimal rank estimates can perform poorly in many real-world settings; therefore, data-analytic performance summaries should always be reported.

Original language	English (US)
Pages (from-to)	915-946
Number of pages	32
Journal	Bayesian Analysis
Volume	1
Issue number	4
DOIs	https://doi.org/10.1214/06-BA130
State	Published - 2006
Externally published	Yes

Keywords

Bayesian models
Decision theory
Operating characteristic
Percentiling

ASJC Scopus subject areas

Statistics and Probability
Applied Mathematics

Access to Document

10.1214/06-BA130

Cite this

@article{07bfb710f8684c1eaa3ce93b8074c5d3,

title = "Loss function based ranking in two-stage, hierarchical models",

abstract = "Performance evaluations of health services providers burgeons. Simi-larly, analyzing spatially related health information, ranking teachers and schools, and identification of differentially expressed genes are increasing in prevalence and importance. Goals include valid and efficient ranking of units for profiling and league tables, identification of excellent and poor performers, the most differentially expressed genes, and determining {"}exceedances{"} (how many and which unit-specific true parameters exceed a threshold). These data and inferential goals require a hierarchical, Bayesian model that accounts for nesting relations and identifies both population values and random effects for unit-specific parameters. Furthermore, the Bayesian approach coupled with optimizing a loss function provides a framework for computing non-standard inferences such as ranks and histograms. Estimated ranks that minimize Squared Error Loss (SEL) between the true and estimated ranks have been investigated. The posterior mean ranks minimize SEL and are {"}general purpose,{"} relevant to a broad spectrum of ranking goals. However, other loss functions and optimizing ranks that are tuned to application-specific goals require identification and evaluation. For example, when the goal is to identify the relatively good (e.g., in the upper 10%) or relatively poor performers, a loss function that penalizes classification errors produces estimates that mini-mize the error rate. We construct loss functions that address this and other goals, developing a unified framework that facilitates generating candidate estimates, comparing approaches and producing data analytic performance summaries. We compare performance for a fully parametric, hierarchical model with Gaussian sampling distribution under Gaussian and a mixture of Gaussians prior distribu-tions. We illustrate approaches via analysis of standardized mortality ratio data from the United States Renal Data System. Results show that SEL-optimal ranks perform well over a broad class of loss functions but can be improved upon when classifying units above or below a per-centile cut-point. Importantly, even optimal rank estimates can perform poorly in many real-world settings; therefore, data-analytic performance summaries should always be reported.",

keywords = "Bayesian models, Decision theory, Operating characteristic, Percentiling",

author = "Rongheng Lin and Louis, {Thomas A.} and Paddock, {Susan M.} and Greg Ridgeway",

year = "2006",

doi = "10.1214/06-BA130",

language = "English (US)",

volume = "1",

pages = "915--946",

journal = "Bayesian Analysis",

issn = "1936-0975",

publisher = "Carnegie Mellon University",

number = "4",

}

TY - JOUR

T1 - Loss function based ranking in two-stage, hierarchical models

AU - Lin, Rongheng

AU - Louis, Thomas A.

AU - Paddock, Susan M.

AU - Ridgeway, Greg

PY - 2006

Y1 - 2006

N2 - Performance evaluations of health services providers burgeons. Simi-larly, analyzing spatially related health information, ranking teachers and schools, and identification of differentially expressed genes are increasing in prevalence and importance. Goals include valid and efficient ranking of units for profiling and league tables, identification of excellent and poor performers, the most differentially expressed genes, and determining "exceedances" (how many and which unit-specific true parameters exceed a threshold). These data and inferential goals require a hierarchical, Bayesian model that accounts for nesting relations and identifies both population values and random effects for unit-specific parameters. Furthermore, the Bayesian approach coupled with optimizing a loss function provides a framework for computing non-standard inferences such as ranks and histograms. Estimated ranks that minimize Squared Error Loss (SEL) between the true and estimated ranks have been investigated. The posterior mean ranks minimize SEL and are "general purpose," relevant to a broad spectrum of ranking goals. However, other loss functions and optimizing ranks that are tuned to application-specific goals require identification and evaluation. For example, when the goal is to identify the relatively good (e.g., in the upper 10%) or relatively poor performers, a loss function that penalizes classification errors produces estimates that mini-mize the error rate. We construct loss functions that address this and other goals, developing a unified framework that facilitates generating candidate estimates, comparing approaches and producing data analytic performance summaries. We compare performance for a fully parametric, hierarchical model with Gaussian sampling distribution under Gaussian and a mixture of Gaussians prior distribu-tions. We illustrate approaches via analysis of standardized mortality ratio data from the United States Renal Data System. Results show that SEL-optimal ranks perform well over a broad class of loss functions but can be improved upon when classifying units above or below a per-centile cut-point. Importantly, even optimal rank estimates can perform poorly in many real-world settings; therefore, data-analytic performance summaries should always be reported.

AB - Performance evaluations of health services providers burgeons. Simi-larly, analyzing spatially related health information, ranking teachers and schools, and identification of differentially expressed genes are increasing in prevalence and importance. Goals include valid and efficient ranking of units for profiling and league tables, identification of excellent and poor performers, the most differentially expressed genes, and determining "exceedances" (how many and which unit-specific true parameters exceed a threshold). These data and inferential goals require a hierarchical, Bayesian model that accounts for nesting relations and identifies both population values and random effects for unit-specific parameters. Furthermore, the Bayesian approach coupled with optimizing a loss function provides a framework for computing non-standard inferences such as ranks and histograms. Estimated ranks that minimize Squared Error Loss (SEL) between the true and estimated ranks have been investigated. The posterior mean ranks minimize SEL and are "general purpose," relevant to a broad spectrum of ranking goals. However, other loss functions and optimizing ranks that are tuned to application-specific goals require identification and evaluation. For example, when the goal is to identify the relatively good (e.g., in the upper 10%) or relatively poor performers, a loss function that penalizes classification errors produces estimates that mini-mize the error rate. We construct loss functions that address this and other goals, developing a unified framework that facilitates generating candidate estimates, comparing approaches and producing data analytic performance summaries. We compare performance for a fully parametric, hierarchical model with Gaussian sampling distribution under Gaussian and a mixture of Gaussians prior distribu-tions. We illustrate approaches via analysis of standardized mortality ratio data from the United States Renal Data System. Results show that SEL-optimal ranks perform well over a broad class of loss functions but can be improved upon when classifying units above or below a per-centile cut-point. Importantly, even optimal rank estimates can perform poorly in many real-world settings; therefore, data-analytic performance summaries should always be reported.

KW - Bayesian models

KW - Decision theory

KW - Operating characteristic

KW - Percentiling

UR - http://www.scopus.com/inward/record.url?scp=36849036984&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=36849036984&partnerID=8YFLogxK

U2 - 10.1214/06-BA130

DO - 10.1214/06-BA130

M3 - Article

C2 - 20607112

AN - SCOPUS:36849036984

SN - 1936-0975

VL - 1

SP - 915

EP - 946

JO - Bayesian Analysis

JF - Bayesian Analysis

IS - 4

ER -

Loss function based ranking in two-stage, hierarchical models

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this