### Abstract

The methodological development of this paper is motivated by a common problem in econometrics where we are interested in estimating the difference in the average expenditures between two populations, say with and without a disease, as a function of the covariates. For example, let Y_{1} and Y_{2} be two nonnegative random variables denoting the health expenditures for cases and controls. Smooth Quantile Ratio Estimation (SQUARE) is a novel approach for estimating Δ = E[Y_{1}] - E[Y _{2}] by smoothing across percentiles the log-transformed ratio of the two quantile functions. Dominici et al. (2005) have shown that SQUARE defines a large class of estimators of Δ, is more efficient than common parametric and nonparametric estimators of Δ, and is consistent and asymptotically normal. However, in applications it is often desirable to estimate Δ(x) = E[Y_{1}|x] - E[Y_{2}|x], that is, the difference in means as a function of x. In this paper we extend SQUARE to a regression model and we introduce a two-part regression SQUARE for estimating Δ(x) as a function of x. We use the first part of the model to estimate the probability of incurring any costs and the second part of the model to estimate the mean difference in health expenditures, given that a nonzero cost is observed. In the second part of the model, we apply the basic definition of SQUARE for positive costs to compare expenditures for the cases and controls having 'similar' covariate profiles. We determine strata of cases and control with 'similar' covariate profiles by the use of propensity score matching. We then apply two-part regression SQUARE to the 1987 National Medicare Expenditure Survey to estimate the difference Δ(x) between persons suffering from smoking-attributable diseases and persons without these diseases as a function of the propensity of getting the disease. Using a simulation study, we compare frequentist properties of two-part regression SQUARE with maximum likelihood estimators for the log-transformed expenditures.

Original language | English (US) |
---|---|

Pages (from-to) | 505-519 |

Number of pages | 15 |

Journal | Biostatistics |

Volume | 6 |

Issue number | 4 |

DOIs | |

State | Published - Oct 2005 |

### Fingerprint

### Keywords

- Comparing means
- Health expenditures
- Log-normal
- Propensity scores
- Q-Q plots
- Quantile regression
- Regression splines
- Skewed distributions
- Smoking

### ASJC Scopus subject areas

- Medicine(all)
- Statistics and Probability
- Statistics, Probability and Uncertainty

### Cite this

**Smooth quantile ratio estimation with regression : Estimating medical expenditures for smoking-attributable diseases.** / Dominici, Francesca; Zeger, Scott.

Research output: Contribution to journal › Article

*Biostatistics*, vol. 6, no. 4, pp. 505-519. https://doi.org/10.1093/biostatistics/kxi031

}

TY - JOUR

T1 - Smooth quantile ratio estimation with regression

T2 - Estimating medical expenditures for smoking-attributable diseases

AU - Dominici, Francesca

AU - Zeger, Scott

PY - 2005/10

Y1 - 2005/10

N2 - The methodological development of this paper is motivated by a common problem in econometrics where we are interested in estimating the difference in the average expenditures between two populations, say with and without a disease, as a function of the covariates. For example, let Y1 and Y2 be two nonnegative random variables denoting the health expenditures for cases and controls. Smooth Quantile Ratio Estimation (SQUARE) is a novel approach for estimating Δ = E[Y1] - E[Y 2] by smoothing across percentiles the log-transformed ratio of the two quantile functions. Dominici et al. (2005) have shown that SQUARE defines a large class of estimators of Δ, is more efficient than common parametric and nonparametric estimators of Δ, and is consistent and asymptotically normal. However, in applications it is often desirable to estimate Δ(x) = E[Y1|x] - E[Y2|x], that is, the difference in means as a function of x. In this paper we extend SQUARE to a regression model and we introduce a two-part regression SQUARE for estimating Δ(x) as a function of x. We use the first part of the model to estimate the probability of incurring any costs and the second part of the model to estimate the mean difference in health expenditures, given that a nonzero cost is observed. In the second part of the model, we apply the basic definition of SQUARE for positive costs to compare expenditures for the cases and controls having 'similar' covariate profiles. We determine strata of cases and control with 'similar' covariate profiles by the use of propensity score matching. We then apply two-part regression SQUARE to the 1987 National Medicare Expenditure Survey to estimate the difference Δ(x) between persons suffering from smoking-attributable diseases and persons without these diseases as a function of the propensity of getting the disease. Using a simulation study, we compare frequentist properties of two-part regression SQUARE with maximum likelihood estimators for the log-transformed expenditures.

AB - The methodological development of this paper is motivated by a common problem in econometrics where we are interested in estimating the difference in the average expenditures between two populations, say with and without a disease, as a function of the covariates. For example, let Y1 and Y2 be two nonnegative random variables denoting the health expenditures for cases and controls. Smooth Quantile Ratio Estimation (SQUARE) is a novel approach for estimating Δ = E[Y1] - E[Y 2] by smoothing across percentiles the log-transformed ratio of the two quantile functions. Dominici et al. (2005) have shown that SQUARE defines a large class of estimators of Δ, is more efficient than common parametric and nonparametric estimators of Δ, and is consistent and asymptotically normal. However, in applications it is often desirable to estimate Δ(x) = E[Y1|x] - E[Y2|x], that is, the difference in means as a function of x. In this paper we extend SQUARE to a regression model and we introduce a two-part regression SQUARE for estimating Δ(x) as a function of x. We use the first part of the model to estimate the probability of incurring any costs and the second part of the model to estimate the mean difference in health expenditures, given that a nonzero cost is observed. In the second part of the model, we apply the basic definition of SQUARE for positive costs to compare expenditures for the cases and controls having 'similar' covariate profiles. We determine strata of cases and control with 'similar' covariate profiles by the use of propensity score matching. We then apply two-part regression SQUARE to the 1987 National Medicare Expenditure Survey to estimate the difference Δ(x) between persons suffering from smoking-attributable diseases and persons without these diseases as a function of the propensity of getting the disease. Using a simulation study, we compare frequentist properties of two-part regression SQUARE with maximum likelihood estimators for the log-transformed expenditures.

KW - Comparing means

KW - Health expenditures

KW - Log-normal

KW - Propensity scores

KW - Q-Q plots

KW - Quantile regression

KW - Regression splines

KW - Skewed distributions

KW - Smoking

UR - http://www.scopus.com/inward/record.url?scp=26644458846&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=26644458846&partnerID=8YFLogxK

U2 - 10.1093/biostatistics/kxi031

DO - 10.1093/biostatistics/kxi031

M3 - Article

C2 - 15872022

AN - SCOPUS:26644458846

VL - 6

SP - 505

EP - 519

JO - Biostatistics

JF - Biostatistics

SN - 1465-4644

IS - 4

ER -