Machine-Learning Algorithms to Code Public Health Spending Accounts

Eoghan S. Brady, Jonathon P. Leider, Beth Resnick, Yira Natalia Alfonso, David M Bishai

Research output: Contribution to journalArticle

Abstract

OBJECTIVES: Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification.

METHODS: We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms.

RESULTS: Compared with manual classification, the machine-learning random forests algorithm produced 84% recall and 91% precision. With algorithm ensembling, we achieved our target criterion of 90% recall by using a consensus ensemble of ≥6 algorithms while still retaining 93% coverage, leaving only 7% of the summary expenditure records unclassified.

CONCLUSIONS: Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence-based public health resource allocation.

Original languageEnglish (US)
Pages (from-to)350-356
Number of pages7
JournalPublic health reports (Washington, D.C. : 1974)
Volume132
Issue number3
DOIs
StatePublished - May 1 2017

Fingerprint

Public Health
Health Expenditures
United States Public Health Service
Machine Learning
Resource Allocation
Health Resources
Censuses
Public Policy
Health Policy
Administrative Personnel
Organizations
Costs and Cost Analysis
Health

Keywords

  • health finance
  • machine learning
  • public health

ASJC Scopus subject areas

  • Public Health, Environmental and Occupational Health

Cite this

Machine-Learning Algorithms to Code Public Health Spending Accounts. / Brady, Eoghan S.; Leider, Jonathon P.; Resnick, Beth; Alfonso, Yira Natalia; Bishai, David M.

In: Public health reports (Washington, D.C. : 1974), Vol. 132, No. 3, 01.05.2017, p. 350-356.

Research output: Contribution to journalArticle

@article{a3878203370044f2abc13f96936f0f0d,
title = "Machine-Learning Algorithms to Code Public Health Spending Accounts",
abstract = "OBJECTIVES: Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification.METHODS: We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms.RESULTS: Compared with manual classification, the machine-learning random forests algorithm produced 84{\%} recall and 91{\%} precision. With algorithm ensembling, we achieved our target criterion of 90{\%} recall by using a consensus ensemble of ≥6 algorithms while still retaining 93{\%} coverage, leaving only 7{\%} of the summary expenditure records unclassified.CONCLUSIONS: Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence-based public health resource allocation.",
keywords = "health finance, machine learning, public health",
author = "Brady, {Eoghan S.} and Leider, {Jonathon P.} and Beth Resnick and Alfonso, {Yira Natalia} and Bishai, {David M}",
year = "2017",
month = "5",
day = "1",
doi = "10.1177/0033354917700356",
language = "English (US)",
volume = "132",
pages = "350--356",
journal = "Public Health Reports",
issn = "0033-3549",
publisher = "Association of Schools of Public Health",
number = "3",

}

TY - JOUR

T1 - Machine-Learning Algorithms to Code Public Health Spending Accounts

AU - Brady, Eoghan S.

AU - Leider, Jonathon P.

AU - Resnick, Beth

AU - Alfonso, Yira Natalia

AU - Bishai, David M

PY - 2017/5/1

Y1 - 2017/5/1

N2 - OBJECTIVES: Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification.METHODS: We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms.RESULTS: Compared with manual classification, the machine-learning random forests algorithm produced 84% recall and 91% precision. With algorithm ensembling, we achieved our target criterion of 90% recall by using a consensus ensemble of ≥6 algorithms while still retaining 93% coverage, leaving only 7% of the summary expenditure records unclassified.CONCLUSIONS: Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence-based public health resource allocation.

AB - OBJECTIVES: Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification.METHODS: We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms.RESULTS: Compared with manual classification, the machine-learning random forests algorithm produced 84% recall and 91% precision. With algorithm ensembling, we achieved our target criterion of 90% recall by using a consensus ensemble of ≥6 algorithms while still retaining 93% coverage, leaving only 7% of the summary expenditure records unclassified.CONCLUSIONS: Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence-based public health resource allocation.

KW - health finance

KW - machine learning

KW - public health

UR - http://www.scopus.com/inward/record.url?scp=85021859548&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021859548&partnerID=8YFLogxK

U2 - 10.1177/0033354917700356

DO - 10.1177/0033354917700356

M3 - Article

C2 - 28363034

AN - SCOPUS:85021859548

VL - 132

SP - 350

EP - 356

JO - Public Health Reports

JF - Public Health Reports

SN - 0033-3549

IS - 3

ER -