Machine-learning algorithms to code public health spending accounts

Eoghan S. Brady, Jonathon P. Leider, Beth A. Resnick, Y. Natalia Alfonso, David Bishai

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Objectives: Government public health expenditure data sets require time- and labor-intensive manipulation to summarize results that public health policy makers can use. Our objective was to compare the performances of machine-learning algorithms with manual classification of public health expenditures to determine if machines could provide a faster, cheaper alternative to manual classification. Methods: We used machine-learning algorithms to replicate the process of manually classifying state public health expenditures, using the standardized public health spending categories from the Foundational Public Health Services model and a large data set from the US Census Bureau. We obtained a data set of 1.9 million individual expenditure items from 2000 to 2013. We collapsed these data into 147 280 summary expenditure records, and we followed a standardized method of manually classifying each expenditure record as public health, maybe public health, or not public health. We then trained 9 machine-learning algorithms to replicate the manual process. We calculated recall, precision, and coverage rates to measure the performance of individual and ensembled algorithms. Results: Compared with manual classification, the machine-learning random forests algorithm produced 84% recall and 91% precision. With algorithm ensembling, we achieved our target criterion of 90% recall by using a consensus ensemble of ≥6 algorithms while still retaining 93% coverage, leaving only 7% of the summary expenditure records unclassified. Conclusions: Machine learning can be a time- and cost-saving tool for estimating public health spending in the United States. It can be used with standardized public health spending categories based on the Foundational Public Health Services model to help parse public health expenditure information from other types of health-related spending, provide data that are more comparable across public health organizations, and evaluate the impact of evidence-based public health resource allocation.

Original languageEnglish (US)
Pages (from-to)350-356
Number of pages7
JournalPublic health reports
Issue number3
StatePublished - May 1 2017


  • Health finance
  • Machine learning
  • Public health

ASJC Scopus subject areas

  • Public Health, Environmental and Occupational Health


Dive into the research topics of 'Machine-learning algorithms to code public health spending accounts'. Together they form a unique fingerprint.

Cite this