Probabilistic hazard assessment for skin sensitiza tion potency by dose-response modeling using feature elimination instead of quantitative structure-activity relationships

Thomas Luechtefeld; Alexandra Maertens; James M. Mckim; Thomas Hartung; Andre Kleensang; Vanessa Sá-Rocha

doi:10.1002/jat.3172

Probabilistic hazard assessment for skin sensitiza tion potency by dose-response modeling using feature elimination instead of quantitative structure-activity relationships

Thomas Luechtefeld, Alexandra Maertens, James M. Mckim, Thomas Hartung, Andre Kleensang, Vanessa Sá-Rocha

Bloomberg School of Public Health

Research output: Contribution to journal › Article › peer-review

24 Scopus citations

Abstract

Supervised learning methods promise to improve integrated testing strategies (ITS), but must be adjusted to handle high dimensionality and dose-response data. ITS approaches are currently fueled by the increasing mechanistic understanding of adverse outcome pathways (AOP) and the development of tests reflecting these mechanisms. Simple approaches to combine skin sensitization data sets, such as weight of evidence, fail due to problems in information redundancy and high dimensionality. The problem is further amplified when potency information (dose/response) of hazards would be estimated. Skin sensitization currently serves as the foster child for AOP and ITS development, as legislative pressures combined with a very good mechanistic understanding of contact dermatitis have led to test development and relatively large high-quality data sets. We curated such a data set and combined a recursive variable selection algorithm to evaluate the information available through in silico, in chemico and in vitro assays. Chemical similarity alone could not cluster chemicals' potency, and in vitro models consistently ranked high in recursive feature elimination. This allows reducing the number of tests included in an ITS. Next, we analyzed with a hidden Markov model that takes advantage of an intrinsic inter-relationship among the local lymph node assay classes, i.e. the monotonous connection between local lymph node assay and dose. The dose-informed random forest/hidden Markov model was superior to the dose-naive random forest model on all data sets. Although balanced accuracy improvement may seem small, this obscures the actual improvement in misclassifications as the dose-informed hidden Markov model strongly reduced "false-negatives" (i.e. extreme sensitizers as non-sensitizer) on all data sets.

Original language	English (US)
Pages (from-to)	1361-1371
Number of pages	11
Journal	Journal of Applied Toxicology
Volume	35
Issue number	11
DOIs	https://doi.org/10.1002/jat.3172
State	Published - Nov 1 2015

Keywords

Feature selection
Hidden Markov model
In vitro
Integrated testing strategy
LLNA
Machine learning
QSAR
Skin sensitization

ASJC Scopus subject areas

Toxicology

Access to Document

10.1002/jat.3172

Cite this

Probabilistic hazard assessment for skin sensitiza tion potency by dose-response modeling using feature elimination instead of quantitative structure-activity relationships. / Luechtefeld, Thomas; Maertens, Alexandra; Mckim, James M. et al.
In: Journal of Applied Toxicology, Vol. 35, No. 11, 01.11.2015, p. 1361-1371.

Research output: Contribution to journal › Article › peer-review

@article{7513f833e51849999be8288bf6c259a7,

title = "Probabilistic hazard assessment for skin sensitiza tion potency by dose-response modeling using feature elimination instead of quantitative structure-activity relationships",

abstract = "Supervised learning methods promise to improve integrated testing strategies (ITS), but must be adjusted to handle high dimensionality and dose-response data. ITS approaches are currently fueled by the increasing mechanistic understanding of adverse outcome pathways (AOP) and the development of tests reflecting these mechanisms. Simple approaches to combine skin sensitization data sets, such as weight of evidence, fail due to problems in information redundancy and high dimensionality. The problem is further amplified when potency information (dose/response) of hazards would be estimated. Skin sensitization currently serves as the foster child for AOP and ITS development, as legislative pressures combined with a very good mechanistic understanding of contact dermatitis have led to test development and relatively large high-quality data sets. We curated such a data set and combined a recursive variable selection algorithm to evaluate the information available through in silico, in chemico and in vitro assays. Chemical similarity alone could not cluster chemicals' potency, and in vitro models consistently ranked high in recursive feature elimination. This allows reducing the number of tests included in an ITS. Next, we analyzed with a hidden Markov model that takes advantage of an intrinsic inter-relationship among the local lymph node assay classes, i.e. the monotonous connection between local lymph node assay and dose. The dose-informed random forest/hidden Markov model was superior to the dose-naive random forest model on all data sets. Although balanced accuracy improvement may seem small, this obscures the actual improvement in misclassifications as the dose-informed hidden Markov model strongly reduced {"}false-negatives{"} (i.e. extreme sensitizers as non-sensitizer) on all data sets.",

keywords = "Feature selection, Hidden Markov model, In vitro, Integrated testing strategy, LLNA, Machine learning, QSAR, Skin sensitization",

author = "Thomas Luechtefeld and Alexandra Maertens and Mckim, {James M.} and Thomas Hartung and Andre Kleensang and Vanessa S{\'a}-Rocha",

note = "Publisher Copyright: {\textcopyright} 2015 John Wiley & Sons, Ltd.",

year = "2015",

month = nov,

day = "1",

doi = "10.1002/jat.3172",

language = "English (US)",

volume = "35",

pages = "1361--1371",

journal = "Journal of Applied Toxicology",

issn = "0260-437X",

publisher = "John Wiley and Sons Ltd",

number = "11",

}

TY - JOUR

T1 - Probabilistic hazard assessment for skin sensitiza tion potency by dose-response modeling using feature elimination instead of quantitative structure-activity relationships

AU - Luechtefeld, Thomas

AU - Maertens, Alexandra

AU - Mckim, James M.

AU - Hartung, Thomas

AU - Kleensang, Andre

AU - Sá-Rocha, Vanessa

PY - 2015/11/1

Y1 - 2015/11/1

N2 - Supervised learning methods promise to improve integrated testing strategies (ITS), but must be adjusted to handle high dimensionality and dose-response data. ITS approaches are currently fueled by the increasing mechanistic understanding of adverse outcome pathways (AOP) and the development of tests reflecting these mechanisms. Simple approaches to combine skin sensitization data sets, such as weight of evidence, fail due to problems in information redundancy and high dimensionality. The problem is further amplified when potency information (dose/response) of hazards would be estimated. Skin sensitization currently serves as the foster child for AOP and ITS development, as legislative pressures combined with a very good mechanistic understanding of contact dermatitis have led to test development and relatively large high-quality data sets. We curated such a data set and combined a recursive variable selection algorithm to evaluate the information available through in silico, in chemico and in vitro assays. Chemical similarity alone could not cluster chemicals' potency, and in vitro models consistently ranked high in recursive feature elimination. This allows reducing the number of tests included in an ITS. Next, we analyzed with a hidden Markov model that takes advantage of an intrinsic inter-relationship among the local lymph node assay classes, i.e. the monotonous connection between local lymph node assay and dose. The dose-informed random forest/hidden Markov model was superior to the dose-naive random forest model on all data sets. Although balanced accuracy improvement may seem small, this obscures the actual improvement in misclassifications as the dose-informed hidden Markov model strongly reduced "false-negatives" (i.e. extreme sensitizers as non-sensitizer) on all data sets.

AB - Supervised learning methods promise to improve integrated testing strategies (ITS), but must be adjusted to handle high dimensionality and dose-response data. ITS approaches are currently fueled by the increasing mechanistic understanding of adverse outcome pathways (AOP) and the development of tests reflecting these mechanisms. Simple approaches to combine skin sensitization data sets, such as weight of evidence, fail due to problems in information redundancy and high dimensionality. The problem is further amplified when potency information (dose/response) of hazards would be estimated. Skin sensitization currently serves as the foster child for AOP and ITS development, as legislative pressures combined with a very good mechanistic understanding of contact dermatitis have led to test development and relatively large high-quality data sets. We curated such a data set and combined a recursive variable selection algorithm to evaluate the information available through in silico, in chemico and in vitro assays. Chemical similarity alone could not cluster chemicals' potency, and in vitro models consistently ranked high in recursive feature elimination. This allows reducing the number of tests included in an ITS. Next, we analyzed with a hidden Markov model that takes advantage of an intrinsic inter-relationship among the local lymph node assay classes, i.e. the monotonous connection between local lymph node assay and dose. The dose-informed random forest/hidden Markov model was superior to the dose-naive random forest model on all data sets. Although balanced accuracy improvement may seem small, this obscures the actual improvement in misclassifications as the dose-informed hidden Markov model strongly reduced "false-negatives" (i.e. extreme sensitizers as non-sensitizer) on all data sets.

KW - Feature selection

KW - Hidden Markov model

KW - In vitro

KW - Integrated testing strategy

KW - LLNA

KW - Machine learning

KW - QSAR

KW - Skin sensitization

UR - http://www.scopus.com/inward/record.url?scp=84942502605&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942502605&partnerID=8YFLogxK

U2 - 10.1002/jat.3172

DO - 10.1002/jat.3172

M3 - Article

C2 - 26046447

AN - SCOPUS:84942502605

SN - 0260-437X

VL - 35

SP - 1361

EP - 1371

JO - Journal of Applied Toxicology

JF - Journal of Applied Toxicology

IS - 11

ER -

Probabilistic hazard assessment for skin sensitiza tion potency by dose-response modeling using feature elimination instead of quantitative structure-activity relationships

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this