TY - JOUR
T1 - Machine learning of toxicological big data enables read-across structure activity relationships (RASAR) outperforming animal test reproducibility
AU - Luechtefeld, Thomas
AU - Marsh, Dan
AU - Rowlands, Craig
AU - Hartung, Thomas
N1 - Funding Information:
Thomas Luechtefeld was supported by an NIEHS training grant (T32 ES007141). This work was supported by the EU-ToxRisk project (An Integrated European “Flagship” Program Driving Mechanism-Based Toxicity Testing and Risk Assessment for the 21st Century) funded by the European Commission under the Horizon 2020 program (Grant Agreement No. 681002).
Publisher Copyright:
© The Author(s) 2018.
Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2018/9/1
Y1 - 2018/9/1
N2 - Earlier we created a chemical hazard database via natural language processing of dossiers submitted to the European Chemical Agency with approximately 10 000 chemicals.We identified repeat OECD guideline tests to establish reproducibility of acute oral and dermal toxicity, eye and skin irritation,mutagenicity and skin sensitization. Based on 350-700+ chemicals each, the probability that an OECD guideline animal test would output the same result in a repeat test was 78%-96% (sensitivity 50%-87%). An expanded database withmore than 866 000 chemical properties/hazards was used as training data and tomodel health hazards and chemical properties. The constructedmodels automate and extend the read-acrossmethod of chemical classification. The novelmodels called RASARs (read-across structure activity relationship) use binary fingerprints and Jaccard distance to define chemical similarity. A large chemical similarity adjacencymatrix is constructed from this similaritymetric and is used to derive feature vectors for supervised learning.We show results on 9 health hazards from 2 kinds of RASARs-"Simple" and "Data Fusion". The "Simple" RASAR seeks to duplicate the traditional read-acrossmethod, predicting hazard from chemical analogs with known hazard data. The "Data Fusion" RASAR extends this concept by creating large feature vectors fromall available property data rather than only themodeled hazard. Simple RASARmodels tested in cross-validation achieve 70%-80% balanced accuracies with constraints on tested compounds. Cross validation of data fusion RASARs show balanced accuracies in the 80%-95% range across 9 health hazards with no constraints on tested compounds.
AB - Earlier we created a chemical hazard database via natural language processing of dossiers submitted to the European Chemical Agency with approximately 10 000 chemicals.We identified repeat OECD guideline tests to establish reproducibility of acute oral and dermal toxicity, eye and skin irritation,mutagenicity and skin sensitization. Based on 350-700+ chemicals each, the probability that an OECD guideline animal test would output the same result in a repeat test was 78%-96% (sensitivity 50%-87%). An expanded database withmore than 866 000 chemical properties/hazards was used as training data and tomodel health hazards and chemical properties. The constructedmodels automate and extend the read-acrossmethod of chemical classification. The novelmodels called RASARs (read-across structure activity relationship) use binary fingerprints and Jaccard distance to define chemical similarity. A large chemical similarity adjacencymatrix is constructed from this similaritymetric and is used to derive feature vectors for supervised learning.We show results on 9 health hazards from 2 kinds of RASARs-"Simple" and "Data Fusion". The "Simple" RASAR seeks to duplicate the traditional read-acrossmethod, predicting hazard from chemical analogs with known hazard data. The "Data Fusion" RASAR extends this concept by creating large feature vectors fromall available property data rather than only themodeled hazard. Simple RASARmodels tested in cross-validation achieve 70%-80% balanced accuracies with constraints on tested compounds. Cross validation of data fusion RASARs show balanced accuracies in the 80%-95% range across 9 health hazards with no constraints on tested compounds.
UR - http://www.scopus.com/inward/record.url?scp=85054881735&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054881735&partnerID=8YFLogxK
U2 - 10.1093/toxsci/kfy152
DO - 10.1093/toxsci/kfy152
M3 - Article
C2 - 30007363
AN - SCOPUS:85054881735
VL - 165
SP - 198
EP - 212
JO - Toxicological Sciences
JF - Toxicological Sciences
SN - 1096-6080
IS - 1
ER -