High Throughput Read-Across for Screening a Large Inventory of Related Structures by Balancing Artificial Intelligence/Machine Learning and Human Knowledge.

Chihae Yang James F RathmanAleksandra MostragJoão Vinícius RibeiroBryan HobocienskiTomasz MagdziarzSunil KulkarniTara Barton-Maclaren

Published in: Chemical research in toxicology (2023)

Read-across is an in silico method applied in chemical risk assessment for data-poor chemicals. The read-across outcomes for repeated-dose toxicity end points include the no-observed-adverse-effect level (NOAEL) and estimated uncertainty for a particular category of effects. We have previously developed a new paradigm for estimating NOAELs based on chemoinformatics analysis and experimental study qualities from selected analogues, not relying on quantitative structure-activity relationships (QSARs) or rule-based SAR systems, which are not well-suited to end points for which the underpinning data are weakly grounded in specific chemical-biological interactions. The central hypothesis of this approach is that similar compounds have similar toxicity profiles and, hence, similar NOAEL values. Analogue quality (AQ) quantifies the suitability of an analogue candidate for reading across to the target by considering similarity from structure, physicochemical, ADME (absorption, distribution, metabolism, excretion), and biological perspectives. Biological similarity is based on experimental data; assay vectors derived from aggregations of ToxCast/Tox21 data are used to derive machine learning (ML) hybrid rules that serve as biological fingerprints to capture target-analogue similarity relevant to specific effects of interest, for example, hormone receptors (ER/AR/THR). Once one or more analogues have been qualified for read-across, a decision theory approach is used to estimate confidence bounds for the NOAEL of the target. The confidence interval is dramatically narrowed when analogues are constrained to biologically related profiles. Although this read-across process works well for a single target with several analogues, it can become unmanageable when, for example, screening multiple targets (e.g., virtual screening library) or handling a parent compound having numerous metabolites. To this end, we have established a digitalized framework to enable the assessment of a large number of substances, while still allowing for human decisions for filtering and prioritization. This workflow was developed and validated through a use case of a large set of bisphenols and their metabolites.

Keyphrases