High Throughput Read-Across for Screening a Large Inventory of Related Structures by Balancing Artificial Intelligence/Machine Learning and Human Knowledge.
Chihae YangJames F RathmanAleksandra MostragJoão Vinícius RibeiroBryan HobocienskiTomasz MagdziarzSunil KulkarniTara Barton-MaclarenPublished in: Chemical research in toxicology (2023)
Read-across is an in silico method applied in chemical risk assessment for data-poor chemicals. The read-across outcomes for repeated-dose toxicity end points include the no-observed-adverse-effect level (NOAEL) and estimated uncertainty for a particular category of effects. We have previously developed a new paradigm for estimating NOAELs based on chemoinformatics analysis and experimental study qualities from selected analogues, not relying on quantitative structure-activity relationships (QSARs) or rule-based SAR systems, which are not well-suited to end points for which the underpinning data are weakly grounded in specific chemical-biological interactions. The central hypothesis of this approach is that similar compounds have similar toxicity profiles and, hence, similar NOAEL values. Analogue quality (AQ) quantifies the suitability of an analogue candidate for reading across to the target by considering similarity from structure, physicochemical, ADME (absorption, distribution, metabolism, excretion), and biological perspectives. Biological similarity is based on experimental data; assay vectors derived from aggregations of ToxCast/Tox21 data are used to derive machine learning (ML) hybrid rules that serve as biological fingerprints to capture target-analogue similarity relevant to specific effects of interest, for example, hormone receptors (ER/AR/THR). Once one or more analogues have been qualified for read-across, a decision theory approach is used to estimate confidence bounds for the NOAEL of the target. The confidence interval is dramatically narrowed when analogues are constrained to biologically related profiles. Although this read-across process works well for a single target with several analogues, it can become unmanageable when, for example, screening multiple targets (e.g., virtual screening library) or handling a parent compound having numerous metabolites. To this end, we have established a digitalized framework to enable the assessment of a large number of substances, while still allowing for human decisions for filtering and prioritization. This workflow was developed and validated through a use case of a large set of bisphenols and their metabolites.
Keyphrases
- machine learning
- big data
- artificial intelligence
- molecular docking
- electronic health record
- high throughput
- single molecule
- risk assessment
- endothelial cells
- deep learning
- ms ms
- healthcare
- structure activity relationship
- oxidative stress
- high resolution
- induced pluripotent stem cells
- molecular dynamics simulations
- skeletal muscle
- pluripotent stem cells
- drinking water
- climate change
- data analysis
- quality improvement
- breast cancer cells