Login / Signup

ChemFH: an integrated tool for screening frequent false positives in chemical biology and drug discovery.

Shao-Hua ShiLi FuJia-Cai YiZiyi YangXiaochen ZhangYouchao DengWenxuan WangChengkun WuWentao ZhaoTing-Jun HouXiangxiang ZengAi-Ping LuDong-Sheng Cao
Published in: Nucleic acids research (2024)
High-throughput screening rapidly tests an extensive array of chemical compounds to identify hit compounds for specific biological targets in drug discovery. However, false-positive results disrupt hit compound screening, leading to wastage of time and resources. To address this, we propose ChemFH, an integrated online platform facilitating rapid virtual evaluation of potential false positives, including colloidal aggregators, spectroscopic interference compounds, firefly luciferase inhibitors, chemical reactive compounds, promiscuous compounds, and other assay interferences. By leveraging a dataset containing 823 391 compounds, we constructed high-quality prediction models using multi-task directed message-passing network (DMPNN) architectures combining uncertainty estimation, yielding an average AUC value of 0.91. Furthermore, ChemFH incorporated 1441 representative alert substructures derived from the collected data and ten commonly used frequent hitter screening rules. ChemFH was validated with an external set of 75 compounds. Subsequently, the virtual screening capability of ChemFH was successfully confirmed through its application to five virtual screening libraries. Furthermore, ChemFH underwent additional validation on two natural products and FDA-approved drugs, yielding reliable and accurate results. ChemFH is a comprehensive, reliable, and computationally efficient screening pipeline that facilitates the identification of true positive results in assays, contributing to enhanced efficiency and success rates in drug discovery. ChemFH is freely available via https://chemfh.scbdd.com/.
Keyphrases
  • drug discovery
  • high throughput
  • high resolution
  • healthcare
  • machine learning
  • risk assessment
  • cross sectional
  • big data
  • health information
  • network analysis