Login / Signup

Prop3D: A flexible, Python-based platform for machine learning with protein structural properties and biophysical data.

Eli J DraizenJohn ReadeyCameron MuraPhilip E Bourne
Published in: BMC bioinformatics (2024)
Prop3D and its associated Prop3D-20sf dataset can be of broad utility in at least three ways. Firstly, the Prop3D workflow code can be customized and deployed on various cloud-based compute platforms, with scalability achieved largely by saving the results to distributed HDF5 files via HSDS . Secondly, the linked Prop3D-20sf dataset provides a hand-crafted, already-featurized dataset of protein domains for 20 highly-populated CATH families; importantly, provision of this pre-computed resource can aid the more efficient development (and reproducible deployment) of ML pipelines. Thirdly, Prop3D-20sf's construction explicitly takes into account (in creating datasets and data-splits) the enigma of 'data leakage', stemming from the evolutionary relationships between proteins.
Keyphrases
  • electronic health record
  • machine learning
  • big data
  • palliative care
  • protein protein
  • data analysis
  • genome wide