Login / Signup

Ten simple rules for using public biological data for your research.

Vishal H OzaJordan H WhitlockElizabeth J WilkAngelina Uno-AntonisonBrandon WilkManavalan GajapathyTimothy C HowtonAustyn TrullLara IanovElizabeth A WortheyBrittany N Lasseigne
Published in: PLoS computational biology (2023)
With an increasing amount of biological data available publicly, there is a need for a guide on how to successfully download and use this data. The 10 simple rules for using public biological data are: (1) use public data purposefully in your research; (2) evaluate data for your use case; (3) check data reuse requirements and embargoes; (4) be aware of ethics for data reuse; (5) plan for data storage and compute requirements; (6) know what you are downloading; (7) download programmatically and verify integrity; (8) properly cite data; (9) make reprocessed data and models Findable, Accessible, Interoperable, and Reusable (FAIR) and share; and (10) make pipelines and code FAIR and share. These rules are intended as a guide for researchers wanting to make use of available data and to increase data reuse and reproducibility.
Keyphrases
  • electronic health record
  • big data
  • healthcare
  • emergency department
  • machine learning
  • data analysis