Login / Signup

FILER: a framework for harmonizing and querying large-scale functional genomics knowledge.

Pavel P KuksaYuk Yee LeungPrabhakaran GangadharanZivadin KatanicLauren KleidermacherAlexandre Amlie-WolfChien-Yueh LeeLiming QuEmily Greenfest-AllenOtto ValladaresLi-San Wang
Published in: NAR genomics and bioinformatics (2022)
Querying massive functional genomic and annotation data collections, linking and summarizing the query results across data sources/data types are important steps in high-throughput genomic and genetic analytical workflows. However, these steps are made difficult by the heterogeneity and breadth of data sources, experimental assays, biological conditions/tissues/cell types and file formats. FILER (FunctIonaL gEnomics Repository) is a framework for querying large-scale genomics knowledge with a large, curated integrated catalog of harmonized functional genomic and annotation data coupled with a scalable genomic search and querying interface. FILER uniquely provides: (i) streamlined access to >50 000 harmonized, annotated genomic datasets across >20 integrated data sources, >1100 tissues/cell types and >20 experimental assays; (ii) a scalable genomic querying interface; and (iii) ability to analyze and annotate user's experimental data. This rich resource spans >17 billion GRCh37/hg19 and GRCh38/hg38 genomic records. Our benchmark querying 7 × 10 9 hg19 FILER records shows FILER is highly scalable, with a sub-linear 32-fold increase in querying time when increasing the number of queries 1000-fold from 1000 to 1 000 000 intervals. Together, these features facilitate reproducible research and streamline integrating/querying large-scale genomic data within analyses/workflows. FILER can be deployed on cloud or local servers (https://bitbucket.org/wanglab-upenn/FILER) for integration with custom pipelines and is freely available (https://lisanwanglab.org/FILER).
Keyphrases
  • electronic health record
  • single cell
  • copy number
  • big data
  • high throughput
  • rna seq
  • healthcare
  • drinking water
  • machine learning
  • stem cells
  • bone marrow
  • single molecule