Login / Signup

Whole-genome bisulfite sequencing data analysis learning module on Google Cloud Platform.

Yujia QinAngela MaggioDale HawkinsLaura BeaudryAllen KimDaniel PanTing GongYuanyuan FuHua YangYouping Deng
Published in: Briefings in bioinformatics (2024)
This study describes the development of a resource module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module is designed to facilitate interactive learning of whole-genome bisulfite sequencing (WGBS) data analysis utilizing cloud-based tools in Google Cloud Platform, such as Cloud Storage, Vertex AI notebooks and Google Batch. WGBS is a powerful technique that can provide comprehensive insights into DNA methylation patterns at single cytosine resolution, essential for understanding epigenetic regulation across the genome. The designed learning module first provides step-by-step tutorials that guide learners through two main stages of WGBS data analysis, preprocessing and the identification of differentially methylated regions. And then, it provides a streamlined workflow and demonstrates how to effectively use it for large datasets given the power of cloud infrastructure. The integration of these interconnected submodules progressively deepens the user's understanding of the WGBS analysis process along with the use of cloud resources. Through this module, we can enhance the accessibility and adoption of cloud computing in epigenomic research, speeding up the advancements in the related field and beyond. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.
Keyphrases
  • data analysis
  • single cell
  • high throughput
  • dna methylation
  • rna seq
  • electronic health record
  • genome wide
  • big data
  • deep learning