Login / Signup

Teaching genomics to life science undergraduates using cloud computing platforms with open datasets.

Toryn M PoolmanAndrea Townsend-NicholsonAmanda Cain
Published in: Biochemistry and molecular biology education : a bimonthly publication of the International Union of Biochemistry and Molecular Biology (2022)
The final year of a biochemistry degree is usually a time to experience research. However, laboratory-based research projects were not possible during COVID-19. Instead, we used open datasets to provide computational research projects in metagenomics to biochemistry undergraduates (80 students with limited computing experience). We aimed to give the students a chance to explore any dataset, rather than use a small number of artificial datasets (~60 published datasets were used). To achieve this, we utilized Google Colaboratory (Colab), a virtual computing environment. Colab was used as a framework to retrieve raw sequencing data (analyzed with QIIME2) and generate visualizations. Setting up the environment requires no prior experience; all students have the same drive structure and notebooks can be shared (for synchronous sessions). We also used the platform to combine multiple datasets, perform a meta-analysis, and allowed the students to analyze large datasets with 1000s of subjects and factors. Projects that required increased computational resources were integrated with Google Cloud Compute. In future, all research projects can include some aspects of reanalyzing public data, providing students with data science experience. Colab is also an excellent environment in which to develop data skills in multiple languages (e.g., Perl, Python, Julia).
Keyphrases
  • high school
  • rna seq
  • electronic health record
  • quality improvement
  • single cell
  • big data
  • public health
  • minimally invasive
  • healthcare
  • emergency department
  • mental health
  • data analysis
  • current status
  • deep learning