Teaching computational genomics and bioinformatics on a high performance computing cluster-a primer.
Arun SethuramanPublished in: Biology methods & protocols (2022)
The burgeoning field of genomics as applied to personalized medicine, epidemiology, conservation, agriculture, forensics, drug development, and other fields comes with large computational and bioinformatics costs, which are often inaccessible to student trainees in classroom settings at universities. However, with increased availability of resources such as NSF XSEDE, Google Cloud, Amazon AWS, and other high-performance computing (HPC) clouds and clusters for educational purposes, a growing community of academicians are working on teaching the utility of HPC resources in genomics and big data analyses. Here, I describe the successful implementation of a semester-long (16 week) upper division undergraduate/graduate level course in Computational Genomics and Bioinformatics taught at San Diego State University in Spring 2022. Students were trained in the theory, algorithms and hands-on applications of genomic data quality control, assembly, annotation, multiple sequence alignment, variant calling, phylogenomic analyses, population genomics, genome-wide association studies, and differential gene expression analyses using RNAseq data on their own dedicated 6-CPU NSF XSEDE Jetstream virtual machines. All lesson plans, activities, examinations, tutorials, code, lectures, and notes are publicly available at https://github.com/arunsethuraman/biomi609spring2022.
Keyphrases
- big data
- single cell
- medical education
- machine learning
- gene expression
- quality control
- medical students
- artificial intelligence
- rna seq
- healthcare
- genome wide association
- primary care
- dna methylation
- deep learning
- climate change
- clinical trial
- quality improvement
- randomized controlled trial
- high intensity
- data analysis
- high school
- genome wide
- double blind
- case control