Login / Signup

Demystifying dimensionality reduction techniques in the 'omics' era: A practical approach for biological science students.

Leonardo D GarmaNuno S Osório
Published in: Biochemistry and molecular biology education : a bimonthly publication of the International Union of Biochemistry and Molecular Biology (2023)
Dimensionality reduction techniques are essential in analyzing large 'omics' datasets in biochemistry and molecular biology. Principal component analysis, t-distributed stochastic neighbor embedding, and uniform manifold approximation and projection are commonly used for data visualization. However, these methods can be challenging for students without a strong mathematical background. In this study, intuitive examples were created using COVID-19 data to help students understand the core concepts behind these techniques. In a 4-h practical session, we used these examples to demonstrate dimensionality reduction techniques to 15 postgraduate students from biomedical backgrounds. Using Python and Jupyter notebooks, our goal was to demystify these methods, typically treated as "black boxes", and empower students to generate and interpret their own results. To assess the impact of our approach, we conducted an anonymous survey. The majority of the students agreed that using computers enriched their learning experience (67%) and that Jupyter notebooks were a valuable part of the class (66%). Additionally, 60% of the students reported increased interest in Python, and 40% gained both interest and a better understanding of dimensionality reduction methods. Despite the short duration of the course, 40% of the students reported acquiring research skills necessary in the field. While further analysis of the learning impacts of this approach is needed, we believe that sharing the examples we generated can provide valuable resources for others to use in interactive teaching environments. These examples highlight advantages and limitations of the major dimensionality reduction methods used in modern bioinformatics analysis in an easy-to-understand way.
Keyphrases
  • high school
  • coronavirus disease
  • sars cov
  • magnetic resonance imaging
  • single cell
  • magnetic resonance
  • computed tomography
  • single molecule
  • big data
  • health information
  • data analysis
  • contrast enhanced