Login / Signup

Use of DFT Distance Metrics for Classification of SARS-CoV-2 Genomes.

Micah ThorntonMonnie Mcgee
Published in: Journal of computational biology : a journal of computational molecular cell biology (2022)
In this work, we investigate using Fourier coefficients (FCs) for capturing useful information about viral sequences in a computationally efficient and compact manner. Specifically, we extract geographic submission location from SARS-CoV-2 sequence headers submitted to the GISAID Initiative, calculate corresponding FCs, and use the FCs to classify these sequences according to geographic location. We show that the FCs serve as useful numerical summaries for sequences that allow manipulation, identification, and differentiation via classical mathematical and statistical methods that are not readily applicable for character strings. Further, we argue that subsets of the FCs may be usable for the same purposes, which results in a reduction in storage requirements. We conclude by offering extensions of the research and potential future directions for subsequent analyses, such as the use of other series transforms for discreetly indexed signals such as genomes.
Keyphrases