It is well accepted that genes are simultaneously involved in multiple biological processes and that genes are coordinated over the duration of such events. Unfortunately, clustering methodologies that group genes for the purpose of novel gene discovery fail to acknowledge the dynamic nature of biological processes and provide static clusters, even when the expression of genes is assessed across time or developmental stages. By taking advantage of techniques and theories from time frequency analysis, periodic gene expression profiles are dynamically clustered based on the assumption that different spectral frequencies characterize different biological processes. A two-step cluster validation approach is proposed to statistically estimate both the optimal number of clusters and to distinguish significant clusters from noise. The resulting clusters reveal coordinated coexpressed genes. This novel dynamic clustering approach has broad applicability to a vast range of sequential data scenarios where the order of the series is of interest.
Keyphrases
- genome wide
- genome wide identification
- gene expression
- dna methylation
- genome wide analysis
- bioinformatics analysis
- single cell
- copy number
- transcription factor
- rna seq
- small molecule
- climate change
- magnetic resonance imaging
- long non coding rna
- optical coherence tomography
- computed tomography
- high throughput
- binding protein