Login / Signup

A fast algorithm to factorize high-dimensional tensor product matrices used in genetic models.

Marco Lopez-CruzPaulino Pérez-RodríguezGustavo de Los Campos
Published in: G3 (Bethesda, Md.) (2024)
Many genetic models (including models for epistatic effects as well as genetic-by-environment) involve covariance structures that are Hadamard products of lower rank matrices. Implementing these models requires factorizing large Hadamard product matrices. The available algorithms for factorization do not scale well for big data, making the use of some of these models not feasible with large sample sizes. Here, based on properties of Hadamard products and (related) Kronecker products, we propose an algorithm that produces an approximate decomposition that is orders of magnitude faster than the standard eigenvalue decomposition. In this article, we describe the algorithm, show how it can be used to factorize large Hadamard product matrices, present benchmarks, and illustrate the use of the method by presenting an analysis of data from the northern testing locations of the G × E project from the Genomes to Fields Initiative (n ∼ 60,000). We implemented the proposed algorithm in the open-source "tensorEVD" R package.
Keyphrases
  • machine learning
  • big data
  • deep learning
  • artificial intelligence
  • quality improvement
  • genome wide
  • high resolution
  • copy number
  • neural network
  • gene expression