Aggregation of recount3 RNA-seq data improves inference of consensus and tissue-specific gene co-expression networks.
Prashanthi RavichandranPrincy ParsanaRebecca KeenerKaspar D HansenAlexis J BattlePublished in: bioRxiv : the preprint server for biology (2024)
This study outlines best practices for network GCN inference and evaluation by data aggregation. We recommend estimating and regressing confounders in each data set before aggregation and prioritizing large sample size studies for GCN reconstruction. Increased statistical power in inferring context-specific networks enabled the derivation of variant annotations that were enriched for concordant trait heritability independent of functional genomic annotations that are context-agnostic. While we observed strictly increasing held-out log-likelihood with data aggregation, we noted diminishing marginal improvements. Future directions aimed at alternate methods for estimating confounders and integrating orthogonal information from modalities such as Hi-C and ChIP-seq can further improve GCN inference.