Identifying complex motifs in massive omics data with a variable-convolutional layer in deep neural network.

Jing-Yi LiShen JinXin-Ming TuYang DingGe Gao

Published in: Briefings in bioinformatics (2022)

Motif identification is among the most common and essential computational tasks for bioinformatics and genomics. Here we proposed a novel convolutional layer for deep neural network, named variable convolutional (vConv) layer, for effective motif identification in high-throughput omics data by learning kernel length from data adaptively. Empirical evaluations on DNA-protein binding and DNase footprinting cases well demonstrated that vConv-based networks have superior performance to their convolutional counterparts regardless of model complexity. Meanwhile, vConv could be readily integrated into multi-layer neural networks as an 'in-place replacement' of canonical convolutional layer. All source codes are freely available on GitHub for academic usage.

Keyphrases

neural network
single cell
high throughput
electronic health record
big data
working memory
bioinformatics analysis
data analysis
cell free
dna binding
nucleic acid