Inferring novel genes related to oral cancer with a network embedding method and one-class learning algorithms.
Lei ChenYu-Hang ZhangGuohua HuangXiaoyong PanTao HuangYu-Dong CaiPublished in: Gene therapy (2019)
Oral cancer (OC) is one of the most common cancers threatening human lives. However, OC pathogenesis has yet to be fully uncovered, and thus designing effective treatments remains difficult. Identifying genes related to OC is an important way for achieving this purpose. In this study, we proposed three computational models for inferring novel OC-related genes. In contrast to previously proposed computational methods, which lacked the learning procedures, each proposed model adopted a one-class learning algorithm, which can provide a deep insight into features of validated OC-related genes. A network embedding algorithm (i.e., node2vec) was applied to the protein-protein interaction network to produce the representation of genes. The features of the OC-related genes were used in the training of the one-class algorithm, and the performance of the final inferring model was improved through a feature selection procedure. Then, candidate genes were produced by applying the trained inferring model to other genes. Three tests were performed to screen out the important candidate genes. Accordingly, we obtained three inferred gene sets, any two of which were different. The inferred genes were also different from previous reported genes and some of them have been included in the public Oral Cancer Gene Database. Finally, we analyzed several inferred genes to confirm whether they are novel OC-related genes.
Keyphrases
- genome wide
- genome wide identification
- machine learning
- bioinformatics analysis
- deep learning
- genome wide analysis
- protein protein
- transcription factor
- dna methylation
- neural network
- magnetic resonance
- endothelial cells
- computed tomography
- magnetic resonance imaging
- body composition
- single cell
- minimally invasive
- high throughput
- resistance training
- childhood cancer