Login / Signup

Prediction of potential miRNA-disease associations based on stacked autoencoder.

Chun-Chun WangTian-Hao LiLi HuangXing Chen
Published in: Briefings in bioinformatics (2022)
In recent years, increasing biological experiments and scientific studies have demonstrated that microRNA (miRNA) plays an important role in the development of human complex diseases. Therefore, discovering miRNA-disease associations can contribute to accurate diagnosis and effective treatment of diseases. Identifying miRNA-disease associations through computational methods based on biological data has been proven to be low-cost and high-efficiency. In this study, we proposed a computational model named Stacked Autoencoder for potential MiRNA-Disease Association prediction (SAEMDA). In SAEMDA, all the miRNA-disease samples were used to pretrain a Stacked Autoencoder (SAE) in an unsupervised manner. Then, the positive samples and the same number of selected negative samples were utilized to fine-tune SAE in a supervised manner after adding an output layer with softmax classifier to the SAE. SAEMDA can make full use of the feature information of all unlabeled miRNA-disease pairs. Therefore, SAEMDA is suitable for our dataset containing small labeled samples and large unlabeled samples. As a result, SAEMDA achieved AUCs of 0.9210 and 0.8343 in global and local leave-one-out cross validation. Besides, SAEMDA obtained an average AUC and standard deviation of 0.9102 ± /-0.0029 in 100 times of 5-fold cross validation. These results were better than those of previous models. Moreover, we carried out three case studies to further demonstrate the predictive accuracy of SAEMDA. As a result, 82% (breast neoplasms), 100% (lung neoplasms) and 90% (esophageal neoplasms) of the top 50 predicted miRNAs were verified by databases. Thus, SAEMDA could be a useful and reliable model to predict potential miRNA-disease associations.
Keyphrases
  • machine learning
  • healthcare
  • high efficiency
  • low cost
  • computed tomography
  • artificial intelligence
  • big data
  • high resolution
  • mass spectrometry
  • replacement therapy
  • data analysis
  • induced pluripotent stem cells