GenoDrawing: An Autoencoder Framework for Image Prediction from SNP Markers.
Federico Jurado-RuizDavid RousseauJuan A BotíaMaría José AranzanaPublished in: Plant phenomics (Washington, D.C.) (2023)
Advancements in genome sequencing have facilitated whole-genome characterization of numerous plant species, providing an abundance of genotypic data for genomic analysis. Genomic selection and neural networks (NNs), particularly deep learning, have been developed to predict complex traits from dense genotypic data. Autoencoders, an NN model to extract features from images in an unsupervised manner, has proven to be useful for plant phenotyping. This study introduces an autoencoder framework, GenoDrawing, for predicting and retrieving apple images from a low-depth single-nucleotide polymorphism (SNP) array, potentially useful in predicting traits that are difficult to define. GenoDrawing demonstrates proficiency in its task using a small dataset of shape-related SNPs. Results indicate that the use of SNPs associated with visual traits has substantial impact on the generated images, consistent with biological interpretation. While using substantial SNPs is crucial, incorporating additional, unrelated SNPs results in performance degradation for simple NN architectures that cannot easily identify the most important inputs. The proposed GenoDrawing method is a practical framework for exploring genomic prediction in fruit tree phenotyping, particularly beneficial for small to medium breeding companies to predict economically substantial heritable traits. Although GenoDrawing has limitations, it sets the groundwork for future research in image prediction from genomic markers. Future studies should focus on using stronger models for image reproduction, SNP information extraction, and dataset balance in terms of phenotypes for more precise outcomes.
Keyphrases
- genome wide
- deep learning
- copy number
- dna methylation
- convolutional neural network
- artificial intelligence
- machine learning
- neural network
- high throughput
- big data
- electronic health record
- current status
- type diabetes
- single cell
- oxidative stress
- gene expression
- healthcare
- metabolic syndrome
- data analysis
- insulin resistance
- social media
- antibiotic resistance genes