Using recurrent neural networks to detect supernumerary chromosomes in fungal strains causing blast diseases.
Nikesh GyawaliYangfan HaoGuifang LinJun HuangRavi BikaLidia Calderon DazaHuakun ZhengGiovana CruppeDoina CarageaDavid Edward CookBarbara ValentSanzhen LiuPublished in: NAR genomics and bioinformatics (2024)
The genomes of the fungus Magnaporthe oryzae that causes blast diseases on diverse grass species, including major crops, have indispensable core-chromosomes and may contain supernumerary chromosomes, also known as mini-chromosomes. These mini-chromosomes are speculated to provide effector gene mobility, and may transfer between strains. To understand the biology of mini-chromosomes, it is valuable to be able to detect whether a M. oryzae strain possesses a mini-chromosome. Here, we applied recurrent neural network models for classifying DNA sequences as arising from core- or mini-chromosomes. The models were trained with sequences from available core- and mini-chromosome assemblies, and then used to predict the presence of mini-chromosomes in a global collection of M. oryzae isolates using short-read DNA sequences. The model predicted that mini-chromosomes were prevalent in M . oryzae isolates. Interestingly, at least one mini-chromosome was present in all recent wheat isolates, but no mini-chromosomes were found in early isolates collected before 1991, indicating a preferential selection for strains carrying mini-chromosomes in recent years. The model was also used to identify assembled contigs derived from mini-chromosomes. In summary, our study has developed a reliable method for categorizing DNA sequences and showcases an application of recurrent neural networks in predictive genomics.