Comparative genomic analysis of thermophilic fungi reveals convergent evolutionary adaptations and gene losses.
Andrei Stecca SteindorffMaria Victoria Aguilar PontesAaron J RobinsonBill AndreopoulosKurt M LaButtiAlan KuoStephen J MondoRobert RileyRobert OtillarSajeet HaridasAnna LipzenJerry W JenkinsJeremy SchmutzAlicia ClumIan D ReidMarie-Claude MoisanGregory ButlerThi Truc Minh NguyenKen DewarGavin C ConantElodie DrulaBernard HenrissatColleen M HanselSteven SingerMiriam I HutchinsonRonald P de VriesDonald O NatvigAmy J PowellAdrian TsangIgor V GrigorievPublished in: Communications biology (2024)
Thermophily is a trait scattered across the fungal tree of life, with its highest prevalence within three fungal families (Chaetomiaceae, Thermoascaceae, and Trichocomaceae), as well as some members of the phylum Mucoromycota. We examined 37 thermophilic and thermotolerant species and 42 mesophilic species for this study and identified thermophily as the ancestral state of all three prominent families of thermophilic fungi. Thermophilic fungal genomes were found to encode various thermostable enzymes, including carbohydrate-active enzymes such as endoxylanases, which are useful for many industrial applications. At the same time, the overall gene counts, especially in gene families responsible for microbial defense such as secondary metabolism, are reduced in thermophiles compared to mesophiles. We also found a reduction in the core genome size of thermophiles in both the Chaetomiaceae family and the Eurotiomycetes class. The Gene Ontology terms lost in thermophilic fungi include primary metabolism, transporters, UV response, and O-methyltransferases. Comparative genomics analysis also revealed higher GC content in the third base of codons (GC3) and a lower effective number of codons in fungal thermophiles than in both thermotolerant and mesophilic fungi. Furthermore, using the Support Vector Machine classifier, we identified several Pfam domains capable of discriminating between genomes of thermophiles and mesophiles with 94% accuracy. Using AlphaFold2 to predict protein structures of endoxylanases (GH10), we built a similarity network based on the structures. We found that the number of disulfide bonds appears important for protein structure, and the network clusters based on protein structures correlate with the optimal activity temperature. Thus, comparative genomics offers new insights into the biology, adaptation, and evolutionary history of thermophilic fungi while providing a parts list for bioengineering applications.
Keyphrases
- anaerobic digestion
- genome wide
- copy number
- genome wide identification
- single cell
- dna methylation
- high resolution
- microbial community
- amino acid
- wastewater treatment
- binding protein
- risk factors
- gene expression
- transcription factor
- machine learning
- mass spectrometry
- tandem mass spectrometry
- solid phase extraction
- innate immune