The mutation features and geographical distributions of the surface glycoprotein (S gene) in SARS-CoV-2 strains: A comparative analysis of the early and current strains.
Rang LiuXinran LinBing ChenZhenhui HouQiuju ZhangShouren LinLan GengZhongyi SunCanhui CaoYu ShiXi XiaPublished in: Journal of medical virology (2022)
The surface glycoprotein (S protein) of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was used to develop coronavirus disease 2019 (COVID-19) vaccines. However, SARS-CoV-2, especially the S protein, has undergone rapid evolution and mutation, which has remained to be determined. Here, we analyzed and compared the early (12 237) and the current (more than 10 million) SARS-CoV-2 strains to identify the mutation features and geographical distribution of the S gene and S protein. Results showed that in the early strains, most of the loci were with relative low mutation frequency except S: 23403 (4486 strains), while in the current strains, there was a surge in the mutation strains and frequency, with S: 23403 constantly being the highest one, but tremendously increased to approximately 1050 times. Furthermore, D614 (S: 23403) was one of the most highly frequent mutations in the S protein of Omicron as of March 2022, and most of the mutant strains were still from the United States, and the United Kingdom. Further analysis demonstrated that in the receptor-binding domain, most of the loci with low mutation frequency in the early strains, while S: 22995 was nowadays the most prevalent loci with 3 122 491 strains in the current strains. Overall, we compare the mutation features of the S region in SARS-CoV-2 strains between the early and the current stains, providing insight into further studies in concert with emerging SARS-CoV-2 variants for COVID-19 vaccines.