Scientists have discovered that several viruses that belong to the Coronaviridae family can infect a wide range of hosts, including birds, humans, and other mammals. These viruses are positive sense single stranded RNA viruses ranging in size from 27 to 32 kb. They are divided into four categories, namely alpha, beta, delta, and gamma.
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of the current coronavirus disease 2019 (COVID-19) pandemic, and was first identified in the Chinese province of Wuhan in December 2019. Due to its high infectivity and mortality rate, the World Health Organization announced that COVID-19 would be a pandemic on March 11, 2020.
As viruses undergo a genomic mutation, identifying the site of the mutation is of utmost importance for vaccine development. Various analyzes based on phylogenetic trees have been performed to understand the evolutionary relationship of SARS-CoV-2 with other beta coronaviruses. An earlier study constructed a phylogenetic tree and revealed that the genomic sequence of SARS-CoV-2 is 88% identical to BAT-CoV. In another study, scientists isolated around 70 SARS-CoV-2 genomic sequences from COVID-19 patients and studied the spike glycoprotein gene. This study also reported that the BetaCoV-bat-Yunnan-RaTG13-2013 virus is almost identical to SARS-CoV-2.
Although a comparative study is available on the genomic sequences of SARS-CoV, MERS-CoV, and SARS-CoV-2, there is a gap in research regarding the comparison between four types of coronavirus, namely SARS-CoV, MERS-CoV, BAT-CoV and SARS-CoV-2. A new study, which deals with the genomic comparison between the sequence of the four types of coronavirus mentioned above, has been published in the Journal of Medical Virology. This study used multiple genetic markers, including single nucleotide polymorphisms (SNPs), whole genome sequence phylogeny, protein mutations, and microsatellites. These were compared with the reference genomic sequence of SARS-CoV-2 which is known as the Wuhan strain (Wuhan-Wu-I). All sequences were obtained from NCBI Genbank.

The SARS-CoV, MERS-CoV and SARS-CoV-2 sequences were obtained from Homo sapiens (host), while the BAT-CoV sequences were obtained from eight different types of bats. The results of this study are described below.
Phylogenetic analysis
For the phylogenetic analysis of the different coronavirus sequences, a maximum likelihood approach with 1000 start values was used. Phylogenetic analysis revealed different coronavirus lineages. All genome-based phylogenetic analysis has shown that MERS-CoV belongs to alien species, while the other three were classified as endogroup species. Within the endogroup, two lineages were found, namely a lineage consisting of SARS-CoV-2 and another consisting of SARS-CoV and BAT-CoV. The branches of the phylogenetic tree indicated that SARS-CoV had separated very early from BAT-CoV. The tree also revealed an independent divergence of SARS-CoV-2 from BAT-CoV. The phylogeny also showed that SARS-CoV-2 is more closely related to BAT-CoV and SARS-CoV than MERS-CoV. Simplot software was used to visualize the similarity plot between the four selected species. It revealed about 98% homology of BAT-CoV to the reference sequence, i.e., the Wuhan stain of SARS-CoV-2. However, a 92% similarity was obtained between SARS-CoV and the reference sequence, and a 58% similarity between MERS-CoV and the Wuhan strain.
Analysis of genetic variants
A variant-based analysis showed that the MERS-CoV genome differed from the Wuhan reference strain by 134.21 sites, the BAT-CoV genome differed by 136.72 sites, the SARS-CoV genome differed by 26.64 sites. and the SARS-CoV-2 genome differed by 0.66 sites. Furthermore, the current study also revealed that the probability of mutations in the missense sites of MERS-CoV and SARS-CoV-2 is higher compared to SARS-CoV and BAT-CoV. This is due to the small number of missense variations in SARS-CoV and BAT-CoV, which have occurred due to selection pressure at missense sites.
The number of mutations in the Spike protein (S), the envelope protein (E), the membrane protein (M), the nucleocapsid protein (N) and the structural proteins were calculated. SNPs were filtered from the S, M, E, and N gene regions using a Python script. The S, M, E, and N genes revealed the presence of a varied number of SNPs. The Multialin online tool was used to detect the similarities between four coronaviruses selected for the current study.
Microsatellite Analysis
Microsatellite analysis is used to determine repetitive sequences in the genome. These sequences have a significant impact on the appearance of diseases and their evolution. In this study, microsatellite analysis was performed using the IMEX (Imperfect Microsatellite Extractor) and FMSD (Rapid Microsatellite Discovery) online tools. No significant presence of microsatellites was found using IMEX. However, FMSD revealed the presence of more microsatellites in MERS-CoV. The SARS-CoV-2 genome showed the presence of the highest incidence of composite microsatellites.
In summary, analysis of the phylogenetic tree showed that SARS-CoV-2 is closely related to BAT-CoV, and its second closest relative is SARS-CoV. All MERS-CoV strains showed a distal relationship with SARS-CoV-2. In the analysis of genetic variants, more mutations were found in MERS-CoV compared to SARS-CoV and BAT-CoV. Phylogenetic analysis, study of genetic variation, multiple sequence and microsatellite analysis, showed that the bat is the native host of SARS-CoV-2. Furthermore, it also concluded that BAT-CoV is closely related to SARS-CoV-2. There is a potential for the presence of an intermediate host to initiate transmission of COVID-19 from BAT to humans. However, more research is required to validate this assumption. The FMSD tool revealed that SARS-CoV is more closely associated with SARS-CoV-2 than with BAT-CoV.
Magazine reference:
- Rehman, AH et al. (2021). Comprehensive comparative genomic and microsatellite analysis of coronavirus SARS, MERS, BAT-SARS and COVID-19. Journal of Medical Virology, https://doi.org/10.1002/jmv.26974, https://onlinelibrary.wiley.com/doi/10.1002/jmv.26974