Homologous Modeling Results of All Key Proteins of SARS-CoV-2Mar 05, 2020
In the face of the current severe COVID-19 epidemic, the joint team formed by HUAWEI CLOUD EIHealth, professor Li Yan from the School of Basic Medicine of Tongji Medical College of Huazhong University of Science & Technology, professor Liu Bing from the First Affiliated Hospital of Xi'an Jiaotong University, researcher Han Dali from the Beijing Institute of Genomics of the Chinese Academy of Sciences, and Dr. Ke Zunhui from the Sixth Clinical School of Tongji Medical College of Huazhong University of Science & Technology conducted ultra-large-scale computer-aided drug screening for multiple target proteins of SARS-CoV-2. The following are the results and methods of homologation modeling for all key proteins of SARS-CoV-2:
Protein Homologous Modeling
Protein homologous modeling means that when a protein with an unknown structure is similar to a protein with a known structure in a primary structure, the protein with a known structure can be used as a template for computer simulation and calculation. A 3D simulation of the unknown protein is constructed based on its primary structure.
Single-sourcing modeling is based on two assumptions: 1. The structure of a protein is determined by its amino acid sequence. If the first-order sequence is known, the second-order and third-order structures can be obtained theoretically. 2. The tertiary structure of proteins is more conservative in evolution than the primary structure. If the amino acid structures of two proteins are 50% the same, then about 90% of their a-carbon atoms have a position deviation of no more than 3 μm.
Building 3D Structure of SARS-CoV-2 Proteins Using SARS-CoV Protein Structure as Template
SARS-CoV-2 and SARS-CoV are very similar, and the consistency between their amino acid structures reaches 76.47%. In addition, the 3D structure of the SARS-CoV proteins has been parsed out. Therefore, the 3D structure of these known proteins can be used as a template to construct the protein structure of SARS-CoV-2 and guide drug development.
Data and Method
Primary Structure Extraction of SARS-CoV-2 Proteins
For the genome structure of SARS-CoV-2, refer to the structure disclosed on NCBI: NC_045512.2. Due to the lack of genome annotation data, the primary structure of all functional proteins cannot be obtained directly. Therefore, the structure corresponding to each SARS-CoV protein is compared with the SARS-CoV-2 genome, and a comparison region of the best match is selected as the primary structure corresponding to the proteins in SARS-CoV-2. In this way, we have obtained a total of 20 protein structures of SARS-CoV-2, including 16 non-structural proteins (NSP1-16) and 4 structural proteins (S, M, N, E).
SWISS-MODEL is a fully automated protein structure homology-modeling server. It allows you to enter the sequence of the target protein or specify the reference protein template. If no template is specified, the system automatically selects the template with the highest degree of matching.
The following shows the procedure and results of single-sourcing modeling using SWISS-MODEL.
Step 1: Enter the protein structure.
On the SWISS-MODEL home page (https://swissmodel.expasy.org/), click Start Modeling and upload the protein sequence in the FASTA format. Click Search For Templates to search for the optimal template proteins.
Step 2: Select a proper template protein.
SWISS-MODEL provides the template proteins matched by the primary structure, degree of matching of each template protein, and template protein parameters. The optimal template protein is selected for homologous modeling based on the following standards:
1. To ensure consistency, the identity between the target and template proteins in the primary structure should be over 30%. The template protein with the highest identity is selected preferentially.
2. The SARS-CoV template protein is preferred for homologous modeling.
3. When identities are similar, the template for constructing the crystal structure using the high-precision X-ray method is preferred. If X-ray is unavailable, check the protein structure resolution in the PDB. The one with a higher resolution is preferred.
4. If Oligo-State has two values, homo and hetero, select both of them.
Step 3: Single-sourcing modeling
After selecting the optimal template protein, click Build Models on the page to automatically perform single-sourcing modeling. For proteins with short sequences (<100 residues), the process usually takes a few minutes. For proteins with a long sequence (>1000 residues), the process usually takes about 20 minutes. After the modeling is completed, the 3D spatial structure of the template and target proteins can be directly downloaded for subsequent analysis.
Step 4: Molecular dynamics simulation
Protein structures derived from homologous modeling can be used in molecular dynamics simulations. This process can be performed using tools such as Gromacs, which is normally time-consuming. HUAWEI CLOUD EIHealth provides an accelerated version of Gromacs that completes this process six times faster than the traditional version.
Modeling Result Statistics
Of the 20 proteins of SARS-CoV-2 whose primary structures have been modeled, 15 have high homology compared with SARS-CoV proteins, and the identities are over 70%. The 3D spatial conformation of these proteins is similar to that of template proteins.
One of the proteins, NSP4, does not have a good homologous protein in SARS-CoV. It is modeled with A59 of hepatitis viruses in mice, and its identity is greater than 60%.
The homologous modeling results of another four proteins are not ideal. NSP2, NSP6, and M do not have good target templates. The identity of the protein sequence with the best matching degree is less than 30%. In addition, the length of the NSP11 protein is only 11 residues, which is too short and does not meet the modeling requirements.
The following table lists the lengths of the target proteins, the selections of the template proteins, and the parameters for homologous modeling.
The 3D structures of template proteins and SARS-CoV-2 proteins obtained from homologous modeling are saved in PDB format. To facilitate query, the Notebook tool of the HUAWEI CLOUD EIHealth platform provides plug-ins and tools required for visualization. You can interactively drag and display the 3D structure of any of the proteins.
The preceding data, algorithms, and tools have been integrated into the HUAWEI CLOUD EIHealth platform. Using the powerful computing capability of the HUAWEI CLOUD Ascend AI clusters, you can complete end-to-end analysis in a time- and labor-efficient manner.