May 08, 2020

Recently, HUAWEI CLOUD EIHealth uses the self-developed AI modeling and knowledge graph technologies to quickly build a new AI-based knowledge graph with multiple records of COVID-19 academic publications, helping scientific research personnel quickly and effectively carry out anti-virus scientific research and drug R&D.

In early 2020, a new virus called SARS-CoV-2, also known as the novel coronavirus, spread rapidly around the globe. At the beginning, scientists did not understand its pathogenesis and transmission mechanism clearly, which posed great challenges to disease prevention and treatment. As researchers in numerous countries continue to collect data on SARS-CoV-2 from clinical research and laboratory studies, a large number of articles are published in scientific journals in a short time. So far, more than 2,000 articles related to the SARS-CoV-2 have been published on several mainstream preprint servers.

To help researchers effectively summarize and query knowledge in massive documents related to the virus, HUAWEI CLOUD uses the ModelArts Pro knowledge graph suite to automatically extract entities and relationships from the scientific publications. The first SARS-CoV-2 research knowledge graph was constructed, consisting of medicine, disease, viral protein, human protein and other different types of entities.


One of the difficulties in this work is to identify domain named entities precisely and extract their relationships. The R&D team uses the HUAWEI CLOUD Knowledge Graph Service and the pre-training language model BERT-MK (BERT-based language model with Medical Knowledge), which is co-developed by Huawei Noah's Ark Lab and HUAWEI CLOUD based on the latest research results in the medical field. Besides, the knowledge graph also integrates multiple deep semantic representation and retrieval technologies to achieve a better performance. It is worth mentioning that HUAWEI CLOUD Language and Speech Innovation Lab, one of the knowledge graph contributors, has won the first place in multiple scientific literature mining events, including DigSci 2019 and WSDM Cup 2020.



In addition, we provided a visualized query tool for users to clearly view the knowledge points and associations in the knowledge graph. The graph enables quick tracing of its information sources and can identify relevant articles and relevant information within them. For example, by querying the drug lopinavir, we can find that it has certain effects on the Mpro protein and HIV protease of SARS-CoV-2. Digging a bit deeper, we can find that colistin and nelfinavir also have specific effects on the Mpro protein of SARS-CoV-2. By viewing specific relationships, we can directly obtain their source articles. The graph can help biomedical researchers efficiently study virus mechanism and virus protein interactions, and help drug R&D personnel find drug targets and develop vaccine more effectively.

In the midst of the global battle against the coronavirus, HUAWEI CLOUD and Professor Chen Huajun from Zhejiang University have teamed up to release multiple scientific research knowledge graphs on SARS-CoV-2 in OpenKG, including a virus classification graph and an antiviral drug graph. In addition, the HUAWEI CLOUD EIHealth team and researchers have been working together to launch a series of anti-virus genome services, medical imaging services, and antiviral drug filtering services, resulting in more comprehensive R&D around the world.

