HUAWEI CLOUD, Gold Medal Winner of WSDM Cup
Feb 29, 2020
A team led by HUAWEI CLOUD won the Gold Medal of the Citation Intent Prediction task at the Thirteenth ACM International Conference on Web Search and Data Mining (WSDM) held in Houston, USA.
WSDM Cup is one of the most important, most influential conferences on search and data mining in the world. This year marks its thirteenth session. WSDM Cup focuses on search and data mining on social networks. It particularly emphasizes search and data mining, for example the design and analysis of search algorithms and the experimental analysis of industry applications. The goal is to improve the accuracy.
This year's WSDM Cup had three tracks, and HUAWEI CLOUD took gold for the Citation Intent Prediction (Report Track). The challenge was to find the three most relevant papers in a library of 800,000 papers, based on a text description of the cited paper.
Academic papers contain the most cutting-edge knowledge in the world. If a computer can understand the information contained in these papers, its capability and scope of understanding can be greatly expanded. In a paper, the author often cites other papers and briefly describes those papers. If a computer can automatically understand and identify these citations, it can help deepen our understanding of the research context. Furthermore, knowledge graphs, automatic Q&A and automatic abstracting based on this information can enhance scientific researches.
HUAWEI CLOUD's solution to the task is "overall recall + re-ranking + ensemble", a strategy designed by the team led by the Language and Speech Innovation Lab of HUAWEI CLOUD. Members of the team are students from South China University of Technology, Huazhong University of Science and Technology, Wuhan University, and Jiangnan University.
First, in order to achieve a higher recall rate to ensure complete relevant papers can be returned, lightweight algorithms such as BM25, TFIDF, and Word2Vec were used to estimate the relevance of the papers. Then, more intensive, more precise computing was performed to calculate the similarity between these candidate papers and the description of the citation, and these candidate papers were rearranged based on their similarity values. Pre-trained language models, based on deep learning, such as BERT were applicable for re-ranking. The papers provided by the contest are from the biomedical field,therefore, we used the pre-trained BioBERT and SciBERT models based on biomedical corpus to re-rank the candidate papers. Finally, three most relevant papers were determined by integrating the results of all models.
This text matching technology used by HUAWEI CLOUD in the contest can be widely applied to search, chatbots, knowledge graphs, and other relevant fields.
In addition to this award, HUAWEI CLOUD has also won many other relevant and influential competitions, which is attributed to its full-stack technology in the field of natural language processing. In October 2019, HUAWEI CLOUD won the first place in the DigSci Science Data Mining (academic paper search contest), with a precision rate 5% higher than the second prize winner. In the final round of the 2019 CCF Big Data & Computing Intelligence Contest, HUAWEI CLOUD was the champion of the entity-level sentiment analysis in financial field.
In the real world, HUAWEI CLOUD's language and speech services have been successfully applied to fields that require voice recognition, language understanding, and knowledge management. These fields include but are not limited to government, finance, oil and gas, healthcare, automobile, logistics, insurance, e-commerce, taxation, and media.