Seeking a confidant in the mountains and rivers, from "hearing" to "understanding"

Allusions to the mountains and rivers are well known。Friend is hard to find, not only the rhythm, but also need to understand。Coincidentally, the poet Li Bai traveled to Luoyang (kaiyuan twenty-three years) and wrote 《Spring Night Luocheng Wendi》: " Whose jade flute ripples in Luoyang City with the spring breeze. Tonight, when I heard my hometown song "Fold Willow", my homesickness sprang up ", which is expressed the same emotion.  The poet heard the faint sound of the flute, and was intoxicated with it, as if scattered in the city with the spring breeze; the poet understood the "folding Liuqu" and touched the feelings of the brigade.

Voice is a non-contact method of information transmission and has inherent advantages in various voice interaction scenarios such as home, customer service, vehicle, education, and medical treatment. The artificial intelligence technology of speech is to solve the whole problem from "hearing" to "understanding", to identify the "flute" in subtle places, analyze and understand the " Fold Willow " in the song, corresponding to the semantics given by the knowledge and culture , eventually forming perception.

Soaring up 90,000 miles, rooting down more than 100 feetIn the African grassland, a plant called "king of grass" is pointed grass. In the first six months of growth, it was only one inch tall, seemingly weak and small, and people could not even see it growing. But half a year later, when the rainy season arrives, the spike grass can grow to a height of one or two meters in a few days. Studies have shown that, for up to half a year, the tall grass has been growing fastly, but it ’s only the roots, which often exceed 28 meters in length. Such an amazing phenomenon verifies an old saying: Reading should be broad and good at extracting the essence, and must be rich in accumulation and use knowledge carefully. The construction of algorithmic capabilities of voice services and the deep cultivation of customer scenarios require the determination of deep-rooted beliefs to explore deep-level technical principles. In this issue, we will continue to share the latest progress of Huawei Cloud Algorithm Innovation Lab in the field of speech, including key algorithm innovations such as cry detection, keyword spotting, and customized speech recognition.

1. Cry recognition

Current Progress:

1.  Crying recognition algorithm landing Puffin AI panoramic camera, product link:https://www.vmall.com/product/10086322059741.html

2.  Launched on Hilens Skill Market, related links:https://www.huaweicloud.com/product/hilens.html

2. Keyword spotting

Current Progress:

The model size of cloud-side keyword spotting is nearly about 1.8MB, the wake-up rate is greater than 95% and the false wake-up rate is less then 1 times / day. The performance of the service reaches the industry-leading level, and has been launched on ROC assistant

The end-side model is trained with dynamic_rnn and dumped to static_rnn, which greatly compresses the model. The model size is less than 500Kb, memory usage is less than 1MB, CPU usage is less than 10%, wake-up rate is greater than 95%, false wake-up rate is less then 1 times / day. Has been successfully deployed on Hisi3516EV300 / Hisi3518EV200 HiLinux.

3. ASR

Current Progress:

 

At present, the development of online decoder engine has been completed, including front-end vad, audio format processing, customized hot words, core decoding and other functional modules.The engine currently supports streaming real-time stream recognition, phrase sound recognition interface. At present, the optimal model effect is better than the Huawei Cloud Xunfei engine on the 3 public test sets and 8 live network customer test sets, and the average recognition gap between the Huawei Cloud Jietong engine is within two percentage points.