Editor’s note: Voice recognition is joining the ranks of the world’s big technological revolution. A Shenzhen startup aims to turn itself into a leading machine hearing solution provider in a few years and make a significant contribution to mobile communication.
(BILLY WONG / CHINA DAILY)
While the smartphone has now become an all-powerful and indispensable tool for many people, it seems little progress has been made to improve on one of its very basic functions — phone calls.
On many occasions, users get irritated by the unpleasant experience of having trouble hearing what the party on the other side of the line is trying to say, particularly on busy streets with the incessant and indiscriminate honking of vehicles, or in an uproarious restaurant or bar.
Traditionally, phone makers use filters to get rid of the noise by recognizing the sound frequency and pattern, while the most common solution now is to add one or two more microphones to enhance such function.
The traditional method, however, is not enough in dealing with the constantly changing sound environment these days, especially when voice interaction is becoming the new communication channel between human and machines
Miao Jianzhang, founder and chief executive of Elevoc Technology Co
“The traditional method, however, is not enough in dealing with the constantly changing sound environment these days, especially when voice interaction is becoming the new communication channel between human and machines,” said Miao Jianzhang, founder and chief executive of Elevoc Technology Co.
“The first step in human-machine interaction is to hear clearly, but it’s difficult for a robot to hear the user five or six meters away in a noisy environment based on traditional voice enhancement methods.”
One of the most common applications of machine hearing technology is the use of digital assistants in mobile phones, smart speakers and other IoT gadgets, while the primary factor for a user’s choice is the accuracy rate, an iResearch survey shows.
Meanwhile, about 68 percent of users believe these products need to improve on the accuracy rate — highest percentage compared to their timbre, speed of response or new functions.
“Many factors can lead to voice recognition mistakes, including fast speaking, dialects, disrobes of noise or other people’s voice, and different channels of voice source. Therefore, denoising and voice enhancement is significant in increasing voice recognition’s accuracy,” said Sun Shuo, an analyst of consulting firm iResearch.
According to global research company MarketsandMarkets, the value of the overall interactive voice response market is projected to grow from $3.73 billion in 2017 to $5.54 billion by 2023.
However, competition is stiff. Not only industry players, such as Anhui-based iFlytek, a global leading company in voice-recognition technology, but also internet giants, including Tencent and Baidu, have invested huge sums in developing the technology.
Miao — the 31-year-old founder of Elevoc — believes that his company’s advantage lies in the unique front-end signal processing technologies powered by machine learning.
By applying deep learning technologies, Elevoc has developed and deployed a real-time implementation of single-channel noise cancellation algorithm which, it says, is the first of its kind in the world.
The algorithm can separate background noise and speech, which can be enhanced without having to add extra hardware, thereby reducing hardware costs and tuning time.
To demonstrate its effect, Miao compared two clips of sound — one original recording and the other processed by Elevoc’s noise cancellation algorithm.
In the first clip, echoes and the sound of fingers knocking on a desk can be heard, but they’re gone in the second clip.
“The technology is like teaching AlphaGo to play the game of Go. We’ve inputted up to 1 million hours of sound samples to train the system so that it can ‘just focus’ on human speech in a diverse environment and enhance them,” said Miao.
Based on its cutting-edge signal processing algorithm, Elevoc has also developed its own voice interaction system.
Miao aims to turn the company into a leading machine hearing solution provider and an industry standard writer in three years. Established in February, 2017, the budding enterprise has acquired several customers in the mobile communication, VoIP (Voice over Internet Protocol) and robot industries.
Elevoc’s speech enhancement solution “Vocplus Telecom” for mobile phones has been applied to the latest product of Smartisan Technology — a domestic phone maker with an annual shipment of 3.4 million units in 2017.
Miao said several other smartphone makers are also testing their technologies. What’s worth noting is that Xiaomi and US-based chipmaker Qualcomm took part in Elevoc’s pre-A-round financing of dozens of millions yuan as leading investors earlier this year.
The investment will help expand the Shenzhen-based startup’s R&D team and broaden its technology services to meet customer demand from smart phones, robots, intelligent home appliances, wearable devices and vehicle-mounted system industries.
HONG KONG NEWS