Meet the speech recognition pioneer: Magic Data Technology provides valuable data for machine learning and improve performances of AI models

The Silicon Review

Voice interaction is one of the most fast-growing markets on a global scale. In actual scenes where AI technologies are applied, voice-interactive interfaces are commonly used for commanding smart devices, offering human-like services, converting speech to written or visual information, as well as improving efficiency of communication. The implementation of these functions is based on models and algorithms trained with huge amount of speech data suitable for specific scenario. Thus, appropriate data is often the critical deficiency in development of voice-driven AI technologies.

In response to this demand of data, Magic Data Technology is committed to motivate application and development of AI technologies by providing data solutions to AI technology developers. Founded in 2016 and headquartered in Beijing, China, Magic Data Tech is today one of the leading AI data service providers in the world. Its services cover multiple fields of AI: Automatic Speech Recognition(ASR), Text-To-Speech(TTS), Natural Language Processing(NLP), as well as Computer Vision(CV). These services are customized in accordance with specific data requirements.

Zhang Qingqing is the CEO of Magic Data Technology. She spoke about the company in an exclusive interview with The Silicon Review. Below is an excerpt.

Q. How to improve the accuracy of speech recognition from the perspective of data?

Machine learning is where a build model learns automatically to carry out a specific task by being trained with a large amount of labeled data. So is the case of speech recognition and text-to-speech. At present, we have developed a wide range of speech datasets in multiple languages, collected from various scenes. These datasets are intended to improve models’ performance.In models’ training, the pertinence of data is a key factor affecting models’ performance in specific fields and scenes. For instance,in-car infotainment models need to be trained with data collected from the driving sectors. We analyze clients’ needs deliberately in order to propose high-quality datasets corresponding to their application scenes. Magic Data Tech has formulated a series of tagging rules to meet actual needs, based on the insights into AI industry and AI application scenes. This makes our datasets easily readable for AI models.

Q. While voice recognition technology recognizes most words in the English language, it still struggles to recognize names and slang words. What kind of solution do you offer to eliminate the negative impact of limited vocabulary on speech recognition result?

OOV (out of vocabulary) words’ is an inevitable pain point, as the natural languages are inherently evolving. We are dedicated to solving this problem by engaging our intelligent platform in pronunciation dictionary production and expansion. This platform permits automatic prediction of new words’ pronunciations, based on AI models trained with existing entries, giving reliable cues to annotators. In this way, our pronunciation dictionaries could be produced and updated rapidly. In addition, ASR models require an amount of speech data to learn new words. Magic Data Tech provides not only large-scale pronunciation dictionaries but also corresponding speech datasets in various languages, with diverse corpus designed.

Q. How has AI transformed the Data Service Process?

AI is helping to drive significant improvements to data quality in various ways. Magic Data Tech has a complete data processing system with a human-in-the-loop setup. In view of the characteristics of big data (which is unstructured, heterogeneous, and needs fast processing), we use intelligent data sorting technology to classify and to store big data, intelligent data pre-labeling technology to assist the manual structuring process, and intelligent quality inspection technology to improve efficiency and accuracy of quality control. Through supervising the whole process with algorithms and optimizing the production efficiency of the supply terminal, we contribute actively to the coming of ‘4.0 era’ of big data structurization.

Q. Data centers are expected to continue to play a vital role in the ingestion, computation, storage, and management of information. How do you interpret this?

Data security is one of our major preoccupations. With the support of our data processing centers, we are able to provide both private deployment and IDC (Internet Data Center) services, which ensure the data security. Moreover, our data processing platform supervises the whole dataset production process, from data collection to quality control, making all process automatic and traceable. Our platform provides data encryption services through the whole production process from transmission, storage to delivery. Besides, with the development of AI technology, localization of data processing becomes an important trend. Magic Data Tech is capable to deliver its services all over the world owing to our data processing centers. These centers allow us to engage speakers and annotators worldwide, and contribute to local ingestion and processing of data.

Q. Do you have any new services ready to be launched?

Our products and services keep pace with new trends in the fields of market and technology. NLP has been a hot topic especially since the release of BERT. For text processing in the NLP field, we will develop a specialized SaaS system, which will help clients to build datasets more efficiently in the field of text analysis.

Stalwart behind the success of Magic Data Technology

Zhang Qingqingis a speech technology expert, technical director of AI Data services. She had her Ph.D. degree from the Institute of Acoustics Chinese Academy of Sciences. And she used to be associate researcher at Chinese Academy of Sciences after postdoctoral studies from LIMSI-CNRS, France.