skip to content
Article, Developers Blog

Transforming How We Interact with Our Cars : Natural Language Speech Recognition

2019-11-20

The era has come when navigation is searched by voice while driving. Hyundai Motor Group's first natural language voice recognition service in the 8th generation Sonata, how was it developed?

Talk to your car just like you would talk to a friend! The 8th generation SONATA is equipped with natural language processing, a HMG first. We met with the engineers behind the development of the natural language processing and heard about the exciting new possibilities that it brings to customers.

Is the new SONATA the first HMG model equipped with natural language processing?

Yes. We began to research server-based voice recognition technology with natural language processing. Unlike virtual assistants for home speakers, many additional factors have to be considered, besides voice recognition success rate and network speed, for automobiles. A decision was made early on to make fit SONATA, our bestselling midsize sedan, with the new system. The virtual assistant will also be incorporated into GENESIS G70 and PALISADE.

How much of a difference does voice recognition make compared to touch-screen control?

Suppose you are setting the destination on a navigation system. It takes at least 30 seconds to enter your destination using a touch screen. There are so many steps including finding the search button, touching the screen to bring up the onscreen keyboard, entering the destination letter by letter and so on.

However, all this can take less than 10 seconds if you use voice recognition. Simply tap on voice recognition and say ‘Take me to Hyundai dealership’ or any other destination. You can also keep your eyes on the road much better, keeping everyone safe.

How is natural language processed so that users can be provided with the information they ask for?

The microphone installed in the cabin collects and transmits sound to the voice recognition engine which digitizes the voice signal and processes every millisecond. The processed voice data is put through the voice and language model, which is trained using the voices of many different people which is then translated into text. This process is called ‘dictation’. The dictation result is then analyzed again to identify the speaker’s intention. When the intention of the voice input is identified, the system searches for the information requested whether from a map, the news or somewhere else and displays the results on screen with a voice output. It is quite a complicated process, but it is all done in just a few seconds.

What kind of technologies are necessary to make natural language processing work?

Gyeong-chul Lee, Senior Researcher

First, extensive knowhow is necessary to improve voice recognition. We need a huge amount of spoken natural language and in text data form. The voice recognition also needs to be interlinked with popular applications so it can deliver information as requested. A highly functional voice recognition engine is a must.

Voice recognition is incorporated in many apps and devices. However, people do not seem to be using it much. Why is this?

This is because voice recognition technology was first incorporated into smartphone apps. When using a smartphone, you can actually get your result faster if you use your fingers. A lot of smartphone apps try to offer too many functions and end up performing things that are not in line with the voice input. However, the so called A.I. speakers have created an environment where voice input is much more convenient. The personal assistant speakers made voice processing popular especially for people searching for music.

It is same for SONATA. The SONATA’s system was specially designed to provide the information drivers need most while driving on the road. Voice recognition is superior to a touch screen when you need to keep your eyes on the road and yours hands on the steering wheel. Even though our system is still only an early generation version it already allows drivers to search for places, weather information and news. The climate control system can also be operated with voice recognition. We plan to increase the number of services which can be accessed via voice input.

What was your biggest challenge whilst developing natural language processing for automobiles?

A natural language processing system for automobiles must work while the system is on the move. More specifically, voice recognition must work seamlessly while the vehicle is moving at highway speeds typically above 100km/h. One of the biggest challenges is ensuring a high recognition rate despite cabin noise. The biggest challenge was improving the voice recognition engine which works well regardless of cabin noise in a variety of driving environments.

Do different types of vehicle require different types of natural language processing technologies?

Jae-min Jo, Senior Researcher

No, not really. The search function is the same for any vehicle. However, some of the interlinked control functions could be different in different vehicles. However, because of their different noise characteristics the system has to be adapted for different vehicle models.

What would be the next step for the voice recognition system after the SONATA? How would it evolve?

Our goal is to improve overall performance by identifying users’ intentions more accurately and generating relevant outcome. We also need to analyze drivers’ usage patterns. We plan to add new features for improved user convenience over time and to improve the user interface as well.

How will voice recognition technologies be used in the connected cars of the future?

When fully autonomous vehicle becomes a reality, drivers will be free to do something other than driving. Voice recognition could be used to control whatever the driver wants to control in the car. Yes, I think drivers should be able to search for anything they need using their voice in the connected cars of the future. We are making improvements to the system to reach that level by adding new search functions and interlinking.