Recognizing the Advancements in Speech Recognition Technology

By Dominic Hart, NASA [Public domain], via Wikimedia Commons

Speech recognition tech has come a long way. The idea that computers are incapable of intelligently interacting with humans through speech is becoming passe. Talking with virtual entities that reside in computers is no longer limited to science fiction movies. Just look at Siri, Cortana, and the efficient Google Now. You can even put Samsung’s S Voice in the fray. The ability of machines to interpret sounds as words and understand these words to be able to respond through speech is indeed an impressive achievement.

Factors Affecting Speech Recognition

For this discussion, let’s focus on speech or voice recognition in the English language. Admittedly, the leaps and bounds of improvements in the technology, for now, mostly applies to the English language. The challenges in recognizing speech in non-English languages still run aplenty and will probably remain unaddressed for quite some time.

There are many factors involved in achieving good speech recognition technology. The most important among them are as follows:

Software and Hardware Matchup – The matching of computer hardware and software capabilities is arguably the most important factor in good speech recognition tech. It’s not enough to have an advanced software. The hardware, the memory and processor in particular, also have to keep up with advanced voice recognition software requirements. Thankfully, currently technology has already achieved this. Now, hardware sizes have shrunk but their computing power have expanded despite the miniaturization.

The software used in speech recognition needs to deal with a vastly extensive range of word combinations. It’s way beyond what a calculator does. Without the right hardware, an advanced software cannot function properly. Conversely, without a properly optimized software, speech recognition is unlikely to come to decent level.

Human Input

A speech recognition software has to deal with nuances in natural language such as the rearrangement of words or the use of idioms and synonyms. That’s why there is a need to extensively study the way people speak to properly configure a software in understanding what humans are saying. This takes a lot of time and, fortunately, a lot of time has already passed since speech identification software developers have started working on solutions to make computers capable of accurately interpreting and responding to human speech. That’s why we now have the likes of Google Now, Siri, and Cortana.


Obviously, the way speech is introduced to a computer (smartphone or tablet in particular) has to be something that insures audio quality. The microphones should be able to properly represent voices. As much as possible, background noises should be eliminated for a speech software to properly analyze and respond to voices. Again, we’re fortunate to already have the technology capable of clearly receiving vocal sounds while performing noise cancellation.

Kabilan29 at en.wikipedia [CC-BY-3.0 (], from Wikimedia Commons

Already Better than What You Think

In an article on, Will Oremus reckons that “speech recognition has gotten better than you think.” We tend to agree. Not many realize how good voice recognition technology already is at present. It is even going to become better in the future as companies like Google are working on making the technology more conducive to natural human speech or language in general.

Just take a look at predictive typing. Speech recognition technology currently employs the same predictive approach in trying to understand human speech. Instead of carefully interpreting the audio data collected by a microphone, most speech software now reconcile or match vocal data with possible combinations of words commonly used by humans in everyday speech. For instance, the phrase “an ice cream van” can be easily interpreted by a speech software as “a nice cream van” – but it does not. This is because speech recognition technology now uses a database of phrases, word combinations, idioms, and many other common variations in human speech to more accurately understand what humans are trying to say to machines.

By Idoelm (Own work) [CC-BY-SA-3.0 (], via Wikimedia Commons

Speech or voice recognition technology is only going to become better in the coming years. If you were already impressed with Siri’s personal assistant skills, you will likely see more improvements in the future. However, not everything will be something you would welcome with open arms. For instance, some companies are now running voice recognition enabled ads. This is a good demonstration of technological advancement, but it could become annoying to consumers. Some groups, on the other hand, frown over AI speech technology becoming accessible to children, as it is alleged to be causing problems in the right development of the personal and interpersonal skills of young learners.

There will always be those who would passionately advocate for the value of human-to-human verbal communication. They will likely assail the “compulsion” to converse with machines. However, it’s important to realize that there’s nothing wrong with speech recognition technology becoming better. The advancement does not replace or reduce the value of human communication or interaction. To criticize people for talking with their smartphones or tablet computers is simply a backwardly mentality. Speech recognition technology is not something people should be meeting with wariness.