Amherst Bytes
By Devindra Hardawar, Columnist
Would you ever want to talk to your computer? Would you be surprised if I told you that's what I'm doing right now? Speech recognition technology is nothing new to the computing world, but it's only recently that the technology has gotten to the point where it could viably replace the keyboard. The software I'm using, Dragon NaturallySpeaking 9, is getting rave reviews from The New York Times and other media outlets. Although the program has received praise for unforeseen levels of accuracy and ease of use, I personally found it difficult to get through this first paragraph. Still, with the reemergence of this technology, I would like to discuss where speech recognition technology has been, and more importantly, where it is going and how it will affect us.

(At this point I am turning off the NaturallySpeaking software and throwing my microphone across the room.) That first paragraph took me nearly 45 minutes to dictate to Dragon, whereas it would have taken me all of 10 minutes to type it. I'm sure that the experience will get exponentially better if I dedicate myself to mastering the program, it's certainly not the best possible introduction to the world of speech recognition technology. Version 9 may be a considerable upgrade for hardcore NaturallySpeaking users, but it's no better than its predecessors for the average user who just wants to be able to talk instead of type.

NaturallySpeaking 1.0 was released back in 1997 and its creators spent the next few years promoting speech recognition as the logical upgrade from the "archaic" keyboard. Several competitors appeared along the way including IBM's ViaVoice, but none of the offerings ever really wowed the public. NaturallySpeaking remained the leader in the market not because it was a perfect speech recognition solution, but because it just happened to be better than its competitors. It retains this reputation to this day.

While the implementation of NaturallySpeaking was (and still is) flawed, I can't help but admire the tenacity of its developers. The perfection of speech recognition technology is undoubtedly the next "big thing" for computing. Beyond its utility for dictation, it will allow us to give voice commands to our computers and other electronic devices. The NaturallySpeaking folks are in league with academics when it comes to figuring out the best way to talk with our computers.

It's not that difficult to see how perfectly accurate speech recognition will help increase the usability of our technology. Imagine never having to take your hands off your car steering wheel while you command it to change radio stations and regulate the temperature. Think of the myriad commands you'd rather shout at your computer instead of digging through menus within menus. Also, consider how much of an effect the rise of speech recognition will have on text-to-speech. When computers understand our voices we will be able to give them voices of their own.

The big question now is to figure out how exactly to perfect speech recognition. I don't think that the NaturallySpeaking developers will ever stumble upon the solution if they continue with their current method of speech analysis, which basically involves changing your own method of speech to that of a newscaster so that the software can more easily parse your words. That's old school computer usability: Make your users change to deal with the limitations of the technology instead of the other way around. Instead of our having to change the way we speak for the computer, the computer must figure out how to decipher speech from the infinite varieties of language dialects and accents. Also, let's not forget that speech recognition of the future shouldn't require us to have to walk around with microphone headsets all day; it will have to separate our voices from background noise, which will often contain other voices as well.

I suspect that whatever ends up solving the speech recognition dilemma will have to involve artificial intelligence (A.I.) in some way. This means one of two things will happen: Either we'll get perfect speech recognition early (2010-2015) because A.I. has advanced sooner than we expected, or we'll get it a bit later when we're currently expecting A.I. to hit its stride (2015-2030 according to extrapolations by MIT's Ray Kurzweil). In the end we know for certain that perfect speech recognition is coming at some point, and most likely our children will grow up with it being as common as the Internet is to teens and 20-somethings today.

While Devindra can't yet talk to his computer, you can contact him at dahardawar@amherst.edu to keep him company.

Issue 02, Submitted 2006-09-27 23:05:13