March 14, 2016
Speech Is the Ultimate Invisible Computer Interface
In the next 10 years more than 50% of computer interactions will be via voice. The computer, the device and the legacy interface will disappear, all that will persist is the volition, intention, interaction and results.
In the summer of 1952 Bell Laboratories actively tested Audrey (Automatic Digit Recognizer)  the first speaker independent voice recognition system that decoded the phone number digits spoken over a telephone for automated operator assisted calls.
Schematic of Audery the first speaker independent voice recognition system.
In 1962 IBM demonstrated at the World’s Fair its Shoebox machine , which could understand 16 words spoken in English and was designed to be a voice calculator.
Demonstration of IBM's Shoebox at 1962 World’s Fair.
Moving forward in time there were hundreds of advancements. Most of the history of speech recognition was mired in speaker dependent systems that required the user to read a very long story or grouping of words. Even with this training accuracy was quite poor. There were many reasons for this, much of it was based on the power of the software algorithms and processor power. Additionally continuous speech recognition, where you just talk naturally has only been refined to a great extent in the last 5 years.
In the last 10 years there has been more advancement then the last 50 years. The line back to 1952 on to 2016 moved speech recognition to be one of the most important technology advancements in computer history.
Speech Requires Less Mechanical Load And Cognitive Load
The most powerful and efficient interface for communication is the human voice. It sounds obvious in this context and it has had a few million years of evolutionary development. Yet we take speech quite for granted as we only recently took to a mechanical system (typing, clicking, pointing) to interact with computers.
Human speech is a far more refined tool that can convey densely packed instructions and requests in-situ more effectively. The mechanical load and cognitive load on the human is far lower when we can utter a phrase like "Alexa, what does my commute look like?" as compared to the 30+ cognitive and mechanical steps using the best smartphone and best apps. The alternative to speech requires the cognitive load on the brain and mechanical load to type with the cognitive load on the brain to interpret what a map may be relating. Simply asking a question is far more superior.
Speech based interactions fundamentally have three advantages over current systems: