Speech synthesis, also called text-to-speech, is the simulation of human speech.
Speech is the most important means of communication between humans. Sought after to improve communication and assist those with certain disabilities, text-to-speech has proven to be a challenging undertaking. Much progress has been made in recent years to improve the quality of simulated speech.
History and Progress of Text-to-Speech Technology
The first known attempt to create simulated speech was in 1779 by Russian Christian Kratzenstein. Professor Kratzenstein developed acoustic resonators using vibrating reeds, similar to those found in musical instruments. Other early pioneers include Charles Wheatstone and Alexander Graham Bell. Electrical speech synthesizers were first developed in the 1920s. From the 1930s on, much experimentation was done with varying rates of success. The biggest hurdle has been to make simulated speech understandable and sound more human.
Uses of Text-to-Speech Technology
The concept of simulated speech may seem of passing interest to some, but there are many whose lives have been enriched by the use of text-to-speech technology. Speech synthesis is an invaluable tool to improve understanding among those who suffer from Attention Deficit Disorder and learning or comprehension disabilities. Others who benefit include those who are learning to speak English as a second language, people with limited mobility, and the visually impaired. Businesses also employ text-to-speech technologies, usually for phone-in center help desks and customer service centers.
How Does Text-to-Speech Work?
Several steps go into the process of converting text to speech. The first task for the synthesizer is to separate each printed word and determine where punctuation marks are located. Next, the application must determine how specific words should be pronounced. For example, the word lead has two different pronunciations, depending on its use. The speech synthesizer must determine which pronunciation is correct, given the context. The final step is the audio transmission of the words. Using methods that may include real voices or strictly electronic means, a voice is produced.
Probably the biggest challenge in text-to-speech technologies has been the sound of the voice. Early voice synthesizers sounded robotic, using a dull cadence and no tone variation. In recent years, great emphasis has been placed on developing a more human-like voice synthesizer, complete with tone and pitch variations. Many text-to-speech applications now offer a choice of voices, including male and female voices of varying ages and accents.