Speech Recognition

There are many different speech recognition packages on the market. Every one of them will work with AutoCaption:

AM Technologies™ WhyType™
IBM® ViaVoice™
Nuance's® Naturally Speaking™
Philips® -- SpeechMagic™
Realize Software Corporation™ -- Realize Voice™
Sail Technologies™ Media Mining™

Even operating systems are getting into the voice recognition act. Apple Speech Recognition™ makes speech recognition available to Apple OS X® applications. Microsoft® MS Speech™ makes speech recognition available to all Windows® applications -- including AutoCaption. (Which makes us wonder why some caption kits charge $3,000 for something that's free.)

You can create an AutoCaption transcript simply and conveniently, no matter whether you use a specialized application or just take advantage of the free speech recognition built into your operating system.

What to look for in voice recognition

Speed, vocabulary, discrimination and accuracy are voice recognition challenges.

Speed is how rapidly a speaker can talk before the system begins to fail. Most people normally speak at around 180 to 230 words per minute -- although our resident teenager seems to zoom along at more than 300 words per minute.

Vocabulary is how many languages, technical words or terms-of-art will be recognized. Many voice recognition systems offer additional "vocabularies" for foreign languages, medical, math and legal speech.

Accuracy is simply how useful the finished transcript is without editing. Each voice recognition package manufacturer makes their own accuracy claims. In our experience 90+% accuracy can be obtained if you have an excellent audio system and have diligently trained the system to a single voice.

Some limitations

We've yet to see a speech recognition package that will consistently make passable transcripts from speakers it wasn't trained on. To make matters worse we've yet to see speech recognition that generates useable punctuation unless the speaker articulates the punctuation.

As a rule, speech recognition will do a pretty nice job on your voice if you speak slowly, clearly and consistently. Be aware, however, that most of these systems are designed to "learn" one speaker's speech patterns and aren't very good with multiple speakers, accents, or when there is background noise or music.

Transcript "cleanup" will always be necessary.

At a minimum, conflicts between homonyms like "too," "to," "two" and "2," or "so," "sow," and "sew" must be resolved; and, punctuation must be added or corrected.

The punctuation cleanup task is particularly time consuming in extemporaneous events, like talk shows, reality shows, lectures, sermons and interviews. That's because most people don't naturally speak in complete sentences and it takes a bit of mucking around to make comprehensible captions.

Experienced voice recognition software users develop efficient ways to vocalize punctuation -- a popping sound for a period, a long "ess" for a comma and so forth. While it sounds comical and takes a while to learn, the technique is quite efficient.

Some AutoCaption users report that voice recognition doesn't really save any time compared to keyboard transcription, but they use it anyway to help protect their captioners from repetitive motion injuries. These folks have employees captioning all day every day and don't want lose good captioners or see their workman's compensation insurance go through the roof.

"In Louisiana, we have a problem with Southern drawl and what we call lazy mouth" reports Capt. John Dunn who is in charge of the Shreveport police department's voice recognition equipment. "Because of that, the system [non-emergency call routing] often doesn't recognize what [callers] say" he continues, observing that more often than not calls are routed to the wrong place.

According to CNN and the Associated Press (11/17/03) even the interim Chief of Police, Mike Campbell said "I can count on one hand when I have been transferred to where I've wanted to go, and I know the system. I can imagine how frustrating it must be for a citizen."

Go Ahead, Use Speech Recognition

What to look for in voice recognition

Some limitations