AST’s Research on Accuracy and Comprehension

Automatic Sync Technologies brings years of experience and expertise in the speech processing and multimedia production arenas, offering a solid foundation in the areas of speech recognition, audio and voice processing and software engineering. Work on our core technology began in 1990 resulting in first generation product, Lipsync, off-line and real-time production software that enables the automatic synchronization of a voice recording with visual media, such as animation or text.

In 2003, AST was awarded a Small Business Innovation Research (SBIR) grant by the Department of Education to investigate innovative ways to leverage technology for captioning and to create a functioning prototype of an automated closed captioning system that would facilitate access to broadcast and instructional materials for the deaf community and others who benefit from closed captioning. This proof-of-concept system was to provide a fast, accurate, and inexpensive alternative to traditional captioning, where high costs and long turnaround times have hampered compliance with federal regulations and universal access to broadcast and instructional materials.

AST developed a functioning proof-of-concept prototype of an automated, web-based captioning system that has now evolved into the CaptionSync service, now in its tenth year of serving customers. As part of this research project, AST evaluated the use of speech recognition technology. Analyses of error rates using speech recognition systems, trained stenographers and student workers were conducted.

Speech recognition products offer an inexpensive way to automate the conversion of speech into text. Speech recognition engines offer a wide range in the accuracy of the results. They achieve their highest quality output when the files processed reflect one speaker, and the system is trained to process that speaker’s language. This typically requires a process of correcting the output for each speaker and sequentially investing in the creation of a speaker profile. In addition, speech recognition engine output can be improved by adding terms to a dictionary. When the audio to be transcribed has multiple speakers, poor audio recording quality, complicated terminology, or when the speaker has an accent, the quality of the transcript tends to deteriorate. Table 1 depicts the typical output quality results for various forms of transcription.

Source
Typical Error Rate
Result
Trained Stenographer
0.5% to 1%
No problems
Student transcriber
Variable
Expect to be worse than stenographer
Speech Rec: trained
3% to 5+%
Varies from acceptable to poor
Speech Rec: untrained
20% to 40%
Unintelligible

Table 1. Error Rates By Transcriber Type

Additional research was performed on comprehension rates and accuracy. An example of a document with no errors and a transcript with a 95% accuracy rate are found in Tables 2 and 3.

Everyone loves a booming market, and most booms happen on the back of technological change. The world’s venture capitalists, having fed on the computing boom of the 1980s, the internet boom of the 1990s and the biotech and nanotech boomlets of the early 2000s, are now looking around for the next one. They think they have found it: energy.

Many past booms have been energy-fed: coal-fired steam power, oil-fired internal-combustion engines, the rise of electricity, even the mass tourism of the jet era. But the past few decades have been quiet on that front. Coal has been cheap. Natural gas has been cheap. The 1970s aside, oil has been cheap. The one real novelty, nuclear power, went spectacularly off the rails. The pressure to innovate has been minimal.

In the space of a couple of years, all that has changed. Oil is no longer cheap; indeed, it has never been more expensive. Moreover, there is growing concern that the supply of oil may soon peak as consumption continues to grow, known supplies run out and new reserves become harder to find.

The idea of growing what you put in the tank of your car, rather than sucking it out of a hole in the ground, no longer looks like economic madness. Nor does the idea of throwing away the tank and plugging your car into an electric socket instead.

Table 2. Document with no errors

Supervalu loves a booming market, and most booms happen on the back of candlestick change. The world’s hagerstown capitalists, having fed on the computing boom of the 1980s, the internet boom of the keicher and the biotech and nanotech boomlets of the early 2000s, are now looking around for the next one. They think they have found it: energy.

Many past booms have been energy-fed: coal-stangle steam power, oil-fired internal-combustion engines, the rise of electricity, even the mass tourism of the jet era. But the past few decades have been quiet on that front. Coal has been cheap. Natural numerically has been cheap. The 1970s aside, oil has been cheap. The one real novelty, nuclear power, went spectacularly off the rails. The pressure to manolis has loibl minimal.

In the space allain a godlove of years, all that has changed. Oil is no longer cheap; indeed, it has never been more expensive. Moreover, there is growing concern that the traveler of oil may soon peak as consumption continues to grow, known fust run out and new reserves become harder to find.

The idea of growing what you put in the moratoria of your car, rather than sucking it out of a hole in the ground, no longer wienke like economic madness. Nor does the idea of throwing away the tank and plugging your car into an electric socket instead.

Table 3. Document transcribed with 95% accuracy

Analysis on comprehension and attention focus indicates that with an error rate greater than 10%, readers are less able to comprehend the main concepts and facts presented. Table 4 demonstrates the impact on comprehension at different error rates.

Table 4. Intelligibility vs Error Rate

In an academic environment, accuracy becomes even more critical as students are assessed based on the accuracy of their retention. The economics behind using speech recognition systems to deliver accurate results indicated that when error rates were 3% or greater, the cost of repairing a bad transcript outweighed the cost of performing a transcription with a trained stenographer.

Captioning Research Resources

Studies on Benefits of Video and Universal Design for Learning in Education

AST Whitepapers