Research Studies

AST’s Research on Accuracy and Comprehension

Automatic Sync Technologies brings years of experience and expertise in the speech processing and multimedia production arenas, offering a solid foundation in the areas of speech recognition, audio and voice processing and software engineering. Work on our core technology began in 1990 resulting in first generation product, Lipsync, off-line and real-time production software that enables the automatic synchronization of a voice recording with visual media, such as animation or text.

In 2003, AST was awarded a Small Business Innovation Research (SBIR) grant by the Department of Education to investigate innovative ways to leverage technology for captioning and to create a functioning prototype of an automated closed captioning system that would facilitate access to broadcast and instructional materials for the deaf community and others who benefit from closed captioning. This proof-of-concept system was to provide a fast, accurate, and inexpensive alternative to traditional captioning, where high costs and long turnaround times have hampered compliance with federal regulations and universal access to broadcast and instructional materials. AST developed a functioning proof-of-concept prototype of an automated, web-based captioning system that has now evolved into the CaptionSync service, now in its tenth year of serving customers.

As part of this research project, AST evaluated the use of speech recognition technology. Analyses of error rates using speech recognition systems, trained stenographers and student workers was conducted. Speech recognition products offer an inexpensive way to automate the conversion of speech into text. Speech recognition engines offer a wide range in the accuracy of the results. They achieve their highest quality output when the files processed reflect one speaker, and the system is trained to process that speaker’s language. This typically requires a process of correcting the output for each speaker and sequentially investing in the creation of a speaker profile. In addition, speech recognition engine output can be improved by adding terms to a dictionary. When the audio to be transcribed has multiple speakers, poor audio recording quality, complicated terminology, or when the speaker has an accent, the quality of the transcript tends to deteriorate. Table 1 depicts the typical output quality results for various forms of transcription.

Source Typical Error Rate Result
Trained Stenographer 0.5% to 1% No problems
Student transcriber Variable Expect to be worse than stenographer
Speech Rec: trained 3% to 5+% Varies from acceptable to poor
Speech Rec: untrained 20% to 40% Unintelligible

Table 1. Error Rates By Transcriber Type

Additional research was performed on comprehension rates and accuracy. An example of a document with no errors and a transcript with a 95% accuracy rate are found in Tables 2 and 3.

Everyone loves a booming market, and most booms happen on the back of technological change. The world’s venture capitalists, having fed on the computing boom of the 1980s, the internet boom of the 1990s and the biotech and nanotech boomlets of the early 2000s, are now looking around for the next one. They think they have found it: energy.

Many past booms have been energy-fed: coal-fired steam power, oil-fired internal-combustion engines, the rise of electricity, even the mass tourism of the jet era. But the past few decades have been quiet on that front. Coal has been cheap. Natural gas has been cheap. The 1970s aside, oil has been cheap. The one real novelty, nuclear power, went spectacularly off the rails. The pressure to innovate has been minimal.

In the space of a couple of years, all that has changed. Oil is no longer cheap; indeed, it has never been more expensive. Moreover, there is growing concern that the supply of oil may soon peak as consumption continues to grow, known supplies run out and new reserves become harder to find.

The idea of growing what you put in the tank of your car, rather than sucking it out of a hole in the ground, no longer looks like economic madness. Nor does the idea of throwing away the tank and plugging your car into an electric socket instead.

Table 2. Document with no errors

Supervalu loves a booming market, and most booms happen on the back of candlestick change. The world’s hagerstown capitalists, having fed on the computing boom of the 1980s, the internet boom of the keicher and the biotech and nanotech boomlets of the early 2000s, are now looking around for the next one. They think they have found it: energy.

Many past booms have been energy-fed: coal-stangle steam power, oil-fired internal-combustion engines, the rise of electricity, even the mass tourism of the jet era. But the past few decades have been quiet on that front. Coal has been cheap. Natural numerically has been cheap. The 1970s aside, oil has been cheap. The one real novelty, nuclear power, went spectacularly off the rails. The pressure to manolis has loibl minimal.

In the space allain a godlove of years, all that has changed. Oil is no longer cheap; indeed, it has never been more expensive. Moreover, there is growing concern that the traveler of oil may soon peak as consumption continues to grow, known fust run out and new reserves become harder to find.

The idea of growing what you put in the moratoria of your car, rather than sucking it out of a hole in the ground, no longer wienke like economic madness. Nor does the idea of throwing away the tank and plugging your car into an electric socket instead.

Table 3. Document transcribed with 95% accuracy

Analysis on comprehension and attention focus indicates that with an error rate greater than 10%, readers are less able to comprehend the main concepts and facts presented. Table 4 demonstrates the impact on comprehension at different error rates.

Graph showing large drop in intelligibility for greater than 3% error rate

Table 4. Intelligibility vs Error Rate

In an academic environment, accuracy becomes even more critical as students are assessed based on the accuracy of their retention. The economics behind using speech recognition systems to deliver accurate results indicated that when error rates were 3% or greater, the cost of repairing a bad transcript outweighed the cost of performing a transcription with a trained stenographer.

Captioning Research Resources

  • And Captions For All. Collins, Robert, San Francisco State University, 2007 Instructional materials delivered randomly to students- 50% got captioned videos, 50% did not. Students who watched captioned videos were more engaged, more responsive to questions about video, were able to make the connections to their lives better. Students who received captioned video averaged 1 GPA point increase over students not exposed to captions.
  • The Closed Captioning Handbook,Robson, Gary, 2004 Augmenting an auditory experience with captions more than doubles the retention and comprehension levels.
  • Adult Literacy: Captioned Videotapes and Word Recognition. Rogner, Benjamin Michael, 1992 Adult students that used captioned video presentations progressed significantly better than those using traditional literacy techniques.
  • Dual coding and bilingual memory. Paivio, A., & Lambert, W. 1981. Journal of Verbal Learning & Verbal Behavior, 20, 532-539. Dual Coding Theory postulates that both visual and verbal information are processed differently and along distinct channels with the human mind creating separate representations for information processed in each channel. Allan Paivio conducted several studies at the University of Western Ontario.
  • Multi-Modal Learning: See It, Hear It, Do It, Master It. Granström, House, & Karlsson 2002, Clark & Mayer 2003 Use of two or more senses to avoid sensory overload

Studies on Benefits of Video and Universal Design for Learning in Education

AST Whitepapers