Three Captioning Methods Used By CC Services

By: Aylin Dunham
captioning methods icons

Popular posts

business learning accessibility
A Guide to Accommodating Employees Who are Deaf A Guide to Accommodating Employees Who are Deaf
CaptionSync’s Kaltura MediaSpace integration
CaptionSync’s Kaltura MediaSpace Integration CaptionSync’s Kaltura MediaSpace Integration

Related posts

auditorium with people listening to a lecturer at the front
Make Lectures Accessible & Engaging with Lecture Transcription Make Lectures Accessible & Engaging with Lecture Transcription
different application icons with focus on microsoft teams app icon
How Microsoft Teams Transcription Supports eLearning  How Microsoft Teams Transcription Supports eLearning 

You send your video files off to a closed captioning service, and 24 to 48 hours later you have a caption file ready for use with your online video. You generally don’t even think about what happened during that time frame; you may not even look at the captions when they’re finished. You’re tempted to think that all that matters is that you now have a captioned video, and you can therefore check off that theoretical ADA compliance box. But what happens during that captioning turnaround time greatly affects the quality of your captions. The methods used by closed captioning services are important to understand, because your choice can mean the difference between poor quality, unintelligible captions, and high-quality captions that meet the needs of your deaf and hard of hearing viewers.

Below we discuss a few different methods used to create closed captioning. But first take a look at this infographic that shows captioning error rates and the likelihood of understanding the captions, depending on the captioning method used.

As shown above, the relationship between closed captioning accuracy and intelligibility is much stronger than you might think; even a small percentage of errors leads to a drastic reduction in intelligibility.

Professional Human Transcription

Professional transcribers consistently provide the most accurate captions out of all closed captioning methods, with error rates below 1%. Even with heavy accents and overlapping speech, these transcribers are experts in distinguishing and interpreting the most difficult audio. Not only are they trained in efficiency and accuracy, they also often have expertise in the subjects covered in your video. Professional transcribers with knowledge of specific jargon and specialized terms from your industry can reach near-perfect accuracy even with challenging content, where an untrained transcriber is likely to misunderstand many terms.

Human transcription initially takes longer than using a speech recognition tool, but due to its very high error rates a human will need to edit the captions created via speech recognition. This can end up taking as much time, if not more, than using a human from the beginning. Next we’ll discuss how speech recognition works.

Speech Recognition

Speech recognition has advanced tremendously in the past few years. Some people use it every day to write text messages or to ask their smartphones a question. The problem is, it is still a computer that’s translating the speech to text, and there is no perfect algorithm to understand all speech. Error rates with speech recognition captioning vary from 5% to 40% depending on variables such as the number of speakers, speaker accent, and whether or not the speech recognition engine is trained to the speaker. If there is a word, name or slang term in the video that is not in the computer’s dictionary, it will be replaced with a random word that sounds similar. Even if the word is in the dictionary, there are so many factors that could mislead the computer. There could be heavy accents, overlapping voices and background noise all contributing to captioning errors. If the video has multiple speakers, speech recognition may not be able to differentiate them, meaning the captions will not specify who is speaking which text.

Quality speech recognition technology also heavily relies on learning how someone speaks and what words they are most likely to use in various contexts. For example, when you first buy a new phone, using the speech recognition may be totally inefficient, but over time as you correct it, the phone learns how you pronounce certain words and names, and it will even add new words to its dictionary resulting in lower error rates. If your video is captioned by a computer that has not yet learned the subject matter or speech patterns of your voice, it will be much less accurate and will require a large amount of editing. Companies use speech recognition to try to lower the price of their closed captioning services, but it comes at the cost of quality. Which leads us to the next method, using cheap labor of untrained transcribers, crowdsourcing and offshore labor.