You send your video files off to a closed captioning service, and 24 to 48 hours later you have a caption file ready for use with your online video. You generally don’t even think about what happened during that time frame; you may not even look at the captions when they’re finished. You’re tempted to think that all that matters is that you now have a captioned video, and you can therefore check off that theoretical ADA compliance box. But what happens during that captioning turnaround time greatly affects the quality of your captions. The methods used by closed captioning services are important to understand, because your choice can mean the difference between poor quality, unintelligible captions, and high-quality captions that meet the needs of your deaf and hard of hearing viewers.
Below we discuss a few different methods used to create closed captioning. But first take a look at this infographic that shows captioning error rates and the likelihood of understanding the captions, depending on the captioning method used.
As shown above, the relationship between closed captioning accuracy and intelligibility is much stronger than you might think; even a small percentage of errors leads to a drastic reduction in intelligibility.
Professional Human Transcription
Professional transcribers consistently provide the most accurate captions out of all closed captioning methods, with error rates below 1%. Even with heavy accents and overlapping speech, these transcribers are experts in distinguishing and interpreting the most difficult audio. Not only are they trained in efficiency and accuracy, they also often have expertise in the subjects covered in your video. Professional transcribers with knowledge of specific jargon and specialized terms from your industry can reach near-perfect accuracy even with challenging content, where an untrained transcriber is likely to misunderstand many terms.
Human transcription initially takes longer than using a speech recognition tool, but due to its very high error rates a human will need to edit the captions created via speech recognition. This can end up taking as much time, if not more, than using a human from the beginning. Next we’ll discuss how speech recognition works.
Speech recognition has advanced tremendously in the past few years. Some people use it every day to write text messages or to ask their smartphones a question. The problem is, it is still a computer that’s translating the speech to text, and there is no perfect algorithm to understand all speech. Error rates with speech recognition captioning vary from 5% to 40% depending on variables such as the number of speakers, speaker accent, and whether or not the speech recognition engine is trained to the speaker. If there is a word, name or slang term in the video that is not in the computer’s dictionary, it will be replaced with a random word that sounds similar. Even if the word is in the dictionary, there are so many factors that could mislead the computer. There could be heavy accents, overlapping voices and background noise all contributing to captioning errors. If the video has multiple speakers, speech recognition may not be able to differentiate them, meaning the captions will not specify who is speaking which text.
Quality speech recognition technology also heavily relies on learning how someone speaks and what words they are most likely to use in various contexts. For example, when you first buy a new phone, using the speech recognition may be totally inefficient, but over time as you correct it, the phone learns how you pronounce certain words and names, and it will even add new words to its dictionary resulting in lower error rates. If your video is captioned by a computer that has not yet learned the subject matter or speech patterns of your voice, it will be much less accurate and will require a large amount of editing. Companies use speech recognition to try to lower the price of their closed captioning services, but it comes at the cost of quality. Which leads us to the next method, using cheap labor of untrained transcribers, crowdsourcing and offshore labor.
Untrained Transcribers – Crowdsourcing & Offshore Labor
Crowdsourced captions are created by taking your video, breaking it into small segments, and sending segments to many different transcribers so that they can complete the captioning for each segment in parallel. Having multiple transcribers means the work will go much faster and can be done more cheaply, but the quality of the captions is at risk with this method. First of all, companies that use crowdsourcing often hire untrained anonymous transcribers who could be anywhere, from someone living abroad where wages are lower, to a stay-at-home parent looking for a little extra cash. Second, even if the crowdsource workers who happen to work on your job are relatively skilled transcribers, they will undoubtedly have different techniques and may spell names or terms differently, leading to inconsistency. This method is fast and cheap, but it comes at the cost of quality and consistency.
Companies that offer closed captioning services have been trying to cut costs using cheap labor or computer-based speech recognition to create captions. The problem is, if you want quality captions, these methods end up being inefficient and costly because of the time and effort needed to edit them to meet your standards. Low quality captions will always be cheaper, but the ability to comprehend captions drops significantly if the error rate is higher than just a few percentage points. If the quality is too low captioning expenditures are wasted, because the people who rely on captions will not be able to follow or fully understand your videos if the accuracy isn’t near perfect. This is especially critical if you are required by law to provide high quality captions.
When you submit your next video for captioning, make sure you know and understand the process that the company uses. The quality of your closed captioning depends on it.