AST was founded with the goal of making the process of captioning video faster and more affordable. By bringing down closed captioning costs, we make it easier for companies and organizations to make their video accessible to everyone. We’ve worked with thousands of organizations over the past 10 years, and a few weeks ago we hit another milestone: we delivered our five millionth caption file.
Closed Captioning Cost Savings
Prior to the introduction of CaptionSync by AST, video publishers would often pay more than $10 per minute of video for closed captioning, and would often wait two weeks or more to get back the results of a very manual process. AST has brought down the average cost of closed captioning to around $3 per minute or less in many cases, and it is safe to say that we have saved publishers, educators, government agencies, and other producers of educational video tens of millions of dollars over the last 10 years. We know that our customers are grateful for these cost benefits, and that they appreciate our quick turnaround times (time is money, after all).
However, as video publishers that are new to closed captioning come into the fold and they start to add up the cost of captioning the terabytes of video that they plan to produce, it’s easy for purchasers to become overwhelmed by the potential cost of closed captioning. They do the math, multiplying the number of minutes of video by the cost per minute of closed captioning, and suddenly they have a very large new budget line item that they hadn’t anticipated. Which inevitably raises the question: could the cost of closed captioning be even cheaper? In this world of technology and global outsourcing, shouldn’t there be a way to bring down the cost of captioning to almost zero?
And indeed, there is a new crop of closed captioning companies that are attempting to do just that, with several companies advertising closed captioning pricing as low as $1.00 per minute. There are a few methods that captioning companies can use to hit these low cost thresholds: 1) using speech recognition, 2) using crowd-sourcing, or 3) using very inexpensive offshore labor. Let’s examine each scenario.
Closed Captioning Costs Using Speech Recognition
For many video producers, finding software that could automatically transcribe and caption video in a few minutes would be like finding the holy grail. Speech recognition does hold great promise, and it continues to get better over time. However, even in the best cases, transcription using speech recognition typically yields results in the low 90 percent accuracy range. While 90 or 95% accuracy may seem to be “good enough,” the fact is that the intelligibility of captions drops precipitously with error rates of higher than 3% (see our research) . If you have multiple speakers in the video, non-standard accents, ambient noise, technical content, or anything less than perfect audio quality, accuracy rates quickly drop below 80% with speech recognition.
Software programs that will convert audio to text using speech recognition technology are available for costs ranging from a few hundred dollars per user to more than $20,000 for an appliance that can process multiple video or audio files at once. If you have thousands of hours of video to process this up-front cost may initially seem like a bargain, until you take into account the hidden costs. Your professors, lecturers, and subject matter experts may need to “train” the software to recognize their voice, recording specific content and correcting errors. Even if the system is designed to work without training, correcting the errors is extremely time-consuming and tedious. If transcripts need to be reviewed and corrected by subject matter experts or editors, a few hours spent by these reviewers can quickly erase any cost advantages of using a speech recognition program instead of professional transcription and captioning at $3 per minute.
Crowd-Sourced Captioning Costs
Another option is to crowd-source your closed captioning. Think of this as the Wikipedia model of closed captioning. One person might transcribe and caption a small portion of your video. Another person in another corner of the world works on another minute or two of the video. By the time your one hour video is finished, several dozen people may have worked on captioning your video, and they may have done it for free (like Wikipedia volunteers)! The crowd-sourcing model appeals to the inner-libertarian in all of us. Why should professional transcribers have a monopoly on transcription? Doesn’t it make sense to tap into all of that idle brain-power that would otherwise go to waste, watching soap operas or late-night reruns?
Here again, it’s the hidden costs that add up. Just as with a Wikipedia article, a crowd-sourced closed caption file may look pretty good at first glance, but the devil is in the details. Did your crowd-source laborers all spell names and technical terms correctly and consistently? Did all of them take the time to research spellings or acronyms? Do your crowd-source workers always meet their turnaround time deadlines? Did all of them take time to re-listen to portions of the audio where an important phrase was unclear, or might some of them have been distracted by an episode of Ellen playing on TV in the background? Remember, it’s not the crowd-source worker’s reputation on the line if there is a mistake or delay, so they don’t have the same level of motivation or commitment as a professional transcriber.
Before you consider crowd-sourcing as an option, consider the value of your reputation as an educational video producer, and the potential cost associated with errors, inconsistencies, and delays. The quality of your captions should be on par with the quality of your video content. Anything less is doing a disservice to that significant portion of your viewers who choose to, or need to, watch your videos with captions.
Closed Captioning Sweatshops?
Finally, let’s look at one more option: using inexpensive overseas labor for closed captioning. We’re not going to make a protectionist argument, or insist that “captioned in the USA” is the only viable option. However, let’s do the math again and look at the realities of this option. The industry standard for professional transcribers is that it takes on average four to six times the length of an audio recording to create an accurate verbatim transcription, depending on the quality of the audio. In other words, a fifteen minute video would take a trained professional 60 to 90 minutes, just for transcription. Amateurs or beginners would take longer. Let’s be optimistic and assume that a beginner could transcribe the same video in six to eight times the length of the video. In addition, captioning companies that take advantage of inexpensive labor often have the transcribers manually set the timing of the captions as well (marking the pop-on time stamps and breaks between phrases). This takes additional time — roughly two to four times the length of the video. To add it all up, an untrained captioner could easily spend eight to ten times the length of a video to create a timed caption file. Let’s call it 9X on average, meaning that each minute of video takes nine minutes for this beginner to caption.
Now let’s work backward from an advertised price of $1.00 per minute for closed captioning. Despite using overseas labor, these companies have significant overhead here in the U.S.: $8 or $9 goes to Google when you click on one of their ads, they have to pay for the offices of their sales and business development folks in Los Angeles, San Francisco, or New York City, their bankers and venture capitalists take a cut, etc. Let’s again be optimistic and assume that these companies are giving half of each dollar to the people who did the captioning work. $0.50 for nine minutes of work equates to $3.33 per hour, and that’s assuming the person doing the transcription and captioning is working on captioning video all the time, with no breaks. When you factor it all in, the captioner for a $1 a minute captioning company is probably making well under $3 per hour. Will $3 per hour transcribers provide the level of commitment, expertise, and quality that you and your customers deserve?
Why Closed Captioning Quality Matters
Hopefully one point is clear from these examples: if you are creating high-quality, professional video content, your closed captions should be of equally high quality, and you will undoubtedly need to pay more than the bare minimum to get that quality.
In fact, if your organization is covered by ADA requirements, Section 508 regulations, accessibility requirements handled by the U.S. Office of Civil Rights, or similar legislation in many other countries, then the requirement to provide captions at a quality level that is on par with the quality of your video content could be seen as a legal obligation. Here’s why: ADA Title III requires that people with disabilities “may not be denied full and equal enjoyment” of the good and services provided to others who use those services. Most subsequent legislation and court decisions have supported this tenet of “full and equal enjoyment.” This means that if you provide high quality educational content but mediocre quality closed captioning, you’re not treating all of your customers equally, opening your organization up to potential lawsuits.
The specter of lawsuits may sound harsh, but it really comes down to fairness. If you are creating high quality video — hiring top-notch subject matter experts and instructional designers, and using high quality audio and video equipment and software — shouldn’t your closed captions be of equally high-quality? And shouldn’t you be willing to pay for that level of quality, for the benefit of those who use the captions? Professional transcribers do not get paid an exorbitant amount for their work, but they do have commitment to providing your customers with the quality content that they deserve. Anything less would be unfair to your customers.