closed captioning books

Closed Captioning Cost and Pricing

AST was founded with the goal of making the process of captioning video faster and more affordable. By bringing down closed captioning costs, we make it easier for companies and organizations to make their video accessible to everyone. We’ve worked with thousands of organizations over the past 10 years, and a few weeks ago we hit another milestone: we delivered our five millionth caption file.

Closed Captioning Cost Savings

Prior to the introduction of CaptionSync by AST, video publishers would often pay more than $10 per minute of video for closed captioning, and would often wait two weeks or more to get back the results of a very manual process. AST has brought down the average cost of closed captioning to around $3 per minute or less in many cases, and it is safe to say that we have saved publishers, educators, government agencies, and other producers of educational video tens of millions of dollars over the last 10 years. We know that our customers are grateful for these cost benefits, and that they appreciate our quick turnaround times (time is money, after all).

However, as video publishers that are new to closed captioning come into the fold and they start to add up the cost of captioning the terabytes of video that they plan to produce, it’s easy for purchasers to become overwhelmed by the potential cost of closed captioning. They do the math, multiplying the number of minutes of video by the cost per minute of closed captioning, and suddenly they have a very large new budget line item that they hadn’t anticipated.  Which inevitably raises the question: could the cost of closed captioning be even cheaper? In this world of technology and global outsourcing, shouldn’t there be a way to bring down the cost of captioning to almost zero?

And indeed, there is a new crop of closed captioning companies that are attempting to do just that, with several companies advertising closed captioning pricing as low as $1.00 per minute. There are a few methods that captioning companies can use to hit these low cost thresholds: 1) using speech recognition, 2) using crowd-sourcing, or 3) using very inexpensive offshore labor.  Let’s examine each scenario.

Closed Captioning Costs Using Speech Recognition

For many video producers, finding software that could automatically transcribe and caption video in a few minutes would be like finding the holy grail. Speech recognition does hold great promise, and it continues to get better over time.  However, even in the best cases, transcription using speech recognition typically yields results in the low 90 percent accuracy range. While 90 or 95% accuracy may seem to be “good enough,” the fact is that the intelligibility of captions drops precipitously with error rates of higher than 3% (see our research) . If you have multiple speakers in the video, non-standard accents, ambient noise, technical content, or anything less than perfect audio quality, accuracy rates quickly drop below 80% with speech recognition.

Software programs that will convert audio to text using speech recognition technology are available for costs ranging from a few hundred dollars per user to more than $20,000 for an appliance that can process multiple video or audio files at once. If you have thousands of hours of video to process this up-front cost may initially seem like a bargain, until you take into account the hidden costs. Your professors, lecturers, and subject matter experts may need to “train” the software to recognize their voice, recording specific content and correcting errors.  Even if the system is designed to work without training, correcting the errors is extremely time-consuming and tedious.  If transcripts need to be reviewed and corrected by subject matter experts or editors, a few hours spent by these reviewers can quickly erase any cost advantages of using a speech recognition program instead of professional transcription and captioning at $3 per minute.

Crowd-Sourced Captioning Costs

Another option is to crowd-source your closed captioning. Think of this as the Wikipedia model of closed captioning. One person might transcribe and caption a small portion of your video.  Another person in another corner of the world works on another minute or two of the video. By the time your one hour video is finished, several dozen people may have worked on captioning your video, and they may have done it for free (like Wikipedia volunteers)! The crowd-sourcing model appeals to the inner-libertarian in all of us. Why should professional transcribers have a monopoly on transcription? Doesn’t it make sense to tap into all of that idle brain-power that would otherwise go to waste, watching soap operas or late-night reruns?

Here again, it’s the hidden costs that add up. Just as with a Wikipedia article, a crowd-sourced closed caption file may look pretty good at first glance, but the devil is in the details. Did your crowd-source laborers all spell names and technical terms correctly and consistently? Did all of them take the time to research spellings or acronyms?  Do your crowd-source workers always meet their turnaround time deadlines? Did all of them take time to re-listen to portions of the audio where an important phrase was unclear, or might some of them have been distracted by an episode of Ellen playing on TV in the background? Remember, it’s not the crowd-source worker’s reputation on the line if there is a mistake or delay, so they don’t have the same level of motivation or commitment as a professional transcriber.

Before you consider crowd-sourcing as an option, consider the value of your reputation as an educational video producer, and the potential cost associated with errors, inconsistencies, and delays. The quality of your captions should be on par with the quality of your video content. Anything less is doing a disservice to that significant portion of your viewers who choose to, or need to, watch your videos with captions.

Closed Captioning Sweatshops?

Finally, let’s look at one more option: using inexpensive overseas labor for closed captioning. We’re not going to make a protectionist argument, or insist that “captioned in the USA” is the only viable option. However, let’s do the math again and look at the realities of this option. The industry standard for professional transcribers is that it takes on average four to six times the length of an audio recording to create an accurate verbatim transcription, depending on the quality of the audio.  In other words, a fifteen minute video would take a trained professional 60 to 90 minutes, just for transcription. Amateurs or beginners would take longer. Let’s be optimistic and assume that a beginner could transcribe the same video in six to eight times the length of the video.  In addition, captioning companies that take advantage of inexpensive labor often have the transcribers manually set the timing of the captions as well (marking the pop-on time stamps and breaks between phrases). This takes additional time — roughly two to four times the length of the video.  To add it all up, an untrained captioner could easily spend eight to ten times the length of a video to create a timed caption file.  Let’s call it 9X on average, meaning that each minute of video takes nine minutes for this beginner to caption.

Now let’s work backward from an advertised price of $1.00 per minute for closed captioning. Despite using overseas labor, these companies have significant overhead here in the U.S.: $8 or $9 goes to Google when you click on one of their ads, they have to pay for the offices of their sales and business development folks in Los Angeles, San Francisco, or New York City, their bankers and venture capitalists take a cut, etc.  Let’s again be optimistic and assume that these companies are giving half of each dollar to the people who did the captioning work. $0.50 for nine minutes of work equates to $3.33 per hour, and that’s assuming the person doing the transcription and captioning is working on captioning video all the time, with no breaks.  When you factor it all in, the captioner for a $1 a minute captioning company is probably making well under $3 per hour. Will $3 per hour transcribers provide the level of commitment, expertise, and quality that you and your customers deserve?

Why Closed Captioning Quality Matters

Hopefully one point is clear from these examples: if you are creating high-quality, professional video content, your closed captions should be of equally high quality, and you will undoubtedly need to pay more than the bare minimum to get that quality.

In fact, if your organization is covered by ADA requirements, Section 508 regulations, accessibility requirements handled by the U.S. Office of Civil Rights, or similar legislation in many other countries, then the requirement to provide captions at a quality level that is on par with the quality of your video content could be seen as a legal obligation. Here’s why: ADA Title III requires that people with disabilities “may not be denied full and equal enjoyment” of the good and services provided to others who use those services. Most subsequent legislation and court decisions have supported this tenet of “full and equal enjoyment.” This means that if you provide high quality educational content but mediocre quality closed captioning, you’re not treating all of your customers equally, opening your organization up to potential lawsuits.

The specter of lawsuits may sound harsh, but it really comes down to fairness. If you are creating high quality video — hiring top-notch subject matter experts and instructional designers, and using high quality audio and video equipment and software — shouldn’t your closed captions be of equally high-quality? And shouldn’t you be willing to pay for that level of quality, for the benefit of those who use the captions? Professional transcribers do not get paid an exorbitant amount for their work, but they do have commitment to providing your customers with the quality content that they deserve. Anything less would be unfair to your customers.



  1. live stenocaptioners are the answer! usually able to achieve 99 percent accuracy…as long as you hire certified get what you pay for

    • Jana, I agree that the adage “you get what you pay for” usually applies in the case of captioning. Live steno-captioners do an amazing job, and they need to be highly skilled to do what they do. Here at AST we only do “offline” or post-production captioning. While some of our transcribers come from a live captioning background or from court reporting, the job is a bit different than live captioning. With live captioning all the prep work is done before the show or lecture (for example, putting together dictionaries of names that might be used), and the captioner must caption very quickly, with little or no time for research or corrections. Our transcribers are trained to pause the video when needed, look up words, and re-listen when necessary. Precision and attention to detail are more important than speed. Punctuation and spelling are critical, otherwise the viewer/listener will not have the optimal experience.

  2. These are scripted shows! How difficult is it to set this up ahead of time? I believe more people use closed captioning than you think. The baby boomers are aging and have listened to loud rock and roll music since they were teens; they are going deaf. Add to that, the popular BBC programs and the fact that even though they are in English, they aren’t easily intelligible just because of different vernacular.

    These companies are making tons of money – they should step to the plate and do it right. Another outsourcing nightmare!

    • Betty, great points. Millions of people need or benefit from captions. People who are deaf or hard of hearing are one segment, but nearly everyone can benefit from captions, especially for content that is challenging to understand due unfamiliar accents, jargon, slang, acronyms, or technical terms.

      You asked about the use of scripts. Our team does take advantage of scripts and screenplays when they are available. That definitely helps with the process (for example, it helps the transcribers ensure that names, places, and other terms are spelled correctly), but it usually doesn’t replace the need for professional human transcribers and reviewers. With crowd-sourcing options you are much less likely to get this level of attention to detail.

      I agree that it is a shame that many very profitable companies take shortcuts that should not be taken, attempting to save a few dollars while sacrificing the experience of their customers who rely on captions. At the same time, there are many companies and organizations that are stepping up to the plate and doing it right. I saw a presentation today by one of our customers where they said that user experience is their “guiding star,” pointing out that their approach to accessibility is grounded in ethics and morals, not just compliance. Working with customers like that makes me very proud of the work we do here at AST, and I know our whole team feels that way as well.

  3. I have read so many poorly transcribed closed captions on web videos. Many are indecipherable beyond a few words. Now I think I know why…is it the machine or the overseas reporter? Probably both. I agree with the above comments that this is an incredibly necessary service which must be improved upon-give it back to the people who can do the job correctly.

  4. I was wondering how much it would cost to buy equipment that does close captioning? What are the names of the equipment? Etc…thank you.

    • Ted, thanks for your question. In addition to equipment, you’ll need training. Captioning is a rewarding and interesting job once you have developed the skills needed to do it efficiently. The National Court Reporters Association is one place to get started learning about the training and equipment needed:

Leave a Comment

Your email address will not be published. Required fields are marked *