In addition to our standard transcription, closed captioning, and translation services, AST offers a service called Production Transcripts. This service is a bit different than our transcription and captioning services, so we wanted to take some time to describe when and why you would want to use production transcripts.
What are Production Transcripts?
One way to look at production transcripts is to think of them as something in between standard transcripts and closed captioning. Transcription files contain the entire verbatim dialog of a video or audio, but they do not contain any time stamp information. Closed captions, on the other hand, contain very frequent time stamps. The time stamps in caption files indicate when a caption should pop on the screen when playing the video with closed captions turned on, so it’s not uncommon to have closed caption time stamps every few seconds, even mid-sentence. Production transcripts contain some time stamps, but the frequency is a happy medium between ordinary transcripts (nothing) and the frequent time stamps of closed caption files.
Use Cases for Production Transcripts
When would you use production transcripts? Frequent use cases include interview footage for documentary films, focus group recordings, legal depositions, and daily footage for reality TV or other unscripted television shows. In cases like depositions or focus group recordings you may not need full caption files if the video or audio will not be used by people who are deaf or hard of hearing, but occasional time stamps are useful for finding the right spot when you need to go back and review parts of long segments. Similarly for interview footage or dailies, you need an accurate record of who said what, and roughly when they said it, but you do not need full captions until you have the final cut of your show or film.
What about Speech Recognition?
When costs are involved, it is always likely that someone will try to offer you a cheaper solution. Speech recognition holds great promise, and it is already being used to automate tasks such as interactive telephone systems and medical dictation. However, when it comes to video footage, particularly with free-form speech and more than one speaker, speech recognition is simply not accurate enough. With error rates of 20% or more, transcripts generated by speech recognition systems would certainly not pass muster for legal depositions, and they are also close to useless for the other cases described above. Automatic Sync’s research on accuracy and comprehension shows that error rates of more than 3% have a dramatic impact on comprehension. In other words, at error rates typical of speech recognition systems, a reader of the transcript would have little chance of understanding what was said. Some companies try to use speech recognition as a first pass and then use editors to correct the transcript, but research shows that it is actually less expensive to start with a professional transcriber.
When you consider the value of your video production team’s time, and the amount of time saved by having a high quality production transcript available during editing, it’s really no contest; it’s worth making a modest investment early in the production process to get accurate production transcripts.