Speakup: A Transcript Markup Language

What is SpeakUp?

A simple text markup language for transcripts of moving pictures or video including a markup language for annotation.


When the Folkstreams project required a way for filmmakers and academic contributors to create and maintain transcripts for films archived and presented through the Folkstreams website, I decided a simple text markup language would be the best way to store and edit transcripts.

A transcript markup language defines a series of conventions for formatting text (like wiki text) that is translated into HTML for display. SpeakUp was designed to contain as much content as possible and preserve meaning for possible later conversion into XML or database form.

Speakup is implemented as a module extending the PEAR Text_Wiki library text translation module and is a requirement for use.

Although development and documentation of Speakup is not complete, it is in use on the Folkstreams website.

Speakup, including all markup, code and documentation is open source and released under a GPL license. I apologize for the brevity of this document, but the best way to learn SpeakUp is to download the package and experiment with it. Download.

Some Background

Some background on why transcripts are important. As the Folkstreams project was developed, project director Tom Davenport and developer Steve Knoblock, in a series of discussions, arrived at the conclusion that transcripts are essential to searching, finding and understanding films online. Two points emerged: that transcripts are a rich source of indexable text that help make media searchable and that more importantly, transcripts are a rich source of conversation and debate.

Frequently notes are more informative and interesting than the work they annotate. We discovered this was true for film transcripts (see Sadobabies for an example of a conversation going on in the notes about the nature of folklore). Although there are sophisticated means to capture the dialog of a moving picture and render it to text, these transcripts are inadequate. They lack annotation. They lack expressive quality of a transcript edited by a knowledgeable person. They are in a sense, a travesty, like an OCR'ed copy of Dickens left uncorrected.

Some preliminary documentation:

SpeakUp Documentation

SpeakUp in a Nutshell

Developing SpeakUp Open Document (Open Office)