WebVTT
May 20, 2023
Web Video Text Tracks, commonly known as WebVTT, is a standard format for displaying timed text in multimedia content that is played on the web. It is an extension of the Timed Text Markup Language (TTML) designed specifically for the web. WebVTT is used to associate captions, subtitles, descriptions, and other metadata with video and audio files on the web. It is widely supported by modern web browsers and media players, making it an essential tool for creating accessible and inclusive web content.
WebVTT files are plain text files with a .vtt
extension, and they contain timed text cues that are synchronized with the video or audio content. The cues can be used to display captions, subtitles, and other related information. Each cue consists of a start time, an end time, and the text content that is displayed during that time interval. WebVTT also supports metadata cues that can be used to provide additional information about the media, such as the title, description, and language of the content.
One of the main purposes of WebVTT is to provide accessibility for people with hearing or visual impairments. By providing captions and subtitles, users can follow along with the content even if they cannot hear or understand the audio. Captions can also be used to provide translations of the content into different languages, making it more accessible to a global audience.
WebVTT is also used for video search engine optimization (SEO). Search engines can use the text content of the captions and subtitles to index the video content and display it in search results. This can help to increase the visibility and reach of the video content.
Basic Syntax
WebVTT files use a simple syntax that is easy to learn and use. Each cue consists of three parts: the timing, the cue settings, and the text content. The timing and cue settings are separated from the text content by a blank line.
Here is an example of a basic WebVTT file:
WEBVTT
00:00:05.000 --> 00:00:10.000
Hello, world!
00:00:12.000 --> 00:00:17.000 align:left size:50%
This is an example of a WebVTT cue.
In this example, the first cue starts at 5 seconds and ends at 10 seconds. The text content of the cue is “Hello, world!”. The second cue starts at 12 seconds and ends at 17 seconds. The text content of the cue is “This is an example of a WebVTT cue.” The align
and size
cue settings are used to specify the alignment and font size of the text.
Cue Settings
Cue settings are optional parameters that can be used to control the appearance and behavior of the text content. The most common cue settings are:
align
: Specifies the alignment of the text content within the video player. Values can bestart
,middle
,end
,left
,right
, orcenter
.position
: Specifies the position of the text content within the video player. Values can be a percentage or a keyword such asline-left
orline-right
.size
: Specifies the size of the text content. Values can be a percentage or a keyword such assmall
,medium
, orlarge
.vertical
: Specifies the writing direction of the text content. Values can berl
(right-to-left) orlr
(left-to-right).
Here is an example of cue settings:
WEBVTT
00:00:05.000 --> 00:00:10.000 align:left
This is an example of left-aligned text.
00:00:12.000 --> 00:00:17.000 align:right position:10%
This is an example of right-aligned text with a 10% offset.
In this example, the first cue is left-aligned, and the second cue is right-aligned with a 10% offset from the right edge of the video player.
Metadata
WebVTT also supports metadata cues that can be used to provide additional information about the media. Metadata cues are similar to regular cues, but they do not have a timing associated with them. Instead, they provide information such as the title, description, and language of the media.
Here is an example of a metadata cue:
WEBVTT
NOTE
title: Example WebVTT File
description: This is an example of a WebVTT file.
language: en-US
In this example, the metadata cue provides information about the title, description, and language of the media.