Subtitling track

[Last update: Apr 6, 2023]

Description

In recent years, the task of automatically creating subtitles for audiovisual content in another language has gained a lot of attention, as we have seen a surge in the amount of movies, series and user-generated videos which are being streamed and distributed all over the world. The task of automatic subtitling is multi-faceted: starting from the speech, not only the translation has to be generated, but it must be segmented into subtitles compliant with constraints that ensure high-quality user experience, like a proper reading speed, synchrony with the voices, maximum number of subtitle lines and characters per line, etc. For the first time, this year IWSLT proposes a specific task on automatic subtitling, where participants are asked to generate subtitles in German and/or Spanish of three kinds of audiovisual documents, featuring different levels of complexity, starting from English speech.

The evaluation of subtitling quality is a complex problem on its own, since both the translation quality and the compliance with subtitling constraints have to be considered at the same time. The recently proposed SubeER and Sigma metrics will be used for assessing the quality of automatically generated subtitles, together with standard translation quality metrics; moreover, they will be complemented with some explicit compliance measures, as detailed below. Most audio visual companies define their own subtitling guidelines, which can differ slightly from each other. Participants are asked to generate subtitles according to some of the tips listed by TED, in particular:

never use more than two lines per subtitle
lines cannot exceed 42 characters, white spaces included
the maximum subtitle reading speed is 21 characters / second

It is expected that participants will use only the audio track from the provided videos (dev and test sets), the video track being of low quality and provided primarily as a means to verify time synchronicity and other aspects of displaying subtitles on screen.

Languages

The task involves the processing of audio-video documents for two language pairs:

English→German
English→Spanish

Training and Data Conditions

A constrained setup is proposed as the official training data condition, in which the allowed training data is limited to a medium-sized framework in order to keep the training time and resource requirements manageable. In order to allow also the participation of teams equipped with high computational power and effective in-house solutions built on additional resources, an unconstrained setup without data restrictions is also proposed.

Constrained training: Under this condition, the allowed training resources are the following ones (note that the list does not include any pre-trained language model):

Data type	src lang	tgt lang	Training corpus (URL)	Version	Comment
speech	en	–	LibriSpeech ASR corpus	v12	includes translations into pt, not to be used
speech	en	–	How2	na
speech	en	–	Mozilla Common Voice	v11.0
speech	en	–	TED LIUM	V2/V3
speech	en	–	Vox Populi	na
speech-to-text-parallel	en	de	MUST-C	v1.2/v2.0/v3.0	a new version of MuST-C en-de has been released, check it out!
speech-to-text-parallel	en	de	MUST-Cinema	v1.0	with subtitle and line breaks
speech-to-text-parallel	en	es	MUST-C	v1.2	same as MUST-Cinema below but without subtitle breaks
speech-to-text-parallel	en	es	MUST-Cinema	v1.0	with subtitle and line breaks
speech-to-text-parallel	en	de	Speech Translation TED corpus	na
speech-to-text-parallel	en	de	CoVoST	v2	only German translation, no English transcription
speech-to-text-parallel	en	de	Europarl-ST	v1.1
speech-to-text-parallel	en	es	Europarl-ST	v1.1
text-parallel	en	de	Europarl	v10
text-parallel	en	es	Europarl	v8
text-parallel	en	de	NewsCommentary	v16
text-parallel	en	es	NewsCommentary	v16
text-parallel	en	de	OpenSubtitles	v2018 apptek	partially re-aligned, filtered, with document meta-information on genre
text-parallel	en	es	OpenSubtitles	v2018 apptek	partially re-aligned, filtered, with document meta-information on genre
text-parallel	en	de	TED2020	v1
text-parallel	en	es	TED2020	v1
text-parallel	en	es	Tatoeba	v2022-03-03
text-parallel	en	de	Tatoeba	v2022-03-03
text-parallel	en	es	ELRC-CORDIS_News	v1
text-parallel	en	de	ELRC-CORDIS_News	v1
text-monolingual	–	de	OpenSubtitles with subtitle breaks	v2018-apptek	superset of parallel data, with subtitle breaks and document meta-info on genre, automatically predicted line breaks
text-monolingual	–	es	OpenSubtitles with subtitle breaks	v2018-apptek	superset of parallel data, with subtitle breaks and document meta-info on genre, automatically predicted line breaks

Unconstrained training: any resource, pre-trained language models included, can be used with the exception of evaluation sets

Development and Evaluation Data

Participants are asked to automatically subtitle in German and/or Spanish three kinds of audio-visual documents, where the spoken language is always English, featuring different levels of complexity: (i) TED talks from the MuST-Cinema corpus, (ii) press interviews from the Multimedia Centre of the European Parliament (EUROPARLTV) and (iii) commercial contents, in particular Peloton physical training videos and ITV entertainment series.

Audio-visual documents of development and evaluation sets are and will be provided in MP4 format; subtitles of development sets are released in SRT (SubRip File Format) UTF-8 encoded files, the same format required for submissions.

TED is a new collection of audio recordings from English TED Talks, automatically aligned at the sentence level with their manual transcriptions and translations (into German and Spanish) marked with subtitle breaks.
- As dev set, 17 video recordings and subtitles (in English, German and Spanish) of the TED talks defining the evaluation set of the Offline Speech Translation task at IWSLT 2022 (total duration: about 4 hours) can be downloaded from here.
- The test set consists of video recordings of 14 TED talks for a total duration of about 80 minutes, can be downloaded from here
EUROPARLTV is a repository of video recordings related to the European Parliament activities that includes messages of the members, interviews, press conferences, debates, etc. This dev set corresponds to the EuroParl Interviews test set of the paper “Direct Speech Translation for Automatic Subtitling” (Papi et al., 2023). Additional info on the benchmark is available in the paper.
- As dev set, 12 video recordings and subtitles (in German and Spanish; English transcriptions/subtitles are available only for 5 documents out of 12) for a total duration of about 1 hour can be downloaded from here.
- The test set consists of 10 video recordings for a total duration of about 1 hour, can be downloaded frome here
Peloton is a US company that offers fitness training equipment as well as on-line fitness classes which are provided with subtitles in different languages. Peloton is interested in research related to the use of automated subtitling technology in their translation workflows. We would like to thank Peloton for providing IWSLT with samples of their videos for research and evaluation purposes and would like to ask you not to use these videos or subtitles for any commercial purposes or make them publicly available on any other website.
- As a dev set, 9 recordings of fitness training videos (mostly single-speaker - the fitness instructor) and corresponding subtitles (in English, German and Spanish) for a total duration of about 4 hours were released but they are no longer available due to copyright reasons. Note: the English SRT files are not properly segmented according to the usual subtitle and line segmentation guidelines and are provided for informational purposes only. The German and Spanish SRT files are the ones created by professional subtitle translators.
- The test set with 8 videos of similar content (but potentially different speakers) was released but it is no longer available due to copyright reasons.
ITV Studios is part of ITV Plc, which includes the UK’s largest commercial broadcaster. They create and produce a broad range of programming (drama, entertainment, factual) in 13 countries, which they distribute globally, providing high-quality subtitles. We would like to thank ITV Studios for providing IWLST with samples of their video content for research and evaluation purposes and would like to ask you not to use these videos and/or the accompanying subtitles for any commercial purposes or make them publicly available on any other website.
- As a dev set, 7 episodes of 3 different television series, with an approximate duration of 7 hours in total, can be downloaded from here. Note: some of the English SRT files were created following different subtitling guidelines than the ones used in this evaluation (e.g. they contain subtitles with 3 lines) and are provided for informational purposes only.
- The test set with 7 episodes from entertainment series, possibly, but not necessarily from the same ones, can be downloaded from here

When available, English subtitles of development sets are released only for convenience of participants; it is not required to generate them for the final evaluation.

Submission Guidelines

Multiple run submissions are allowed, but participants must explicitly indicate one PRIMARY run. All other run submissions are treated as CONTRASTIVE runs. In the case that none of the runs is marked as PRIMARY, the latest submission (according to the file time-stamp) will be used as the PRIMARY run
Submissions have to be submitted as a gzipped TAR archive (see format below)
Each run has to be stored in SRT (SubRip File Format) UTF-8 encoded files
For each video of test sets, provide the subtitles in an SRT file whose name includes the file identifier (number) of the video
Scoring will be case-sensitive and will include the punctuation

TAR archive file structure:

< UserID >/< Set >_< VdId >.< Lang >.< UserID >.primary.srt 
  /< Set >_< VdId >.< Lang >.< UserID >.contrastive1.srt  
  /< Set >_< VdId >.< Lang >.< UserID >.contrastive2.srt  
  /...  

where:

< UserID > = user ID of participant; use the short name chosen in the registration form
< Set > = IWSLT23.Subtitling.< Domain >tst
< VdId > = numeric identifier of the video
< Domain > = one of {EPTV, TED, Peloton, ITV}
< Lang > = one of {en-de.de, en-es.es} (ISO 639-1 two-letter codes of languages)

Example:

FBK/IWSLT23.Subtitling.TEDtst_13587.en-de.de.FBK.primary.srt

Submissions must be sent as an email attachment to these two addresses:

cettolo AT fbk DOT eu
ematusov AT apptek DOT com

The email should also include the following information:

Institute/company:
Contact Person:
Email:
Data condition: Constrained/Unconstrained
Brief abstract about the system:
Do you want to make your submissions freely available for research purposes? (yes/no)

Automatic Evaluation

The evaluation will be carried out from three perspectives, subtitle quality, translation quality and subtitle compliance, through the following automatic measures:

Subtitle quality vs. reference subtitles:
- SubER, primary metric, used also for ranking (paper, code)
- Sigma (paper, code)
Translation quality vs. reference translations:
- BLEU, CHRF (via sacreBLEU version 2.3.1)
- COMET (model: wmt20-comet-da)
  Automatic subtitles will be realigned to the reference subtitles using mwerSegmenter (Matusov et al., 2005) before running sacreBLEU and COMET
Subtitle compliance:
- Rates of
  - subtitles with more than two lines
  - lines longer than 42 characters (white spaces included)
  - subtitles with reading speed higher than 21 characters / second

(paper, code)

Organizers

Mauro Cettolo, FBK
Evgeny Matusov, AppTek
Mattia Di Gangi, AppTek
Patrick Wiken, AppTek
Matteo Negri, FBK
Marco Turchi, Zoom Video Communications

Contact

Chairs:

Mauro Cettolo, FBK, Italy
Evgeny Matusov, AppTek, Germany

Discussion: iwslt-evaluation-campaign@googlegroups.com