Subtitling track
DEVELOPMENT SETS 2026 ARE AVAILABLE
[Last update: Jan 16, 2026]
Description
In recent years, the task of automatically creating subtitles for audiovisual content in another language has gained increasing attention, as we have seen a surge in the amount of movies, series, and user-generated videos that are being streamed and distributed all over the world. The task is a multi-faceted one: not only the translation has to be generated from speech, which is right on target for IWSLT, but also the subtitling constraints have to be respected (i.e. a proper reading speed, synchrony with the voices, maximum number of subtitle lines and characters per line).
As in past editions, in 2026 we propose the Automatic Subtitling track, where participants are asked to generate subtitles of different kinds of audiovisual documents, starting from English speech. The novelty this year is the set of target languages: in addition to Arabic and German, already offered in edition 2025, this year Chinese and Japanese have been added. Also, Spanish (Europe) is available again as the target language for one of the three domains (see below).
Training and Data Conditions
Two training data conditions are proposed:
- constrained: the official training data condition, in which the allowed training data is limited to a medium-sized framework (described below) in order to keep the training time and resource requirements manageable
- unconstrained: a setup without data restrictions (any resource, pre-trained language models can be used) to allow also the participation of teams equipped with high computational power and effective in-house solutions built on additional resources
Training Data allowed for Constrained Conditions
| Data type | src lang | tgt lang | Training corpus (URL) | Version | Comment |
|---|---|---|---|---|---|
| speech | en | – | LibriSpeech ASR corpus | v12 | includes translations into pt, not to be used |
| speech | en | – | How2 | na | |
| speech | en | – | Mozilla Common Voice | v24.0 | |
| speech | en | – | Vox Populi | na | |
| speech-to-text-parallel | en | ar, de, ja, zh | CoVoST | v2 | only translations, no English transcription |
| speech-to-text-parallel | en | de, es | Europarl-ST | v1.1 | |
| text-parallel | en | ar | UNPC | v1.0 | |
| text-parallel | en | de | Europarl | v10 | |
| text-parallel | en | es | Europarl | v8 | |
| text-parallel | en | ar, de, es, ja | Tanzil | v1 | |
| text-parallel | en | ar, de, es, ja, zh | NewsCommentary | v18 | |
| text-parallel | en | ar, de, es | GlobalVoices | v2018q4 | |
| text-parallel | en | ar, de, es, ja, zh | OpenSubtitles | v2024 | |
| text-parallel | en | de | OpenSubtitles | v2018 apptek | partially re-aligned, filtered, with document meta-information on genre |
| text-parallel | en | es | OpenSubtitles | v2018 apptek | partially re-aligned, filtered, with document meta-information on genre |
| text-parallel | en | ja | JParaCrawl | na | |
| text-parallel | en | ar, de, es, ja, zh | Tatoeba | v2023-04-12 | |
| text-parallel | en | ar, zh | ELRC_2922 | v1 | |
| text-parallel | en | de, es | ELRC-CORDIS_News | v1 | |
| text-monolingual | – | de | OpenSubtitles with subtitle breaks | v2018-apptek | superset of parallel data, with subtitle breaks and document meta-info on genre, automatically predicted line breaks |
| text-monolingual | – | es | OpenSubtitles with subtitle breaks | v2018-apptek | superset of parallel data, with subtitle breaks and document meta-info on genre, automatically predicted line breaks |
Development and Evaluation Data
Participants are asked to automatically subtitle three kinds of audio-visual documents, where the spoken language is always English:
- ITV entertainment series, to be subtitled in the language(s) of your choice: Chinese, German, Japanese, Spanish (Europe)
- news programs from the Asharq-Bloomberg platform, to be subtitled in the language(s) of your choice: Arabic, Chinese, German, Japanese
- audio recordings from the YODAS YouTube dataset, to be subtitled in the language(s) of your choice: Chinese, German, Japanese
Audio-visual documents of development and evaluation sets are and will be provided in MP4 format (asharq-bloomberg and ITV) and WAV format (YODAS); subtitles of development sets are released in SRT (SubRip File Format) UTF-8 encoded files, the same format required for submissions.
-
ITV Studios is part of ITV Plc, which includes the UK’s largest commercial broadcaster. They create and produce a broad range of programming (drama, entertainment, factual) in 13 countries, which they distribute globally, providing high-quality subtitles. We would like to thank ITV Studios for providing IWLST with samples of their video content for research and evaluation purposes and would like to ask you not to use these videos and/or the accompanying subtitles for any commercial purposes or to make them publicly available on any other website.
- As a new dev2026 development set, 3 episodes of a television series, with an approximate duration of 2.5 hours in total, can be downloaded from here.
NOTES: English SRT files are provided for informational purposes only. The dev set from previous years can also be used for system development and training. - The test2026 set will be released according to the scheduling.
- As a new dev2026 development set, 3 episodes of a television series, with an approximate duration of 2.5 hours in total, can be downloaded from here.
-
Asharq Business with Bloomberg is part of SRMG, the largest integrated media group in the MENA (Middle East and North Africa) region. An exclusive content agreement with ‘Bloomberg Media’ powers this distinguished business news multi-platform, drawing on Bloomberg’s comprehensive coverage from more than 2,700 journalists and analysts globally. Asharq Business with Bloomberg is a leading source for Arabic economic news rich in context and content and unparalleled market data, delivered through a TV channel and across digital and social media platforms. Professional human reference translations into Chinese, Japanese, Arabic, and German have been created by AppTek.
- As a dev2026 set, 2 recordings of about 2.5 hours each, including actual Asharq-Bloomberg news content, can be downloaded from here. The archive contains a README file with important infos, audios, reference subtitles, and YAML files which provide the audio segments for which subtitles must be created (the rest of the video file can be ignored). The dev set was already used in the 2025 evaluation, but now Chinese and Japanese were added as the additional target languages.
- The test2026 set will be released according to the scheduling.
-
YODAS (YouTube-Oriented Dataset for Audio and Speech) is “a large-scale, multilingual dataset comprising currently over 500k hours of speech data in more than 100 languages, sourced from both labeled and unlabeled YouTube speech datasets.” Refer to this paper for more details.
IMPORTANT NOTE: the “en003” partition of the Yodas dataset is used for selecting dev/test data and is therefore not permitted for training (e.g. for an auxiliary ASR task). This partition had also been used to select a speech recognition benchmarking test set by the creators of the Loquacious dataset and thus is a natural held-out choice. Professional human reference translations into Chinese, Japanese, and German have been created by AppTek.- The dev2026 development set, consisting of 6 files, can be downloaded from here
- The test2026 set will be released according to the scheduling.
Submission
- Multiple run submissions are allowed, but participants must explicitly indicate one PRIMARY run; all other run submissions are treated as CONTRASTIVE runs. In the case that none of the runs is marked as PRIMARY, the latest submission (according to the file timestamp) will be used as the PRIMARY run
- Submissions have to be sent as a gzipped TAR archive (see format below)
- Submissions must include test26 and progressive test sets: for the ITV section, test23, test24 and test25; for the Asharq section, test25
- Submission files have to be stored as SRT (SubRip File Format) UTF-8 encoded files
- For each element of test sets, provide the subtitles in an SRT file whose name includes the file identifier (number) of the video
TAR archive file structure:
<UserID>/IWSLT26.Subtitling.<Domain>_<Set>_<VdId>.<Lang>.<UserID>.primary.srt
/IWSLT26.Subtitling.<Domain>_<Set>_<VdId>.<Lang>.<UserID>.contrastive1.srt
/IWSLT26.Subtitling.<Domain>_<Set>_<VdId>.<Lang>.<UserID>.contrastive2.srt
/...
where:
<UserID> = user ID of participant; use the short name chosen in the registration form
<Domain> = one of {ITV, Asharq}
<Set> = one of {tst26, tst25, tst24, tst23} if <Domain>=ITV
= one of {tst26, tst25} if <Domain>=Asharq
= tst26 if <Domain>=YODAS
<VdId> = numeric identifier of the video
<Lang> = one of {en-de.de,en-es.es,en-ja.ja,en-zh.zh} if <Domain>=ITV
= one of {en-ar.ar,en-de.de,en-ja.ja,en-zh.zh} if <Domain>=Asharq
= one of {en-de.de,en-ja.ja,en-zh.zh} if <Domain>=YODAS
(ISO 639-1 two-letter codes of languages)
Example:
FBK/IWSLT26.Subtitling.ITV_tst25_22.en-de.de.FBK.primary.srt
Submissions must be sent as an email attachment to these two addresses:
- cettolo AT fbk DOT eu
- ematusov AT apptek DOT com
The email should also include the following information:
- Institute/Company:
- Contact Person:
- Email:
- Track: Automatic Subtitling
- Data condition: Constrained/Unconstrained
- Brief abstract about the system:
- Do you agree that the IWSLT organizers may release your submitted system output data under an Apache 2.0 or similar license (depending on licensing details of test sets), to encourage future research? We may exclude your submissions from human evaluation without this consent. (yes/no)
Evaluation
DISCLAIMER: It is expected that participants will use only the audio track from the provided videos (dev and test sets), the video track provided primarily as a means to verify time synchronicity and other aspects of displaying subtitles on the screen.
The evaluation of subtitling quality is a complex problem on its own since both the translation quality and the compliance with subtitling constraints have to be considered at the same time. We adopt the following metrics, where limits of acceptability for the conformity metrics (CPS, CPL, LPB) are set following the TED and Netflix (for Japanese and Chinese) guidelines:
- SubER: the primary metric of the task, for measuring the overall quality of automatically generated subtitles
- BLEU and BLEURT: for measuring the translation quality. The BLEU scores will be computed using the following tokenization methods: “13a” for German and Spanish, “ja-mecab” for Japanese, “zh” for Chinese. Automatic subtitles will be realigned to the reference subtitles using mweralign (Post and Hoang, 2025), which implements a variant of the AS-WER algorithm (Matusov et al., 2005), before running these metrics
- CPS: the percentage of subtitles not exceeding:
- 21 characters per second for German and Spanish
- 4 characters per second for Japanese (half-width characters counted as 0.5)
- 9 characters per second for Chinese
- CPL: the percentage of subtitles not exceeding:
- 42 characters per line for German and Spanish
- 13 characters per line for Japanese (half-width characters counted as 0.5)
- 16 characters per line for Chinese
-
LPB: the percentage of subtitles not exceeding 2 lines per subtitle
CPS, CPL and CPB will be computed with the subtitle compliance script (Papi et al., 2023)
Scoring will be case-sensitive and will include the punctuation.
Organizers
- Mauro Cettolo, FBK
- Evgeny Matusov, AppTek
- Matteo Negri, FBK
- Marco Turchi, Zoom Video Communications
- Patrick Wilken, AppTek
Contact
Chair(s):
- Mauro Cettolo cettolo@fbk.edu, FBK, Italy,
- Evgeny Matusov ematusov@apptek.com, AppTek, Germany
Discussion: iwslt-evaluation-campaign@googlegroups.com