[Last update: Nov 11, 2024]

Description

In recent years, the task of automatically creating subtitles for audiovisual content in another language has gained increasing attention, as we have seen a surge in the amount of movies, series, and user-generated videos that are being streamed and distributed all over the world. The task is a multi-faceted one: not only the translation has to be generated from speech, which is right on target for IWSLT, but also the subtitling constraints have to be respected (i.e. a proper reading speed, synchrony with the voices, maximum number of subtitle lines and characters per line).

As in 2024, we propose two sub-tracks:

  • automatic subtitling: participants are asked to generate subtitles in Arabic and/or German of different kinds of audiovisual documents, featuring different levels of complexity, starting from English speech
  • subtitle compression: participants are asked to rephrase subtitles that are non-compliant with the reading speed constraint (at most 21 characters / second) to make them compliant

and two language directions:

  • 🆕 English → Arabic
  • English → German

Training and Data Conditions

🔵 AUTOMATIC SUBTITLING 🔵

Data will be released in January.

🔵 SUBTITLE COMPRESSION 🔵

Data will be released in January.

Submission

🔵 AUTOMATIC SUBTITLING 🔵

  • Multiple run submissions are allowed, but participants must explicitly indicate one PRIMARY run; all other run submissions are treated as CONTRASTIVE runs. In the case that none of the runs is marked as PRIMARY, the latest submission (according to the file timestamp) will be used as the PRIMARY run
  • Submissions have to be sent as a gzipped TAR archive (see format below)
  • Submissions must include both test24 and test25
  • Submission files have to be stored as SRT (SubRip File Format) UTF-8 encoded files
  • For each element of test sets, provide the subtitles in an SRT file whose name includes the file identifier (number) of the video

TAR archive file structure:

< UserID >/IWSLT25.Subtitling.< Domain >_< Set >_< VdId >.< Lang >.< UserID >.primary.srt 
  /IWSLT24.Subtitling.< Domain >_< Set >_< VdId >.< Lang >.< UserID >.contrastive1.srt  
  /IWSLT24.Subtitling.< Domain >_< Set >_< VdId >.< Lang >.< UserID >.contrastive2.srt  
  /...  

where:

< UserID > = user ID of participant; use the short name chosen in the registration form
< Set > = one of {tst25, tst24}
< VdId > = numeric identifier of the video
< Domain > = one of {Peloton, ITV}
< Lang > = one of {en-ar.ar, en-de.de} (ISO 639-1 two-letter codes of languages)

Example:

FBK/IWSLT25.Subtitling.ITV_tst24_15.en-de.de.FBK.primary.srt

Submissions must be sent as an email attachment to these two addresses:

  • cettolo AT fbk DOT eu
  • ematusov AT apptek DOT com

The email should also include the following information:

  • Institute/Company:
  • Contact Person:
  • Email:
  • Track: Automatic Subtitling
  • Data condition: Constrained/Unconstrained
  • Brief abstract about the system:
  • Do you want to make your submissions freely available for research purposes? (yes/no)

🔵 SUBTITLE COMPRESSION 🔵

  • Submission files have to be stored as SRT (SubRip File Format) UTF-8 encoded files as the original input.
    • ONLY CHANGES REGARDING THE TEXTUAL CONTENT ARE ADMITTED, THE TIMESTAMP OF THE SUBTITLES CAN NOT BE CHANGED!
  • For each element of test sets, provide the subtitles in an SRT file whose name includes the file identifier (number) of the video.
  • Submissions have to be sent as a gzipped TAR archive.
  • Multiple run submissions are allowed, but participants must explicitly indicate one PRIMARY run; all other run submissions are considered CONTRASTIVE runs. In the case that none of the runs is marked as PRIMARY, the latest submission (according to the file timestamp) will be used as the PRIMARY run.

TAR archive file structure:

< UserID >/IWSLT25.SubtitleCompression.< VdId >.< Lang >.< UserID >.primary.srt 
  /IWSLT25.SubtitleCompression.< VdId >.< Lang >.< UserID >.contrastive1.srt  
  /IWSLT25.SubtitleCompression.< VdId >.< Lang >.< UserID >.contrastive2.srt  
  /...  

where:

< UserID > = user ID of the participant; use the short name chosen in the registration form
< VdId > = numeric identifier of the video
< Lang > = one of {en-ar.ar, en-de.de} (ISO 639-1 two-letter codes of languages)

Example:

FBK/IWSLT25.SubtitleCompression.15.en-de.de.FBK.primary.srt

Submissions must be sent as an email attachment to these two addresses:

  • cettolo AT fbk DOT eu
  • ematusov AT apptek DOT com

The email should also include the following information:

  • Institute/Company:
  • Contact Person:
  • Email:
  • Track: Subtitle Compression
  • Brief abstract about the system/method:
  • Do you want to make your submissions freely available for research purposes? (yes/no)

Evaluation

DISCLAIMER: It is expected that participants will use only the audio track from the provided videos (dev and test sets), the video track being of low quality and provided primarily as a means to verify time synchronicity and other aspects of displaying s ubtitles on the screen.

🔵 AUTOMATIC SUBTITLING 🔵

The evaluation of subtitling quality is a complex problem on its own since both the translation quality and the compliance with subtitling constraints have to be considered at the same time. We adopt the following metrics, where limits of acceptability for the conformity metrics (LPB, CPS, CPL) are set following the TED guidelines:

Scoring will be case-sensitive and will include the punctuation.

🔵 SUBTITLE COMPRESSION 🔵

For the subtitle compression sub-track, the following metrics will be used:

BLEURT and CPS will be both considered as primary scores since the texts should be compressed without significantly impacting their quality.

Scoring will be case-sensitive and will include the punctuation.

Organizers

  • Mauro Cettolo, FBK
  • Evgeny Matusov, AppTek
  • Matteo Negri, FBK
  • Marco Turchi, Zoom Video Communications
  • Patrick Wilken, AppTek

Contact

Chair(s):

  • Mauro Cettolo, FBK, Italy
  • Evgeny Matusov, AppTek, Germany

Discussion: iwslt-evaluation-campaign@googlegroups.com