Constrained training data
Data typesrc langtgt langTraining corpus (URL)VersionComment
speechen--LibriSpeechv12
speechen--How2
speechen--Mozilla Common Voicev11.0
speechen--TED LIUMV2/V3
speechen--Vox Populi
speech-to-text-parallelenallMuST-Cv1.2/v2.0/v3.0(10) ar, zh, nl, fr, de, ja, fa, pt, ru, tr
speech-to-text-parallelenallCoVoSTv2(10) ar, zh, nl, fr, de, ja, fa, pt, ru, tr
speech-to-text-parallelenallEuroparl-STv1.1(4) fr, de, pt, tr
text-parallelenallEuroparlv10(2) fr, de
text-parallelenallEuroparlv7(4) nl, fr, de, pt
text-parallelenallNewsCommentaryv16(8) ar, zh, nl, fr, de, ja, pt, ru
text-parallelenallOpenSubtitlesv2018(10) ar, zh, nl, fr, de, ja, fa, pt, ru, tr
text-parallelendeTED2020v1(1) de
text-parallelenallTatoebav2022-03-03(10) ar, zh, nl, fr, de, ja, fa, pt, ru, tr
text-parallelendeELRC-CORDIS_Newsv1(1) de