Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
video_speech_translation [2020/02/14 22:00]
fhuang [Data]
video_speech_translation [2020/03/17 16:43]
fhuang [Video Speech Translation]
Line 4: Line 4:
   - [[video_speech_translation#​Data|Data]]  ​   - [[video_speech_translation#​Data|Data]]  ​
   - [[video_speech_translation#​Baselines|Baselines]]  ​   - [[video_speech_translation#​Baselines|Baselines]]  ​
 +  - [[video_speech_translation#​Submission Guidelines|Submission Guidelines]]  ​
   - [[video_speech_translation#​Evaluation|Evaluation]]  ​   - [[video_speech_translation#​Evaluation|Evaluation]]  ​
   - [[video_speech_translation#​Contacts|Contact Info]] ​   - [[video_speech_translation#​Contacts|Contact Info]] ​
Line 29: Line 30:
   - Chinese:   - Chinese:
      * [[http://​www.openslr.org/​33/​|Aishell]]:​ 170 hours with 400 speakers ​      * [[http://​www.openslr.org/​33/​|Aishell]]:​ 170 hours with 400 speakers ​
-     * [[https://​voice.mozilla.org/​zh-CN/​datasets|Common Voice (Chinese-China)]]: ​11 hours with 288 speakers ​+     * [[https://​voice.mozilla.org/​zh-CN/​datasets|Common Voice (Chinese-China)]]: ​26 hours  
             ​             ​
 == MT == == MT ==
Line 36: Line 37:
  
   - Chinese-English:​   - Chinese-English:​
-      * [[http://​data.statmt.org/​news-commentary/​v14/|News Commentary ​v14]]+      * [[http://​data.statmt.org/​news-commentary/​v15/|News Commentary ​v15]]
       * [[http://​data.statmt.org/​wikititles/​v1/​wikititles-v1.zh-en.tsv.gz|Wiki Title v1]]       * [[http://​data.statmt.org/​wikititles/​v1/​wikititles-v1.zh-en.tsv.gz|Wiki Title v1]]
       * [[https://​cms.unov.org/​UNCorpus/​|UN Parallel Corpus V1.0 (register and download)]]       * [[https://​cms.unov.org/​UNCorpus/​|UN Parallel Corpus V1.0 (register and download)]]
Line 43: Line 44:
       * [[https://​s3.amazonaws.com/​web-language-models/​paracrawl/​release1/​paracrawl-release1.en-ru.zipporah0-dedup-clean.tgz|ParaCrawl v3]]       * [[https://​s3.amazonaws.com/​web-language-models/​paracrawl/​release1/​paracrawl-release1.en-ru.zipporah0-dedup-clean.tgz|ParaCrawl v3]]
       * [[http://​www.statmt.org/​wmt13/​training-parallel-commoncrawl.tgz|Common Crawl]]       * [[http://​www.statmt.org/​wmt13/​training-parallel-commoncrawl.tgz|Common Crawl]]
-      * [[http://​data.statmt.org/​news-commentary/​v14/|News Commentary ​v14]]+      * [[http://​data.statmt.org/​news-commentary/​v15/|News Commentary ​v15]]
       * [[https://​translate.yandex.ru/​corpus?​lang=en|Yandex Corpus (register and download)]]       * [[https://​translate.yandex.ru/​corpus?​lang=en|Yandex Corpus (register and download)]]
-      * [[http://​data.statmt.org/​wikititles/​v1/​wikititles-v1.zh-en.tsv.gz|Wiki Title v1]]+      * [[http://​data.statmt.org/​wikititles/​v1/​wikititles-v1.ru-en.tsv.gz|Wiki Title v1]]
       * [[https://​cms.unov.org/​UNCorpus/​|UN Parallel Corpus V1.0 (register and download)]]       * [[https://​cms.unov.org/​UNCorpus/​|UN Parallel Corpus V1.0 (register and download)]]
   ​   ​
Line 62: Line 63:
 The **unseen test** will be released when the evaluation is due. The **unseen test** will be released when the evaluation is due.
  
-Chinese-English dev set: +**Chinese-English** 
 + 
 +dev set: 
       * [[https://​github.com/​nguyenbh/​iwslt2020_video_translation|v0.0.2,​ Feb 12, 2020]]       * [[https://​github.com/​nguyenbh/​iwslt2020_video_translation|v0.0.2,​ Feb 12, 2020]]
 +      * [[https://​github.com/​nguyenbh/​iwslt2020_video_translation|v0.0.3,​ Feb 27, 2020]]
 +      * [[https://​github.com/​nguyenbh/​iwslt2020_video_translation|v0.0.4,​ Feb 28, 2020]]
 +
 +- test set: 
 +      * [[https://​github.com/​nguyenbh/​iwslt2020_video_translation|v0.0.5,​ March 17, 2020]]
 +
 +**English-Russian**
 +
 +- dev set: 
 +      * [[https://​github.com/​nguyenbh/​iwslt2020_video_translation|v0.0.4,​ Feb 28, 2020]]
 +
 +- test set: 
 +      * [[https://​github.com/​nguyenbh/​iwslt2020_video_translation|v0.0.5,​ March 17, 2020]]
 +
 +
  
  
Line 70: Line 88:
  
 - Constrained track - Constrained track
-    * ASR: IWSLT organizers provides English engine, and we can provide a Kaldi-based Chinese system +    * ASR: IWSLT organizers provides English engine, and we can provide a Kaldi-based Chinese system. 
-    MTWe provide transformer-based systems for Chinese-(English,​ Russian, Japanese) and English-(Russian,​ Vietnamese)+        [[http://kaldi-asr.org/​models/​m2|CVTE Mandarin Model V2]] 
 - Unconstrained track: Participants are encouraged to use whatever resources to build video translation systems. we will provide ASR and MT outputs from Online systems as baseline. - Unconstrained track: Participants are encouraged to use whatever resources to build video translation systems. we will provide ASR and MT outputs from Online systems as baseline.