Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
non_native_speech_translation [2020/01/21 16:12]
obojar
non_native_speech_translation [2020/03/17 21:47] (current)
obojar releasing test set
Line 53: Line 53:
 == Development Set == == Development Set ==
  
-A small development set will be released by the end of January 2020.+  * Devset v1: [[http://​ufallab.ms.mff.cuni.cz/​~bojar/​iwslt2020/​iwslt2020-nonnat-minidevset-v1.tar.gz|iwslt2020-nonnat-minidevset-v1.tar.gz]] (137 MB) 
 +  * Devset v2: [[http://​ufallab.ms.mff.cuni.cz/​~bojar/​iwslt2020/​iwslt2020-nonnat-minidevset-v2.tar.gz|iwslt2020-nonnat-minidevset-v2.tar.gz]] (149 MB, supersedes devset v1)
  
-The development ​set will illustrate ​the intended domains ​but it may be too small for reliable measurements.+The dev set illustrates **file formats**, including **expected output formats**. 
 + 
 +Dev set v1 contained only a few sample files. Dev set v2 includes new files better illustrating ​the domain of the test set, but the reference translations are still not available ​for all files. 
 + 
 + 
 +== Test Set == 
 + 
 +  * Testset: [[http://​ufallab.ms.mff.cuni.cz/​~bojar/​iwslt2020/​iwslt2020-nonnat-testset.tar.gz|iwslt2020-nonnat-testset.tar.gz]] (270 MB) 
 + 
 +Please process all the files in the test set and produce formats as illustrated in the dev set.
  
 == File Format of ASR Candidates == == File Format of ASR Candidates ==
Line 64: Line 74:
  
 <​code>​ <​code>​
-600 500 Good +60 Good 
-800 650 Good mor +80 65 Good mor 
-1130 1020 Good morning +113 102 Good morning 
-1300 1190 Good morning how +130 119 Good morning how 
-1480 1400 Good morning. How are +148 140 Good morning. How are 
-2010 1950 Good morning. How are you? +201 195 Good morning. How are you? 
-2010 1020 Good morning. +201 102 Good morning. 
-2200 1020 2180 How are you? I +220 102 218 How are you? I 
-2200 1020 1950 How are you? +220 102 195 How are you? 
-2450 1950 2390 I am+245 195 239 I am
 ... ...
 </​code>​ </​code>​
Line 80: Line 90:
 For SLT-style submissions (end-to-end speech recognition and translation),​ this file is not required. Please provide it if you can, because it will allow for a more fine-grained evaluation. For SLT-style submissions (end-to-end speech recognition and translation),​ this file is not required. Please provide it if you can, because it will allow for a more fine-grained evaluation.
  
-There are three numbers (time stamps) in each line: **display time**, **start time** and **end time**. All times are measured in milliseconds ​from the start of the sound file.+There are three numbers (time stamps) in each line: **display time**, **start time** and **end time**. All times are measured in centiseconds ​from the start of the sound file.
  
 Display time shows the time when the given line/​sentence was recognized, produced by the ASR system. If your system is not "​on-line"​ in any sense, you can report 0 on all lines. The start and end time indicate the span in which the respective words were uttered in the recording. If your system does not provide timestamps, again report zeros. Display time shows the time when the given line/​sentence was recognized, produced by the ASR system. If your system is not "​on-line"​ in any sense, you can report 0 on all lines. The start and end time indicate the span in which the respective words were uttered in the recording. If your system does not provide timestamps, again report zeros.
Line 97: Line 107:
  
 <​code>​ <​code>​
-600 500 Gut +60 50 Gut 
-800 650 Guten Morgen! +80 65 Guten Morgen! 
-1130 1020 Guten Morgen! +113 102 Guten Morgen! 
-1300 1190 Guten wie morgen +130 119 Guten wie morgen 
-1480 1400 Guten Morgen! Wie geht es? +148 140 Guten Morgen! Wie geht es? 
-2010 1950 Guten Morgen! Wie geht es dir? +201 195 Guten Morgen! Wie geht es dir? 
-2010 1020 Guten Morgen! +201 102 Guten Morgen! 
-2200 1020 2180 Wie geht es dir? Ich +220 102 218 Wie geht es dir? Ich 
-2200 1020 1950 Wie geht es dir? +220 102 195 Wie geht es dir? 
-2450 1950 2390 Ich bin+245 195 239 Ich bin
 ... ...
 </​code>​ </​code>​
Line 125: Line 135:
 Chair: Ondrej Bojar (Charles University, Czech Republic)\\ Chair: Ondrej Bojar (Charles University, Czech Republic)\\
 Ebrahim Ansari (Charles University, Czech Republic)\\ Ebrahim Ansari (Charles University, Czech Republic)\\
 +Sebastian Stüker (KIT, Germany)
  
 Discussion: <​iwslt-evaluation-campaign@googlegroups.com>​ Discussion: <​iwslt-evaluation-campaign@googlegroups.com>​
 +
 +The non-native speech translation task is receiving support from the EU project [[http://​elitr.eu/​|ELITR]] (H2020-ICT-2018-2-825460).