The ISCA SIGSLT lecture series are held periodically on Zoom, with recordings linked here after the talk for those who cannot join live. Zoom links are posted to the SIGSLT google group.
Isochrony for Automatic Dubbing
Prashant Mathur, Senior Applied Scientist in Amazon AI.
Thurs 5 May 2022, 23:00 UTC (8am Fri JST / 1am Fri CET / 7pm Thurs EST / 4pm Thurs PST)
Watch on YouTube
In this talk, I will focus on the problem of isochrony in automatic dubbing for media localization where the goal is to maintain synchrony of translated dialogues and original video when the speakers are on-screen. This problem requires that we first identify the location of pauses in the translation and then ensure the speech-pause arrangement as in the source audio. I will first introduce our approaches on aligning the pauses from source speech to target translation (prosodic alignment), then review a recent work on isometric machine translation and a follow-up work on isochrony aware MT. I will end the talk by providing a brief overview of the Isometric SLT shared task we are organizing as a part of IWSLT.
Prashant is a Senior Applied Scientist in Amazon AI. His research focuses on improving the state of the art for machine translation, and domain adaptation with special attention to the problem of Automatic Dubbing. Prior to Amazon, he was a research scientist at eBay working on language generation with structured data and machine translation for eBay’s catalog content.
Towards Augmented Speech Translation: Joint Speech Translation and Named Entity Recognition
Marco Gaido, Ph.D. student at Fondazione Bruno Kessler (FBK), Italy
Wed 6 April 2022, 16:00 UTC (1am JST / 6pm CET / 12pm EST / 9am PST)
Watch on YouTube
Translation is a complex task involving different levels of understanding of the content being handled. The process involves grasping the semantic meaning of the source, which is conveyed through the mentioned (named) entities, the specific terminology indicating peculiar elements, and the relationships between them. However, these aspects remain implicit in the output of automatic translation systems exclusively designed to generate fluent and adequate text. Going beyond these standard output quality objectives, we envisage “augmented translation” as the task of jointly generating the output together with explicit meaning-related enrichments suitable for downstream use. In this talk, we will discuss the application of this idea to the domain of live interpreting, in which handling named entities represents a daunting task. In this framework, we will present our work on systems that jointly translate speech and recognize named entities, discussing the main challenges, possible approaches, data requirements, experimental results, and future research directions.
Simultaneous Speech-to-Speech Translation with Transformer-based Incremental ASR, MT, and TTS
Katsuhito Sudoh Ph.D., Associate Professor at the Nara Institute of Science and Technology (NAIST), Japan
Tue 1 March 2022, 8:00 UTC (5pm JST / 9am CET / 12am PCT)
Watch on YouTube
I’ll talk about our English-to-Japanese simultaneous speech-to-speech translation (S2ST) system. It has three Transformer-based incremental processing modules for S2ST: automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS). We also evaluated its system-level latency in addition to the module-level latency and accuracy.
(This work was originally presented at Oriental COCOSDA 2021).