The ISCA SIGSLT lecture series is held periodically on Zoom, with recordings posted on YouTube for those who cannot join live.
Zoom links are distributed through the SIGSLT google group.

State of the art and current challenges of MT for sign to spoken languages

Dr. Dimitar Shterionov, assistant professor in the Department of Cognitive Science and Artificial Intelligence at Tilburg University
Dr. Mirella De Sisto, postdoctoral researcher in the Department of Cognitive Science and Artificial Intelligence at Tilburg University

Date:
Wed 14 Sept 2022, 15:00 UTC (12am JST / 5pm CEST / 11am EST / 8am PST)

Recording:
Watch on YouTube

Abstract:
Machine translation (MT) is a core technique for reducing language barriers that has advanced and seen many breakthroughs since it began in the 1950s, to reach quality levels comparable to humans. Despite the significant advances of MT for spoken languages in the recent couple of decades, MT is in its infancy when it comes to SLs. The complexity of the problem, automatically translating between SLs or SL and spoken languages, requires multi-disciplinary research. SLMT is lagging behind, in comparison to MT for spoken languages, for a number of reasons linked to challenges of technical as well as linguistic nature.

In this presentation, we discuss current approaches to SLMT and will dive into some of the challenges that the field is facing. We focus on those challenges which are related to linguistic properties of sign languages and to the format of the data available. We discuss possible approaches to some of them and, by doing that, we introduce our work in progress.

Bio:
Dr. Dimitar Shterionov is an assistant professor in the Department of Cognitive Science and Artificial Intelligence at Tilburg University. He is the scientific coordinator of the SignON project. His current research focuses on low-resource MT, and, in particular, on MT for Sign Languages. He is an experienced researcher in MT and NLP, and an expert in a plethora of subtopics of MT such as quality estimation, automatic post-editing, evaluation of MT quality and usability, and speech-to-text and text-to-speech translation.

Dr. Mirella De Sisto is a postdoctoral researcher in the Department of Cognitive Science and Artificial Intelligence at Tilburg University. As a member of the SignON project, her current research focus is Sign Language MT and how to support its development with a linguistically-informed approach. Her expertise lies in the interface between linguistics and NLP and MT.

Towards Streaming Speech Translation

Javier Iranzo-Sánchez, PhD student at the Universitat Politècnica de València.

Date:
Tues 14 June 2022, 08:00 UTC (5pm JST / 10am CET / 4am EST / 1am PST)

Recording:
Watch on YouTube

Abstract:
This seminar introduces the task of Streaming Speech Translation, which is the task of translating an unbounded audio stream under real-time constraints. The different components of the Speech Translation pipeline (Automatic Speech Recognition, Segmentation, Simultaneous Translation, and Evaluation) will be explained in detail, paying special attention to some overlooked details that differ from the standard Speech Translation task.

Bio:
Javier Iranzo-Sánchez is a PhD student at the Universitat Politècnica de València, under the supervision of Professor Jorge Civera and Professor Alfons Juan. His research topic is Simultaneous Speech Translation, and he has published multiple publications at top ASR and MT conferences. Currently, he is working as a Research Scientist at Apptek while finishing his thesis.

Isochrony for Automatic Dubbing

Prashant Mathur, Senior Applied Scientist in Amazon AI.

Date:
Thurs 5 May 2022, 23:00 UTC (8am Fri JST / 1am Fri CET / 7pm Thurs EST / 4pm Thurs PST)

Recording:
Watch on YouTube

Abstract:
In this talk, I will focus on the problem of isochrony in automatic dubbing for media localization where the goal is to maintain synchrony of translated dialogues and original video when the speakers are on-screen. This problem requires that we first identify the location of pauses in the translation and then ensure the speech-pause arrangement as in the source audio. I will first introduce our approaches on aligning the pauses from source speech to target translation (prosodic alignment), then review a recent work on isometric machine translation and a follow-up work on isochrony aware MT. I will end the talk by providing a brief overview of the Isometric SLT shared task we are organizing as a part of IWSLT.

Bio:
Prashant is a Senior Applied Scientist in Amazon AI. His research focuses on improving the state of the art for machine translation, and domain adaptation with special attention to the problem of Automatic Dubbing. Prior to Amazon, he was a research scientist at eBay working on language generation with structured data and machine translation for eBay’s catalog content.

Towards Augmented Speech Translation: Joint Speech Translation and Named Entity Recognition

Marco Gaido, Ph.D. student at Fondazione Bruno Kessler (FBK), Italy

Date:
Wed 6 April 2022, 16:00 UTC (1am JST / 6pm CET / 12pm EST / 9am PST)

Recording:
Watch on YouTube

Abstract:
Translation is a complex task involving different levels of understanding of the content being handled. The process involves grasping the semantic meaning of the source, which is conveyed through the mentioned (named) entities, the specific terminology indicating peculiar elements, and the relationships between them. However, these aspects remain implicit in the output of automatic translation systems exclusively designed to generate fluent and adequate text. Going beyond these standard output quality objectives, we envisage “augmented translation” as the task of jointly generating the output together with explicit meaning-related enrichments suitable for downstream use. In this talk, we will discuss the application of this idea to the domain of live interpreting, in which handling named entities represents a daunting task. In this framework, we will present our work on systems that jointly translate speech and recognize named entities, discussing the main challenges, possible approaches, data requirements, experimental results, and future research directions.

Simultaneous Speech-to-Speech Translation with Transformer-based Incremental ASR, MT, and TTS

Katsuhito Sudoh Ph.D., Associate Professor at the Nara Institute of Science and Technology (NAIST), Japan

Date:
Tue 1 March 2022, 8:00 UTC (5pm JST / 9am CET / 12am PCT)

Recording:
Watch on YouTube

Abstract:
I’ll talk about our English-to-Japanese simultaneous speech-to-speech translation (S2ST) system. It has three Transformer-based incremental processing modules for S2ST: automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS). We also evaluated its system-level latency in addition to the module-level latency and accuracy.
(This work was originally presented at Oriental COCOSDA 2021).