Researchers at the Barcelona Supercomputing Center—Centro Nacional de Supercomputación (BSC-CNS) and the Universitat Politècnica de Catalunya (UPC) have developed a tool for research into automatic sign language translation that uses artificial intelligence to break down some of the communication barriers commonly faced by deaf people and those that are hard of hearing.
Despite advances in voice recognition technologies such as Alexa and Siri, sign languages are still not included in these applications that are increasingly present in everyday life in many households. This imposes a barrier for people who rely on sign language as their preferred mode of communication to interact with technology and access digital services designed only for spoken languages.
The development of this new open source software is an important step towards making communication accessible and barrier-free for all people. To this end, BSC and UPC researchers have combined computer vision, natural language processing and machine learning techniques to advance research in automatic sign language translation, a complex problem due in part to the variability and large number of sign languages in the world.
The system, still in an experimental phase, uses a machine learning model called Transformers, which is the basis of other artificial intelligence tools such as ChatGPT, to convert entire sign language sentences in video format to spoken language in text format. It is currently focused on American Sign Language (ASL) but could be adapted to any other language as long as all the necessary data is available, i.e., there is a corpus with parallel data where each sign language sentence (in video format) has a corresponding translation into spoken language (in text format).
"The new tool developed is an extension of a previous publication also by BSC and the UPC called How2Sign, where the data needed to train the models (more than 80 hours of videos where American Sign Language interpreters translate video tutorials such as cooking recipes or DIY tricks) were published. With this data already available, the team has developed a new open source software capable of learning the mapping between video and text," says Laia Tarrés, researcher at BSC and UPC, who presented the publication of the new model to coincide with the celebration of Global Accessibility Awareness Day.
Step toward real application
The researchers say that this new work is a step in the right direction, but they also stress that there is still much room for improvement. These are the first results, which, for the moment, do not allow for the creation of a concrete application to serve users. The aim is to continue working to improve the tool and obtain a real application that will promote the creation of accessible technologies for deaf and hard of hearing people.
The project has already been presented at the Fundación Telefónica space in Madrid as part of the exhibition "Code and algorithms. Sense in a calculated world" which, with a prominent presence of BSC, brings together different projects related to artificial intelligence. It will also soon be on display at the Center de Cultura Contemporània de Barcelona (CCCB) as part of a major exhibition also on artificial intelligence that will open in October.
"This open tool for automatic sign language translation is a valuable contribution to the scientific community focused on accessibility, and its publication represents a significant step towards the creation of more inclusive and accessible technology for all," concludes Tarrés.
The paper is published on the arXiv preprint server.
More information: Laia Tarrés et al, Sign Language Translation from Instructional Videos, arXiv (2023). DOI: 10.48550/arxiv.2304.06371
Journal information: arXiv
Provided by Barcelona Supercomputing Center