Inclusive speech processing | Aanchan K. Mohan

Improving speech recognition and voice conversion for atypical speech

Many communication tools used by speakers with non-standard speech patterns, such as those with dysarthria, have some redundancy built into them. For example, augmentative and alternative communications (AAC), software often have redundancy built in terms of the same information appearing as:

An icon that represents the object of interest
A textual orthographic representation in terms of words that describe the object of interest
An audio rendition of the words used to describe the object

AAC icons are often not very expressive, and it might easier for the user to just speak in order to communicate. As noted in the image above, state-of-the-art speech recognizers often break when presented with atypical speech.

Our goal is to build robust speech recognizers that can accurately transcribe speech as well as provide a clear audio message for the voice sample that was input by the user.

Foundation models for speech recognition and voice conversion

Towards this end we are focusing on some of the latest deep learning based speech recognition and voice conversion models. We seek to leverage representations learned with semi-supervised learning to transfer knowledge to the domain of speech recognition and voice conversion.

Our work on this project is funded by Google Research through the Research Scholar Program, the Office of the Provost at Northeastern University, and by Happy Prime Inc with support from Innovate BC’s Innovator Skills Initiative Grant.