Voice Collector | Aanchan K. Mohan

The need

Today’s speech recognition and voice conversion models are data hungry. There are very few publicly available datasets for those with atypical individual speech patterns. While noticing the lack of data, we did also notice a lack of tools with simple user interfaces that this user population typically needs. So we decided to build a tool that meets all of these requirements.

Inspired by projects such as Mozilla Common Voice and Uncommon Voice our group realized a need for a recording tool that has a very simple user interface that is not overwhelming for target user population. Our design has been inspired by a not-much-talked-about tool called ChitChat. Our prompts have been carefully designed by a speech and language pathologist for appropriate phonetic coverage, along with prompts like numbers, phrases of daily activity and descriptions of images. These assets and prompts are meant to be highly configurable in a text file.

The Tech

The tool has a Python backend with Flask, along with a front-end written using MaterialUI and ReactJS. Each recording session is given a unique identification and each audio file has some metadata attached to it. The metadata for each audio recording is meant to be stored in a relational database. The audio file itself is stored in a blob store in the wav format stored at a 16KHz sampling rate. The entire recording tool is meant to be deployable using Docker. Additional features include audio validation in terms of format checks and silence and volume checks and so on. The project is meant to be released open source on the Apache 2 license.

On Github

Find us here on Github

A screenshot

Screenshot of the VoiceCollector platform