A very minimal thing about scale in Automatic Speech Recognition


On January 27, 2023 at 13:30, Sean Robertson will present A very minimal thing about scale in Automatic Speech Recognition.

The Zoom link for the meeting is here. The password will be distributed through the mailing list.

Abstract

Psychoacoustic experimentation strongly suggests that humans process speech’s time-frequency geometry non-uniformly: to cover the same perceptual distance, intervals in time or frequency must be larger the further away they start from origin. This relationship is well known in the case of frequency versus pitch, and steps are taken to standardize (linearize) distances in frequency space before being fed into Automatic Speech Recognition (ASR) systems. However, little attention has been paid to the associated non-uniformity in time. In this talk, I will introduce the mathematical formalism of “scale” and connect it to the aforementioned geometries over time and frequency. I discuss geometric learning in ASR in the uniform case and how to adapt it to the scale case. I introduce a brand-new neural network layer and pit it against the uniform layer in the TIMIT ASR task. I cover our results and future directions.