Amazon scientists'​ work from Interspeech 2022

Amazon scientists' work from Interspeech 2022

Learn more about the work Amazon researchers presented at Interspeech 2022—the world's largest and most comprehensive conference on the science and technology of spoken-language processing.

Amazon’s 40-plus papers at Interspeech 2022

Amazon researchers had more than 40 papers accepted, ranging from topics such as automatic speech recognition and text-to-speech to acoustic watermarking and automatic dubbing.

The training behavior of the algorithm proposed in "Sub-8-bit quantization aware training for 8-bit neural network accelerator with on device speech recognition"​, in which weights are optimized to lower quantization loss.

The growth of interdisciplinary research

Senior applied scientist Penny Karanasou was an area and session chair for Interspeech 2022. Across her career, she has worked on speech recognition, language understanding, and text-to-speech. Find out why cross-pollination of speech-related fields intrigues her and how the conference program reflected that.

Alexa speech science developments

Illustration of the arbitrator and Transformer backbone of each block. The lightweight arbitrator toggles whether to evaluate subcomponents during the forward pass.

Alexa AI senior principal scientist Andreas Stolcke highlighted some speech-related papers, focusing on end-to-end models and fairness. He also wrote about the techniques Amazon scientists are using, like toggling neural blocks on and off, adding multiple CNN front ends to RNN-T models, and adversarial reweighting.

Alexa’s spoken-language-understanding research

Alexa AI senior principal scientist  Gokhan Tur  selected papers that covered a wide range of topics in spoken-language understanding—like learning from noisy data, using phonetic embeddings to improve entity resolution, and quantization-aware training.

The architecture of the weighted-sum model.

Alexa’s text-to-speech research

A new approach to building expressive text-to-speech voices can make do with only an hour of expressive speech from the target speaker.

Senior applied scientist Antonio Bonafonte  wrote about work being done on transference—of prosody, accent, and speaker identity—in text-to-speech, and the new ways scientists have used tools like normalizing flows and variational autoencoders.



Get a monthly digest of the latest news, research papers, conferences, and career opportunities at Amazon, by signing up for our newsletter.

Matthew Hepburn

Principal Product Marketing Manager, Amazon Science

1y

👏👏

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics