Monday, September 9, 2024

Artificial Intelligence news

Roblox is launching a...

Roblox plans to roll out a generative AI tool that will let...

What this futuristic Olympics...

The Olympic Games in Paris just finished last month and the Paralympics...

AI’s impact on elections...

This year, close to half the world’s population has the opportunity to...

Here’s how ed-tech companies...

This story is from The Algorithm, our weekly newsletter on AI. To...
HomeMachine LearningPre-trained Model Representations...

Pre-trained Model Representations and their Robustness against Noise for Speech Emotion Analysis



Pre-trained model representations have demonstrated state-of-the-art performance in speech recognition, natural language processing, and other applications. Speech models, such as Bidirectional Encoder Representations from Transformers (BERT) and Hidden units BERT (HuBERT), have enabled generating lexical and acoustic representations to benefit speech recognition applications. We investigated the use of pre-trained model representations for estimating dimensional emotions, such as activation, valence, and dominance, from speech. We observed that while valence may rely heavily on lexical…



Article Source link and Credit

Continue reading

Positional Description for Numerical Normalization

We present a Positional Description Scheme (PDS) tailored for digit sequences, integrating placeholder value information for each digit. Given the structural limitations of subword tokenization algorithms, language models encounter critical Text Normalization (TN) challenges when handling numerical tasks....

AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition

Audio-visual speech contains synchronized audio and visual information that provides cross-modal supervision to learn representations for both automatic speech recognition (ASR) and visual speech recognition (VSR). We introduce continuous pseudo-labeling for audio-visual speech recognition (AV-CPL), a semi-supervised method...

Novel-View Acoustic Synthesis From 3D Reconstructed Rooms

We investigate the benefit of combining blind audio recordings with 3D scene information for novel-view acoustic synthesis. Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we...