Monday, September 9, 2024

Artificial Intelligence news

Roblox is launching a...

Roblox plans to roll out a generative AI tool that will let...

What this futuristic Olympics...

The Olympic Games in Paris just finished last month and the Paralympics...

AI’s impact on elections...

This year, close to half the world’s population has the opportunity to...

Here’s how ed-tech companies...

This story is from The Algorithm, our weekly newsletter on AI. To...
HomeMachine LearningLearning to Detect...

Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations





This work investigates pre-trained audio representations for few shot Sound Event Detection. We specifically address the task of few shot detection of novel acoustic sequences, or sound events, with semantically meaningful temporal structure without assuming access to non-target audio. We develop procedures for pre-training suitable representations and methods that transfer them to our few shot learning scenario. Our experiments evaluate the general purpose utility of our pre-trained representations on AudioSet, and the utility of proposed few shot methods via tasks constructed from…



Article Source link and Credit

Continue reading

Positional Description for Numerical Normalization

We present a Positional Description Scheme (PDS) tailored for digit sequences, integrating placeholder value information for each digit. Given the structural limitations of subword tokenization algorithms, language models encounter critical Text Normalization (TN) challenges when handling numerical tasks....

AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition

Audio-visual speech contains synchronized audio and visual information that provides cross-modal supervision to learn representations for both automatic speech recognition (ASR) and visual speech recognition (VSR). We introduce continuous pseudo-labeling for audio-visual speech recognition (AV-CPL), a semi-supervised method...

Novel-View Acoustic Synthesis From 3D Reconstructed Rooms

We investigate the benefit of combining blind audio recordings with 3D scene information for novel-view acoustic synthesis. Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we...