Monday, June 24, 2024

Artificial Intelligence news

Synthesia’s hyperrealistic deepfakes will...

Startup Synthesia’s AI-generated avatars are getting an update to make them even...

How underwater drones could...

A potential future conflict between Taiwan and China would be shaped by...

How generative AI could...

First, a confession. I only got into playing video games a little...

I tested out a...

This story first appeared in China Report, MIT Technology Review’s newsletter about...
HomeMachine LearningEfficient Multimodal Neural...

Efficient Multimodal Neural Networks for Trigger-less Voice Assistants



The adoption of multimodal interactions by Voice Assistants (VAs) is growing rapidly to enhance human-computer interactions. Smartwatches have now incorporated trigger-less methods of invoking VAs, such as Raise To Speak (RTS), where the user raises their watch and speaks to VAs without an explicit trigger. Current state-of-the-art RTS systems rely on heuristics and engineered Finite State Machines to fuse gesture and audio data for multimodal decision-making. However, these methods have limitations, including limited adaptability, scalability, and induced human biases. In this work, we…



Article Source link and Credit

Continue reading

Conformer-Based Speech Recognition on Extreme Edge-Computing Devices

This paper was accepted at the Industry Track at NAACL 2024. With increasingly more powerful compute capabilities and resources in today’s devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect...

AGRaME: Any Granularity Ranking with Multi-Vector Embeddings

Ranking is a fundamental and popular problem in search. However, existing ranking algorithms usually restrict the granularity of ranking to full passages or require a specific dense index for each desired level of granularity. Such lack of flexibility...

Time Sensitive Knowledge Editing through Efficient Finetuning

Large Language Models (LLMs) have demonstrated impressive capability in different tasks and are bringing transformative changes to many domains. However, keeping the knowledge in LLMs up-to-date remains a challenge once pretraining is complete. It is thus essential to...