Friday, March 21, 2025

Artificial Intelligence news

Powering the food industry...

There has never been a more pressing time for food producers to...

When you might start...

Last Wednesday, Google made a somewhat surprising announcement. It launched a version...

Is Google playing catchup...

This story originally appeared in The Debrief with Mat Honan, a weekly newsletter...

Gemini Robotics uses Google’s...

Google DeepMind has released a new model, Gemini Robotics, that combines its...
HomeMachine LearningCompact Neural TTS...

Compact Neural TTS Voices for Accessibility



Contemporary text-to-speech solutions for accessibility applications can typically be classified into two categories: (i) device-based statistical parametric speech synthesis (SPSS) or unit selection (USEL) and (ii) cloud-based neural TTS. SPSS and USEL offer low latency and low disk footprint at the expense of naturalness and audio quality. Cloud-based neural TTS systems provide significantly better audio quality and naturalness but regress in terms of latency and responsiveness, rendering these impractical for real-world applications. More recently, neural TTS models were made deployable to…



Article Source link and Credit

Continue reading

M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference

Residual transformations enhance the representational depth and expressive power of large language models (LLMs). However, applying static residual transformations across all tokens in auto-regressive generation leads to a suboptimal trade-off between inference efficiency and generation fidelity. Existing methods,...

Does Spatial Cognition Emerge in Frontier Models?

Not yet. We present SPACE, a benchmark that systematically evaluates spatial cognition in frontier models. Our benchmark builds on decades of research in cognitive science. It evaluates large-scale mapping abilities that are brought to bear when an organism...

SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions

In this work, we present and evaluate SELMA, a Speech-Enabled Language Model for virtual Assistant interactions that integrates audio and text as inputs to a Large Language Model (LLM). SELMA is designed to handle three primary and two...