Tuesday, December 10, 2024

Artificial Intelligence news

How to use Sora,...

MIT Technology Review’s How To series helps you get things done.  Today, OpenAI released its...

The US Department of...

The US Department of Defense has invested $2.4 million over two years...

OpenAI’s new defense contract...

At the start of 2024, OpenAI’s rules for how armed forces might...

Google DeepMind’s new AI...

Google DeepMind has unveiled an AI model that’s better at predicting the...
HomeMachine LearningSpeech is More...

Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?



This paper was accepted at the Ninth Conference on Machine Translation (WMT24) at EMNLP 2024.
The prosody of a spoken utterance, including features like stress, intonation and rhythm, can significantly affect the underlying semantics, and as a consequence can also affect its textual translation. Nevertheless, prosody is rarely studied within the context of speech-to-text translation (S2TT) systems. In particular, end-to-end (E2E) systems have been proposed as well-suited for prosody-aware translation because they have direct access to the speech signal when making translation decisions, but…



Article Source link and Credit

Continue reading

Memory-Retaining Finetuning via Distillation

This paper was accepted at the Fine-Tuning in Modern Machine Learning: Principles and Scalability (FITML) Workshop at NeurIPS 2024. Large language models (LLMs) pretrained on large corpora of internet text possess much of the world's knowledge. Following pretraining, one...

Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Diffusion models have emerged as a powerful tool for generating high-quality images from textual descriptions. Despite their successes, these models often exhibit limited diversity in the sampled images, particularly when sampling with a high classifier-free guidance weight. To...

Towards Time-Series Reasoning with LLMs

Multi-modal large language models (MLLMs) have enabled numerous advances in understanding and reasoning in domains like vision, but we have not yet seen this broad success for time-series. Although prior works on time-series MLLMs have shown promising performance...