Saturday, January 25, 2025

Artificial Intelligence news

How a top Chinese...

The AI community is abuzz over DeepSeek R1, a new open-source reasoning...

What’s next for robots

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies...

OpenAI launches Operator—an agent...

After weeks of buzz, OpenAI has released Operator, its first AI agent....

Implementing responsible AI in...

Many organizations have experimented with AI, but they haven’t always gotten the...
HomeMachine LearningWhat Algorithms can...

What Algorithms can Transformers Learn? A Study in Length Generalization



This paper was accepted at the MATH workshop at NeurIPS 2023.
Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for solving a task. We study the scope of Transformers’ abilities in the specific setting of length generalization on algorithmic tasks. Here, we propose a unifying framework to understand when and how Transformers can exhibit strong length generalization on a given task. Specifically, we…



Article Source link and Credit

Continue reading

Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition

This paper presents an efficient decoding approach for end-to-end automatic speech recognition (E2E-ASR) with large language models (LLMs). Although shallow fusion is the most common approach to incorporate language models into E2E-ASR decoding, we face two practical problems...

Interpreting CLIP: Insights on the Robustness to ImageNet Distribution Shifts

What distinguishes robust models from non-robust ones? While for ImageNet distribution shifts it has been shown that such differences in robustness can be traced back predominantly to differences in training data, so far it is not known what...

Controlling Language and Diffusion Models by Transporting Activations

The increasing capabilities of large generative models and their ever more widespread deployment have raised concerns about their reliability, safety, and potential misuse. To address these issues, recent works have proposed to control model generation by steering model...