Monday, December 9, 2024

Artificial Intelligence news

The US Department of...

The US Department of Defense has invested $2.4 million over two years...

OpenAI’s new defense contract...

At the start of 2024, OpenAI’s rules for how armed forces might...

Google DeepMind’s new AI...

Google DeepMind has unveiled an AI model that’s better at predicting the...

The startup trying to...

A startup called Exa is pitching a new spin on generative search....
HomeMachine LearningComputational Bottlenecks of...

Computational Bottlenecks of Training Small-Scale Large Language Models



This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) workshop at NeurIPS Workshop 2024.
While large language models (LLMs) dominate the AI landscape, Small-scale large Language Models (SLMs) are gaining attention due to cost and efficiency demands from consumers. However, there is limited research on the training behavior and computational requirements of SLMs. In this study, we explore the computational bottlenecks of training SLMs (up to 2B parameters) by examining the effects of various hyperparameters and configurations, including GPU type, batch size…



Article Source link and Credit

Continue reading

Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Diffusion models have emerged as a powerful tool for generating high-quality images from textual descriptions. Despite their successes, these models often exhibit limited diversity in the sampled images, particularly when sampling with a high classifier-free guidance weight. To...

Towards Time-Series Reasoning with LLMs

Multi-modal large language models (MLLMs) have enabled numerous advances in understanding and reasoning in domains like vision, but we have not yet seen this broad success for time-series. Although prior works on time-series MLLMs have shown promising performance...

Private and Personalized Frequency Estimation in a Federated Setting

*Equal Contributors Motivated by the problem of next word prediction on user devices we introduce and study the problem of personalized frequency histogram estimation in a federated setting. In this problem, over some domain, each user observes a number...