Tuesday, December 10, 2024

Artificial Intelligence news

How to use Sora,...

MIT Technology Review’s How To series helps you get things done.  Today, OpenAI released its...

The US Department of...

The US Department of Defense has invested $2.4 million over two years...

OpenAI’s new defense contract...

At the start of 2024, OpenAI’s rules for how armed forces might...

Google DeepMind’s new AI...

Google DeepMind has unveiled an AI model that’s better at predicting the...
HomeMachine LearningMultimodal Autoregressive Pre-Training...

Multimodal Autoregressive Pre-Training of Large Vision Encoders



*Equal Contributors
A dominant paradigm in large multimodal models is to pair a large language de- coder with a vision encoder. While it is well-known how to pre-train and tune language decoders for multimodal tasks, it is less clear how the vision encoder should be pre-trained. A de facto standard is to pre-train the vision encoder with a discriminative objective, such as contrastive loss. This causes a mismatch between pre-training and the generative autoregressive downstream task. At the same time, following their success in the language domain, autoregressive image models have been shown…



Article Source link and Credit

Continue reading

Memory-Retaining Finetuning via Distillation

This paper was accepted at the Fine-Tuning in Modern Machine Learning: Principles and Scalability (FITML) Workshop at NeurIPS 2024. Large language models (LLMs) pretrained on large corpora of internet text possess much of the world's knowledge. Following pretraining, one...

Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Diffusion models have emerged as a powerful tool for generating high-quality images from textual descriptions. Despite their successes, these models often exhibit limited diversity in the sampled images, particularly when sampling with a high classifier-free guidance weight. To...

Towards Time-Series Reasoning with LLMs

Multi-modal large language models (MLLMs) have enabled numerous advances in understanding and reasoning in domains like vision, but we have not yet seen this broad success for time-series. Although prior works on time-series MLLMs have shown promising performance...