Saturday, October 12, 2024

Artificial Intelligence news

Google DeepMind wins joint...

In a second Nobel win for AI, the Royal Swedish Academy of...

Adobe wants to make...

Adobe has announced a new tool to help creators watermark their artwork...

Geoffrey Hinton just won...

Geoffrey Hinton, a computer scientist whose pioneering work on deep learning in...

Geoffrey Hinton just won...

Geoffrey Hinton, a computer scientist whose pioneering work on deep learning in...
HomeMachine Learning

Vision-Based Hand Gesture Customization from a Single Demonstration

Hand gesture recognition is becoming a more prevalent mode of human-computer interaction, especially as cameras proliferate across everyday devices. Despite continued progress in this field, gesture customization is often underexplored. Customization is crucial since it enables users to define and demonstrate gestures that are more natural, memorable, and accessible. However, customization requires efficient usage of user-provided data. We introduce a method that enables users to easily design bespoke gestures with a monocular camera from one demonstration. We employ transformers and… Article Source link and Credit

GSM-Symbolic: Understanding the...

Recent advancements in Large Language Models (LLMs) have sparked interest in their formal reasoning capabilities, particularly in mathematics. The GSM8K benchmark is widely...

Contrastive Localized Language-Image...

Contrastive Language-Image Pre-training (CLIP) has been a celebrated method for training vision encoders to generate image/text representations facilitating various applications. Recently, CLIP has...

When is Multicalibration...

Calibration is a well-studied property of predictors which guarantees meaningful uncertainty estimates. Multicalibration is a related notion -- originating in algorithmic fairness --...

On the Limited...

Reinforcement Learning from Human Feedback (RLHF) is an effective approach for aligning language models to human preferences. Central to RLHF is learning a...

Depth Pro: Sharp...

We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and...

Improving How Machine...

Machine Translation (MT) enables people to connect with others and engage with content across language barriers. Grammatical gender presents a difficult challenge for...

UI-JEPA: Towards Active...

Generating user intent from a sequence of user interface (UI) actions is a core challenge in comprehensive UI understanding. Recent advancements in multimodal...

Ferret-UI: Grounded Mobile...

Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend...

Retrieval-Augmented Correction of...

In recent years, end-to-end automatic speech recognition (ASR) systems have proven themselves remarkably accurate and performant, but these systems still have a significant...