Google Showcases Gemini 1.5 AI's Multimodal Abilities at I/O 2024, How It Uses

Technology
Jul 12 2024 10:00 AM
P C Thomas

At I/O 2024, Google unveiled the impressive multimodal capabilities of its Gemini 1.5 AI model. This advanced language model can process inputs like photos, videos, audio, and text, generating intelligent responses. Google's AI team is now utilizing this technology to train robots to navigate their environments effectively.

Google DeepMind recently published a research paper and released several video clips demonstrating how robots can be trained to understand multimodal instructions, including natural language and images, to perform useful navigation tasks. The research focuses on a category of navigation tasks called Multimodal Instruction Navigation with Demonstration Tours (MINT), where the environment is introduced through a previously recorded video demonstration. Advances in Vision Language Models (VLMs) have shown significant potential in achieving these goals.

Training Robots with Gemini 1.5 AI Model
In a thread shared on X (formerly Twitter), Google highlighted the challenge of limited context length in many AI models, which hampers their ability to recall environments. However, the Gemini 1.5 Pro model, with its 1 million token context length, overcomes this limitation, enabling effective robot training for navigation.

Using human instructions, video tours, and common sense reasoning, the robots successfully navigate spaces. Trainers guided the robots through specific areas in real-world settings, emphasizing key locations to remember. The robots were then tasked with leading the trainers to these locations, showcasing their ability to understand and follow multimodal instructions effectively.

Free For All: Google To Make its Dark Web Monitoring Tool

Bhashini's Vaani Project: Open-Sourcing Speech Data Across India

Google Tests New Play Store Feature to Simplify App Ratings for Different Devices, What is Matters For Users

Google Showcases Gemini 1.5 AI's Multimodal Abilities at I/O 2024, How It Uses

"Manifest" Named Word of the Year 2024 by Cambridge Dictionary

Meet Lisa Miller: US Deputy Assistant AG Leading Bribery Allegations Against Adani Group

SCAM ALERT: How to Stop Unwanted SBI Account Notifications and Protect Yourself from Fraud

UP Police Constable Result 2024 Announced: Check Cutoff Scores, Document Verification, and Physical Test Dates

Most Popular

Google Showcases Gemini 1.5 AI's Multimodal Abilities at I/O 2024, How It Uses

Related News

THESE Premium Smartphones Debut in India with Impressive Features and MediaTek Dimensity 9400

How OpenAI Expands Advanced Voice Mode to Desktop for ChatGPT Users

How Instagram Tests New Algorithm Reset Feature for Teen Safety

Apple to Discontinue iCloud Backups for Older Devices: Check Your iOS Version Before Dec 18,

Smart Tech Purchases: Why You Should Wait for These 4 Apple Devices

Apple iOS 18.1 Update Boosts iPhone Security with New Features

"Manifest" Named Word of the Year 2024 by Cambridge Dictionary

Meet Lisa Miller: US Deputy Assistant AG Leading Bribery Allegations Against Adani Group

SCAM ALERT: How to Stop Unwanted SBI Account Notifications and Protect Yourself from Fraud

UP Police Constable Result 2024 Announced: Check Cutoff Scores, Document Verification, and Physical Test Dates

Most Popular

Apple Set to Unveil AirTag 2 with Key Upgrades in 2025: Here's What to Expect

Google's Gemini AI Under Fire After Disturbing "Please Die" Response

Weekly Tech Roundup: Bluesky’s Growth, Apple’s New Update, WhatsApp Drafts, and More

Northeastern Students Launch ‘Eden’ to Revolutionize Healthcare with AI

Sam Altman vs Elon Musk: The Battle Over ChatGPT’s Political Neutrality

Samsung Galaxy S24 Ultra vs. iPhone 17 Pro Max: Which Flagship Reigns Supreme?