New features in Gemini AI & generative tools

SAN FRANCISCO, CALIFORNIA — This year’s Google I/O, the company’s annual developer conference, officially kicked off with the announcement of its headlong thrust into the future of generative AI. At the conference, Google CEO Sundar Pichai unveiled a wave of advancements powered by Google’s AI technology, Gemini.

A core theme of the event was generative AI, with Google showcasing significant updates to its Gemini family of models and introducing new tools for creators. Here are the highlights of the upgrades:

Faster and more capable Gemini models

Google unveiled the new 1.5 Flash model, its fastest Gemini yet, excelling in tasks like summarization, chat applications, and data extraction. The improved 1.5 Pro model also boasts an enhanced ability to follow complex instructions and control response styles.

Sponsor

Gemini Nano expands, and Gemma gets an upgrade

Gemini Nano, designed for on-device tasks, now supports image inputs starting with Pixel phones. Google also announced Gemma 2.0, the next generation of open-source models for responsible AI development, and PaliGemma, a vision-language model inspired by PaLI-3.

High-fidelity video and image generation

Google is introducing Veo, a groundbreaking video generation model capable of producing high-quality, cinematic-style videos exceeding a minute in length. Also joining the field is Imagen 3, the company’s most advanced text-to-image model to date.

Music AI sandbox

In collaboration with YouTube, Google introduced a suite of music AI tools designed to empower creators, including the ability to generate original instrumental sections.

Greater Gemini integration across Google products

As the Gemini model established itself months after its release, Google announced plans to further integrate it deeply into its core products, including:

Enhanced Android UX

New features leverage on-device AI to enhance user experience. For instance, Circle to Search allows students to use their phones or tablets for step-by-step tutoring on math and physics problems. Additionally, Gemini integration enables features like drag-and-dropping generated images into messages and “Ask this video” for information retrieval from YouTube videos.

Better search functionality

Search will soon leverage a custom-built Gemini model to answer entirely new types of questions. Users will be able to interact with AI Overviews, adjust the level of detail displayed, and explore AI-organized results pages with categorized content.

Improved Google Photos organization

Ask Photos, a new feature powered by Gemini, allows users to search their photo libraries more naturally, such as by requesting photos from specific locations or based on thematic details. Ask Photos can also curate photo highlights and suggest captions for social media sharing.

A more synergistic Google Workspace

Gemini for Workspace features are getting an upgrade, including access to the 1.5 Pro model within the side panel of Gmail, Docs, Drive, Slides, and Sheets. This enables users to ask a wider range of questions and receive more insightful responses directly within these applications.

These announcements mark a significant step forward in Google’s AI strategy, placing generative AI at the forefront of user experiences across its products.

Learn more about the new AI features by catching the replay of I/O 2024 here and read the highlights here.