Google's AI Mode Can Now Answer Questions About Images: Tech News

Google AI Can Now See: Ask Questions About Any Image!

Google révolutionne la recherche : son AI Mode répond désormais aux questions sur les images, grâce à Gemini 2.0 et Google Lens.

Google AI Can Now See: Ask Questions About Any Image!

In a significant advancement for search technology, Google has recently expanded the capabilities of its AI Mode to include the ability to answer questions about images. This new multimodal feature represents a major step forward in how users can interact with visual content online, allowing for more intuitive and comprehensive search experiences. By combining the power of Google's custom Gemini 2.0 model with Google Lens technology, this update transforms how users can extract information from the visual search world around them.

Understanding Google AI Mode

Google AI Mode is an experimental search feature that Google introduced in March 2025 as an extension of its AI Overviews. While traditional Google Search returns a list of links, AI Mode provides conversational, AI-powered search responses that directly answer user queries. Initially launched exclusively for Google One AI Premium subscribers through Google Labs, AI Mode was designed to handle complex, multi-part questions and follow-up questions to help users dig deeper on topics.

What sets AI Mode apart from traditional search is its ability to understand nuanced questions and provide comprehensive responses that might have previously required multiple searches. The system leverages Google's vast information resources, including the Knowledge Graph, real-time sources, and shopping data for billions of products, all built directly into the Search experience.

As of April 2025, Google has begun expanding access to AI Mode beyond just paying subscribers, making it available to "millions more" Labs users in the United States. This expansion signals Google's confidence in the technology and its commitment to evolving search technology beyond the traditional "10 blue links" paradigm.

The New Multimodal Capabilities

The latest update to AI Mode introduces multimodal search capabilities, enabling users to upload photos or take pictures with their camera and ask questions about images. This visual search functionality is available starting today and can be accessed in the Google app on both Android and iOS devices.

With this update, users will notice a new button in the AI Mode search bar that allows them to snap a photo or image upload. Once an image is provided, users can ask questions about it and receive comprehensive responses with links to explore further. This creates a seamless experience where visual content becomes an integral part of the search process.

The multimodal functionality builds upon Google's years of work on visual search but takes it significantly further. While Google Lens has long been able to identify objects in images, the integration with AI Mode creates a more powerful and contextual understanding of visual content.

Technical Aspects and How It Works

At the core of this new feature is a custom version of Gemini 2.0, Google's large language model specifically optimized for search applications. This model has been enhanced with multimodal capabilities, allowing it to "see" and interpret images alongside text queries.

The system employs what Google calls a "query fan-out technique" that issues multiple related queries about both the image as a whole and specific objects within it. This approach allows AI Mode to access more breadth and depth of information than a traditional search.

When a user uploads an image, Google Lens first identifies specific objects within the picture. The system then understands the entire scene, including the context of how objects relate to one another and their unique materials, colors, shapes, and arrangements. This comprehensive contextual understanding enables AI Mode to provide responses that are "incredibly nuanced and contextually relevant," according to Robby Stein, VP of Product for Google Search.

The integration between Gemini's multimodal capabilities and Google's information systems creates a powerful combination that can understand complex visual scenes and provide detailed information about them.

Practical Applications and Examples

Google has showcased several practical applications for this new feature. In one example, a user takes a photo of their bookshelf and asks, "If I enjoyed these, what are some similar books that are highly rated?" AI Mode identifies each book in the image and then provides personalized recommendations based on those titles, with links to learn more about or purchase the suggested books.

Users can also ask follow-up questions to refine their search, such as "I'm looking for a quick read. Which one of these recommendations is the shortest?" This conversational approach to visual search creates a more natural and intuitive way to find information.

Other potential applications include identifying plants or animals, getting information about landmarks or products, understanding complex diagrams or charts, and analyzing visual content in educational contexts. The ability to ask specific questions about images opens up numerous possibilities for how users can interact with the world around them.

User Benefits and Future Implications

The addition of image capabilities to AI Mode offers several significant benefits for users. First, it streamlines the search process by allowing users to directly ask questions about what they see rather than trying to describe it in text. This can save time and reduce frustration when dealing with complex visual information.

Second, it enhances the depth and relevance of search results by providing contextual information about images that might be difficult to capture in a traditional text search. The system's ability to understand relationships between objects in an image creates a more comprehensive search experience.

Third, the expansion of access to more users beyond Google One AI Premium subscribers means that this powerful search technology will be available to a broader audience. Google has indicated that it plans to continue refining the user experience and expanding functionality in AI Mode.

Looking ahead, this development represents a significant step in Google's strategy to maintain its role as the primary internet directory in an increasingly AI-driven world. Early telemetry from AI Mode shows that people are putting about twice as much text in their searches compared to traditional web search, indicating that users are engaging more deeply with this new search paradigm.

Conclusion

Google's addition of image capabilities to AI Mode marks a significant evolution in search technology. By combining advanced AI models with visual search recognition capabilities, Google has created a more intuitive and comprehensive way for users to interact with visual content online.

This development reflects the ongoing transformation of search from a text-based, link-returning system to a more conversational, multimodal experience that can understand and respond to the full complexity of human queries. As AI Mode continues to evolve and reach more users, it may fundamentally change how we think about and interact with search engines.

For content creators, marketers, and businesses, this shift underscores the growing importance of optimizing not just for traditional text search but for AI-powered search, multimodal search experiences. As one search expert noted, soon we may all be looking for ways to get traffic from AI Mode as opposed to just Google Search and AI Overviews.

With its ability to understand and answer questions about images, Google's AI Mode represents a significant step forward in making the visual world more searchable and accessible to everyone.