Google has expanded the Gemini API's File Search tool to support multimodal data — images, audio, and text — alongside custom metadata filtering and page-level citations. The update is designed to make retrieval-augmented generation (RAG) systems more flexible and accurate for developers building search into their applications.
What changed
Previously, File Search was limited to text-based queries. The new version processes images and text together natively, using the Gemini Embedding 2 model. This means you can search an archive of visual assets using natural language descriptions of tone, style, or content — without relying on filenames or keywords. Audio inputs are also supported, though the documentation does not specify which audio formats or length limits apply.
Custom metadata is another addition. You can attach key-value labels to files — for example, department: Legal or status: Final — and then filter search results by those labels at query time. This reduces noise from irrelevant documents and speeds up retrieval in large archives.
Page citations are now included in responses. When the model pulls an answer from a PDF, it returns the exact page number, allowing users to verify the source directly. This improves transparency and is useful for fact-checking in regulated or document-heavy workflows.
How it works
File Search handles the infrastructure for indexing and retrieval. Developers upload files and then query them via the Gemini API. The tool uses a Relevance Aware Generator (RAG) model to improve search accuracy. The underlying embedding model is Gemini Embedding 2, which understands both text and image data natively.
Google provides code snippets in its developer guide and API documentation. The basic workflow is:
- Upload files (text, images, or audio) to the File Search tool.
- Optionally attach custom metadata as key-value pairs.
- Query using natural language, images, or audio.
- Receive results with page citations where applicable.
Tradeoffs
File Search is a managed service — you don't need to set up your own vector database or embedding pipeline. That reduces operational overhead but also means you're tied to Google's infrastructure and pricing. The tool is part of the Gemini API, so costs depend on usage volume and the specific model tier.
Custom metadata filtering is a significant improvement for production RAG systems, but it requires upfront labeling effort. If your data lacks consistent metadata, the feature adds little value.
Page citations are limited to PDFs. For other file types (images, audio, plain text), the citation mechanism is not described.
When to use it
This update is relevant for any application that needs to search across mixed media — creative agencies looking for visual assets, legal teams scanning document archives, or customer support tools that need to retrieve information from PDFs and images alike. The multimodal search and metadata filtering make it suitable for larger, more complex datasets where simple keyword search falls short.
Bottom line
Google's Gemini API File Search now offers a practical, managed RAG solution with multimodal input, metadata filtering, and source citations. It removes the need to build custom embedding and retrieval infrastructure, at the cost of vendor lock-in and potential usage fees. For teams already using the Gemini API, it's a straightforward upgrade.