TLDRs;
- Google’s Gemini app now supports audio uploads, with limits set for free and paid users.
- New multimodal upgrades let users upload multiple file types, including ZIP files, per prompt.
- NotebookLM enhancements with 80+ language support position Gemini for enterprise research and productivity.
- Google’s AI-first strategy seeks to challenge Microsoft and OpenAI in both consumer and enterprise markets.
Google has unveiled a significant update to its Gemini-powered applications, adding long-requested audio upload support and expanded productivity features.
The move strengthens Google’s strategy of positioning Gemini as a powerful multimodal AI platform capable of handling text, images, files, and now, audio.
The update reflects Google’s broader push to stand out in the hypercompetitive AI race, where OpenAI dominates consumer usage and Microsoft leads in enterprise adoption.
Audio uploads meet user demand
One of the most notable additions is the ability to upload audio files directly into the Gemini app. Free users can upload up to 10 minutes of audio with a cap of five prompts daily, while paid subscribers gain access to three full hours of uploads.
Josh Woodward, a senior director of product management at Google, noted on Monday that audio compatibility was the single most requested feature from Gemini users. This tiered rollout not only satisfies long-standing demand but also signals a deliberate effort to drive premium subscriptions through enhanced functionality.
The feature adds new use cases for students, journalists, and professionals who may want to analyze lectures, interviews, or meetings within the Gemini ecosystem.
Expanding multimodal capabilities
Beyond audio, Gemini now supports up to 10 files per prompt across a wide range of formats, including ZIP files. This expansion underscores Google’s determination to build a truly multimodal AI that can seamlessly process complex, multi-layered inputs.
The updates complement other recent upgrades across Gemini-powered services such as NotebookLM, which now generates structured reports in over 80 languages. For businesses, this positions NotebookLM as a serious contender for document analysis, research synthesis, and multilingual productivity workflows—directly competing with Microsoft’s Copilot suite.
Competing in enterprise AI race
Although Google Cloud trails Microsoft Azure and Amazon Web Services in market share, the company is experiencing rapid growth, reporting $13.6 billion in revenue for Q2 2025.
The integration of Gemini’s new capabilities into Google’s broader product lineup, including Search, Gmail, and Android, highlights the company’s “AI-first” philosophy.
The freemium model of Gemini reflects a calculated strategy: attract casual users with limited free features while encouraging professionals and enterprises to upgrade for greater flexibility. For organizations, the combination of advanced AI tools and Google’s existing productivity platforms could make Gemini an attractive alternative to Microsoft’s entrenched ecosystem.
A glimpse of AI’s future role
The timing of Gemini’s upgrade comes as Google’s leadership emphasizes AI’s role not just as an assistant but as a collaborative research partner. Jeff Dean, Google’s chief scientist, recently outlined how AI could soon autonomously explore scientific ideas, conduct experiments, and deliver human-readable results.
In this context, Gemini’s audio and multimodal capabilities represent more than just incremental improvements, they are steps toward a vision where AI augments human creativity and accelerates discovery.
Whether through analyzing recorded discussions, processing multilingual documents, or drafting reports, Gemini is inching closer to becoming an indispensable tool for knowledge work.