Skip to main content
All Google Gemini models below are accessible through the Hicap API using the standard OpenAI API spec. Point your OpenAI SDK at https://api.hicap.ai/v1 and use any model ID listed here. Pricing is sourced from hicap.ai/models.json.

gemini-3.1-pro

Gemini 3.1 Pro is Google’s latest flagship reasoning model with a 1M token context window, advanced analytical capabilities, and support for long-context pricing tiers. It excels at complex multi-step reasoning and large-scale document analysis.
Best for:
Advanced reasoning, complex multi-step analysis, large-scale document processing, codebase understanding, and enterprise workflows requiring the highest Google Gemini capability.
InputOutput
Text, Image, Audio, Video, PDFText
This model offers
  • * 1,000,000 context window
  • * Long-context pricing above 200,000 tokens
Pricing: 2.00/2.00 / 12.00 / 0.20(input/output/cache)per1Mtokens.Longcontext:0.20 (input / output / cache) per 1M tokens. Long-context: 4.00 / 18.00/18.00 / 0.40 per 1M tokens.Information provided from hicap.ai/models.json

gemini-3.1-pro-preview

Gemini 3.1 Pro Preview is the preview channel of Gemini 3.1 Pro with identical capabilities, providing early access to the latest improvements.
Best for:
Same use cases as gemini-3.1-pro — advanced reasoning, document analysis, and enterprise workflows. Use the preview for early access to latest refinements.
InputOutput
Text, Image, Audio, Video, PDFText
This model offers
  • * 1,000,000 context window
  • * Long-context pricing above 200,000 tokens
Pricing: 2.00/2.00 / 12.00 / 0.20(input/output/cache)per1Mtokens.Longcontext:0.20 (input / output / cache) per 1M tokens. Long-context: 4.00 / 18.00/18.00 / 0.40 per 1M tokens.Information provided from hicap.ai/models.json

gemini-3.1-flash-image-preview

Gemini 3.1 Flash Image Preview is a fast, cost-efficient model specialized for text-to-image generation and visual content creation. Built on the Flash architecture for low-latency responses.
Best for:
Fast text-to-image generation, iterative image editing, visual asset creation, and design prototyping where speed and cost efficiency are priorities.
InputOutput
Text, ImageImage
Pricing: 0.50/0.50 / 3.00 (input / output) per 1M tokens.Information provided from hicap.ai/models.json

gemini-3-pro-image-preview

Gemini 3 Pro Image Preview is a high-capability model for image generation and multimodal visual workflows. It delivers higher quality image outputs compared to Flash variants.
Best for:
High-quality image generation, creative visual content, marketing asset creation, and premium visual workflows where output quality matters more than speed.
InputOutput
Text, ImageImage
This model offers
  • * 1,000,000 context window
Pricing: 2.00/2.00 / 12.00 (input / output) per 1M tokens.Information provided from hicap.ai/models.json

gemini-3-pro-preview

Gemini 3 Pro Preview is a next-generation, high-capability model focused on advanced reasoning, deep multimodal understanding, and long-context performance. As a preview release, it showcases cutting-edge improvements in analytical depth, instruction comprehension, and complex problem solving across text and multimodal inputs.
Best for:
Advanced reasoning, complex multimodal workflows, research-oriented tasks, and early adoption scenarios where maximum capability and depth are prioritized over cost or latency.
InputOutput
Text, Image, Audio, Video, PDFText
This model offers
  • * 1,000,000 context window
  • * 64,000 max output tokens
Information provided from Gemini website.

gemini-3-flash-preview

Gemini 3 Flash is a fast and cost-efficient multimodal model designed for responsive interactions and scalable workloads. It balances solid reasoning and multimodal understanding with low latency, making it well suited for high-throughput applications.
Best for:
Low-latency, high-throughput chat, summarization, and multimodal extraction in production environments where speed and cost efficiency are critical.
InputOutput
Text, ImageText
Information provided from Gemini website.

gemini-2.5-pro

Extended version of Gemini 2.5 Pro with ultra-long context support (beyond 200k tokens, reaching into the millions depending on setup). Designed to handle large documents, full repositories, or extensive datasets in a single session.
Best for:
Long-context processing: legal contracts spanning thousands of pages, large codebases, academic research, reviewing historical chat/log data. Ideal for copilots that need to “remember” or reason over very large corpora.
InputOutput
Text, Image, Audio, Video, CodeText
Information provided from Gemini website.

gemini-2.5-flash

Gemini 2.5 Flash is a fast and cost-effective model that balances performance with a wide range of capabilities. It is the first Flash model to feature thinking capabilities, which lets you see the model’s thinking process as it generates a response.
Best for:
Low latency/high throughput for chat, summarization, and multimodal extraction at a lower cost.
InputOutput
Text, Image, Audio, Video, CodeText, Image, Audio, Video, Code
This model offers
  • * 1,048,576 context window
  • * 65,536 max output tokens
  • * January 1, 2025 knowledge cutoff
Information provided from Gemini website.

gemini-2.5-flash-preview

Gemini 2.5 Flash Preview provides early access to the latest Gemini 2.5 Flash improvements. Same fast, cost-effective architecture with thinking capabilities and multimodal support.
Best for:
Same use cases as gemini-2.5-flash — low latency chat, summarization, and multimodal extraction. Use the preview for early access to the latest refinements.
InputOutput
Text, Image, Audio, Video, CodeText, Image, Audio, Video, Code
This model offers
  • * 1,048,576 context window
Pricing: 0.15/0.15 / 0.60 / $0.0375 (input / output / cache) per 1M tokens.Information provided from hicap.ai/models.json

gemini-2.5-flash-lite

Lightweight, optimized version of Gemini Flash, designed for speed and low cost. Trades off some reasoning depth and output quality for efficiency. Maintains multimodal capabilities (text + image input) but focuses on ultra-fast responses.
Best for:
High-throughput, latency-sensitive tasks: chatbots with large user volumes, quick autocomplete, real-time customer support, fast retrieval-augmented generation (RAG), and mobile/embedded use cases where cost and speed matter more than depth.
InputOutput
Text, Image, AudioText, Audio
This model offers
  • * 1,000,000 context window
  • * 64,000 max output tokens
  • * January 1, 2025 knowledge cutoff
Information provided from Gemini website.

gemini-2.0-flash

A Google model that performs like a Pro model with the speed of a Flash model. Multi-modal inputs and outputs, native tool use, great for agentic workflows.
Best for:
High-volume workflows: customer interaction, summarization, data extraction, and fast-response agents. Good when you need speed with decent reasoning.
InputOutput
Text, Image, Audio, Video, CodeText, Image, Audio
This model offers
  • * 1,000,000 context window
  • * 8,000 max output tokens
  • * August 1, 2024 knowledge cutoff
Information provided from Gemini website.

gemini-2.0-flash-lite

Ultra-light, lowest-latency Gemini variant. Optimized for efficiency rather than depth, while maintaining multimodal support.
Best for:
Mobile or embedded copilots, lightweight RAG, chatbots, autocomplete, and fast contextual lookups where cost and response time are more critical than complex reasoning.
InputOutput
Text, Image, AudioText
This model offers
  • * 1,000,000 context window
  • * 8,000 max output tokens
  • * February 5, 2025 knowledge cutoff
Information provided from Gemini website.