https://api.hicap.ai/v1 and use any model ID listed here. Pricing is sourced from hicap.ai/models.json.
gemini-3.1-pro
Gemini 3.1 Pro is Google’s latest flagship reasoning model with a 1M token context window, advanced analytical capabilities, and support for long-context pricing tiers. It excels at complex multi-step reasoning and large-scale document analysis.Best for:
Advanced reasoning, complex multi-step analysis, large-scale document processing, codebase understanding, and enterprise workflows requiring the highest Google Gemini capability.
| Input | Output | |
|---|---|---|
Text, Image, Audio, Video, PDF | Text |
This model offers
- * 1,000,000 context window
- * Long-context pricing above 200,000 tokens
gemini-3.1-pro-preview
Gemini 3.1 Pro Preview is the preview channel of Gemini 3.1 Pro with identical capabilities, providing early access to the latest improvements.Best for:
Same use cases as gemini-3.1-pro — advanced reasoning, document analysis, and enterprise workflows. Use the preview for early access to latest refinements.
| Input | Output | |
|---|---|---|
Text, Image, Audio, Video, PDF | Text |
This model offers
- * 1,000,000 context window
- * Long-context pricing above 200,000 tokens
gemini-3.1-flash-image-preview
Gemini 3.1 Flash Image Preview is a fast, cost-efficient model specialized for text-to-image generation and visual content creation. Built on the Flash architecture for low-latency responses.Best for:
Fast text-to-image generation, iterative image editing, visual asset creation, and design prototyping where speed and cost efficiency are priorities.
| Input | Output | |
|---|---|---|
Text, Image | Image |
Pricing: 3.00 (input / output) per 1M tokens.Information provided from hicap.ai/models.json
gemini-3-pro-image-preview
Gemini 3 Pro Image Preview is a high-capability model for image generation and multimodal visual workflows. It delivers higher quality image outputs compared to Flash variants.Best for:
High-quality image generation, creative visual content, marketing asset creation, and premium visual workflows where output quality matters more than speed.
| Input | Output | |
|---|---|---|
Text, Image | Image |
This model offers
- * 1,000,000 context window
gemini-3-pro-preview
Gemini 3 Pro Preview is a next-generation, high-capability model focused on advanced reasoning, deep multimodal understanding, and long-context performance. As a preview release, it showcases cutting-edge improvements in analytical depth, instruction comprehension, and complex problem solving across text and multimodal inputs.Best for:
Advanced reasoning, complex multimodal workflows, research-oriented tasks, and early adoption scenarios where maximum capability and depth are prioritized over cost or latency.
| Input | Output | |
|---|---|---|
Text, Image, Audio, Video, PDF | Text |
This model offers
- * 1,000,000 context window
- * 64,000 max output tokens
gemini-3-flash-preview
Gemini 3 Flash is a fast and cost-efficient multimodal model designed for responsive interactions and scalable workloads. It balances solid reasoning and multimodal understanding with low latency, making it well suited for high-throughput applications.Best for:
Low-latency, high-throughput chat, summarization, and multimodal extraction in production environments where speed and cost efficiency are critical.
| Input | Output | |
|---|---|---|
Text, Image | Text |
Information provided from Gemini website.
gemini-2.5-pro
Extended version of Gemini 2.5 Pro with ultra-long context support (beyond 200k tokens, reaching into the millions depending on setup). Designed to handle large documents, full repositories, or extensive datasets in a single session.Best for:
Long-context processing: legal contracts spanning thousands of pages, large codebases, academic research, reviewing historical chat/log data. Ideal for copilots that need to “remember” or reason over very large corpora.
| Input | Output | |
|---|---|---|
Text, Image, Audio, Video, Code | Text |
Information provided from Gemini website.
gemini-2.5-flash
Gemini 2.5 Flash is a fast and cost-effective model that balances performance with a wide range of capabilities. It is the first Flash model to feature thinking capabilities, which lets you see the model’s thinking process as it generates a response.Best for:
Low latency/high throughput for chat, summarization, and multimodal extraction at a lower cost.
| Input | Output | |
|---|---|---|
Text, Image, Audio, Video, Code | Text, Image, Audio, Video, Code |
This model offers
- * 1,048,576 context window
- * 65,536 max output tokens
- * January 1, 2025 knowledge cutoff
gemini-2.5-flash-preview
Gemini 2.5 Flash Preview provides early access to the latest Gemini 2.5 Flash improvements. Same fast, cost-effective architecture with thinking capabilities and multimodal support.Best for:
Same use cases as gemini-2.5-flash — low latency chat, summarization, and multimodal extraction. Use the preview for early access to the latest refinements.
| Input | Output | |
|---|---|---|
Text, Image, Audio, Video, Code | Text, Image, Audio, Video, Code |
This model offers
- * 1,048,576 context window
gemini-2.5-flash-lite
Lightweight, optimized version of Gemini Flash, designed for speed and low cost. Trades off some reasoning depth and output quality for efficiency. Maintains multimodal capabilities (text + image input) but focuses on ultra-fast responses.Best for:
High-throughput, latency-sensitive tasks: chatbots with large user volumes, quick autocomplete, real-time customer support, fast retrieval-augmented generation (RAG), and mobile/embedded use cases where cost and speed matter more than depth.
| Input | Output | |
|---|---|---|
Text, Image, Audio | Text, Audio |
This model offers
- * 1,000,000 context window
- * 64,000 max output tokens
- * January 1, 2025 knowledge cutoff
gemini-2.0-flash
A Google model that performs like a Pro model with the speed of a Flash model. Multi-modal inputs and outputs, native tool use, great for agentic workflows.Best for:
High-volume workflows: customer interaction, summarization, data extraction, and fast-response agents. Good when you need speed with decent reasoning.
| Input | Output | |
|---|---|---|
Text, Image, Audio, Video, Code | Text, Image, Audio |
This model offers
- * 1,000,000 context window
- * 8,000 max output tokens
- * August 1, 2024 knowledge cutoff
gemini-2.0-flash-lite
Ultra-light, lowest-latency Gemini variant. Optimized for efficiency rather than depth, while maintaining multimodal support.Best for:
Mobile or embedded copilots, lightweight RAG, chatbots, autocomplete, and fast contextual lookups where cost and response time are more critical than complex reasoning.
| Input | Output | |
|---|---|---|
Text, Image, Audio | Text |
This model offers
- * 1,000,000 context window
- * 8,000 max output tokens
- * February 5, 2025 knowledge cutoff