Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.hicap.ai/llms.txt

Use this file to discover all available pages before exploring further.

All Google Gemini models below are accessible through the Hicap API using the standard OpenAI API spec. Point your OpenAI SDK at https://api.hicap.ai/v1 and use any model ID listed here. For pricing, see the Model Catalog.

gemini-3.1-pro

Gemini 3.1 Pro is Google’s latest flagship reasoning model with a 1M token context window, advanced analytical capabilities, and support for long-context tiers. It excels at complex multi-step reasoning and large-scale document analysis.
Best for:
Advanced reasoning, complex multi-step analysis, large-scale document processing, codebase understanding, and enterprise workflows requiring the highest Google Gemini capability.
InputOutput
Text, Image, Audio, Video, PDFText
This model offers
  • * 1,000,000 context window
  • * Long-context tier above 200,000 tokens

gemini-3.1-pro-preview

Gemini 3.1 Pro Preview is the preview channel of Gemini 3.1 Pro with identical capabilities, providing early access to the latest improvements.
Best for:
Same use cases as gemini-3.1-pro — advanced reasoning, document analysis, and enterprise workflows. Use the preview for early access to latest refinements.
InputOutput
Text, Image, Audio, Video, PDFText
This model offers
  • * 1,000,000 context window
  • * Long-context tier above 200,000 tokens

gemini-3.1-flash-image-preview

Gemini 3.1 Flash Image Preview is a fast, cost-efficient model specialized for text-to-image generation and visual content creation. Built on the Flash architecture for low-latency responses.
Best for:
Fast text-to-image generation, iterative image editing, visual asset creation, and design prototyping where speed and cost efficiency are priorities.
InputOutput
Text, ImageImage

gemini-3-pro-image-preview

Gemini 3 Pro Image Preview is a high-capability model for image generation and multimodal visual workflows. It delivers higher quality image outputs compared to Flash variants.
Best for:
High-quality image generation, creative visual content, marketing asset creation, and premium visual workflows where output quality matters more than speed.
InputOutput
Text, ImageImage
This model offers
  • * 1,000,000 context window

gemini-3-pro-preview

Gemini 3 Pro Preview is a next-generation, high-capability model focused on advanced reasoning, deep multimodal understanding, and long-context performance. As a preview release, it showcases cutting-edge improvements in analytical depth, instruction comprehension, and complex problem solving across text and multimodal inputs.
Best for:
Advanced reasoning, complex multimodal workflows, research-oriented tasks, and early adoption scenarios where maximum capability and depth are prioritized over cost or latency.
InputOutput
Text, Image, Audio, Video, PDFText
This model offers
  • * 1,000,000 context window
  • * 64,000 max output tokens

gemini-3-flash-preview

Gemini 3 Flash is a fast and cost-efficient multimodal model designed for responsive interactions and scalable workloads. It balances solid reasoning and multimodal understanding with low latency, making it well suited for high-throughput applications.
Best for:
Low-latency, high-throughput chat, summarization, and multimodal extraction in production environments where speed and cost efficiency are critical.
InputOutput
Text, ImageText

gemini-2.5-pro

Extended version of Gemini 2.5 Pro with ultra-long context support (beyond 200k tokens, reaching into the millions depending on setup). Designed to handle large documents, full repositories, or extensive datasets in a single session.
Best for:
Long-context processing: legal contracts spanning thousands of pages, large codebases, academic research, reviewing historical chat/log data. Ideal for copilots that need to “remember” or reason over very large corpora.
InputOutput
Text, Image, Audio, Video, CodeText

gemini-2.5-flash

Gemini 2.5 Flash is a fast and cost-effective model that balances performance with a wide range of capabilities. It is the first Flash model to feature thinking capabilities, which lets you see the model’s thinking process as it generates a response.
Best for:
Low latency/high throughput for chat, summarization, and multimodal extraction at a lower cost.
InputOutput
Text, Image, Audio, Video, CodeText, Image, Audio, Video, Code
This model offers
  • * 1,048,576 context window
  • * 65,536 max output tokens
  • * January 1, 2025 knowledge cutoff

gemini-2.5-flash-preview

Gemini 2.5 Flash Preview provides early access to the latest Gemini 2.5 Flash improvements. Same fast, cost-effective architecture with thinking capabilities and multimodal support.
Best for:
Same use cases as gemini-2.5-flash — low latency chat, summarization, and multimodal extraction. Use the preview for early access to the latest refinements.
InputOutput
Text, Image, Audio, Video, CodeText, Image, Audio, Video, Code
This model offers
  • * 1,048,576 context window

gemini-2.5-flash-lite

Lightweight, optimized version of Gemini Flash, designed for speed and low cost. Trades off some reasoning depth and output quality for efficiency. Maintains multimodal capabilities (text + image input) but focuses on ultra-fast responses.
Best for:
High-throughput, latency-sensitive tasks: chatbots with large user volumes, quick autocomplete, real-time customer support, fast retrieval-augmented generation (RAG), and mobile/embedded use cases where cost and speed matter more than depth.
InputOutput
Text, Image, AudioText, Audio
This model offers
  • * 1,000,000 context window
  • * 64,000 max output tokens
  • * January 1, 2025 knowledge cutoff

gemini-2.0-flash

A Google model that performs like a Pro model with the speed of a Flash model. Multi-modal inputs and outputs, native tool use, great for agentic workflows.
Best for:
High-volume workflows: customer interaction, summarization, data extraction, and fast-response agents. Good when you need speed with decent reasoning.
InputOutput
Text, Image, Audio, Video, CodeText, Image, Audio
This model offers
  • * 1,000,000 context window
  • * 8,000 max output tokens
  • * August 1, 2024 knowledge cutoff

gemini-2.0-flash-lite

Ultra-light, lowest-latency Gemini variant. Optimized for efficiency rather than depth, while maintaining multimodal support.
Best for:
Mobile or embedded copilots, lightweight RAG, chatbots, autocomplete, and fast contextual lookups where cost and response time are more critical than complex reasoning.
InputOutput
Text, Image, AudioText
This model offers
  • * 1,000,000 context window
  • * 8,000 max output tokens
  • * February 5, 2025 knowledge cutoff