Skip to main content

RouteLLM API Reference

RouteLLM provides an OpenAI-compatible API endpoint that intelligently routes your requests to the most appropriate underlying model based on cost, speed, and performance requirements.

Overview​

RouteLLM is a smart routing layer that automatically selects the best model for your request, balancing performance, cost, and speed. Instead of manually choosing between different models, you can use the route-llm model identifier and let the system make the optimal choice for you.

Key Features​

  • Intelligent Routing: Automatically selects the best model based on request complexity
  • Cost Optimization: Routes to cost-effective models when appropriate
  • Performance Tuning: Uses high-performance models for complex tasks
  • Multimodal Support: Supports text and image inputs for compatible models
  • Image Generation: Generate high-quality images from text prompts using state-of-the-art models
  • Streaming Support: Real-time response streaming available

Getting Started​

How It Works​

  1. Sign Up: Sign up as a ChatLLM subscriber to access RouteLLM API
  2. Access the API: Click on the RouteLLM API icon in the lower left corner of the ChatLLM interface to access API documentation and details
  3. Get Your API Key: Obtain your API key from the RouteLLM API page
  4. Start Using: Invoke the API for any LLM and use it in your applications

Why Choose RouteLLM API?​

RouteLLM API comes with your ChatLLM subscription, providing several key benefits:

  • Unified Platform: Use all LLMs (both open-weight and Proprietary) in the ChatLLM Teams UX and via API, all in one place
  • Easy Management: Centralized way to manage all your favorite AI model consumption
  • Flexible Access: Access models through both the user interface and programmatic API
  • Cost-Effective: Competitive pricing with best available rates for open-source models
  • Transparent Pricing: No markup on proprietary LLMs - you pay provider prices

Pricing​

Credit System​

The ChatLLM subscription includes 20,000 credits to get you started. Each API call consumes credits proportional to the cost of the LLM call. RouteLLM is available for unlimited use for ChatLLM subscribers - while it still tracks credits for accounting purposes, you can continue to use RouteLLM even after hitting your monthly credit limit.

Pricing Details​

Proprietary LLMs​

Proprietary LLMs (e.g., OpenAI, Anthropic, Google Gemini, etc.) are priced based on the prices advertised by the provider. We DO NOT charge you more than what the provider does. Prices are updated automatically whenever the provider updates their pricing.

Open-Weight LLMs​

Open-Weight LLMs are typically priced at the best available price on the planet. Our prices typically match the best available price anywhere in the world.

Note: All open weight LLMs are hosted on servers based in the United States.

View Current Pricing​

Pricing for each LLM is published in our RouteLLM API documentation. You can also use the listRouteLLMModels endpoint to programmatically retrieve the most up-to-date list of available models and their current pricing.

Base URLs​

The base URL depends on your organization type:

  • Self-Serve Organizations: https://routellm.abacus.ai/v1
  • Enterprise Platform: https://<workspace>.abacus.ai/v1

Replace <workspace> with your specific workspace identifier for enterprise deployments. To know your correct base url, refer: RouteLLM API.

Authentication​

All API requests require authentication using an API key. Include your API key in the request header:

Authorization: Bearer <your_api_key>

You can obtain your API key from the Abacus.AI platform.

Supported Models​

The RouteLLM API supports a wide range of models for both text generation and image generation. You can specify a model explicitly or use route-llm to let the system decide.

Routing Model​

  • route-llm: Intelligently routes to one of Claude 4.5 Sonnet, GPT-5.2, or Gemini 3 Flash based on the complexity of the request. This is the recommended option for most use cases.

Text Generation Models​

You can also directly target specific text generation models. The following models are currently supported:

OpenAI Models​

  • gpt-5.2, gpt-5.1, gpt-5.1-chat-latest
  • gpt-5, gpt-5-mini, gpt-5-nano
  • gpt-4o, gpt-4o-mini
  • o4-mini, o3, o3-pro

Anthropic Models​

  • claude-4-5-sonnet, claude-4-5-haiku, claude-4-5-opus
  • claude-3-opus

Google Models​

  • gemini-3-pro, gemini-3-flash
  • gemini-2.5-flash, gemini-2.5-pro

xAI Models​

  • grok-4-1-fast, grok-4, grok-code-fast-1

Meta Models​

  • llama-4-Maverick-17B
  • llama-3.1-405B, llama-3.1-70B

DeepSeek Models​

  • deepseek-v3.2, deepseek-v3.1-Terminus
  • deepseek-R1

Qwen Models​

  • qwen-3-Max, qwen3-coder-480b-a35b-instruct
  • qwen-3-32B, qwq-32B

Note: This list is subject to change as new models are added. Use the listRouteLLMModels endpoint to get the most up-to-date list of available models and their pricing.

Image Generation Models​

For image generation, the following models are supported:

  • flux-2-pro: FLUX-2 PRO - High-quality, photorealistic image generation
  • flux-kontext: FLUX Kontext - Advanced image generation
  • dall-e: OpenAI DALL-E - High-quality creative image generation
  • ideogram: Ideogram - Excellent for text rendering in images
  • recraft: Recraft - Design and illustration focused
  • imagen: Google Imagen - Image generation
  • nano-banana-pro: Nano Banana Pro - High-quality image generation
  • seedream: Seedream 4.5 - Image generation model

Request Parameters​

1. Required Parameters​

messages (array, required)​

A list of messages comprising the conversation so far. Each message must be an object with the following structure:

  • role (string, required): The role of the message sender. Must be one of:

    • user: Messages from the user/end-user
    • assistant: Previous responses from the AI assistant
    • system: System-level instructions that guide the assistant's behavior
  • content (string or array, required): The content of the message. Can be:

    • A string for text-only messages
    • An array for multimodal content (text and images)

2. Optional Parameters​

model (string, optional)​

The ID of the model to use. Can be either a text generation model or an image generation model, depending on the modalities parameter. If omitted, defaults to route-llm.

Text Generation Models: route-llm, gpt-5.1, claude-4-5-sonnet, gemini-2.5-pro, etc.

Image Generation Models: flux-2-pro, flux-kontext, dall-e, ideogram, recraft, imagen, nano-banana-pro, seedream

Examples: route-llm, gpt-5.1, flux-2-pro, seedream

max_tokens (integer, optional)​

The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model's context window.

Default: Model-dependent

temperature (number, optional)​

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

Default: 1.0

Recommended values:

  • 0.0-0.3: For factual, deterministic responses
  • 0.7-1.0: For creative, varied responses
  • 1.0-2.0: For highly creative, diverse outputs

top_p (number, optional)​

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

Default: 1.0

Range: 0.0 to 1.0

stream (boolean, optional)​

If set to true, partial message deltas will be sent as data-only server-sent events as they become available. The stream will terminate by a data: [DONE] message.

Default: false

stop (string or array, optional)​

Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

Example: stop": ["Human:", "AI:"]

presence_penalty (number, optional)​

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

Default: 0.0

frequency_penalty (number, optional)​

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

Default: 0.0

response_format (object, optional)​

An object specifying the format that the model must output. Currently, only JSON mode is supported:

"response_format": {
"type": "json_object"
}

When JSON mode is enabled, the model is constrained to only generate strings that parse into valid JSON objects.

Important: When using response_format, you must also instruct the model to produce JSON via a system or user message.

Response Format​

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1677858242,
"model": "route-llm",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The meaning of life is..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
}
}

Response Fields​

  • id: A unique identifier for the chat completion
  • object: The object type, always chat.completion (or chat.completion.chunk for streaming)
  • created: The Unix timestamp of when the completion was created
  • model: The model used for the completion (may differ from the requested model if using route-llm)
  • choices: A list of completion choices
    • index: The index of the choice
    • message: The message object (non-streaming) or delta (streaming)
    • finish_reason: The reason the completion finished (stop, length, content_filter, or null for streaming)
  • usage: Token usage statistics (not present in streaming responses until the final chunk)

Multimodal capabilities​

The API supports multimodal inputs & outputs for models that support vision capabilities.

Image Analysis​

Images can be provided in the two following ways as input:

{
"model": "route-llm",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe the image"},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
}

Image Support Notes:

  • Supported formats: PNG, JPEG, WebP, and GIF
  • Images are automatically resized and processed by the API
  • Multiple images can be included in a single message
  • Base64 images should use the data URI format: data:image/<format>;base64,<base64_string>

Image Generation​

The RouteLLM API supports image generation from text prompts using state-of-the-art image generation models. Image generation uses the unified chat completions endpoint with the modalities and image_config parameters.

Note: In addition to dedicated image generation models (e.g., flux-2-pro, seedream, ideogram), Gemini and OpenAI models also support image generation when used with the modalities: ["image"] parameter.

Request Syntax​

Image generation uses the same unified schema as text generation, with additional parameters for image generation:

{
"model": "string (required)",
"messages": [
{
"role": "user",
"content": "string (required)"
}
],
"modalities": ["image"],
"image_config": {
"num_images": "integer (optional)",
"aspect_ratio": "string (optional)",
...
}
}

Request Parameters​

model (string, required)​

The ID of the model to use for image generation. Can be any supported image generation model or a Gemini/OpenAI model that supports image generation.

Supported Models:

  • Dedicated Image Generation Models: flux-2-pro, flux-kontext, seedream, ideogram, recraft, imagen, nano-banana-pro, dall-e
  • Gemini Models (support image generation): gemini-2.5-pro, gemini-2.5-flash, gemini-3-pro, gemini-3-flash
  • OpenAI Models (support image generation): gpt-5.1, gpt-5.2, gpt-5, gpt-4o, etc.

Examples: flux-2-pro, seedream, gemini-2.5-pro, gpt-5.1

messages (array, required)​

A list of messages comprising the conversation. The user's message should contain the prompt for image generation.

Example:

{
"role": "user",
"content": "A beautiful sunset over mountains"
}

modalities (array, optional)​

Specifies what type of content to generate.

Valid values:

  • ["image"]: Generate images
  • ["text"]: Generate text (default if not specified)

Default: ["text"] (if not specified)

Note: You can generate either images or text in a single request, not both simultaneously.

image_config (object, optional)​

Configuration object for image generation. Required when modalities includes image.

Important:

  • num_images is supported by all image generation models
  • aspect_ratio is supported by all image generation models
  • image_size is only supported by OpenAI & Gemini models
  • resolution is only supported by OpenAI & Gemini models
  • quality is only supported by OpenAI & Gemini models

Image Config properties​

ParameterTypeDescriptionValid Values / RangeDefaultExample
num_imagesinteger, optionalThe number of images to generate. Supported by all image generation models.1-413
aspect_ratiostring, optionalThe aspect ratio of the generated images. Supported by all image generation models.1:1 - Square
2:3 - Portrait orientation
3:2 - Landscape orientation
3:4 - Portrait orientation
4:3 - Landscape orientation
9:16 - Portrait widescreen
16:9 - Widescreen landscape
Model-dependent (typically 1:1)2:3

Model-Specific Configurations​

Different image generation models have unique strengths and support different parameters:

ModelBest ForSupported ParametersSupported Aspect Ratios
FLUX-2 PRO
flux-2-pro
Photorealistic images, high-quality portraits, detailed scenesnum_images, aspect_ratio1:1, 2:3, 3:2, 16:9, 9:16
FLUX Kontext
flux-kontext
Context-aware image generation, complex scenesnum_images, aspect_ratio1:1, 2:3, 3:2, 16:9, 9:16
DALL-E
dalle
Creative and artistic images, safe content generationnum_images, aspect_ratio, quality, resolution, image_size1:1, 2:3, 3:2, 16:9, 9:16
Ideogram
ideogram
Text rendering in images, typography, logosnum_images, aspect_ratio1:1, 2:3, 3:2, 16:9, 9:16
Recraft
recraft
Design and illustration work, vector-style imagesnum_images, aspect_ratio1:1, 2:3, 3:2, 16:9, 9:16
Google Imagen
imagen
General-purpose image generationnum_images, aspect_ratio1:1, 2:3, 3:2, 16:9, 9:16
Nano Banana Pro
nano-banana-pro
High-quality artistic imagesnum_images, aspect_ratio1:1, 2:3, 3:2, 16:9, 9:16
Seedream
seedream
General image generationnum_images, aspect_ratio1:1, 2:3, 3:2, 16:9, 9:16
Gemini Models
gemini-2.5-pro, gemini-3-pro, etc.
General-purpose image generation with advanced configurationnum_images, aspect_ratio, quality, resolution, image_size1:1, 2:3, 3:2, 16:9, 9:16
OpenAI Models
gpt-5.1, gpt-5.2, etc.
High-quality image generation with advanced configurationnum_images, aspect_ratio, quality, resolution, image_size1:1, 2:3, 3:2, 16:9, 9:16

Code Examples​

1. Basic Image Generation​

from openai import OpenAI

client = OpenAI(
base_url="<your base url>",
api_key="<your_api_key>",
)

# Basic image generation
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[
{
"role": "user",
"content": "A beautiful sunset over mountains"
}
],
modalities=["image"],
image_config={
"num_images": 1
}
)

# Extract image URLs from response
for content_item in response.choices[0].message.content:
if content_item.type == "image_url":
print(f"Generated image: {content_item.image_url.url}")

2. Multiple Images​

from openai import OpenAI

client = OpenAI(
base_url="<your base url>",
api_key="<your_api_key>",
)

# Generate multiple images
response = client.chat.completions.create(
model="flux-2-pro",
messages=[
{
"role": "user",
"content": "A futuristic cityscape at night with neon lights and flying cars"
}
],
modalities=["image"],
image_config={
"num_images": 3,
"aspect_ratio": "1:1"
}
)

# Extract all image URLs
image_urls = [
item.image_url.url
for item in response.choices[0].message.content
if item.type == "image_url"
]
for idx, url in enumerate(image_urls, 1):
print(f"Image {idx}: {url}")

3. Portrait Orientation​

from openai import OpenAI

client = OpenAI(
base_url="<your base url>",
api_key="<your_api_key>",
)

# Generate portrait-oriented image
response = client.chat.completions.create(
model="flux-2-pro",
messages=[
{
"role": "user",
"content": "A full-body portrait of a fashion model in elegant evening wear"
}
],
modalities=["image"],
image_config={
"num_images": 1,
"aspect_ratio": "2:3"
}
)

for content_item in response.choices[0].message.content:
if content_item.type == "image_url":
print(f"Portrait image: {content_item.image_url.url}")

4. OpenAI Model with Quality​

from openai import OpenAI

client = OpenAI(
base_url="<your base url>",
api_key="<your_api_key>",
)

# Generate high-quality image with OpenAI model
response = client.chat.completions.create(
model="gpt-5.1",
messages=[
{
"role": "user",
"content": "A whimsical illustration of a magical forest with glowing mushrooms"
}
],
modalities=["image"],
image_config={
"num_images": 1,
"aspect_ratio": "1:1",
"quality": "high"
}
)

for content_item in response.choices[0].message.content:
if content_item.type == "image_url":
print(f"Image URL: {content_item.image_url.url}")

5. Gemini Model with Image Size and Resolution​

from openai import OpenAI

client = OpenAI(
base_url="<your base url>",
api_key="<your_api_key>",
)

# Generate image using Gemini model with advanced parameters
response = client.chat.completions.create(
model="gemini-2.5-pro",
messages=[
{
"role": "user",
"content": "A professional headshot of a business executive"
}
],
modalities=["image"],
image_config={
"num_images": 1,
"aspect_ratio": "2:3",
"image_size": "1024x1536",
"resolution": "2K"
}
)

for content_item in response.choices[0].message.content:
if content_item.type == "image_url":
print(f"Image URL: {content_item.image_url.url}")

Note: The image_config parameters resolution, image_size, and quality are only supported for Gemini and OpenAI models. num_images and aspect_ratio are supported by all image generation models.

Response Schema​

Image generation responses follow the same unified chat completion response format. When modalities includes image, the response will contain image data in addition to any text content.

Success Response (Image Only)​

{
"created": 1677858242,
"model": "gemini-2.5-pro",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": '',
"images": [
{
"type": "image_url",
"image_url": {
"url": "https://example.com/generated-image-1.png"
}
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/generated-image-2.png"
}
}
]
},
"finish_reason": "stop"
}
],
"usage": {
"compute_points_used": 150
}
}

Error Handling​

The API uses standard HTTP status codes to indicate success or failure:

  • 200 OK: Request succeeded
  • 400 Bad Request: Invalid request (missing parameters, invalid format, etc.)
  • 401 Unauthorized: Missing or invalid API key
  • 429 Too Many Requests: Rate limit exceeded
  • 500 Internal Server Error: Server error

Error Response Format​

{
"error": {
"message": "The 'messages' parameter is missing, empty, or not a list.",
"type": "ValidationError",
"code": "invalid_request_error"
}
}

Common error scenarios:

  • Missing required messages parameter
  • Empty messages array
  • Missing role or content in message objects
  • Invalid role value (must be "user", "assistant", or "system")
  • Invalid model name
  • Rate limit exceeded

Code Examples​

Basic Request​

from openai import OpenAI

client = OpenAI(
base_url="<your base url>",
api_key="<your_api_key>",
)

response = client.chat.completions.create(
model="route-llm",
messages=[
{"role": "user", "content": "What is the meaning of life?"}
]
)

print(response.choices[0].message.content)

Streaming Request​

from openai import OpenAI

client = OpenAI(
base_url="<your base url>",
api_key="<your_api_key>",
)

stream = client.chat.completions.create(
model="route-llm",
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
stream=True
)

for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)

Conversation with History​

from openai import OpenAI

client = OpenAI(
base_url="<your base url>",
api_key="<your_api_key>",
)

messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "My name is Alice."},
{"role": "assistant", "content": "Nice to meet you, Alice! How can I help you today?"},
{"role": "user", "content": "What's my name?"}
]

response = client.chat.completions.create(
model="route-llm",
messages=messages,
temperature=0.7,
max_tokens=150
)

print(response.choices[0].message.content)

JSON Mode​

from openai import OpenAI
import json

client = OpenAI(
base_url="<your base url>",
api_key="<your_api_key>",
)

response = client.chat.completions.create(
model="route-llm",
messages=[
{
"role": "system",
"content": "You are a helpful assistant that outputs JSON."
},
{
"role": "user",
"content": "Return a JSON object with keys 'name', 'age', and 'city'."
}
],
response_format={"type": "json_object"},
temperature=0.7
)

content = response.choices[0].message.content
data = json.loads(content)
print(data)

With Optional Parameters​

from openai import OpenAI

client = OpenAI(
base_url="<your base url>",
api_key="<your_api_key>",
)

response = client.chat.completions.create(
model="route-llm",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a haiku about programming."}
],
max_tokens=100,
temperature=0.8,
top_p=0.9
)

print(response.choices[0].message.content)

Best Practices​

  1. Use route-llm for most cases: Let the system choose the optimal model automatically
  2. Include conversation history: Provide full message history for better context
  3. Set appropriate max_tokens: Prevent unnecessarily long responses
  4. Use streaming for long responses: Improve user experience with real-time output
  5. Handle errors gracefully: Implement retry logic for transient errors