Creates a model response for the given chat conversation using the same interface as the
OpenAI Create chat completion
API. Supports non-streaming JSON and Server-Sent Events when stream=true.
Omit tools for a simple text completion. Send optional tools for function calling (client vs server tools —
see overview above). Prefer Responses API for new integrations.
Request – Follows OpenAI Chat Completions: model, messages, optional tools, tool_choice, stream,
temperature, max_tokens, etc.
Tools – Send tools as function definitions (name, description, parameters as JSON Schema). Client-tool
results appear as choices[0].message.tool_calls (or delta.tool_calls when streaming). Server tools registered
on the gateway run without returning tool_calls to the client.
Streaming – text/event-stream with data: lines; chunks include choices[0].delta with content and/or
tool_calls. Stream ends with data: [DONE].
Capabilities – Tool calling and prompt caching are available based on provider and model support
(e.g. OpenAI gpt-4o, Anthropic Claude). Providers without tool support ignore tools and return text only.
Multi-provider — The same endpoint routes to OpenAI, Anthropic, Google Gemini, xAI, and other providers.
Use model in provider/model format (e.g. openai/gpt-4o-mini). The request schema
documents the union of OpenAI Chat Completions parameters; supported fields and value ranges vary by provider,
including tool calling and prompt caching where the upstream API supports them. See BOUNDARY LIMITS on each
property. Parameters a provider does not support may be ignored, stripped, or rejected. Check the provider docs
for your model prefix: