Create chat completion

post

https://api.macpaw.com/ai/v1/chat/completions

Create a chat completion using the same interface as the OpenAI Chat Completions API.
Supports both SSE (Server-Sent Events) streaming and non-streaming modes.
Request and response formats follow the OpenAI Create chat completion API.

Tools (function calling): Optional tools and tool_choice support hybrid client and server tools.
Client-tool calls return tool_calls for the client to execute; server tools registered on the gateway
run automatically without returning tool_calls. Prefer the Responses API for new integrations.

Recent Requests

Time	Status	User Agent
Retrieving recent requests…

Loading…

Body Params

Creates a model response for the given chat conversation using the same interface as the
OpenAI Create chat completion
API. Supports non-streaming JSON and Server-Sent Events when stream=true.

Omit tools for a simple text completion. Send optional tools for function calling (client vs server tools —
see overview above). Prefer Responses API for new integrations.

Request – Follows OpenAI Chat Completions: model, messages, optional tools, tool_choice, stream,
temperature, max_tokens, etc.

Tools – Send tools as function definitions (name, description, parameters as JSON Schema). Client-tool
results appear as choices[0].message.tool_calls (or delta.tool_calls when streaming). Server tools registered
on the gateway run without returning tool_calls to the client.

Streaming – text/event-stream with data: lines; chunks include choices[0].delta with content and/or
tool_calls. Stream ends with data: [DONE].

Capabilities – Tool calling and prompt caching are available based on provider and model support
(e.g. OpenAI gpt-4o, Anthropic Claude). Providers without tool support ignore tools and return text only.

Multi-provider — The same endpoint routes to OpenAI, Anthropic, Google Gemini, xAI, and other providers.
Use model in provider/model format (e.g. openai/gpt-4o-mini). The request schema
documents the union of OpenAI Chat Completions parameters; supported fields and value ranges vary by provider,
including tool calling and prompt caching where the upstream API supports them. See BOUNDARY LIMITS on each
property. Parameters a provider does not support may be ignored, stripped, or rejected. Check the provider docs
for your model prefix:

model

string

required

ID of the model to use (e.g., 'openai/gpt-4o', 'anthropic/claude-3-5-sonnet', 'google/gemini-2.5-pro').

messages

array of objects

required

Conversation history. The Gateway will automatically map 'system' roles to specific provider formats (like Anthropic's root system prompt or Gemini's system_instruction).

messages*

temperature

number

0 to 2

Defaults to 1

Sampling temperature. 0.0 is strict and deterministic, 1.0+ is creative. BOUNDARY LIMITS: - OpenAI, Google, xAI: 0.0 to 2.0. - Anthropic (Claude): 0.0 to 1.0.

top_p

number

0 to 1

Nucleus sampling probability. Range is 0.0 to 1.0 for all providers. It is generally recommended to alter either temperature or top_p, but not both.

max_tokens

integer

≥ 1

Maximum number of tokens to generate in the response.

presence_penalty

number

-2 to 2

Defaults to 0

Penalizes new tokens based on whether they appear in the text so far, encouraging the model to talk about new topics. BOUNDARY LIMITS: Range is -2.0 to 2.0 for OpenAI, Google, and xAI. Anthropic DOES NOT support this.

frequency_penalty

number

-2 to 2

Defaults to 0

Penalizes new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. BOUNDARY LIMITS: Range is -2.0 to 2.0 for OpenAI, Google, and xAI. Anthropic DOES NOT support this. The Gateway must strip this field from the payload if the target is Anthropic.

integer

1 to 128

Defaults to 1

How many chat completion choices to generate for each input message. BOUNDARY LIMITS: - OpenAI/xAI: up to 128. - Google: up to 8. - Anthropic: DOES NOT support generating multiple choices natively.

stream

boolean

Defaults to false

If true, the response is streamed back.

stop

Custom sequences where the API will stop generating further tokens. BOUNDARY LIMITS: OpenAI, Anthropic, xAI allow a maximum of 4 items. Google allows up to 5.

tools

array of objects

Optional list of tools the model may call.

tools

tool_choice

enum

How the model should use tools. auto (default) – model may call tools or not; none – no tools; required – must call at least one tool; or force a specific function by name.

Headers

string

enum

Defaults to application/json

Generated from available response content types

Allowed:

Responses

200Chat completion created successfully. Non-streaming: JSON with choices[0].message (content and/or tool_calls). Streaming: text/event-stream with data: chunks.

400Bad request

401Invalid or missing token

402Insufficient credits

422Validation error

429Rate limit exceeded

500Internal server error