Chat

POST /v1/chat/completions

Use chat completions for LLM and vision models. Streaming is supported with Server-Sent Events.

Request

{
  "model": "gemini-3-flash",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"}
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false
}

Key fields

Field	Type	Required	Description
`model`	string	yes	Model ID
`messages`	array	yes	Chat messages
`temperature`	number	no	Sampling temperature
`top_p`	number	no	Nucleus sampling
`max_tokens`	integer	no	Maximum output tokens
`stream`	boolean	no	Enable SSE streaming
`stop`	string or array	no	Stop sequences
`tools`	array	no	Function calling tools
`tool_choice`	string or object	no	Tool selection strategy
`response_format`	object	no	JSON mode with `{"type":"json_object"}`
`seed`	integer	no	Reproducibility seed when supported
`think`	boolean	no	Parel reasoning mode when supported

Vision input

{
  "model": "gemini-3-flash",
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image."},
        {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
      ]
    }
  ]
}

Streaming

stream = client.chat.completions.create(
    model="gemini-3-flash",
    messages=[{"role": "user", "content": "Write a short poem"}],
    stream=True,
)

for chunk in stream:
    text = chunk.choices[0].delta.content
    if text:
        print(text, end="")