Chat Completions API

Generate responses to text prompts using the standard chat completion format.

Endpoint

POSThttps://api.oxlo.ai/v1/chat/completions

Parameters

Name	Type	Required	Description
model	string	Yes	ID of the model to use (e.g., `mistral-7b`, `llama-3-8b`).
messages	array	Yes	A list of messages comprising the conversation so far.
max_tokens	integer	No	Maximum number of tokens to generate. Defaults to 256.
temperature	float	No	Sampling temperature between 0 and 2. Defaults to 0.7.
stream	boolean	No	Whether to stream back partial progress. Defaults to `false`.

Note: Advanced parameters like top_p, frequency_penalty, and presence_penalty are coming soon.

Example Request

bash

curl https://api.oxlo.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "mistral-7b",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

Example Response

json

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "mistral-7b",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello there, how may I assist you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Error Handling

Code	Description
401	Unauthorized. Invalid or missing API key.
403	Forbidden. Access denied (e.g., plan limit reached, or model requires upgrade).
429	Too Many Requests. Rate limit exceeded.
502	Bad Gateway. Worker unreachable or returned an invalid response.
503	Service Unavailable. All workers busy or queue full.
504	Gateway Timeout. Model took too long to generate a response.