Oxlo.ai

Chat Completions API

Generate responses to text prompts using the standard chat completion format.


Endpoint

POSThttps://api.oxlo.ai/v1/chat/completions

Parameters

NameTypeRequiredDescription
modelstringYesID of the model to use (e.g., mistral-7b, llama-3-8b).
messagesarrayYesA list of messages comprising the conversation so far.
max_tokensintegerNoMaximum number of tokens to generate. Defaults to 256.
temperaturefloatNoSampling temperature between 0 and 2. Defaults to 0.7.
streambooleanNoWhether to stream back partial progress. Defaults to false.

Note: Advanced parameters like top_p, frequency_penalty, and presence_penalty are coming soon.

Example Request

bash
curl https://api.oxlo.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "mistral-7b",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'

Example Response

json
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "mistral-7b",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello there, how may I assist you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Error Handling

CodeDescription
401Unauthorized. Invalid or missing API key.
403Forbidden. Access denied (e.g., plan limit reached, or model requires upgrade).
429Too Many Requests. Rate limit exceeded.
502Bad Gateway. Worker unreachable or returned an invalid response.
503Service Unavailable. All workers busy or queue full.
504Gateway Timeout. Model took too long to generate a response.