Oxlo.ai

Vision Models

Send images alongside text prompts for multimodal understanding using the OpenAI-compatible Chat Completions API.

OpenAI Compatible: Vision works with the standard openai Python library just set base_url to https://api.oxlo.ai/v1.

Supported Models

ModelAPI IDTierVision Support
MiniMistral 14Bministral-14bProImages + Text
Llama 4 Maverickllama-4-maverickProImages + Text
Kimi K2.5kimi-k2.5PremiumImages + Text
Kimi K2.6kimi-k2.6PremiumVideo, Images + Text
Kimi K2.5 Thinkingkimi-k2-thinkingPremiumImages + Text
Gemma 3 4Bgemma-3-4bFreeImages + Text
Gemma 3 27Bgemma-3-27bPremiumImages + Text

How It Works

Vision models accept a messages array where each message's content can be either a plain string for text-only messages, or an array of content blocks mixing text, images, and video (for supported models like Kimi K2.6). Image and video blocks support base64-encoded data URIs or public URLs.

Quick Example

Send an image with a text prompt:

import openai

client = openai.OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key="<YOUR_API_KEY>"
)

response = client.chat.completions.create(
    model="ministral-14b",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What do you see in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://example.com/photo.jpg"
                    }
                }
            ]
        }
    ],
    max_tokens=512
)

print(response.choices[0].message.content)

Base64 Image (Local File)

Send a local image file encoded as base64:

import openai
import base64

client = openai.OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key="<YOUR_API_KEY>"
)

# Encode a local image
with open("photo.jpg", "rb") as f:
    image_b64 = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="ministral-14b",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image in detail."},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{image_b64}"
                    }
                }
            ]
        }
    ],
    max_tokens=1024
)

print(response.choices[0].message.content)

Content Block Format

Block TypeFieldsDescription
text{"type": "text", "text": "..."}A text prompt or question
image_url{"type": "image_url", "image_url": {"url": "..."}}A public URL or base64 data URI (data:image/jpeg;base64,...)

Tips

  • Use data:image/jpeg;base64,... or data:image/png;base64,... for local images
  • For video models (e.g., Kimi K2.6), use video_url or standard mp4/webm base64 strings
  • Use public URLs directly for remote images
  • Multiple images can be sent in a single message by adding more image_url blocks
  • Non-vision models will ignore image blocks and respond to the text content only