ollama/docs/api/openai-compatibility.mdx

359 lines
22 KiB
Plaintext

---
title: OpenAI compatibility
---
Ollama provides compatibility with parts of the [OpenAI API](https://platform.openai.com/docs/api-reference) to help connect existing applications to Ollama.
## Usage
### Simple `v1/chat/completions` example
<CodeGroup dropdown>
```python basic.py
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
api_key='ollama', # required but ignored
)
chat_completion = client.chat.completions.create(
messages=[
{
'role': 'user',
'content': 'Say this is a test',
}
],
model='gpt-oss:20b',
)
print(chat_completion.choices[0].message.content)
```
```javascript basic.js
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:11434/v1/",
apiKey: "ollama", // required but ignored
});
const chatCompletion = await openai.chat.completions.create({
messages: [{ role: "user", content: "Say this is a test" }],
model: "gpt-oss:20b",
});
console.log(chatCompletion.choices[0].message.content);
```
```shell basic.sh
curl -X POST http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss:20b",
"messages": [{ "role": "user", "content": "Say this is a test" }]
}'
```
</CodeGroup>
### Simple `v1/responses` example
<CodeGroup dropdown>
```python responses.py
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
api_key='ollama', # required but ignored
)
responses_result = client.responses.create(
model='qwen3:8b',
input='Write a short poem about the color blue',
)
print(responses_result.output_text)
```
```javascript responses.js
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:11434/v1/",
apiKey: "ollama", // required but ignored
});
const responsesResult = await openai.responses.create({
model: "qwen3:8b",
input: "Write a short poem about the color blue",
});
console.log(responsesResult.output_text);
```
```shell responses.sh
curl -X POST http://localhost:11434/v1/responses \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3:8b",
"input": "Write a short poem about the color blue"
}'
```
</CodeGroup>
### v1/chat/completions with vision example
<CodeGroup dropdown>
```python vision.py
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
api_key='ollama', # required but ignored
)
response = client.chat.completions.create(
model='qwen3-vl:8b',
messages=[
{
'role': 'user',
'content': [
{'type': 'text', 'text': "What's in this image?"},
{
'type': 'image_url',
'image_url': '',
},
],
}
],
max_tokens=300,
)
print(response.choices[0].message.content)
```
```javascript vision.js
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:11434/v1/",
apiKey: "ollama", // required but ignored
});
const response = await openai.chat.completions.create({
model: "qwen3-vl:8b",
messages: [
{
role: "user",
content: [
{ type: "text", text: "What's in this image?" },
{
type: "image_url",
image_url:
"",
},
],
},
],
});
console.log(response.choices[0].message.content);
```
```shell vision.sh
curl -X POST http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-vl:8b",
"messages": [{ "role": "user", "content": [{"type": "text", "text": "What is this an image of?"}, {"type": "image_url", "image_url": ""}]}]
}'
```
</CodeGroup>
## Endpoints
### `/v1/chat/completions`
#### Supported features
- [x] Chat completions
- [x] Streaming
- [x] JSON mode
- [x] Reproducible outputs
- [x] Vision
- [x] Tools
- [ ] Logprobs
#### Supported request fields
- [x] `model`
- [x] `messages`
- [x] Text `content`
- [x] Image `content`
- [x] Base64 encoded image
- [ ] Image URL
- [x] Array of `content` parts
- [x] `frequency_penalty`
- [x] `presence_penalty`
- [x] `response_format`
- [x] `seed`
- [x] `stop`
- [x] `stream`
- [x] `stream_options`
- [x] `include_usage`
- [x] `temperature`
- [x] `top_p`
- [x] `max_tokens`
- [x] `tools`
- [ ] `tool_choice`
- [ ] `logit_bias`
- [ ] `user`
- [ ] `n`
### `/v1/completions`
#### Supported features
- [x] Completions
- [x] Streaming
- [x] JSON mode
- [x] Reproducible outputs
- [ ] Logprobs
#### Supported request fields
- [x] `model`
- [x] `prompt`
- [x] `frequency_penalty`
- [x] `presence_penalty`
- [x] `seed`
- [x] `stop`
- [x] `stream`
- [x] `stream_options`
- [x] `include_usage`
- [x] `temperature`
- [x] `top_p`
- [x] `max_tokens`
- [x] `suffix`
- [ ] `best_of`
- [ ] `echo`
- [ ] `logit_bias`
- [ ] `user`
- [ ] `n`
#### Notes
- `prompt` currently only accepts a string
### `/v1/models`
#### Notes
- `created` corresponds to when the model was last modified
- `owned_by` corresponds to the ollama username, defaulting to `"library"`
### `/v1/models/{model}`
#### Notes
- `created` corresponds to when the model was last modified
- `owned_by` corresponds to the ollama username, defaulting to `"library"`
### `/v1/embeddings`
#### Supported request fields
- [x] `model`
- [x] `input`
- [x] string
- [x] array of strings
- [ ] array of tokens
- [ ] array of token arrays
- [x] `encoding format`
- [x] `dimensions`
- [ ] `user`
### `/v1/responses`
Ollama supports the [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses). Only the non-stateful flavor is supported (i.e., there is no `previous_response_id` or `conversation` support).
#### Supported features
- [x] Streaming
- [x] Tools (function calling)
- [x] Reasoning summaries (for thinking models)
- [ ] Stateful requests
#### Supported request fields
- [x] `model`
- [x] `input`
- [x] `instructions`
- [x] `tools`
- [x] `stream`
- [x] `temperature`
- [x] `top_p`
- [x] `max_output_tokens`
- [ ] `previous_response_id` (stateful v1/responses not supported)
- [ ] `conversation` (stateful v1/responses not supported)
- [ ] `truncation`
## Models
Before using a model, pull it locally `ollama pull`:
```shell
ollama pull llama3.2
```
### Default model names
For tooling that relies on default OpenAI model names such as `gpt-3.5-turbo`, use `ollama cp` to copy an existing model name to a temporary name:
```shell
ollama cp llama3.2 gpt-3.5-turbo
```
Afterwards, this new model name can be specified the `model` field:
```shell
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
```
### Setting the context size
The OpenAI API does not have a way of setting the context size for a model. If you need to change the context size, create a `Modelfile` which looks like:
```
FROM <some model>
PARAMETER num_ctx <context size>
```
Use the `ollama create mymodel` command to create a new model with the updated context size. Call the API with the updated model name:
```shell
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mymodel",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
```