--- title: OpenAI compatibility --- Ollama provides compatibility with parts of the [OpenAI API](https://platform.openai.com/docs/api-reference) to help connect existing applications to Ollama. ## Usage ### Simple `v1/chat/completions` example ```python basic.py from openai import OpenAI client = OpenAI( base_url='http://localhost:11434/v1/', api_key='ollama', # required but ignored ) chat_completion = client.chat.completions.create( messages=[ { 'role': 'user', 'content': 'Say this is a test', } ], model='gpt-oss:20b', ) print(chat_completion.choices[0].message.content) ``` ```javascript basic.js import OpenAI from "openai"; const openai = new OpenAI({ baseURL: "http://localhost:11434/v1/", apiKey: "ollama", // required but ignored }); const chatCompletion = await openai.chat.completions.create({ messages: [{ role: "user", content: "Say this is a test" }], model: "gpt-oss:20b", }); console.log(chatCompletion.choices[0].message.content); ``` ```shell basic.sh curl -X POST http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-oss:20b", "messages": [{ "role": "user", "content": "Say this is a test" }] }' ``` ### Simple `v1/responses` example ```python responses.py from openai import OpenAI client = OpenAI( base_url='http://localhost:11434/v1/', api_key='ollama', # required but ignored ) responses_result = client.responses.create( model='qwen3:8b', input='Write a short poem about the color blue', ) print(responses_result.output_text) ``` ```javascript responses.js import OpenAI from "openai"; const openai = new OpenAI({ baseURL: "http://localhost:11434/v1/", apiKey: "ollama", // required but ignored }); const responsesResult = await openai.responses.create({ model: "qwen3:8b", input: "Write a short poem about the color blue", }); console.log(responsesResult.output_text); ``` ```shell responses.sh curl -X POST http://localhost:11434/v1/responses \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3:8b", "input": "Write a short poem about the color blue" }' ``` ### v1/chat/completions with vision example ```python vision.py from openai import OpenAI client = OpenAI( base_url='http://localhost:11434/v1/', api_key='ollama', # required but ignored ) response = client.chat.completions.create( model='qwen3-vl:8b', messages=[ { 'role': 'user', 'content': [ {'type': 'text', 'text': "What's in this image?"}, { 'type': 'image_url', 'image_url': '', }, ], } ], max_tokens=300, ) print(response.choices[0].message.content) ``` ```javascript vision.js import OpenAI from "openai"; const openai = new OpenAI({ baseURL: "http://localhost:11434/v1/", apiKey: "ollama", // required but ignored }); const response = await openai.chat.completions.create({ model: "qwen3-vl:8b", messages: [ { role: "user", content: [ { type: "text", text: "What's in this image?" }, { type: "image_url", image_url: "", }, ], }, ], }); console.log(response.choices[0].message.content); ``` ```shell vision.sh curl -X POST http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "qwen3-vl:8b", "messages": [{ "role": "user", "content": [{"type": "text", "text": "What is this an image of?"}, {"type": "image_url", "image_url": ""}]}] }' ``` ## Endpoints ### `/v1/chat/completions` #### Supported features - [x] Chat completions - [x] Streaming - [x] JSON mode - [x] Reproducible outputs - [x] Vision - [x] Tools - [ ] Logprobs #### Supported request fields - [x] `model` - [x] `messages` - [x] Text `content` - [x] Image `content` - [x] Base64 encoded image - [ ] Image URL - [x] Array of `content` parts - [x] `frequency_penalty` - [x] `presence_penalty` - [x] `response_format` - [x] `seed` - [x] `stop` - [x] `stream` - [x] `stream_options` - [x] `include_usage` - [x] `temperature` - [x] `top_p` - [x] `max_tokens` - [x] `tools` - [ ] `tool_choice` - [ ] `logit_bias` - [ ] `user` - [ ] `n` ### `/v1/completions` #### Supported features - [x] Completions - [x] Streaming - [x] JSON mode - [x] Reproducible outputs - [ ] Logprobs #### Supported request fields - [x] `model` - [x] `prompt` - [x] `frequency_penalty` - [x] `presence_penalty` - [x] `seed` - [x] `stop` - [x] `stream` - [x] `stream_options` - [x] `include_usage` - [x] `temperature` - [x] `top_p` - [x] `max_tokens` - [x] `suffix` - [ ] `best_of` - [ ] `echo` - [ ] `logit_bias` - [ ] `user` - [ ] `n` #### Notes - `prompt` currently only accepts a string ### `/v1/models` #### Notes - `created` corresponds to when the model was last modified - `owned_by` corresponds to the ollama username, defaulting to `"library"` ### `/v1/models/{model}` #### Notes - `created` corresponds to when the model was last modified - `owned_by` corresponds to the ollama username, defaulting to `"library"` ### `/v1/embeddings` #### Supported request fields - [x] `model` - [x] `input` - [x] string - [x] array of strings - [ ] array of tokens - [ ] array of token arrays - [x] `encoding format` - [x] `dimensions` - [ ] `user` ### `/v1/responses` Ollama supports the [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses). Only the non-stateful flavor is supported (i.e., there is no `previous_response_id` or `conversation` support). #### Supported features - [x] Streaming - [x] Tools (function calling) - [x] Reasoning summaries (for thinking models) - [ ] Stateful requests #### Supported request fields - [x] `model` - [x] `input` - [x] `instructions` - [x] `tools` - [x] `stream` - [x] `temperature` - [x] `top_p` - [x] `max_output_tokens` - [ ] `previous_response_id` (stateful v1/responses not supported) - [ ] `conversation` (stateful v1/responses not supported) - [ ] `truncation` ## Models Before using a model, pull it locally `ollama pull`: ```shell ollama pull llama3.2 ``` ### Default model names For tooling that relies on default OpenAI model names such as `gpt-3.5-turbo`, use `ollama cp` to copy an existing model name to a temporary name: ```shell ollama cp llama3.2 gpt-3.5-turbo ``` Afterwards, this new model name can be specified the `model` field: ```shell curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "user", "content": "Hello!" } ] }' ``` ### Setting the context size The OpenAI API does not have a way of setting the context size for a model. If you need to change the context size, create a `Modelfile` which looks like: ``` FROM PARAMETER num_ctx ``` Use the `ollama create mymodel` command to create a new model with the updated context size. Call the API with the updated model name: ```shell curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "mymodel", "messages": [ { "role": "user", "content": "Hello!" } ] }' ```