Using Ollama with Python: A Simple Guide

Lomanu4 · 12 Май 2025

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

Once you’ve installed Ollama and experimented with running models from the command line, the next logical step is to integrate these powerful AI capabilities into your Python applications. This guide will show you how to use Ollama with Python.

Setting Up

First, make sure Ollama is installed and running on your system.

You can check this other article

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

if you are no familiar with Ollama yet.

Required Ollama Models

Before running the Python examples in this guide, make sure you have the necessary models pulled. You can pull them using the Ollama CLI:

Pull the models used in these examples

ollama pull llama3.2:1b

You only need to pull these models once. Check which models you already have with:

ollama list
Creating a Virtual Environment

It’s a good practice to use a virtual environment for your Python projects. This keeps your dependencies isolated and makes your project more portable:

# Create a virtual environment

python -m venv ollama-env

# Activate the virtual environment

# On Windows:

ollama-env\Scripts\activate

# On macOS/Linux:
source ollama-env/bin/activate
Installing Dependencies

Install the Ollama Python library:

pip install ollama
Creating a requirements.txt

For better project management, create a requirements.txt file:

pip freeze > requirements.txt

To install from this file in the future:

pip install -r requirements.txt
Basic Usage

Let’s start with a simple example using the Llama 3.2 1B model.

Create a file named generate.py with this content:

from ollama import generate
# Regular response
response = generate('llama3.2:1b', 'Why is the sky blue?')
print(response['response'])

This will output the model’s explanation of why the sky is blue as a complete response.

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

Streaming Responses

For a more interactive experience, you can get the response as it’s being generated.

Create a file named generate-stream.py with this content:

from ollama import generate
# Streaming response
print("Streaming response:")
for chunk in generate('llama3.2:1b', 'Why is the sky blue?', stream=True):
print(chunk['response'], end='', flush=True)
print() # New line at the end

This displays the response incrementally as it’s generated, creating a more interactive experience.

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

Why for chunk in generate is used?

When you use the streaming functionality with Ollama, the response isn’t returned all at once. Instead, it’s broken into small pieces (chunks) that arrive one at a time as they’re generated by the model.

The generate() function with stream=True returns an iterator in Python. This iterator yields new chunks of text as they become available from the model. The for loop processes these chunks one by one as they arrive:

Each chunk contains a small piece of the response in chunk['response']
The end='' parameter prevents adding newlines between chunks
The flush=True ensures text displays immediately

This creates the effect of watching the AI “think” in real-time, similar to watching someone type.

Using System Prompts

The system prompt allows you to set context and instructions for the model before the conversation starts. It’s a powerful way to define the model’s behavior.

Create a file named chat-system-role.py with this content:

from ollama import chat

# Define a system prompt
system_prompt = "You speaks and sounds like a pirate with short sentences."
# Chat with a system prompt
response = chat('llama3.2:1b',
messages=[
{'role': 'system', 'content': system_prompt},
{'role': 'user', 'content': 'Tell me about your boat.'}
])
print(response.message.content)

The system prompt stays active throughout the conversation, influencing how the model responds to all user inputs.

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

Conversational Context

Maintain a conversation with context using streaming for a more interactive experience.

Create a file named chat-history-stream.py with this content:

from ollama import chat

# Initialize an empty message history
messages = []
while True:
user_input = input('Chat with history: ')
if user_input.lower() == 'exit':
break
# Get streaming response while maintaining conversation history
response_content = ""
for chunk in chat(
'llama3.2:1b',
messages=messages + [
{'role': 'system', 'content': 'You are a helpful assistant. You only give a short sentence by answer.'},
{'role': 'user', 'content': user_input},
],
stream=True
):
if chunk.message:
response_chunk = chunk.message.content
print(response_chunk, end='', flush=True)
response_content += response_chunk
# Add the exchange to the conversation history
messages += [
{'role': 'user', 'content': user_input},
{'role': 'assistant', 'content': response_content},
]
print('\n') # Add space after response

Here you can see how this example looks like.

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

Conclusion

The Ollama Python library makes it easy to integrate powerful language models into your Python applications. Whether you’re building a simple script or a complex application, the library’s straightforward API allows you to focus on creating value rather than managing the underlying AI infrastructure.

As you become more comfortable with the basics, explore more advanced features and consider how you can use these capabilities to solve real-world problems in your projects.

Resources

In this GitHub repository, you'll find working code examples:

Пожалуйста Авторизируйтесь или Зарегистрируйтесь для просмотра скрытого текста.

Using Ollama with Python: A Simple Guide

Lomanu4