Building Your First Local AI Agent: Ollama + smolagents Setup Guide

Author: Narendra Mandadapu
Posted: May 29, 2025

A complete setup for running AI agents locally using Ollama and Hugging Face's smolagents library. This creates an agent that thinks in Python code to solve problems.

📂 Complete code and setup files available on GitHub: Ollama + smolagents Setup Guide

What This Does

Instead of just chatting with an AI, this setup creates an agent that:

Receives your request
Writes Python code to solve it
Executes that code
Returns the result

Perfect for calculations, data analysis, and multi-step problem solving - all running locally on your machine!

Code-Thinking Agent vs Regular AI Chat

Regular AI Chat (like ChatGPT or ollama run qwen2:7b):

Just generates text responses
Tells you the answer
Limited to what it can "remember" or calculate in its head

Code-Thinking Agent (our smolagents setup):

Writes actual Python code to solve problems
Executes that code step-by-step
Can handle complex math, logic, data processing
Shows you its "work" (the code it wrote)
Much more reliable for calculations and multi-step reasoning

Think of it as the difference between asking someone a math question versus hiring a programmer to solve it!

Architecture & Connection Flow

flowchart LR
    A[Your Python Script] --> B[smolagents] --> C[LiteLLM] --> D[Ollama<br/>:11434] --> E[qwen2:7b model]

    style A fill:#e3f2fd,stroke:#1565c0,stroke-width:2px
    style B fill:#fff3e0,stroke:#ef6c00,stroke-width:2px
    style C fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
    style D fill:#fce4ec,stroke:#c2185b,stroke-width:2px
    style E fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px

Why Port 11434 Matters:
Your Python script needs to know exactly where to find the Ollama server, just like needing someone's phone number to call them. The line api_base="http://127.0.0.1:11434" in your code tells it:

127.0.0.1 = "localhost" (your own computer)
:11434 = the specific port Ollama listens on

If Ollama runs on a different port or you use the wrong port number, your Python script can't connect!

Prerequisites

macOS/Linux/Windows with Python 3.8+
~5GB free disk space for the AI model
Important: You need BOTH system-level AND Python-level components:
- System-level: Ollama application (installed like any other app)
- Python-level: smolagents package (installed via pip)

Think of it like needing both Microsoft Word (the app) AND a document to work with - two different things!

Setup Instructions

1. Install Ollama

macOS:

# Download from https://ollama.ai or use brew:
brew install ollama

Linux:

curl -fsSL https://ollama.ai/install.sh | sh

2. Download AI Model

# Pull the 7B parameter Qwen2 model (4.4GB)
ollama pull qwen2:7b

# Verify it downloaded
ollama list

3. Python Environment Setup

# Create virtual environment
python3 -m venv little-llm-model-venv

# Activate it
source little-llm-model-venv/bin/activate  # macOS/Linux
# OR
little-llm-model-venv\Scripts\activate     # Windows

# Install dependencies
pip install smolagents[litellm]

4. Create Main Script

Create main.py:

from smolagents import LiteLLMModel, CodeAgent

model = LiteLLMModel(
    model_id="ollama_chat/qwen2:7b",
    api_base="http://127.0.0.1:11434",
    num_ctx=8192,
)

agent = CodeAgent(tools=[], model=model)
response = agent.run("Tell me a joke")
print(response)

Running the Setup

1. Start Ollama Server

# In one terminal, start the server
ollama serve

2. Test your Agent

❯ curl -s http://localhost:11434/api/tags || echo "Ollama not running - start with 'ollama serve'"

3. Run Your Agent

# In another terminal, activate your environment
source little-llm-model-venv/bin/activate

# Run your script
python3 main.py

Example Output

╭─────────────────────────────────────────────────────── New run ───────────────────────────────────────────────────────╮
│ Tell me a joke                                                                                                        │
╰─ LiteLLMModel - ollama_chat/qwen2:7b ─────────────────────────────────────────────────────────────────────────────────╯
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 ─ Executing parsed code: ──────────────────────────────────────────────────────────────────────────────────────────────
  joke = "Why don't scientists trust atoms? Because they make up everything."
  final_answer(joke)
 ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Out - Final answer: Why don't scientists trust atoms? Because they make up everything.
[Step 1: Duration 3.02 seconds| Input tokens: 1,990 | Output tokens: 50]
Why don't scientists trust atoms? Because they make up everything.

Troubleshooting

"Import smolagents could not be resolved" in VS Code

Problem: VS Code doesn't know about your virtual environment
Solution:

Press Cmd+Shift+P (Mac) or Ctrl+Shift+P (Windows/Linux)
Type "Python: Select Interpreter"
Choose ./little-llm-model-venv/bin/python

Ollama Connection Issues

Problem: Python script can't connect to Ollama
Check if Ollama is running:

sudo lsof -i :11434
# Should show ollama process listening on port 11434
COMMAND   PID  USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
ollama  65356 naren    3u  IPv4  xyz      0t0  TCP localhost:11434 (LISTEN)

Solution: Make sure ollama serve is running in a separate terminal

Model Not Found Error

Problem: Agent can't find the AI model
Check available models:

ollama list
# Should show qwen2:7b in the list
NAME        ID              SIZE      MODIFIED
qwen2:7b    dd314f039b9d    4.4 GB    2 hours ago

Solution: Re-download if missing:

ollama pull qwen2:7b

"No module named smolagents"

Problem: Package not installed or wrong Python environment
Solution:

# Make sure virtual environment is activated
source little-llm-model-venv/bin/activate

# Reinstall if needed
pip install smolagents[litellm]

Port 11434 Already in Use

Problem: Something else is using Ollama's port
Find what's using the port:

sudo lsof -i :11434

Solution: Kill the existing process or restart your computer

Advanced Usage

Interactive Chat

from smolagents import LiteLLMModel, CodeAgent

model = LiteLLMModel(
    model_id="ollama_chat/qwen2:7b",
    api_base="http://127.0.0.1:11434",
    num_ctx=8192,
)

agent = CodeAgent(tools=[], model=model)

while True:
    user_input = input("You: ")
    if user_input.lower() == 'quit':
        break
    response = agent.run(user_input)
    print(f"Agent: {response}")

Try Complex Tasks

# Math calculations
agent.run("Calculate 15% of 2847 and tell me if it's greater than 400")

# Data analysis
agent.run("Create a list of prime numbers between 1 and 50")

# Logic problems
agent.run("If I have 3 apples and buy 2 more, then eat 1, how many do I have?")

Direct Ollama Chat (without agents)

# Simple chat interface
ollama run qwen2:7b

File Structure

your-project/
├── little-llm-model-venv/   # Virtual environment
├── main.py                  # Your agent script
├── interactive_chat.py      # Interactive chat version
├── requirements.txt         # Dependencies
├── .gitignore               # Git ignore rules
└── README.md                # Complete documentation

🔗 GitHub Repository: Local AI Agent Setup

What's Happening Under the Hood

Ollama serves the AI model locally on port 11434
LiteLLM provides a unified interface to talk to Ollama
smolagents creates an agent that writes Python code to solve problems
CodeAgent executes the generated code and returns results

Benefits

✅ 100% Local - No internet required after setup
✅ Privacy - Your data never leaves your machine
✅ No API Costs - Free to run unlimited queries
✅ Code-First - Agent thinks in executable Python code
✅ Powerful - Can handle complex multi-step problems

Get the Code

All the code, configuration files, and documentation from this tutorial are available in the GitHub repository:

🔗 GitHub Repository: Local AI Agent Setup

The repo includes:

Complete main.py and interactive_chat.py scripts
Ready-to-use requirements.txt
Comprehensive .gitignore
This complete guide as README.md

Simply clone and follow the setup instructions!

Resources

Pro Tip: Keep ollama serve running in the background when working with this setup. You can add it to your system startup if you use this regularly!

Post Tags:

Previous: Generative AI vs Agentic AI: Complete Guide