Building Your First Local AI Agent: Ollama + smolagents Setup Guide
A complete setup for running AI agents locally using Ollama and Hugging Face's smolagents library. This creates an agent that thinks in Python code to solve problems.
📂 Complete code and setup files available on GitHub: Ollama + smolagents Setup Guide
What This Does
Instead of just chatting with an AI, this setup creates an agent that:
- Receives your request
- Writes Python code to solve it
- Executes that code
- Returns the result
Perfect for calculations, data analysis, and multi-step problem solving - all running locally on your machine!
Code-Thinking Agent vs Regular AI Chat
Regular AI Chat (like ChatGPT or ollama run qwen2:7b
):
- Just generates text responses
- Tells you the answer
- Limited to what it can "remember" or calculate in its head
Code-Thinking Agent (our smolagents setup):
- Writes actual Python code to solve problems
- Executes that code step-by-step
- Can handle complex math, logic, data processing
- Shows you its "work" (the code it wrote)
- Much more reliable for calculations and multi-step reasoning
Think of it as the difference between asking someone a math question versus hiring a programmer to solve it!
Architecture & Connection Flow
flowchart LR A[Your Python Script] --> B[smolagents] --> C[LiteLLM] --> D[Ollama<br/>:11434] --> E[qwen2:7b model] style A fill:#e3f2fd,stroke:#1565c0,stroke-width:2px style B fill:#fff3e0,stroke:#ef6c00,stroke-width:2px style C fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px style D fill:#fce4ec,stroke:#c2185b,stroke-width:2px style E fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
Why Port 11434 Matters:
Your Python script needs to know exactly where to find the Ollama server, just like needing someone's phone number to call them. The line api_base="http://127.0.0.1:11434"
in your code tells it:
127.0.0.1
= "localhost" (your own computer):11434
= the specific port Ollama listens on
If Ollama runs on a different port or you use the wrong port number, your Python script can't connect!
Prerequisites
- macOS/Linux/Windows with Python 3.8+
- ~5GB free disk space for the AI model
- Important: You need BOTH system-level AND Python-level components:
- System-level: Ollama application (installed like any other app)
- Python-level: smolagents package (installed via pip)
Think of it like needing both Microsoft Word (the app) AND a document to work with - two different things!
Setup Instructions
1. Install Ollama
macOS:
# Download from https://ollama.ai or use brew:
brew install ollama
Linux:
curl -fsSL https://ollama.ai/install.sh | sh
2. Download AI Model
# Pull the 7B parameter Qwen2 model (4.4GB)
ollama pull qwen2:7b
# Verify it downloaded
ollama list
3. Python Environment Setup
# Create virtual environment
python3 -m venv little-llm-model-venv
# Activate it
source little-llm-model-venv/bin/activate # macOS/Linux
# OR
little-llm-model-venv\Scripts\activate # Windows
# Install dependencies
pip install smolagents[litellm]
4. Create Main Script
Create main.py
:
from smolagents import LiteLLMModel, CodeAgent
model = LiteLLMModel(
model_id="ollama_chat/qwen2:7b",
api_base="http://127.0.0.1:11434",
num_ctx=8192,
)
agent = CodeAgent(tools=[], model=model)
response = agent.run("Tell me a joke")
print(response)
Running the Setup
1. Start Ollama Server
# In one terminal, start the server
ollama serve
2. Test your Agent
❯ curl -s http://localhost:11434/api/tags || echo "Ollama not running - start with 'ollama serve'"
3. Run Your Agent
# In another terminal, activate your environment
source little-llm-model-venv/bin/activate
# Run your script
python3 main.py
Example Output
╭─────────────────────────────────────────────────────── New run ───────────────────────────────────────────────────────╮
│ Tell me a joke │
╰─ LiteLLMModel - ollama_chat/qwen2:7b ─────────────────────────────────────────────────────────────────────────────────╯
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
─ Executing parsed code: ──────────────────────────────────────────────────────────────────────────────────────────────
joke = "Why don't scientists trust atoms? Because they make up everything."
final_answer(joke)
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Out - Final answer: Why don't scientists trust atoms? Because they make up everything.
[Step 1: Duration 3.02 seconds| Input tokens: 1,990 | Output tokens: 50]
Why don't scientists trust atoms? Because they make up everything.
Troubleshooting
"Import smolagents could not be resolved" in VS Code
Problem: VS Code doesn't know about your virtual environment
Solution:
- Press
Cmd+Shift+P
(Mac) orCtrl+Shift+P
(Windows/Linux) - Type "Python: Select Interpreter"
- Choose
./little-llm-model-venv/bin/python
Ollama Connection Issues
Problem: Python script can't connect to Ollama
Check if Ollama is running:
sudo lsof -i :11434
# Should show ollama process listening on port 11434
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
ollama 65356 naren 3u IPv4 xyz 0t0 TCP localhost:11434 (LISTEN)
Solution: Make sure ollama serve
is running in a separate terminal
Model Not Found Error
Problem: Agent can't find the AI model
Check available models:
ollama list
# Should show qwen2:7b in the list
NAME ID SIZE MODIFIED
qwen2:7b dd314f039b9d 4.4 GB 2 hours ago
Solution: Re-download if missing:
ollama pull qwen2:7b
"No module named smolagents"
Problem: Package not installed or wrong Python environment
Solution:
# Make sure virtual environment is activated
source little-llm-model-venv/bin/activate
# Reinstall if needed
pip install smolagents[litellm]
Port 11434 Already in Use
Problem: Something else is using Ollama's port
Find what's using the port:
sudo lsof -i :11434
Solution: Kill the existing process or restart your computer
Advanced Usage
Interactive Chat
from smolagents import LiteLLMModel, CodeAgent
model = LiteLLMModel(
model_id="ollama_chat/qwen2:7b",
api_base="http://127.0.0.1:11434",
num_ctx=8192,
)
agent = CodeAgent(tools=[], model=model)
while True:
user_input = input("You: ")
if user_input.lower() == 'quit':
break
response = agent.run(user_input)
print(f"Agent: {response}")
Try Complex Tasks
# Math calculations
agent.run("Calculate 15% of 2847 and tell me if it's greater than 400")
# Data analysis
agent.run("Create a list of prime numbers between 1 and 50")
# Logic problems
agent.run("If I have 3 apples and buy 2 more, then eat 1, how many do I have?")
Direct Ollama Chat (without agents)
# Simple chat interface
ollama run qwen2:7b
File Structure
your-project/
├── little-llm-model-venv/ # Virtual environment
├── main.py # Your agent script
├── interactive_chat.py # Interactive chat version
├── requirements.txt # Dependencies
├── .gitignore # Git ignore rules
└── README.md # Complete documentation
🔗 GitHub Repository: Local AI Agent Setup
What's Happening Under the Hood
- Ollama serves the AI model locally on port 11434
- LiteLLM provides a unified interface to talk to Ollama
- smolagents creates an agent that writes Python code to solve problems
- CodeAgent executes the generated code and returns results
Benefits
- ✅ 100% Local - No internet required after setup
- ✅ Privacy - Your data never leaves your machine
- ✅ No API Costs - Free to run unlimited queries
- ✅ Code-First - Agent thinks in executable Python code
- ✅ Powerful - Can handle complex multi-step problems
Get the Code
All the code, configuration files, and documentation from this tutorial are available in the GitHub repository:
🔗 GitHub Repository: Local AI Agent Setup
The repo includes:
- Complete
main.py
andinteractive_chat.py
scripts - Ready-to-use
requirements.txt
- Comprehensive
.gitignore
- This complete guide as README.md
Simply clone and follow the setup instructions!
Resources
Pro Tip: Keep ollama serve
running in the background when working with this setup. You can add it to your system startup if you use this regularly!
Post Tags: