Ollama Adapter

The Ollama adapter provides integration with Ollama’s API, supporting any model available through the API.

Features

✅ Logging: Comprehensive logging with request context and performance metrics
✅ Health Checks: Built-in health check to verify API connectivity
✅ Token Tracking: Automatic token usage tracking for cost monitoring

Basic Usage


from parsec.models.adapters import OllamaAdapter
 
adapter = OllamaAdapter(
    model="llama3",
    base_url="http://localhost:11434"
)
 
# Generate a response
result = await adapter.generate("What is the capital of France?")
print(result.output)  # "Paris"
print(result.tokens_used)  # e.g., 25
print(result.latency_ms)  # e.g., 342.5

Structured Output with Schema

The Gemini adapter uses native JSON mode when a schema is provided:


schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
        "email": {"type": "string"}
    },
    "required": ["name", "age"]
}
 
result = await adapter.generate(
    "Extract: John Doe is 30 years old, john@example.com",
    schema=schema,
    temperature=0.7
)
 
print(result.output)  # '{"name": "John Doe", "age": 30, "email": "john@example.com"}'

Streaming

Streaming support is still under developement.

Configuration Options

Parameter	Type	Default	Description
`model`	`str`	Required	Model name (e.g., “llama3”)
`temperature`	`float`	0.7	Sampling temperature (0.0 to 2.0)
`max_output_tokens`	`int`	None	Maximum tokens to generate
`schema`	`dict`	None	JSON schema for structured output

Logging

The adapter includes comprehensive logging:


import logging
logging.basicConfig(level=logging.INFO)
 
# Logs will show:
# INFO - Generating response from Ollama model llama3
# DEBUG - Success: 25 tokens

Health Check

Verify API connectivity:


is_healthy = await adapter.health_check()
if is_healthy:
    print("Ollama API is accessible")

Supported Models

llama3 - Best for text tasks

Error Handling


try:
    result = await adapter.generate("Hello")
except Exception as e:
    # Logs automatically include full stack trace
    print(f"Generation failed: {e}")

Important Notes

Token Counting

The adapter reports token usage when available:

prompt_token_count - Input tokens
candidates_token_count - Output tokens
Total reported in tokens_used