Technical Guide

Building Your First AI Agent: A Complete Guide

From chatbot to autonomous agent: A comprehensive technical guide covering architecture, platforms, code examples, and hard-won lessons from building Axis. The AI that runs As Above Technologies.

📖 75 min read 💻 Code examples included 📅 February 2026

Contents

1. AI Agents vs Chatbots: Understanding the Difference
2. The Agent Architecture: LLM + Tools + Memory + Orchestration
3. Platforms Compared: OpenClaw, LangChain, AutoGPT, CrewAI
4. Step-by-Step: Building a Simple Agent
5. Adding Tools and Capabilities
6. Memory and Context Management
7. Deployment Options
8. Real Example: How We Built Axis
9. Common Pitfalls and How to Avoid Them
10. Cost Considerations and Scaling

In January 2025, we started building what would become Axis. The AI agent that now manages significant portions of As Above Technologies' operations. It handles customer inquiries, monitors our systems, researches markets, drafts content, and even contributes to its own codebase. It wasn't magic, and it wasn't easy. But it was more achievable than you might think.

This guide is what I wish existed when we started. It covers everything from foundational concepts to production deployment, with code examples and real lessons from building an agent that actually runs a business. By the end, you'll have a clear roadmap for building your own AI agent, whether it's a weekend project or a production system.

We'll be honest about what works, what doesn't, and where the hype exceeds reality. Building agents is now accessible to developers of all experience levels, but it requires understanding the right patterns and avoiding common traps.

💡 Who This Guide Is For

Developers, technical founders, and ambitious entrepreneurs who want to build AI agents. You don't need ML expertise, but basic programming familiarity (Python preferred) will help you get the most from the code examples. Non-developers can still benefit from the architecture and platform sections to make informed decisions.

1. AI Agents vs Chatbots: Understanding the Difference

Before we build anything, we need to be precise about what we're building. The industry uses "AI agent" to describe everything from a slightly enhanced chatbot to science fiction AGI. Here's the taxonomy that actually matters:

The Spectrum of AI Systems

Think of AI systems on a spectrum of autonomy and capability:

System Type	Autonomy	Tools	Memory	Example
Basic Chatbot	None	None	Single conversation	Rule-based support bots
LLM Interface	None	None	Single conversation	Basic ChatGPT wrapper
Enhanced LLM	Minimal	Built-in only	Session-based	ChatGPT with web browsing
AI Assistant	Low	Limited set	Persistent	Custom GPTs, Claude Projects
AI Agent	Medium-High	Extensible	Long-term	Axis, OpenClaw agents
Autonomous Agent	High	Self-extending	Evolving	AutoGPT, experimental systems

The Three Defining Characteristics

What separates a true AI agent from a fancy chatbot? Three core capabilities:

1. Tool Use (Action in the World)

A chatbot tells you how to do something. An agent does the thing. This is the most fundamental difference. When you ask an agent to "check if my server is up," it doesn't explain ping commands—it pings the server and tells you the result.

Tools are the bridge between language and action. They can be:

Information retrieval: Web search, database queries, API calls
System interaction: File operations, command execution, browser automation
Communication: Sending emails, posting to channels, notifications
Creation: Code generation, image creation, document synthesis
Integration: CRM updates, calendar management, third-party services

💡 Tool Use at As Above

Axis has access to over 30 tools: web search, calendar management, email, file operations, browser automation, code execution, database queries, and our business-specific integrations (inventory systems, customer databases). Each tool extends what Axis can do in the world.

2. Memory (Continuity Across Time)

A chatbot starts fresh each conversation. An agent remembers. This isn't just about technical context windows. It\'s about building a persistent understanding that compounds over time.

Effective agent memory includes:

Working memory: Current task context, active goals, in-progress work
Short-term memory: Recent interactions, conversation history
Long-term memory: User preferences, learned procedures, domain knowledge
Episodic memory: Specific past events, decisions made, outcomes observed

Memory is what allows an agent to say "last time we tried that approach, it didn't work because X" or "you mentioned preferring brief emails" without being reminded each session.

3. Goal-Directed Behavior (Pursuing Objectives)

A chatbot waits for input. An agent pursues goals. This is the autonomy dimension. The ability to take a high-level objective and break it down into sub-tasks, execute them in sequence, handle obstacles, and persist until the goal is achieved (or determined impossible).

Compare:

❌ Chatbot interaction

User: "How do I find out what our competitors are charging?"
Chatbot: "You can visit their websites, check industry reports, or use price monitoring tools like Prisync..."

✅ Agent interaction

User: "Find out what our competitors are charging."
Agent: "I'll research that now. [Uses web search to find competitor sites, navigates to pricing pages, extracts pricing data, synthesizes findings] Here's what I found: Competitor A charges $49-199/mo, Competitor B is $79/mo flat, Competitor C uses usage-based pricing starting at $0.01/call. Want me to compile this into a comparison table?"

Why the Distinction Matters

The distinction isn't academic—it determines what you build and how. Agent architecture is fundamentally different from chatbot architecture:

You need a tool system, not just a prompt template
You need memory management, not just conversation history
You need task planning, not just response generation
You need error handling and recovery, not just retry logic
You need safety guardrails, not just content filtering

If you're building a chatbot thinking you're building an agent, you'll hit walls. If you're building an agent with chatbot architecture, you'll create something fragile and frustrating. Let's build it right from the start.

2. The Agent Architecture: LLM + Tools + Memory + Orchestration

An AI agent is a system, not just a model. The model (GPT-4, Claude, etc.) is the brain, but brains need bodies, senses, and support systems. Here's the architecture that makes agents work:

┌─────────────────────────────────────────────────────────────────┐
│                        ORCHESTRATOR                              │
│    (Receives input, plans tasks, routes to components)           │
└─────────────────────────────────────────────────────────────────┘
                              │
         ┌────────────────────┼────────────────────┐
         │                    │                    │
         ▼                    ▼                    ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│                 │  │                 │  │                 │
│    LLM CORE     │  │  TOOL SYSTEM    │  │  MEMORY STORE   │
│                 │  │                 │  │                 │
│  • Reasoning    │  │  • Web Search   │  │  • Working      │
│  • Generation   │  │  • File I/O     │  │  • Short-term   │
│  • Planning     │  │  • APIs         │  │  • Long-term    │
│  • Synthesis    │  │  • Browser      │  │  • Retrieval    │
│                 │  │  • Custom       │  │                 │
└─────────────────┘  └─────────────────┘  └─────────────────┘
         │                    │                    │
         └────────────────────┼────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      INTERFACE LAYER                             │
│         (Chat, API, Webhooks, Scheduled Tasks)                   │
└─────────────────────────────────────────────────────────────────┘

Component 1: The LLM Core

The language model is the reasoning engine. It interprets intent, plans approaches, generates outputs, and decides when to use tools. Choosing the right model matters:

Model	Best For	Context	Cost (per 1M tokens)
GPT-4o	General tasks, speed	128K	$2.50 in / $10 out
GPT-4 Turbo	Complex reasoning	128K	$10 in / $30 out
Claude 3.5 Sonnet	Balanced capability/cost	200K	$3 in / $15 out
Claude Opus 4	Complex reasoning, agentic tasks	200K	$15 in / $75 out
Llama 3 70B	Self-hosted, privacy	8K-128K	Hardware cost
Mixtral 8x22B	Cost-effective self-hosted	64K	Hardware cost

💡 Model Selection at As Above

Axis uses a tiered approach: Claude Opus 4 for complex reasoning and planning, Claude 3.5 Sonnet for most day-to-day tasks, and GPT-4o for specific use cases where OpenAI excels. The orchestrator routes tasks to the appropriate model based on complexity and requirements.

Component 2: The Tool System

Tools are functions the agent can call to interact with the world. A well-designed tool system has:

Tool Definition

Each tool needs clear specification that the LLM can understand:

tool_definition.py

# Example tool definition for a web search tool
web_search_tool = {
    "name": "web_search",
    "description": "Search the web for current information. Use for questions about recent events, facts you're unsure about, or when user asks to 'look up' something.",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "The search query. Be specific and include relevant context."
            },
            "num_results": {
                "type": "integer",
                "description": "Number of results to return (1-10)",
                "default": 5
            }
        },
        "required": ["query"]
    }
}

Tool Execution

The tool executor takes the LLM's tool call and performs the actual action:

tool_executor.py

async def execute_tool(tool_name: str, parameters: dict) -> dict:
    """Execute a tool and return results."""
    
    if tool_name == "web_search":
        results = await brave_search(
            query=parameters["query"],
            count=parameters.get("num_results", 5)
        )
        return {
            "success": True,
            "results": results
        }
    
    elif tool_name == "read_file":
        content = await read_file(parameters["path"])
        return {
            "success": True,
            "content": content
        }
    
    elif tool_name == "send_email":
        # Always confirm before external actions
        if not parameters.get("confirmed"):
            return {
                "success": False,
                "needs_confirmation": True,
                "message": f"Send email to {parameters['to']}?"
            }
        await send_email(**parameters)
        return {"success": True}
    
    else:
        return {
            "success": False,
            "error": f"Unknown tool: {tool_name}"
        }

Tool Categories

Organize tools by risk level and type:

Read-only tools (safe to run freely):
- Web search
- File reading
- Database queries
- Calendar viewing
- System status checks
Write tools (require logging, maybe confirmation):
- File creation/editing
- Database writes
- Calendar event creation
- Note taking
External action tools (always require confirmation):
- Sending emails
- Posting to social media
- Making purchases
- Modifying production systems

Component 3: Memory System

Memory is what makes an agent feel intelligent over time. There are multiple layers:

Working Memory (Context Window)

The immediate context the LLM sees—current conversation, task state, relevant retrieved information. This is limited by the model's context window.

Short-Term Memory (Session Storage)

Information persisted across turns in a session but not necessarily across sessions. Typically implemented as in-memory storage or short-TTL cache.

Long-Term Memory (Persistent Storage)

Knowledge that persists across sessions. This requires:

Storage: Database, file system, or vector store
Retrieval: How to find relevant memories (search, embeddings)
Management: What to remember, what to forget, when to consolidate

memory_system.py

class AgentMemory:
    def __init__(self, storage_path: str):
        self.storage_path = storage_path
        self.working_memory = {}  # Current session state
        self.load_long_term_memory()
    
    def load_long_term_memory(self):
        """Load persistent memory from disk."""
        self.user_preferences = self._load_json("preferences.json")
        self.episodic_memory = self._load_json("episodes.json")
        self.learned_procedures = self._load_json("procedures.json")
    
    def remember(self, key: str, value: any, memory_type: str = "working"):
        """Store a memory."""
        if memory_type == "working":
            self.working_memory[key] = value
        elif memory_type == "long_term":
            self._persist_to_file(key, value)
    
    def recall(self, query: str, memory_types: list = None) -> list:
        """Retrieve relevant memories."""
        results = []
        
        # Search working memory
        if "working" in (memory_types or ["working"]):
            for key, value in self.working_memory.items():
                if self._is_relevant(query, key, value):
                    results.append({"source": "working", "key": key, "value": value})
        
        # Search long-term memory with embeddings
        if "long_term" in (memory_types or []):
            relevant = self._semantic_search(query, self.episodic_memory)
            results.extend(relevant)
        
        return results
    
    def consolidate(self):
        """Move important working memories to long-term storage."""
        # Run periodically to persist important learnings
        for key, value in self.working_memory.items():
            if self._should_persist(key, value):
                self.remember(key, value, memory_type="long_term")

Component 4: The Orchestrator

The orchestrator is the "executive function"—it coordinates everything. Key responsibilities:

Input processing: Receive and parse user requests
Task planning: Break complex goals into steps
Context assembly: Gather relevant memories and context
LLM routing: Choose which model for which task
Tool coordination: Manage tool calls and results
Response synthesis: Compile final output
Error handling: Recover from failures gracefully

orchestrator.py

class AgentOrchestrator:
    def __init__(self, config: AgentConfig):
        self.llm = LLMClient(config.model)
        self.tools = ToolRegistry(config.tools)
        self.memory = AgentMemory(config.memory_path)
    
    async def handle_request(self, user_input: str) -> str:
        """Main entry point for agent requests."""
        
        # 1. Retrieve relevant context
        context = self.memory.recall(user_input)
        
        # 2. Build prompt with context and available tools
        messages = self._build_messages(user_input, context)
        
        # 3. Get LLM response (may include tool calls)
        response = await self.llm.chat(
            messages=messages,
            tools=self.tools.get_definitions()
        )
        
        # 4. Process tool calls if any
        while response.tool_calls:
            tool_results = await self._execute_tools(response.tool_calls)
            messages.append(response.message)
            messages.append(self._format_tool_results(tool_results))
            response = await self.llm.chat(messages=messages, tools=self.tools.get_definitions())
        
        # 5. Update memory with interaction
        self.memory.remember("last_interaction", {
            "input": user_input,
            "output": response.content,
            "timestamp": datetime.now()
        })
        
        return response.content
    
    async def _execute_tools(self, tool_calls: list) -> list:
        """Execute multiple tool calls, potentially in parallel."""
        results = []
        for call in tool_calls:
            result = await self.tools.execute(call.name, call.parameters)
            results.append({
                "tool": call.name,
                "call_id": call.id,
                "result": result
            })
        return results

The ReAct Pattern

Most successful agents use some variant of the ReAct (Reasoning + Acting) pattern:

Observe: Receive input, gather context
Think: Reason about what to do (often explicit "thinking" in the prompt)
Act: Execute a tool or generate output
Observe: See the result of the action
Repeat: Continue until task is complete

💡 ReAct in Practice

When Axis receives "Check if our website is down and fix it if possible," the loop looks like:

Think: Need to check website status first.
Act: [calls http_check tool for asabove.tech]
Observe: 503 error, server not responding
Think: Site is down. Should check server status.
Act: [calls server_status tool]
Observe: High CPU, memory exhausted
Think: Need to restart service, but should confirm first.
Act: [requests confirmation to restart]
...and so on

3. Platforms Compared: OpenClaw, LangChain, AutoGPT, CrewAI

You don't have to build everything from scratch. Several platforms provide agent infrastructure. Here's an honest comparison based on our experience:

OpenClaw

Personal AI Assistant Framework

OpenClaw is what we built Axis on. It's a personal AI assistant framework designed for developers who want a capable agent without drowning in infrastructure complexity.

Learning Curve

Low-Medium

Customization

High

Production Ready

Yes

Cost

Open source + API costs

Key Features

Built-in tool system with common integrations (web, files, browser, calendar, email)
Flexible memory with workspace files (SOUL.md, MEMORY.md patterns)
Multi-model support (switch between Claude, GPT, local models)
Multiple interfaces: CLI, web chat, API, Discord, Telegram
Subagent spawning for parallel tasks
Heartbeat system for proactive behavior

Best For

Developers building personal or business assistants. Those who want a working agent fast but need customization. Teams that value the "workspace as configuration" pattern.

Limitations

Newer platform, smaller community than LangChain
Documentation still maturing
Less suited for massive multi-agent systems

LangChain

AI Application Framework

LangChain is the most popular framework for building LLM applications. It provides extensive abstractions for chains, agents, memory, and integrations.

Learning Curve

Medium-High

Customization

Very High

Production Ready

Yes (with work)

Cost

Open source + API costs

Key Features

Massive ecosystem of integrations and tools
LangGraph for complex agent workflows with state machines
LangSmith for observability and debugging
Extensive documentation and tutorials
Large community, many examples
Support for virtually any LLM provider

Best For

Teams building complex, custom agent systems. Production applications needing observability. Projects requiring specific integrations from the ecosystem.

Limitations

Abstraction layers can add complexity
Frequent breaking changes in early versions (stabilizing now)
Can be overwhelming—many ways to do the same thing
Debugging can be tricky due to abstraction depth

langchain_example.py

from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.tools import Tool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Define tools
tools = [
    Tool(
        name="web_search",
        description="Search the web for information",
        func=lambda q: web_search(q)
    ),
    Tool(
        name="calculator",
        description="Perform mathematical calculations",
        func=lambda expr: eval(expr)  # Don't do this in production!
    )
]

# Create agent
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

agent = create_openai_tools_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# Run
result = executor.invoke({"input": "What's the population of Tokyo?"})

AutoGPT

Autonomous Agent Experiment

AutoGPT pioneered the "give an AI a goal and let it figure out how" paradigm. It's more experimental than production-ready, but influential.

Learning Curve

Medium

Autonomy

Very High

Production Ready

Experimental

Cost

Can be expensive (many LLM calls)

Key Features

Fully autonomous operation (set goal, watch it work)
Self-prompting loop with planning and reflection
Built-in web browsing, code execution, file management
Memory with pinecone/local vector stores
Plugin system for extensions

Best For

Experimentation and learning. Research into autonomous systems. Tasks where high autonomy is acceptable and cost isn't a primary concern.

Limitations

High token consumption (loops can be expensive)
Can get stuck in loops or pursue tangents
Limited controllability once started
Not recommended for production business processes

CrewAI

Multi-Agent Orchestration

CrewAI specializes in multi-agent systems where different "crew members" collaborate on tasks. Each agent has a role, goal, and backstory.

Learning Curve

Low-Medium

Multi-Agent

Excellent

Production Ready

Yes (for appropriate use cases)

Cost

Scales with agents/tasks

Key Features

Role-based agent design (researcher, writer, reviewer, etc.)
Hierarchical and collaborative process types
Task delegation between agents
Clean, intuitive API
Good for content pipelines and research workflows

Best For

Workflows that naturally decompose into roles. Content creation pipelines. Research and analysis tasks. Teams exploring multi-agent patterns.

Limitations

Multi-agent overhead isn't always necessary
Can be slower than single-agent approaches
Coordination between agents can be unpredictable
Less flexible for highly custom requirements

crewai_example.py

from crewai import Agent, Task, Crew, Process

# Define agents
researcher = Agent(
    role='Research Analyst',
    goal='Find and analyze market information',
    backstory='Expert at finding and synthesizing information',
    tools=[web_search_tool]
)

writer = Agent(
    role='Content Writer',
    goal='Create compelling content from research',
    backstory='Experienced writer who turns data into narratives',
    tools=[write_file_tool]
)

# Define tasks
research_task = Task(
    description='Research the AI agent market landscape in 2026',
    agent=researcher,
    expected_output='Detailed market analysis with key players and trends'
)

writing_task = Task(
    description='Write a blog post based on the research',
    agent=writer,
    context=[research_task],
    expected_output='1500-word blog post in markdown format'
)

# Create crew and execute
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential
)

result = crew.kickoff()

Platform Selection Guide

If You Want...	Choose	Why
Personal assistant, fast setup	OpenClaw	Batteries included, workspace-centric
Maximum flexibility, enterprise	LangChain + LangGraph	Most mature ecosystem, best tooling
Full autonomy experiments	AutoGPT	Designed for autonomous operation
Multi-agent collaboration	CrewAI	Purpose-built for agent teams
Learning/understanding agents	Build from scratch	Nothing teaches like building

💡 Our Recommendation

Start with OpenClaw or raw API calls to understand the fundamentals. Graduate to LangChain when you need specific integrations or complex workflows. Consider CrewAI for content pipelines. Use AutoGPT for experiments, not production.

4. Step-by-Step: Building a Simple Agent

Let's build an agent from scratch. We'll create a research assistant that can search the web, read files, and maintain conversation memory. This will teach you the fundamentals before using any framework.

Step 1: Project Setup

requirements.txt

anthropic>=0.18.0
aiohttp>=3.9.0
python-dotenv>=1.0.0

.env

ANTHROPIC_API_KEY=your_api_key_here
BRAVE_API_KEY=your_brave_search_key  # Optional, for web search

Step 2: Define Your Tools

tools.py

import aiohttp
import os
from typing import Any
from pathlib import Path

# Tool definitions for Claude
TOOLS = [
    {
        "name": "web_search",
        "description": "Search the web for current information. Returns titles, URLs, and snippets.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query"
                }
            },
            "required": ["query"]
        }
    },
    {
        "name": "read_file",
        "description": "Read the contents of a file from the workspace.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "Path to the file"
                }
            },
            "required": ["path"]
        }
    },
    {
        "name": "write_file",
        "description": "Write content to a file. Creates directories if needed.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "Path to write to"
                },
                "content": {
                    "type": "string",
                    "description": "Content to write"
                }
            },
            "required": ["path", "content"]
        }
    },
    {
        "name": "remember",
        "description": "Save important information to long-term memory.",
        "input_schema": {
            "type": "object",
            "properties": {
                "key": {
                    "type": "string",
                    "description": "Memory key/label"
                },
                "value": {
                    "type": "string",
                    "description": "Information to remember"
                }
            },
            "required": ["key", "value"]
        }
    }
]

class ToolExecutor:
    def __init__(self, workspace: str = "./workspace"):
        self.workspace = Path(workspace)
        self.workspace.mkdir(exist_ok=True)
        self.memory = {}
        self._load_memory()
    
    def _load_memory(self):
        """Load persistent memory from file."""
        memory_file = self.workspace / "memory.json"
        if memory_file.exists():
            import json
            self.memory = json.loads(memory_file.read_text())
    
    def _save_memory(self):
        """Persist memory to file."""
        import json
        memory_file = self.workspace / "memory.json"
        memory_file.write_text(json.dumps(self.memory, indent=2))
    
    async def execute(self, tool_name: str, tool_input: dict) -> Any:
        """Execute a tool and return results."""
        
        if tool_name == "web_search":
            return await self._web_search(tool_input["query"])
        
        elif tool_name == "read_file":
            return self._read_file(tool_input["path"])
        
        elif tool_name == "write_file":
            return self._write_file(tool_input["path"], tool_input["content"])
        
        elif tool_name == "remember":
            return self._remember(tool_input["key"], tool_input["value"])
        
        else:
            return {"error": f"Unknown tool: {tool_name}"}
    
    async def _web_search(self, query: str) -> dict:
        """Perform web search using Brave API."""
        api_key = os.getenv("BRAVE_API_KEY")
        if not api_key:
            return {"error": "BRAVE_API_KEY not configured"}
        
        async with aiohttp.ClientSession() as session:
            async with session.get(
                "https://api.search.brave.com/res/v1/web/search",
                headers={"X-Subscription-Token": api_key},
                params={"q": query, "count": 5}
            ) as resp:
                if resp.status != 200:
                    return {"error": f"Search failed: {resp.status}"}
                data = await resp.json()
                
                results = []
                for item in data.get("web", {}).get("results", []):
                    results.append({
                        "title": item.get("title"),
                        "url": item.get("url"),
                        "snippet": item.get("description")
                    })
                return {"results": results}
    
    def _read_file(self, path: str) -> dict:
        """Read a file from workspace."""
        file_path = self.workspace / path
        if not file_path.exists():
            return {"error": f"File not found: {path}"}
        if not str(file_path.resolve()).startswith(str(self.workspace.resolve())):
            return {"error": "Access denied: path outside workspace"}
        return {"content": file_path.read_text()}
    
    def _write_file(self, path: str, content: str) -> dict:
        """Write content to a file."""
        file_path = self.workspace / path
        if not str(file_path.resolve()).startswith(str(self.workspace.resolve())):
            return {"error": "Access denied: path outside workspace"}
        file_path.parent.mkdir(parents=True, exist_ok=True)
        file_path.write_text(content)
        return {"success": True, "path": str(file_path)}
    
    def _remember(self, key: str, value: str) -> dict:
        """Store information in persistent memory."""
        self.memory[key] = value
        self._save_memory()
        return {"success": True, "remembered": key}

Step 3: Build the Agent Core

agent.py

import anthropic
import asyncio
from tools import TOOLS, ToolExecutor
from typing import Optional
from datetime import datetime

class SimpleAgent:
    def __init__(self, model: str = "claude-sonnet-4-20250514"):
        self.client = anthropic.Anthropic()
        self.model = model
        self.tool_executor = ToolExecutor()
        self.conversation_history = []
        
        # System prompt defines agent behavior
        self.system_prompt = """You are a helpful AI research assistant. You can:
- Search the web for current information
- Read and write files in your workspace
- Remember important information across conversations

When given a task:
1. Think about what information or actions you need
2. Use your tools to gather information or take actions
3. Synthesize findings into a clear response

Be thorough but concise. If you're unsure about something, say so.
If a task requires multiple steps, work through them systematically.

Current date: {date}
Memories: {memories}
"""
    
    def _build_system_prompt(self) -> str:
        """Build system prompt with current context."""
        memories_str = "\n".join(
            f"- {k}: {v}" for k, v in self.tool_executor.memory.items()
        ) if self.tool_executor.memory else "None stored yet."
        
        return self.system_prompt.format(
            date=datetime.now().strftime("%Y-%m-%d"),
            memories=memories_str
        )
    
    async def chat(self, user_message: str) -> str:
        """Process a user message and return response."""
        
        # Add user message to history
        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })
        
        # Keep history manageable (last 20 turns)
        if len(self.conversation_history) > 40:
            self.conversation_history = self.conversation_history[-40:]
        
        # Call Claude with tools
        response = self.client.messages.create(
            model=self.model,
            max_tokens=4096,
            system=self._build_system_prompt(),
            tools=TOOLS,
            messages=self.conversation_history
        )
        
        # Process response - may need multiple turns for tool use
        while response.stop_reason == "tool_use":
            # Extract tool calls from response
            assistant_content = response.content
            tool_results = []
            
            for block in assistant_content:
                if block.type == "tool_use":
                    print(f"  → Using tool: {block.name}")
                    result = await self.tool_executor.execute(
                        block.name, 
                        block.input
                    )
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": str(result)
                    })
            
            # Add assistant message and tool results to history
            self.conversation_history.append({
                "role": "assistant",
                "content": assistant_content
            })
            self.conversation_history.append({
                "role": "user",
                "content": tool_results
            })
            
            # Continue conversation with tool results
            response = self.client.messages.create(
                model=self.model,
                max_tokens=4096,
                system=self._build_system_prompt(),
                tools=TOOLS,
                messages=self.conversation_history
            )
        
        # Extract final text response
        final_response = ""
        for block in response.content:
            if hasattr(block, "text"):
                final_response += block.text
        
        # Add final response to history
        self.conversation_history.append({
            "role": "assistant",
            "content": response.content
        })
        
        return final_response

async def main():
    """Interactive chat loop."""
    agent = SimpleAgent()
    print("Agent ready. Type 'quit' to exit.\n")
    
    while True:
        try:
            user_input = input("You: ").strip()
            if user_input.lower() in ['quit', 'exit', 'q']:
                break
            if not user_input:
                continue
            
            print("Agent: ", end="", flush=True)
            response = await agent.chat(user_input)
            print(response)
            print()
            
        except KeyboardInterrupt:
            break
    
    print("\nGoodbye!")

if __name__ == "__main__":
    asyncio.run(main())

Step 4: Test Your Agent

terminal

$ python agent.py
Agent ready. Type 'quit' to exit.

You: What's the current state of AI agent technology? Search for recent news.
  → Using tool: web_search
Agent: Based on my search, here's the current state of AI agent technology in early 2026:

**Major Developments:**
1. **Claude's Computer Use** - Anthropic's ability to control computers directly has matured...
2. **OpenAI Operator** - GPT-4 can now browse the web and complete multi-step tasks...
3. **Open Source Progress** - Llama 3 and Mixtral are enabling self-hosted agent systems...

Would you like me to save this summary to a file?

You: Yes, save it to research/ai-agents-2026.md
  → Using tool: write_file
Agent: Done! I've saved the summary to research/ai-agents-2026.md. 

You: Remember that I'm interested in AI agents for business automation
  → Using tool: remember
Agent: Noted! I'll remember your interest in AI agents for business automation. 
This will help me tailor future research and suggestions to your focus area.

You: quit
Goodbye!

What You've Built

Congratulations! You now have a working AI agent with:

✅ Tool use (web search, file I/O)
✅ Conversation memory (within session)
✅ Persistent memory (across sessions)
✅ Multi-turn tool calling (can use multiple tools per request)
✅ Safety constraints (file access limited to workspace)

This is a foundation you can build on. In the next sections, we'll add more sophisticated tools and memory systems.

✅ Code Available

The complete code for this tutorial is available at github.com/asabove-tech/simple-agent-tutorial. Star it to bookmark for later!

5. Adding Tools and Capabilities

Tools are what transform an LLM from a text generator into a capable agent. Let's explore how to add more sophisticated capabilities.

Tool Design Principles

1. Clear, Specific Descriptions

The LLM decides when to use tools based on descriptions. Be explicit about:

What the tool does
When to use it (and when not to)
What parameters mean
What the output looks like

❌ Bad tool description

"name": "search", "description": "Searches for stuff"

✅ Good tool description

"name": "web_search", "description": "Search the web using Brave Search API. Use for questions about current events, facts you're uncertain about, or when the user explicitly asks to look something up. Returns titles, URLs, and snippets for the top results. Not suitable for accessing specific websites—use browser tools for that."

2. Appropriate Granularity

Tools should be atomic enough to be composable, but not so granular that simple tasks require many calls:

Too granular: separate tools for "open_file", "read_line", "close_file"
Too coarse: one tool that "researches a topic and writes a report"
Right level: "read_file" that handles opening, reading, and returns content

3. Meaningful Error Messages

When tools fail, return errors the LLM can act on:

# Bad: unhelpful error
return {"error": "Failed"}

# Good: actionable error
return {
    "error": "File not found",
    "details": f"No file at path '{path}'",
    "suggestion": "Check if the path is correct or use list_files to see available files"
}

Common Tool Categories

Information Retrieval Tools

retrieval_tools.py

# Web search with multiple providers
async def web_search(query: str, provider: str = "brave") -> dict:
    """Multi-provider web search."""
    if provider == "brave":
        return await brave_search(query)
    elif provider == "serper":
        return await serper_search(query)
    elif provider == "tavily":
        return await tavily_search(query)  # Good for AI-optimized results

# URL content fetching
async def fetch_url(url: str, extract_mode: str = "markdown") -> dict:
    """Fetch and extract content from a URL."""
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            html = await resp.text()
    
    if extract_mode == "markdown":
        # Convert HTML to readable markdown
        content = html_to_markdown(html)
    elif extract_mode == "text":
        content = extract_text(html)
    else:
        content = html
    
    return {
        "url": url,
        "content": content[:50000],  # Limit size
        "truncated": len(content) > 50000
    }

# Database queries (read-only!)
def query_database(sql: str, database: str = "analytics") -> dict:
    """Execute read-only SQL query."""
    # Validate query is SELECT only
    if not sql.strip().upper().startswith("SELECT"):
        return {"error": "Only SELECT queries allowed"}
    
    conn = get_connection(database)
    try:
        results = conn.execute(sql).fetchall()
        return {"columns": [d[0] for d in conn.description], "rows": results}
    except Exception as e:
        return {"error": str(e)}

Communication Tools

communication_tools.py

# Email with confirmation
async def send_email(
    to: str, 
    subject: str, 
    body: str, 
    confirmed: bool = False
) -> dict:
    """Send email. Requires confirmation for safety."""
    if not confirmed:
        return {
            "needs_confirmation": True,
            "preview": {
                "to": to,
                "subject": subject,
                "body_preview": body[:200] + "..." if len(body) > 200 else body
            },
            "message": "Please confirm you want to send this email."
        }
    
    # Actually send
    result = await email_client.send(to=to, subject=subject, body=body)
    return {"success": True, "message_id": result.id}

# Slack/Discord messaging
async def send_message(
    channel: str, 
    message: str, 
    platform: str = "slack"
) -> dict:
    """Send message to team chat."""
    if platform == "slack":
        return await slack_client.post(channel=channel, text=message)
    elif platform == "discord":
        return await discord_client.send(channel_id=channel, content=message)

# Calendar integration
async def create_calendar_event(
    title: str,
    start_time: str,
    end_time: str,
    attendees: list = None,
    confirmed: bool = False
) -> dict:
    """Create calendar event. Requires confirmation."""
    if not confirmed:
        return {
            "needs_confirmation": True,
            "preview": {"title": title, "start": start_time, "end": end_time},
            "message": "Please confirm you want to create this event."
        }
    
    event = await calendar_client.create_event(
        title=title,
        start=start_time,
        end=end_time,
        attendees=attendees or []
    )
    return {"success": True, "event_id": event.id, "link": event.html_link}

Code Execution Tools

code_tools.py

import subprocess
import tempfile
from pathlib import Path

async def execute_python(code: str, timeout: int = 30) -> dict:
    """Execute Python code in sandboxed environment."""
    # Create temp file
    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
        f.write(code)
        temp_path = f.name
    
    try:
        # Run with timeout and restricted permissions
        result = subprocess.run(
            ['python', temp_path],
            capture_output=True,
            text=True,
            timeout=timeout,
            cwd=tempfile.gettempdir(),  # Isolated directory
        )
        
        return {
            "stdout": result.stdout,
            "stderr": result.stderr,
            "return_code": result.returncode
        }
    except subprocess.TimeoutExpired:
        return {"error": f"Execution timed out after {timeout}s"}
    finally:
        Path(temp_path).unlink()  # Clean up

async def execute_shell(command: str, confirmed: bool = False) -> dict:
    """Execute shell command. DANGEROUS - requires confirmation."""
    # Blacklist dangerous commands
    dangerous = ['rm -rf', 'sudo', 'mkfs', 'dd if=', '> /dev']
    if any(d in command for d in dangerous):
        return {"error": "Command blocked for safety"}
    
    if not confirmed:
        return {
            "needs_confirmation": True,
            "command": command,
            "message": "Please confirm you want to run this shell command."
        }
    
    result = subprocess.run(
        command,
        shell=True,
        capture_output=True,
        text=True,
        timeout=60
    )
    
    return {
        "stdout": result.stdout,
        "stderr": result.stderr,
        "return_code": result.returncode
    }

Browser Automation Tools

browser_tools.py

from playwright.async_api import async_playwright

class BrowserTool:
    def __init__(self):
        self.browser = None
        self.page = None
    
    async def initialize(self):
        """Start browser instance."""
        playwright = await async_playwright().start()
        self.browser = await playwright.chromium.launch(headless=True)
        self.page = await self.browser.new_page()
    
    async def navigate(self, url: str) -> dict:
        """Navigate to URL and return page content."""
        await self.page.goto(url, wait_until="networkidle")
        
        # Extract readable content
        content = await self.page.evaluate("""
            () => {
                const article = document.querySelector('article') || document.body;
                return article.innerText;
            }
        """)
        
        return {
            "url": url,
            "title": await self.page.title(),
            "content": content[:30000]
        }
    
    async def screenshot(self, path: str = None) -> dict:
        """Take screenshot of current page."""
        if not path:
            path = f"screenshot_{int(time.time())}.png"
        await self.page.screenshot(path=path, full_page=True)
        return {"path": path}
    
    async def click(self, selector: str) -> dict:
        """Click element by selector."""
        try:
            await self.page.click(selector, timeout=5000)
            return {"success": True}
        except Exception as e:
            return {"error": str(e)}
    
    async def fill(self, selector: str, text: str) -> dict:
        """Fill input field."""
        try:
            await self.page.fill(selector, text)
            return {"success": True}
        except Exception as e:
            return {"error": str(e)}
    
    async def get_page_structure(self) -> dict:
        """Get accessible page structure for agent reasoning."""
        structure = await self.page.evaluate("""
            () => {
                function getAccessibleTree(el, depth = 0) {
                    if (depth > 3) return null;
                    const nodes = [];
                    for (const child of el.children) {
                        const role = child.getAttribute('role') || child.tagName.toLowerCase();
                        const text = child.innerText?.slice(0, 50);
                        if (['a', 'button', 'input', 'select', 'textarea'].includes(role) || 
                            child.getAttribute('role')) {
                            nodes.push({
                                role,
                                text,
                                selector: child.id ? '#' + child.id : null
                            });
                        }
                        const childNodes = getAccessibleTree(child, depth + 1);
                        if (childNodes) nodes.push(...childNodes);
                    }
                    return nodes;
                }
                return getAccessibleTree(document.body);
            }
        """)
        return {"interactive_elements": structure}

Tool Safety Patterns

Safety is critical. Here are patterns we use with Axis:

1. Confirmation for Dangerous Actions

def requires_confirmation(tool_name: str, params: dict) -> bool:
    """Determine if action needs human approval."""
    # Always confirm external communications
    if tool_name in ['send_email', 'post_tweet', 'send_slack']:
        return True
    
    # Always confirm financial actions
    if tool_name in ['make_purchase', 'transfer_funds']:
        return True
    
    # Confirm destructive file operations
    if tool_name == 'delete_file':
        return True
    
    # Confirm shell commands
    if tool_name == 'execute_shell':
        return True
    
    return False

2. Allowlists Over Blocklists

# Bad: trying to block dangerous things
BLOCKED_COMMANDS = ['rm', 'sudo', 'wget', ...]  # Will miss something

# Good: explicitly allow safe things
ALLOWED_COMMANDS = ['ls', 'cat', 'grep', 'find', 'wc', 'head', 'tail']

def is_safe_command(command: str) -> bool:
    cmd = command.split()[0]
    return cmd in ALLOWED_COMMANDS

3. Rate Limiting

from collections import defaultdict
from time import time

class RateLimiter:
    def __init__(self):
        self.calls = defaultdict(list)
    
    def check(self, tool: str, limit: int, window: int) -> bool:
        """Check if tool call is within rate limit."""
        now = time()
        # Remove old calls outside window
        self.calls[tool] = [t for t in self.calls[tool] if now - t < window]
        
        if len(self.calls[tool]) >= limit:
            return False
        
        self.calls[tool].append(now)
        return True

rate_limiter = RateLimiter()

# Usage
async def execute_tool(name: str, params: dict):
    # Limit expensive operations
    if name == "web_search" and not rate_limiter.check("web_search", 10, 60):
        return {"error": "Rate limit exceeded. Max 10 searches per minute."}
    ...

4. Sandboxing

# File operations restricted to workspace
class SandboxedFileSystem:
    def __init__(self, root: Path):
        self.root = root.resolve()
    
    def validate_path(self, path: str) -> Path:
        """Ensure path is within sandbox."""
        resolved = (self.root / path).resolve()
        if not str(resolved).startswith(str(self.root)):
            raise PermissionError(f"Access denied: {path} is outside workspace")
        return resolved
    
    def read(self, path: str) -> str:
        safe_path = self.validate_path(path)
        return safe_path.read_text()
    
    def write(self, path: str, content: str):
        safe_path = self.validate_path(path)
        safe_path.parent.mkdir(parents=True, exist_ok=True)
        safe_path.write_text(content)

⚠️ Security is Non-Negotiable

Every tool you add is a potential attack vector. Assume the LLM might be tricked into misusing tools (prompt injection). Defense in depth: validate inputs, confirm dangerous actions, sandbox execution, log everything.

6. Memory and Context Management

Memory is what transforms a stateless text generator into something that feels intelligent over time. But it's also one of the hardest problems in agent design.

The Memory Challenge

Context windows are limited. Even Claude's 200K tokens fill up quickly with:

System prompts and tool definitions (~2-5K tokens)
Conversation history (grows with each turn)
Retrieved documents and search results
Tool call inputs and outputs

You need strategies for:

Deciding what goes into the context window
Storing information that doesn't fit
Retrieving relevant information when needed
Forgetting information that's no longer useful

Memory Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     CONTEXT WINDOW                               │
│   System Prompt + Recent History + Retrieved Memories            │
│                      (Token Limited)                             │
└─────────────────────────────────────────────────────────────────┘
                              ▲
                              │ Retrieval
         ┌────────────────────┼────────────────────┐
         │                    │                    │
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│  WORKING MEMORY │  │  CONVERSATION   │  │   LONG-TERM     │
│                 │  │     STORE       │  │    MEMORY       │
│ Current task    │  │                 │  │                 │
│ Active goals    │  │ Full history    │  │ User prefs      │
│ Temp variables  │  │ Summaries       │  │ Learned facts   │
│                 │  │                 │  │ Procedures      │
│ (In-memory)     │  │ (Database)      │  │ (Vector store)  │
└─────────────────┘  └─────────────────┘  └─────────────────┘

Strategy 1: Conversation Summarization

Instead of keeping full conversation history, periodically summarize:

summarization.py

class ConversationMemory:
    def __init__(self, llm_client, max_messages: int = 20):
        self.llm = llm_client
        self.max_messages = max_messages
        self.messages = []
        self.summaries = []
    
    def add_message(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})
        
        # Summarize when history gets too long
        if len(self.messages) > self.max_messages:
            self._summarize_oldest()
    
    def _summarize_oldest(self):
        """Summarize oldest messages and remove them."""
        # Take first half of messages
        to_summarize = self.messages[:self.max_messages // 2]
        
        summary = self.llm.chat([{
            "role": "user",
            "content": f"""Summarize this conversation concisely, preserving key facts, 
            decisions, and context needed for continuation:
            
            {self._format_messages(to_summarize)}"""
        }])
        
        self.summaries.append({
            "timestamp": datetime.now(),
            "summary": summary,
            "message_count": len(to_summarize)
        })
        
        # Remove summarized messages
        self.messages = self.messages[self.max_messages // 2:]
    
    def get_context(self) -> list:
        """Get context for LLM, including summaries."""
        context = []
        
        # Include summaries of older conversations
        if self.summaries:
            summary_text = "\n\n".join(
                f"[Earlier: {s['summary']}]" for s in self.summaries[-3:]  # Last 3 summaries
            )
            context.append({
                "role": "user",
                "content": f"Previous conversation context:\n{summary_text}"
            })
        
        # Include recent messages
        context.extend(self.messages)
        
        return context

Strategy 2: Semantic Retrieval (RAG)

Store memories as embeddings and retrieve relevant ones:

semantic_memory.py

import numpy as np
from typing import List, Tuple

class SemanticMemory:
    def __init__(self, embedding_model):
        self.embedding_model = embedding_model
        self.memories = []  # List of (text, embedding, metadata)
    
    def store(self, text: str, metadata: dict = None):
        """Store a memory with its embedding."""
        embedding = self.embedding_model.embed(text)
        self.memories.append({
            "text": text,
            "embedding": embedding,
            "metadata": metadata or {},
            "timestamp": datetime.now()
        })
    
    def retrieve(self, query: str, top_k: int = 5) -> List[dict]:
        """Retrieve most relevant memories."""
        query_embedding = self.embedding_model.embed(query)
        
        # Calculate similarities
        scored = []
        for memory in self.memories:
            similarity = self._cosine_similarity(
                query_embedding, 
                memory["embedding"]
            )
            scored.append((similarity, memory))
        
        # Return top-k
        scored.sort(reverse=True, key=lambda x: x[0])
        return [
            {"score": score, **memory} 
            for score, memory in scored[:top_k]
        ]
    
    def _cosine_similarity(self, a: np.ndarray, b: np.ndarray) -> float:
        return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Usage with OpenAI embeddings
from openai import OpenAI

class OpenAIEmbedding:
    def __init__(self):
        self.client = OpenAI()
    
    def embed(self, text: str) -> np.ndarray:
        response = self.client.embeddings.create(
            model="text-embedding-3-small",
            input=text
        )
        return np.array(response.data[0].embedding)

# Initialize
memory = SemanticMemory(OpenAIEmbedding())

# Store memories
memory.store("User prefers concise responses", {"type": "preference"})
memory.store("User's company is in the cannabis industry", {"type": "fact"})
memory.store("Last project was building an inventory system", {"type": "history"})

# Retrieve relevant memories
relevant = memory.retrieve("What kind of business does the user have?")
# Returns the cannabis industry memory

Strategy 3: Structured Memory Files

OpenClaw uses a file-based approach that's simple but effective:

workspace/MEMORY.md

# Long-Term Memory

## User Preferences
- Prefers concise, actionable responses
- Usually works US Pacific time zone
- Technical background (can show code)

## Important Facts
- Company: As Above Technologies
- Industry: AI/Software
- Key product: OpenClaw agent platform

## Learned Procedures
- For email drafts: always ask about tone before drafting
- For research: provide sources and confidence levels
- For code: include comments and explain reasoning

## Recent Decisions
- 2026-01-15: Decided to use Claude Opus 4 for complex reasoning tasks
- 2026-01-20: Established daily standup format for project updates

## Ongoing Projects
- Building customer onboarding automation
- Writing technical documentation for API

The agent loads this file into context at session start and updates it when learning new information:

memory_management.py

class FileBasedMemory:
    def __init__(self, workspace: Path):
        self.memory_file = workspace / "MEMORY.md"
        self.daily_file = workspace / f"memory/{datetime.now().strftime('%Y-%m-%d')}.md"
    
    def load_context(self) -> str:
        """Load memory for session start."""
        context = ""
        
        # Long-term memory
        if self.memory_file.exists():
            context += f"## Long-term Memory\n{self.memory_file.read_text()}\n\n"
        
        # Recent daily notes
        memory_dir = self.workspace / "memory"
        if memory_dir.exists():
            recent_files = sorted(memory_dir.glob("*.md"))[-3:]  # Last 3 days
            for f in recent_files:
                context += f"## Notes from {f.stem}\n{f.read_text()}\n\n"
        
        return context
    
    def append_daily(self, note: str):
        """Add to today's memory file."""
        self.daily_file.parent.mkdir(exist_ok=True)
        with open(self.daily_file, "a") as f:
            f.write(f"\n- {datetime.now().strftime('%H:%M')}: {note}")
    
    def update_long_term(self, section: str, content: str):
        """Update a section of long-term memory."""
        if not self.memory_file.exists():
            self.memory_file.write_text(f"# Long-Term Memory\n\n## {section}\n{content}")
            return
        
        current = self.memory_file.read_text()
        # Find and update section, or append
        # (Implementation details omitted for brevity)

Strategy 4: Hierarchical Memory

Different information has different lifespans and access patterns:

hierarchical_memory.py

class HierarchicalMemory:
    """
    Tier 1: Always in context (user prefs, core facts)
    Tier 2: Retrieved on relevance (episodic memory)
    Tier 3: Retrieved on explicit request (archived knowledge)
    """
    
    def __init__(self):
        self.tier1_always = {}      # Small, critical info
        self.tier2_indexed = []     # Semantic search
        self.tier3_archived = {}    # Keyword lookup
    
    def build_context(self, query: str) -> str:
        """Build memory context for a query."""
        context_parts = []
        
        # Always include tier 1
        if self.tier1_always:
            context_parts.append("Core context:")
            for key, value in self.tier1_always.items():
                context_parts.append(f"- {key}: {value}")
        
        # Retrieve relevant tier 2
        relevant = self._semantic_search(query, self.tier2_indexed, top_k=5)
        if relevant:
            context_parts.append("\nRelevant memories:")
            for mem in relevant:
                context_parts.append(f"- {mem['text']}")
        
        # Tier 3 only if explicitly referenced
        # (agent can request via tool)
        
        return "\n".join(context_parts)
    
    def promote_memory(self, memory_id: str, to_tier: int):
        """Move memory between tiers based on access patterns."""
        # Frequently accessed tier 2 -> tier 1
        # Rarely accessed tier 2 -> tier 3
        pass
    
    def consolidate(self):
        """Periodically consolidate and reorganize memories."""
        # Merge similar memories
        # Archive old unused memories
        # Update tier 1 based on importance
        pass

The Axis Memory System

Here's how memory actually works in Axis:

💡 Axis Memory Architecture

SOUL.md: Permanent personality, values, behavioral guidelines
USER.md: Information about the human(s) Axis works with
MEMORY.md: Curated long-term memory (manually and auto-updated)
memory/YYYY-MM-DD.md: Daily logs of significant events
TOOLS.md: Environment-specific information (API keys location, server names)
Conversation history: Last 20-30 turns, summarized when longer

The key insight: treat memory like a well-organized filing system, not a database. The agent can read and update these files, creating a form of self-modifying memory that persists across sessions and is human-readable for debugging.

7. Deployment Options

You've built an agent. Now where does it run? The deployment choice affects reliability, cost, latency, and what's possible.

Option 1: Local Development Machine

Simplest Not for Production

Run the agent on your laptop/desktop. Good for development and personal use.

Pros: Zero infrastructure cost, easy debugging, fast iteration
Cons: Only works when computer is on, no remote access, can't scale
Good for: Development, personal assistant, testing

local_run.sh

# Simple local deployment
python agent.py

# Or with auto-reload for development
watchmedo auto-restart --patterns="*.py" -- python agent.py

Option 2: Cloud VM (Always-On)

Moderate Complexity Reliable

Run on a cloud server that's always available. The most common production choice.

Provider	Smallest Useful Instance	Monthly Cost
DigitalOcean	2GB RAM, 1 vCPU	~$12
AWS EC2	t3.small (2GB)	~$15
Google Cloud	e2-small (2GB)	~$13
Hetzner	CX11 (2GB)	~$4

deploy_vm.sh

# Setup on fresh Ubuntu VM

# Install dependencies
sudo apt update && sudo apt install -y python3.11 python3.11-venv

# Create project directory
mkdir -p ~/agent && cd ~/agent

# Setup virtual environment
python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Create systemd service for auto-restart
sudo tee /etc/systemd/system/agent.service << EOF
[Unit]
Description=AI Agent Service
After=network.target

[Service]
Type=simple
User=$USER
WorkingDirectory=$HOME/agent
Environment="PATH=$HOME/agent/venv/bin"
ExecStart=$HOME/agent/venv/bin/python agent.py
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

# Enable and start
sudo systemctl enable agent
sudo systemctl start agent

# Check logs
sudo journalctl -u agent -f

Option 3: Container Deployment

Moderate Complexity Scalable

Package your agent as a Docker container for portability and scaling.

Dockerfile

FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Create workspace directory
RUN mkdir -p /app/workspace

# Run agent
CMD ["python", "agent.py"]

docker-compose.yml

version: '3.8'

services:
  agent:
    build: .
    restart: always
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - BRAVE_API_KEY=${BRAVE_API_KEY}
    volumes:
      - ./workspace:/app/workspace  # Persist workspace
      - ./logs:/app/logs
    ports:
      - "8080:8080"  # If exposing HTTP API

Option 4: Serverless / Functions

Pay-per-use Complex State

Run agent logic in serverless functions. Good for event-driven agents with external state management.

Pros: Scale to zero, pay only for usage, no server management
Cons: Cold starts, execution time limits, state must be external
Good for: Webhook handlers, scheduled tasks, low-volume agents

lambda_handler.py

import json
from agent import SimpleAgent
import boto3

# Use DynamoDB for state
dynamodb = boto3.resource('dynamodb')
state_table = dynamodb.Table('agent-state')

def handler(event, context):
    """AWS Lambda handler for agent requests."""
    
    # Parse input
    body = json.loads(event.get('body', '{}'))
    user_id = body.get('user_id')
    message = body.get('message')
    
    # Load state
    state = state_table.get_item(Key={'user_id': user_id}).get('Item', {})
    
    # Initialize agent with state
    agent = SimpleAgent()
    agent.conversation_history = state.get('history', [])
    
    # Process message
    response = agent.chat(message)  # Note: need to make this sync for Lambda
    
    # Save state
    state_table.put_item(Item={
        'user_id': user_id,
        'history': agent.conversation_history[-20:]  # Keep last 20
    })
    
    return {
        'statusCode': 200,
        'body': json.dumps({'response': response})
    }

Option 5: Platform-as-a-Service

Easiest Ops Higher Cost

Use a platform designed for agent deployment. Trade flexibility for convenience.

Platform	Focus	Starting Cost
Railway	General Python apps	$5/mo + usage
Render	Web services	$7/mo
Modal	AI/ML workloads	Pay-per-use
LangServe	LangChain agents	Varies

How Axis Is Deployed

🚀 Axis Production Architecture

Runtime

OpenClaw on dedicated VM (8GB RAM, 4 vCPU)
Handles the main agent loop, tool execution, memory

Storage

Local SSD for workspace files + PostgreSQL for structured data
Memory files on disk, analytics in DB

Interfaces

Web chat (primary), Discord (team), CLI (admin), API (integrations)
Multiple ways to interact based on context

Monitoring

Custom dashboard + alerts via Slack
Track costs, errors, usage patterns

Backup

Daily workspace snapshots to S3
Memory is critical—never lose it

8. Real Example: How We Built Axis

Theory is nice, but nothing teaches like real experience. Here's the actual story of building Axis, the AI agent that runs significant portions of As Above Technologies.

The Beginning: January 2025

We didn't set out to build an agent framework. We needed a capable AI assistant for our own operations—managing multiple business units, handling customer inquiries, monitoring systems, creating content. The existing tools (ChatGPT, basic automation) weren't cutting it.

Initial requirements:

Persistent memory across sessions (critical for business context)
Tool use: web search, file operations, system commands
Multiple interfaces: chat, Discord, scheduled tasks
Safety guardrails: never send external communications without confirmation
Cost efficiency: can't spend $500/day on API calls

Phase 1: Proof of Concept (Weeks 1-2)

Started with a minimal implementation: Claude API + file system tools + basic conversation memory. No framework, just ~500 lines of Python.

First version (simplified)

class Axis:
    def __init__(self):
        self.client = anthropic.Anthropic()
        self.history = []
        self.workspace = Path("./workspace")
        
    def chat(self, message):
        # Load context files
        context = self._load_context_files()
        
        # Call Claude with basic tools
        response = self.client.messages.create(
            model="claude-3-opus",
            system=f"You are Axis. Context:\n{context}",
            messages=self.history + [{"role": "user", "content": message}],
            tools=[read_file_tool, write_file_tool, search_tool]
        )
        
        # Handle tool calls...
        return self._process_response(response)

What we learned:

Simple works. Don't over-engineer before you understand the problem.
The workspace file pattern (SOUL.md, MEMORY.md) emerged naturally.
Claude was better than GPT-4 for our use case (longer context, better instruction following).

Phase 2: Tool Explosion (Weeks 3-6)

Once the basics worked, we kept adding tools. Every "I wish Axis could..." became a new tool.

Tools added:

Web search (Brave API)
URL fetching and content extraction
Email reading and drafting (Gmail API)
Calendar management (Google Calendar)
Browser automation (Playwright)
Code execution (sandboxed Python)
Database queries (read-only)
Slack/Discord messaging
Image generation (DALL-E)
Business-specific integrations (inventory, CRM)

What we learned:

Tool descriptions matter enormously. Spent as much time on descriptions as implementation.
Confirmation flows are essential. Axis almost sent emails it shouldn't have several times.
Tool sprawl is real. Started organizing tools into categories, only loading relevant ones.

⚠️ The Email Incident

Week 4: Axis drafted an email to a customer and, due to a bug in confirmation flow, actually sent it. The email was fine—polite, accurate, helpful. But we hadn't reviewed it. The customer was happy; we were terrified. Added multiple confirmation layers after that.

Phase 3: Memory Evolution (Weeks 6-10)

Early Axis had goldfish memory. Each day felt like meeting a new assistant. We iterated on memory extensively:

Version 1: Full conversation history in context
Problem: Context window fills up, expensive, irrelevant old conversations
Version 2: Automatic summarization
Problem: Lost important details in summaries
Version 3: Hybrid file + conversation system
Solution: MEMORY.md for curated important facts, conversation history for recent context
Version 4: Daily notes + long-term memory
Current: Daily logs capture everything, MEMORY.md is curated highlights

The breakthrough was realizing memory should be editable by both human and agent. When Axis learns something important, it can write to MEMORY.md. When it gets something wrong, we can correct it directly.

Phase 4: Multi-Model Strategy (Weeks 10-14)

Using Claude Opus for everything was expensive ($75/M output tokens). We implemented model routing:

model_routing.py

def select_model(task_type: str, complexity: str) -> str:
    """Route to appropriate model based on task."""
    
    if task_type == "simple_question":
        return "claude-3-5-sonnet-20241022"  # Fast, cheap
    
    elif task_type == "complex_reasoning":
        return "claude-opus-4-20250514"  # Best quality
    
    elif task_type == "code_generation":
        return "claude-sonnet-4-20250514"  # Good at code
    
    elif task_type == "image_analysis":
        return "gpt-4o"  # Strong vision
    
    else:
        return "claude-3-5-sonnet-20241022"  # Default: balanced

Result: 60% cost reduction while maintaining quality for complex tasks.

Phase 5: Proactive Behavior (Weeks 14-20)

Axis was helpful when asked, but we wanted proactive assistance. Enter the heartbeat system:

Every 30 minutes, Axis receives a "heartbeat" prompt
Checks HEARTBEAT.md for scheduled tasks
Can initiate actions: check email, review calendar, monitor systems
Notifies humans of important items

HEARTBEAT.md

# Heartbeat Checklist

## Every heartbeat
- Check for urgent emails (spam filter: ignore marketing)
- Verify production systems are healthy

## Morning (first heartbeat after 8am)
- Summarize calendar for the day
- Check overnight messages

## Every 4 hours
- Review project status in GitHub
- Check analytics dashboards

## Flags
- lastEmailCheck: 2026-01-28T14:30:00
- lastSystemCheck: 2026-01-28T14:45:00

Current State: Axis in 2026

After a year of development and daily use, here's where Axis stands:

Metric	Value
Daily interactions	50-100 messages
Tool calls per day	200-400
Active tools	32
Memory files	~400 daily logs, 1 MEMORY.md (~15KB)
Uptime	99.7% (excluding planned maintenance)
Monthly API cost	$150-300 (varies with usage)
Estimated time saved	60-80 hours/month

What Axis does regularly:

Answers customer inquiries (draft → human approval → send)
Researches markets and competitors
Monitors systems and alerts on issues
Drafts content (articles, emails, documentation)
Manages calendar and meeting prep
Updates project documentation
Queries business databases for reports
Helps debug code and systems

✅ The Biggest Win

The compound effect of persistent memory. Axis now knows our business deeply—customer names, product history, past decisions, learned preferences. It's not starting from zero each interaction. This accumulated context is worth more than any individual capability.

9. Common Pitfalls and How to Avoid Them

We made many mistakes building Axis. Here's what to watch for:

Pitfall 1: Over-Engineering Before Understanding

The trap: Building elaborate infrastructure before proving the basic concept works. Spending weeks on a perfect memory system before having a useful agent.

How to avoid:

Start with the simplest possible implementation
Add complexity only when you hit real limitations
Get something working in days, not weeks
Iterate based on actual usage, not anticipated needs

Pitfall 2: Tool Definition Neglect

The trap: Writing clear code but vague tool descriptions. The LLM doesn't see your code, only the descriptions.

How to avoid:

Treat tool descriptions as user documentation
Include when to use AND when not to use
Specify expected inputs and outputs
Test with edge cases—does the LLM choose the right tool?

Tool description checklist

# For each tool, answer:
# 1. What does it do? (one sentence)
# 2. When should the agent use it? (specific scenarios)
# 3. When should the agent NOT use it? (common mistakes)
# 4. What parameters does it need? (with examples)
# 5. What does it return? (success and error cases)

Pitfall 3: Insufficient Safety Guardrails

The trap: Trusting the LLM to be careful. It's not malicious, but it can be confidently wrong.

How to avoid:

Default to requiring confirmation for external actions
Implement rate limiting on expensive/risky operations
Sandbox file and code execution
Log everything for debugging and auditing
Test with adversarial prompts (prompt injection attempts)

Pitfall 4: Context Window Stuffing

The trap: Putting everything possible into context, assuming more information is always better.

How to avoid:

Be selective about what goes in context
Use retrieval for relevant information rather than including everything
Monitor token usage and optimize
Test how the agent performs with minimal vs. maximal context

Pitfall 5: Ignoring Cost Management

The trap: Not monitoring API costs until you get a surprise bill.

How to avoid:

Set up cost alerts from day one
Track tokens per request and per day
Use cheaper models for simple tasks
Implement caching for repeated queries
Set hard limits on daily spend

cost_tracking.py

class CostTracker:
    PRICES = {
        "claude-opus-4-20250514": {"input": 0.015, "output": 0.075},
        "claude-sonnet-4-20250514": {"input": 0.003, "output": 0.015},
        "claude-3-5-sonnet-20241022": {"input": 0.003, "output": 0.015},
        "gpt-4o": {"input": 0.0025, "output": 0.01},
    }
    
    def __init__(self, daily_limit: float = 10.0):
        self.daily_limit = daily_limit
        self.daily_spend = 0.0
        self.last_reset = datetime.now().date()
    
    def track(self, model: str, input_tokens: int, output_tokens: int):
        # Reset daily counter
        if datetime.now().date() != self.last_reset:
            self.daily_spend = 0.0
            self.last_reset = datetime.now().date()
        
        # Calculate cost
        prices = self.PRICES.get(model, {"input": 0.01, "output": 0.03})
        cost = (input_tokens * prices["input"] + output_tokens * prices["output"]) / 1000
        self.daily_spend += cost
        
        # Alert if approaching limit
        if self.daily_spend > self.daily_limit * 0.8:
            self._send_alert(f"Approaching daily limit: ${self.daily_spend:.2f}/${self.daily_limit}")
        
        # Block if over limit
        if self.daily_spend > self.daily_limit:
            raise CostLimitExceeded(f"Daily limit of ${self.daily_limit} exceeded")
        
        return cost

Pitfall 6: Hallucination Blind Trust

The trap: Assuming the agent's output is accurate, especially for facts.

How to avoid:

For factual queries, always require tool use (web search) over training data
Ask the agent to cite sources when possible
Implement verification for critical information
Human review for anything published or sent externally

Pitfall 7: Poor Error Handling

The trap: Agent crashes or gets stuck when tools fail or return unexpected results.

How to avoid:

Every tool should have clear error responses
Implement retry logic with backoff for transient failures
Give the agent guidance on what to do when tools fail
Set maximum tool call attempts per request

Error handling pattern

async def execute_with_recovery(tool_name: str, params: dict, max_retries: int = 3):
    """Execute tool with retry and fallback."""
    last_error = None
    
    for attempt in range(max_retries):
        try:
            result = await execute_tool(tool_name, params)
            if result.get("success") or "error" not in result:
                return result
            last_error = result.get("error")
        except Exception as e:
            last_error = str(e)
            await asyncio.sleep(2 ** attempt)  # Exponential backoff
    
    # Return informative error for LLM
    return {
        "error": last_error,
        "suggestion": f"Tool '{tool_name}' failed after {max_retries} attempts. Consider an alternative approach.",
        "attempted_params": params
    }

💡 The Meta-Lesson

Most pitfalls come from treating the LLM as either too smart or too dumb. It's neither. It's a powerful but imperfect tool that needs guardrails, clear instructions, and human oversight for high-stakes operations.

10. Cost Considerations and Scaling

AI agents aren't free. Understanding costs helps you build sustainably and scale wisely.

Understanding API Costs

API pricing is per token (roughly 4 characters = 1 token). Input tokens (what you send) are cheaper than output tokens (what you receive).

Model	Input ($/1M)	Output ($/1M)	Typical Request Cost
GPT-4o	$2.50	$10.00	$0.01-0.05
Claude 3.5 Sonnet	$3.00	$15.00	$0.02-0.08
Claude Sonnet 4	$3.00	$15.00	$0.02-0.08
Claude Opus 4	$15.00	$75.00	$0.10-0.50
GPT-4 Turbo	$10.00	$30.00	$0.05-0.20

Cost Breakdown for Typical Agent

Here's what Axis typically costs per month:

💰 Monthly Cost Breakdown (Axis)

LLM API

$120-250/month
~70% Sonnet, ~25% Opus, ~5% GPT-4o for vision

Search API

$15-30/month
Brave Search API for web queries

Embedding

$5-10/month
OpenAI embeddings for semantic search

Compute

$30/month
VPS hosting for 24/7 operation

Total

$170-320/month
Varies with usage intensity

Cost Optimization Strategies

1. Model Routing

Use expensive models only when needed:

# Simple classification to route requests
def classify_complexity(message: str) -> str:
    """Quick heuristic for task complexity."""
    
    # Simple patterns → cheap model
    simple_patterns = [
        r'^(what|when|where|who) is',  # Basic questions
        r'^(hi|hello|hey)',             # Greetings
        r'^(thanks|thank you)',         # Acknowledgments
    ]
    
    for pattern in simple_patterns:
        if re.match(pattern, message.lower()):
            return "simple"
    
    # Complex indicators → expensive model
    complex_indicators = [
        "analyze", "compare", "explain why", "strategy",
        "write a", "draft", "create", "plan"
    ]
    
    if any(ind in message.lower() for ind in complex_indicators):
        return "complex"
    
    return "medium"  # Default

2. Caching

Cache responses for repeated queries:

import hashlib
from functools import lru_cache

class ResponseCache:
    def __init__(self, ttl_seconds: int = 3600):
        self.cache = {}
        self.ttl = ttl_seconds
    
    def get_key(self, messages: list, tools: list) -> str:
        """Generate cache key from request."""
        content = str(messages) + str(tools)
        return hashlib.sha256(content.encode()).hexdigest()
    
    def get(self, key: str) -> dict | None:
        """Get cached response if valid."""
        if key in self.cache:
            entry = self.cache[key]
            if time.time() - entry["timestamp"] < self.ttl:
                return entry["response"]
            del self.cache[key]
        return None
    
    def set(self, key: str, response: dict):
        """Cache a response."""
        self.cache[key] = {
            "response": response,
            "timestamp": time.time()
        }

3. Context Pruning

Minimize tokens in context:

def optimize_context(context: str, max_tokens: int = 10000) -> str:
    """Reduce context size while preserving information."""
    
    # Remove excessive whitespace
    context = re.sub(r'\n\s*\n', '\n\n', context)
    context = re.sub(r'  +', ' ', context)
    
    # Truncate if still too long
    tokens = estimate_tokens(context)
    if tokens > max_tokens:
        # Keep beginning and end, summarize middle
        lines = context.split('\n')
        keep_start = len(lines) // 4
        keep_end = len(lines) // 4
        middle = lines[keep_start:-keep_end]
        
        context = '\n'.join(
            lines[:keep_start] + 
            [f"[... {len(middle)} lines summarized ...]"] +
            lines[-keep_end:]
        )
    
    return context

4. Prompt Optimization

Shorter prompts = lower costs:

Remove redundant instructions
Use concise tool descriptions
Avoid repeating information in system prompt and messages
Load tool definitions dynamically (only include relevant tools)

Scaling Considerations

As usage grows, consider these patterns:

Scale	Requests/Day	Architecture	Estimated Cost
Personal	10-50	Single instance	$30-100/mo
Small Team	100-500	Single instance + queue	$150-400/mo
Business	1,000-5,000	Load balanced + workers	$500-2,000/mo
Enterprise	10,000+	Distributed + local models	$2,000+/mo

When to Consider Local Models

Running models locally (Llama 3, Mixtral) makes sense when:

High volume: >$500/month in API costs
Privacy critical: Data can't leave your infrastructure
Latency sensitive: Need <100ms response times
Specific fine-tuning: Need a model trained on your data

Hardware requirements for local deployment:

Model	VRAM Required	Approximate Hardware Cost
Llama 3 8B	16GB	$400-800 (RTX 4080)
Llama 3 70B	40-80GB	$2,000-8,000 (A100/multi-GPU)
Mixtral 8x7B	24-48GB	$1,000-3,000

💡 The Hybrid Approach

Many production systems use a hybrid: local models for high-volume, simple tasks (classification, embedding, basic Q&A), cloud APIs for complex reasoning. This balances cost and capability.

Conclusion: Your Path Forward

You now have a comprehensive understanding of how to build AI agents, from the fundamental architecture to production deployment. But knowledge without action is just entertainment. Here's your concrete path forward:

🎯 Your 30-Day Action Plan

Week 1

Build the basics
Get the simple agent from Section 4 running. Add one custom tool. Experience the fundamentals firsthand.

Week 2

Add memory and tools
Implement file-based memory. Add 2-3 tools relevant to your use case. Start tracking costs.

Week 3

Deploy and use daily
Put it on a server. Make it accessible. Use it for real tasks. Note what's missing.

Week 4

Iterate based on reality
Fix the pain points. Add the tools you actually need. Improve memory based on what matters. Share what you've built.

Building Axis changed how we work at As Above Technologies. The compound effect of persistent memory and capable tools creates something genuinely useful, an assistant that knows your business and can actually help.

The technology is accessible. The patterns are proven. The only question is whether you'll build or watch others build. We hope you choose to build.

If you build something, we'd love to see it. Share your agent projects, ask questions, and join the community of builders creating the next generation of AI tools.

Ready to explore more technical guides?

Explore Techne