In January 2025, we started building what would become Axisβ€”the AI agent that now manages significant portions of As Above Technologies' operations. It handles customer inquiries, monitors our systems, researches markets, drafts content, and even contributes to its own codebase. It wasn't magic, and it wasn't easy. But it was more achievable than you might think.

This guide is what I wish existed when we started. It covers everything from foundational concepts to production deployment, with code examples and real lessons from building an agent that actually runs a business. By the end, you'll have a clear roadmap for building your own AI agentβ€”whether it's a weekend project or a production system.

We'll be honest about what works, what doesn't, and where the hype exceeds reality. Building agents is now accessible to developers of all experience levels, but it requires understanding the right patterns and avoiding common traps.

πŸ’‘ Who This Guide Is For

Developers, technical founders, and ambitious entrepreneurs who want to build AI agents. You don't need ML expertise, but basic programming familiarity (Python preferred) will help you get the most from the code examples. Non-developers can still benefit from the architecture and platform sections to make informed decisions.

1. AI Agents vs Chatbots: Understanding the Difference

Before we build anything, we need to be precise about what we're building. The industry uses "AI agent" to describe everything from a slightly enhanced chatbot to science fiction AGI. Here's the taxonomy that actually matters:

The Spectrum of AI Systems

Think of AI systems on a spectrum of autonomy and capability:

System Type Autonomy Tools Memory Example
Basic Chatbot None None Single conversation Rule-based support bots
LLM Interface None None Single conversation Basic ChatGPT wrapper
Enhanced LLM Minimal Built-in only Session-based ChatGPT with web browsing
AI Assistant Low Limited set Persistent Custom GPTs, Claude Projects
AI Agent Medium-High Extensible Long-term Axis, OpenClaw agents
Autonomous Agent High Self-extending Evolving AutoGPT, experimental systems

The Three Defining Characteristics

What separates a true AI agent from a fancy chatbot? Three core capabilities:

1. Tool Use (Action in the World)

A chatbot tells you how to do something. An agent does the thing. This is the most fundamental difference. When you ask an agent to "check if my server is up," it doesn't explain ping commandsβ€”it pings the server and tells you the result.

Tools are the bridge between language and action. They can be:

πŸ’‘ Tool Use at As Above

Axis has access to over 30 tools: web search, calendar management, email, file operations, browser automation, code execution, database queries, and our business-specific integrations (inventory systems, customer databases). Each tool extends what Axis can do in the world.

2. Memory (Continuity Across Time)

A chatbot starts fresh each conversation. An agent remembers. This isn't just about technical context windowsβ€”it's about building a persistent understanding that compounds over time.

Effective agent memory includes:

Memory is what allows an agent to say "last time we tried that approach, it didn't work because X" or "you mentioned preferring brief emails" without being reminded each session.

3. Goal-Directed Behavior (Pursuing Objectives)

A chatbot waits for input. An agent pursues goals. This is the autonomy dimensionβ€”the ability to take a high-level objective and break it down into sub-tasks, execute them in sequence, handle obstacles, and persist until the goal is achieved (or determined impossible).

Compare:

❌ Chatbot interaction

User: "How do I find out what our competitors are charging?"
Chatbot: "You can visit their websites, check industry reports, or use price monitoring tools like Prisync..."

βœ… Agent interaction

User: "Find out what our competitors are charging."
Agent: "I'll research that now. [Uses web search to find competitor sites, navigates to pricing pages, extracts pricing data, synthesizes findings] Here's what I found: Competitor A charges $49-199/mo, Competitor B is $79/mo flat, Competitor C uses usage-based pricing starting at $0.01/call. Want me to compile this into a comparison table?"

Why the Distinction Matters

The distinction isn't academicβ€”it determines what you build and how. Agent architecture is fundamentally different from chatbot architecture:

If you're building a chatbot thinking you're building an agent, you'll hit walls. If you're building an agent with chatbot architecture, you'll create something fragile and frustrating. Let's build it right from the start.

2. The Agent Architecture: LLM + Tools + Memory + Orchestration

An AI agent is a system, not just a model. The model (GPT-4, Claude, etc.) is the brain, but brains need bodies, senses, and support systems. Here's the architecture that makes agents work:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        ORCHESTRATOR                              β”‚
β”‚    (Receives input, plans tasks, routes to components)           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                    β”‚                    β”‚
         β–Ό                    β–Ό                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 β”‚  β”‚                 β”‚  β”‚                 β”‚
β”‚    LLM CORE     β”‚  β”‚  TOOL SYSTEM    β”‚  β”‚  MEMORY STORE   β”‚
β”‚                 β”‚  β”‚                 β”‚  β”‚                 β”‚
β”‚  β€’ Reasoning    β”‚  β”‚  β€’ Web Search   β”‚  β”‚  β€’ Working      β”‚
β”‚  β€’ Generation   β”‚  β”‚  β€’ File I/O     β”‚  β”‚  β€’ Short-term   β”‚
β”‚  β€’ Planning     β”‚  β”‚  β€’ APIs         β”‚  β”‚  β€’ Long-term    β”‚
β”‚  β€’ Synthesis    β”‚  β”‚  β€’ Browser      β”‚  β”‚  β€’ Retrieval    β”‚
β”‚                 β”‚  β”‚  β€’ Custom       β”‚  β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                    β”‚                    β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      INTERFACE LAYER                             β”‚
β”‚         (Chat, API, Webhooks, Scheduled Tasks)                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Component 1: The LLM Core

The language model is the reasoning engine. It interprets intent, plans approaches, generates outputs, and decides when to use tools. Choosing the right model matters:

Model Best For Context Cost (per 1M tokens)
GPT-4o General tasks, speed 128K $2.50 in / $10 out
GPT-4 Turbo Complex reasoning 128K $10 in / $30 out
Claude 3.5 Sonnet Balanced capability/cost 200K $3 in / $15 out
Claude Opus 4 Complex reasoning, agentic tasks 200K $15 in / $75 out
Llama 3 70B Self-hosted, privacy 8K-128K Hardware cost
Mixtral 8x22B Cost-effective self-hosted 64K Hardware cost
πŸ’‘ Model Selection at As Above

Axis uses a tiered approach: Claude Opus 4 for complex reasoning and planning, Claude 3.5 Sonnet for most day-to-day tasks, and GPT-4o for specific use cases where OpenAI excels. The orchestrator routes tasks to the appropriate model based on complexity and requirements.

Component 2: The Tool System

Tools are functions the agent can call to interact with the world. A well-designed tool system has:

Tool Definition

Each tool needs clear specification that the LLM can understand:

tool_definition.py
# Example tool definition for a web search tool
web_search_tool = {
    "name": "web_search",
    "description": "Search the web for current information. Use for questions about recent events, facts you're unsure about, or when user asks to 'look up' something.",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "The search query. Be specific and include relevant context."
            },
            "num_results": {
                "type": "integer",
                "description": "Number of results to return (1-10)",
                "default": 5
            }
        },
        "required": ["query"]
    }
}

Tool Execution

The tool executor takes the LLM's tool call and performs the actual action:

tool_executor.py
async def execute_tool(tool_name: str, parameters: dict) -> dict:
    """Execute a tool and return results."""
    
    if tool_name == "web_search":
        results = await brave_search(
            query=parameters["query"],
            count=parameters.get("num_results", 5)
        )
        return {
            "success": True,
            "results": results
        }
    
    elif tool_name == "read_file":
        content = await read_file(parameters["path"])
        return {
            "success": True,
            "content": content
        }
    
    elif tool_name == "send_email":
        # Always confirm before external actions
        if not parameters.get("confirmed"):
            return {
                "success": False,
                "needs_confirmation": True,
                "message": f"Send email to {parameters['to']}?"
            }
        await send_email(**parameters)
        return {"success": True}
    
    else:
        return {
            "success": False,
            "error": f"Unknown tool: {tool_name}"
        }

Tool Categories

Organize tools by risk level and type:

Component 3: Memory System

Memory is what makes an agent feel intelligent over time. There are multiple layers:

Working Memory (Context Window)

The immediate context the LLM seesβ€”current conversation, task state, relevant retrieved information. This is limited by the model's context window.

Short-Term Memory (Session Storage)

Information persisted across turns in a session but not necessarily across sessions. Typically implemented as in-memory storage or short-TTL cache.

Long-Term Memory (Persistent Storage)

Knowledge that persists across sessions. This requires:

memory_system.py
class AgentMemory:
    def __init__(self, storage_path: str):
        self.storage_path = storage_path
        self.working_memory = {}  # Current session state
        self.load_long_term_memory()
    
    def load_long_term_memory(self):
        """Load persistent memory from disk."""
        self.user_preferences = self._load_json("preferences.json")
        self.episodic_memory = self._load_json("episodes.json")
        self.learned_procedures = self._load_json("procedures.json")
    
    def remember(self, key: str, value: any, memory_type: str = "working"):
        """Store a memory."""
        if memory_type == "working":
            self.working_memory[key] = value
        elif memory_type == "long_term":
            self._persist_to_file(key, value)
    
    def recall(self, query: str, memory_types: list = None) -> list:
        """Retrieve relevant memories."""
        results = []
        
        # Search working memory
        if "working" in (memory_types or ["working"]):
            for key, value in self.working_memory.items():
                if self._is_relevant(query, key, value):
                    results.append({"source": "working", "key": key, "value": value})
        
        # Search long-term memory with embeddings
        if "long_term" in (memory_types or []):
            relevant = self._semantic_search(query, self.episodic_memory)
            results.extend(relevant)
        
        return results
    
    def consolidate(self):
        """Move important working memories to long-term storage."""
        # Run periodically to persist important learnings
        for key, value in self.working_memory.items():
            if self._should_persist(key, value):
                self.remember(key, value, memory_type="long_term")

Component 4: The Orchestrator

The orchestrator is the "executive function"β€”it coordinates everything. Key responsibilities:

orchestrator.py
class AgentOrchestrator:
    def __init__(self, config: AgentConfig):
        self.llm = LLMClient(config.model)
        self.tools = ToolRegistry(config.tools)
        self.memory = AgentMemory(config.memory_path)
    
    async def handle_request(self, user_input: str) -> str:
        """Main entry point for agent requests."""
        
        # 1. Retrieve relevant context
        context = self.memory.recall(user_input)
        
        # 2. Build prompt with context and available tools
        messages = self._build_messages(user_input, context)
        
        # 3. Get LLM response (may include tool calls)
        response = await self.llm.chat(
            messages=messages,
            tools=self.tools.get_definitions()
        )
        
        # 4. Process tool calls if any
        while response.tool_calls:
            tool_results = await self._execute_tools(response.tool_calls)
            messages.append(response.message)
            messages.append(self._format_tool_results(tool_results))
            response = await self.llm.chat(messages=messages, tools=self.tools.get_definitions())
        
        # 5. Update memory with interaction
        self.memory.remember("last_interaction", {
            "input": user_input,
            "output": response.content,
            "timestamp": datetime.now()
        })
        
        return response.content
    
    async def _execute_tools(self, tool_calls: list) -> list:
        """Execute multiple tool calls, potentially in parallel."""
        results = []
        for call in tool_calls:
            result = await self.tools.execute(call.name, call.parameters)
            results.append({
                "tool": call.name,
                "call_id": call.id,
                "result": result
            })
        return results

The ReAct Pattern

Most successful agents use some variant of the ReAct (Reasoning + Acting) pattern:

  1. Observe: Receive input, gather context
  2. Think: Reason about what to do (often explicit "thinking" in the prompt)
  3. Act: Execute a tool or generate output
  4. Observe: See the result of the action
  5. Repeat: Continue until task is complete
πŸ’‘ ReAct in Practice

When Axis receives "Check if our website is down and fix it if possible," the loop looks like:

Think: Need to check website status first.
Act: [calls http_check tool for asabove.tech]
Observe: 503 error, server not responding
Think: Site is down. Should check server status.
Act: [calls server_status tool]
Observe: High CPU, memory exhausted
Think: Need to restart service, but should confirm first.
Act: [requests confirmation to restart]
...and so on

3. Platforms Compared: OpenClaw, LangChain, AutoGPT, CrewAI

You don't have to build everything from scratch. Several platforms provide agent infrastructure. Here's an honest comparison based on our experience:

OpenClaw
Personal AI Assistant Framework

OpenClaw is what we built Axis on. It's a personal AI assistant framework designed for developers who want a capable agent without drowning in infrastructure complexity.

Learning Curve
Low-Medium
Customization
High
Production Ready
Yes
Cost
Open source + API costs

Key Features

  • Built-in tool system with common integrations (web, files, browser, calendar, email)
  • Flexible memory with workspace files (SOUL.md, MEMORY.md patterns)
  • Multi-model support (switch between Claude, GPT, local models)
  • Multiple interfaces: CLI, web chat, API, Discord, Telegram
  • Subagent spawning for parallel tasks
  • Heartbeat system for proactive behavior

Best For

Developers building personal or business assistants. Those who want a working agent fast but need customization. Teams that value the "workspace as configuration" pattern.

Limitations

  • Newer platform, smaller community than LangChain
  • Documentation still maturing
  • Less suited for massive multi-agent systems
LangChain
AI Application Framework

LangChain is the most popular framework for building LLM applications. It provides extensive abstractions for chains, agents, memory, and integrations.

Learning Curve
Medium-High
Customization
Very High
Production Ready
Yes (with work)
Cost
Open source + API costs

Key Features

  • Massive ecosystem of integrations and tools
  • LangGraph for complex agent workflows with state machines
  • LangSmith for observability and debugging
  • Extensive documentation and tutorials
  • Large community, many examples
  • Support for virtually any LLM provider

Best For

Teams building complex, custom agent systems. Production applications needing observability. Projects requiring specific integrations from the ecosystem.

Limitations

  • Abstraction layers can add complexity
  • Frequent breaking changes in early versions (stabilizing now)
  • Can be overwhelmingβ€”many ways to do the same thing
  • Debugging can be tricky due to abstraction depth
langchain_example.py
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.tools import Tool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Define tools
tools = [
    Tool(
        name="web_search",
        description="Search the web for information",
        func=lambda q: web_search(q)
    ),
    Tool(
        name="calculator",
        description="Perform mathematical calculations",
        func=lambda expr: eval(expr)  # Don't do this in production!
    )
]

# Create agent
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

agent = create_openai_tools_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# Run
result = executor.invoke({"input": "What's the population of Tokyo?"})
AutoGPT
Autonomous Agent Experiment

AutoGPT pioneered the "give an AI a goal and let it figure out how" paradigm. It's more experimental than production-ready, but influential.

Learning Curve
Medium
Autonomy
Very High
Production Ready
Experimental
Cost
Can be expensive (many LLM calls)

Key Features

  • Fully autonomous operation (set goal, watch it work)
  • Self-prompting loop with planning and reflection
  • Built-in web browsing, code execution, file management
  • Memory with pinecone/local vector stores
  • Plugin system for extensions

Best For

Experimentation and learning. Research into autonomous systems. Tasks where high autonomy is acceptable and cost isn't a primary concern.

Limitations

  • High token consumption (loops can be expensive)
  • Can get stuck in loops or pursue tangents
  • Limited controllability once started
  • Not recommended for production business processes
CrewAI
Multi-Agent Orchestration

CrewAI specializes in multi-agent systems where different "crew members" collaborate on tasks. Each agent has a role, goal, and backstory.

Learning Curve
Low-Medium
Multi-Agent
Excellent
Production Ready
Yes (for appropriate use cases)
Cost
Scales with agents/tasks

Key Features

  • Role-based agent design (researcher, writer, reviewer, etc.)
  • Hierarchical and collaborative process types
  • Task delegation between agents
  • Clean, intuitive API
  • Good for content pipelines and research workflows

Best For

Workflows that naturally decompose into roles. Content creation pipelines. Research and analysis tasks. Teams exploring multi-agent patterns.

Limitations

  • Multi-agent overhead isn't always necessary
  • Can be slower than single-agent approaches
  • Coordination between agents can be unpredictable
  • Less flexible for highly custom requirements
crewai_example.py
from crewai import Agent, Task, Crew, Process

# Define agents
researcher = Agent(
    role='Research Analyst',
    goal='Find and analyze market information',
    backstory='Expert at finding and synthesizing information',
    tools=[web_search_tool]
)

writer = Agent(
    role='Content Writer',
    goal='Create compelling content from research',
    backstory='Experienced writer who turns data into narratives',
    tools=[write_file_tool]
)

# Define tasks
research_task = Task(
    description='Research the AI agent market landscape in 2026',
    agent=researcher,
    expected_output='Detailed market analysis with key players and trends'
)

writing_task = Task(
    description='Write a blog post based on the research',
    agent=writer,
    context=[research_task],
    expected_output='1500-word blog post in markdown format'
)

# Create crew and execute
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential
)

result = crew.kickoff()

Platform Selection Guide

If You Want... Choose Why
Personal assistant, fast setup OpenClaw Batteries included, workspace-centric
Maximum flexibility, enterprise LangChain + LangGraph Most mature ecosystem, best tooling
Full autonomy experiments AutoGPT Designed for autonomous operation
Multi-agent collaboration CrewAI Purpose-built for agent teams
Learning/understanding agents Build from scratch Nothing teaches like building
πŸ’‘ Our Recommendation

Start with OpenClaw or raw API calls to understand the fundamentals. Graduate to LangChain when you need specific integrations or complex workflows. Consider CrewAI for content pipelines. Use AutoGPT for experiments, not production.

4. Step-by-Step: Building a Simple Agent

Let's build an agent from scratch. We'll create a research assistant that can search the web, read files, and maintain conversation memory. This will teach you the fundamentals before using any framework.

Step 1: Project Setup

requirements.txt
anthropic>=0.18.0
aiohttp>=3.9.0
python-dotenv>=1.0.0
.env
ANTHROPIC_API_KEY=your_api_key_here
BRAVE_API_KEY=your_brave_search_key  # Optional, for web search

Step 2: Define Your Tools

tools.py
import aiohttp
import os
from typing import Any
from pathlib import Path

# Tool definitions for Claude
TOOLS = [
    {
        "name": "web_search",
        "description": "Search the web for current information. Returns titles, URLs, and snippets.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query"
                }
            },
            "required": ["query"]
        }
    },
    {
        "name": "read_file",
        "description": "Read the contents of a file from the workspace.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "Path to the file"
                }
            },
            "required": ["path"]
        }
    },
    {
        "name": "write_file",
        "description": "Write content to a file. Creates directories if needed.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "Path to write to"
                },
                "content": {
                    "type": "string",
                    "description": "Content to write"
                }
            },
            "required": ["path", "content"]
        }
    },
    {
        "name": "remember",
        "description": "Save important information to long-term memory.",
        "input_schema": {
            "type": "object",
            "properties": {
                "key": {
                    "type": "string",
                    "description": "Memory key/label"
                },
                "value": {
                    "type": "string",
                    "description": "Information to remember"
                }
            },
            "required": ["key", "value"]
        }
    }
]

class ToolExecutor:
    def __init__(self, workspace: str = "./workspace"):
        self.workspace = Path(workspace)
        self.workspace.mkdir(exist_ok=True)
        self.memory = {}
        self._load_memory()
    
    def _load_memory(self):
        """Load persistent memory from file."""
        memory_file = self.workspace / "memory.json"
        if memory_file.exists():
            import json
            self.memory = json.loads(memory_file.read_text())
    
    def _save_memory(self):
        """Persist memory to file."""
        import json
        memory_file = self.workspace / "memory.json"
        memory_file.write_text(json.dumps(self.memory, indent=2))
    
    async def execute(self, tool_name: str, tool_input: dict) -> Any:
        """Execute a tool and return results."""
        
        if tool_name == "web_search":
            return await self._web_search(tool_input["query"])
        
        elif tool_name == "read_file":
            return self._read_file(tool_input["path"])
        
        elif tool_name == "write_file":
            return self._write_file(tool_input["path"], tool_input["content"])
        
        elif tool_name == "remember":
            return self._remember(tool_input["key"], tool_input["value"])
        
        else:
            return {"error": f"Unknown tool: {tool_name}"}
    
    async def _web_search(self, query: str) -> dict:
        """Perform web search using Brave API."""
        api_key = os.getenv("BRAVE_API_KEY")
        if not api_key:
            return {"error": "BRAVE_API_KEY not configured"}
        
        async with aiohttp.ClientSession() as session:
            async with session.get(
                "https://api.search.brave.com/res/v1/web/search",
                headers={"X-Subscription-Token": api_key},
                params={"q": query, "count": 5}
            ) as resp:
                if resp.status != 200:
                    return {"error": f"Search failed: {resp.status}"}
                data = await resp.json()
                
                results = []
                for item in data.get("web", {}).get("results", []):
                    results.append({
                        "title": item.get("title"),
                        "url": item.get("url"),
                        "snippet": item.get("description")
                    })
                return {"results": results}
    
    def _read_file(self, path: str) -> dict:
        """Read a file from workspace."""
        file_path = self.workspace / path
        if not file_path.exists():
            return {"error": f"File not found: {path}"}
        if not str(file_path.resolve()).startswith(str(self.workspace.resolve())):
            return {"error": "Access denied: path outside workspace"}
        return {"content": file_path.read_text()}
    
    def _write_file(self, path: str, content: str) -> dict:
        """Write content to a file."""
        file_path = self.workspace / path
        if not str(file_path.resolve()).startswith(str(self.workspace.resolve())):
            return {"error": "Access denied: path outside workspace"}
        file_path.parent.mkdir(parents=True, exist_ok=True)
        file_path.write_text(content)
        return {"success": True, "path": str(file_path)}
    
    def _remember(self, key: str, value: str) -> dict:
        """Store information in persistent memory."""
        self.memory[key] = value
        self._save_memory()
        return {"success": True, "remembered": key}

Step 3: Build the Agent Core

agent.py
import anthropic
import asyncio
from tools import TOOLS, ToolExecutor
from typing import Optional
from datetime import datetime

class SimpleAgent:
    def __init__(self, model: str = "claude-sonnet-4-20250514"):
        self.client = anthropic.Anthropic()
        self.model = model
        self.tool_executor = ToolExecutor()
        self.conversation_history = []
        
        # System prompt defines agent behavior
        self.system_prompt = """You are a helpful AI research assistant. You can:
- Search the web for current information
- Read and write files in your workspace
- Remember important information across conversations

When given a task:
1. Think about what information or actions you need
2. Use your tools to gather information or take actions
3. Synthesize findings into a clear response

Be thorough but concise. If you're unsure about something, say so.
If a task requires multiple steps, work through them systematically.

Current date: {date}
Memories: {memories}
"""
    
    def _build_system_prompt(self) -> str:
        """Build system prompt with current context."""
        memories_str = "\n".join(
            f"- {k}: {v}" for k, v in self.tool_executor.memory.items()
        ) if self.tool_executor.memory else "None stored yet."
        
        return self.system_prompt.format(
            date=datetime.now().strftime("%Y-%m-%d"),
            memories=memories_str
        )
    
    async def chat(self, user_message: str) -> str:
        """Process a user message and return response."""
        
        # Add user message to history
        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })
        
        # Keep history manageable (last 20 turns)
        if len(self.conversation_history) > 40:
            self.conversation_history = self.conversation_history[-40:]
        
        # Call Claude with tools
        response = self.client.messages.create(
            model=self.model,
            max_tokens=4096,
            system=self._build_system_prompt(),
            tools=TOOLS,
            messages=self.conversation_history
        )
        
        # Process response - may need multiple turns for tool use
        while response.stop_reason == "tool_use":
            # Extract tool calls from response
            assistant_content = response.content
            tool_results = []
            
            for block in assistant_content:
                if block.type == "tool_use":
                    print(f"  β†’ Using tool: {block.name}")
                    result = await self.tool_executor.execute(
                        block.name, 
                        block.input
                    )
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": str(result)
                    })
            
            # Add assistant message and tool results to history
            self.conversation_history.append({
                "role": "assistant",
                "content": assistant_content
            })
            self.conversation_history.append({
                "role": "user",
                "content": tool_results
            })
            
            # Continue conversation with tool results
            response = self.client.messages.create(
                model=self.model,
                max_tokens=4096,
                system=self._build_system_prompt(),
                tools=TOOLS,
                messages=self.conversation_history
            )
        
        # Extract final text response
        final_response = ""
        for block in response.content:
            if hasattr(block, "text"):
                final_response += block.text
        
        # Add final response to history
        self.conversation_history.append({
            "role": "assistant",
            "content": response.content
        })
        
        return final_response

async def main():
    """Interactive chat loop."""
    agent = SimpleAgent()
    print("Agent ready. Type 'quit' to exit.\n")
    
    while True:
        try:
            user_input = input("You: ").strip()
            if user_input.lower() in ['quit', 'exit', 'q']:
                break
            if not user_input:
                continue
            
            print("Agent: ", end="", flush=True)
            response = await agent.chat(user_input)
            print(response)
            print()
            
        except KeyboardInterrupt:
            break
    
    print("\nGoodbye!")

if __name__ == "__main__":
    asyncio.run(main())

Step 4: Test Your Agent

terminal
$ python agent.py
Agent ready. Type 'quit' to exit.

You: What's the current state of AI agent technology? Search for recent news.
  β†’ Using tool: web_search
Agent: Based on my search, here's the current state of AI agent technology in early 2026:

**Major Developments:**
1. **Claude's Computer Use** - Anthropic's ability to control computers directly has matured...
2. **OpenAI Operator** - GPT-4 can now browse the web and complete multi-step tasks...
3. **Open Source Progress** - Llama 3 and Mixtral are enabling self-hosted agent systems...

Would you like me to save this summary to a file?

You: Yes, save it to research/ai-agents-2026.md
  β†’ Using tool: write_file
Agent: Done! I've saved the summary to research/ai-agents-2026.md. 

You: Remember that I'm interested in AI agents for business automation
  β†’ Using tool: remember
Agent: Noted! I'll remember your interest in AI agents for business automation. 
This will help me tailor future research and suggestions to your focus area.

You: quit
Goodbye!

What You've Built

Congratulations! You now have a working AI agent with:

This is a foundation you can build on. In the next sections, we'll add more sophisticated tools and memory systems.

βœ… Code Available

The complete code for this tutorial is available at github.com/asabove-tech/simple-agent-tutorial. Star it to bookmark for later!

5. Adding Tools and Capabilities

Tools are what transform an LLM from a text generator into a capable agent. Let's explore how to add more sophisticated capabilities.

Tool Design Principles

1. Clear, Specific Descriptions

The LLM decides when to use tools based on descriptions. Be explicit about:

❌ Bad tool description

"name": "search", "description": "Searches for stuff"

βœ… Good tool description

"name": "web_search", "description": "Search the web using Brave Search API. Use for questions about current events, facts you're uncertain about, or when the user explicitly asks to look something up. Returns titles, URLs, and snippets for the top results. Not suitable for accessing specific websitesβ€”use browser tools for that."

2. Appropriate Granularity

Tools should be atomic enough to be composable, but not so granular that simple tasks require many calls:

3. Meaningful Error Messages

When tools fail, return errors the LLM can act on:

# Bad: unhelpful error
return {"error": "Failed"}

# Good: actionable error
return {
    "error": "File not found",
    "details": f"No file at path '{path}'",
    "suggestion": "Check if the path is correct or use list_files to see available files"
}

Common Tool Categories

Information Retrieval Tools

retrieval_tools.py
# Web search with multiple providers
async def web_search(query: str, provider: str = "brave") -> dict:
    """Multi-provider web search."""
    if provider == "brave":
        return await brave_search(query)
    elif provider == "serper":
        return await serper_search(query)
    elif provider == "tavily":
        return await tavily_search(query)  # Good for AI-optimized results

# URL content fetching
async def fetch_url(url: str, extract_mode: str = "markdown") -> dict:
    """Fetch and extract content from a URL."""
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            html = await resp.text()
    
    if extract_mode == "markdown":
        # Convert HTML to readable markdown
        content = html_to_markdown(html)
    elif extract_mode == "text":
        content = extract_text(html)
    else:
        content = html
    
    return {
        "url": url,
        "content": content[:50000],  # Limit size
        "truncated": len(content) > 50000
    }

# Database queries (read-only!)
def query_database(sql: str, database: str = "analytics") -> dict:
    """Execute read-only SQL query."""
    # Validate query is SELECT only
    if not sql.strip().upper().startswith("SELECT"):
        return {"error": "Only SELECT queries allowed"}
    
    conn = get_connection(database)
    try:
        results = conn.execute(sql).fetchall()
        return {"columns": [d[0] for d in conn.description], "rows": results}
    except Exception as e:
        return {"error": str(e)}

Communication Tools

communication_tools.py
# Email with confirmation
async def send_email(
    to: str, 
    subject: str, 
    body: str, 
    confirmed: bool = False
) -> dict:
    """Send email. Requires confirmation for safety."""
    if not confirmed:
        return {
            "needs_confirmation": True,
            "preview": {
                "to": to,
                "subject": subject,
                "body_preview": body[:200] + "..." if len(body) > 200 else body
            },
            "message": "Please confirm you want to send this email."
        }
    
    # Actually send
    result = await email_client.send(to=to, subject=subject, body=body)
    return {"success": True, "message_id": result.id}

# Slack/Discord messaging
async def send_message(
    channel: str, 
    message: str, 
    platform: str = "slack"
) -> dict:
    """Send message to team chat."""
    if platform == "slack":
        return await slack_client.post(channel=channel, text=message)
    elif platform == "discord":
        return await discord_client.send(channel_id=channel, content=message)

# Calendar integration
async def create_calendar_event(
    title: str,
    start_time: str,
    end_time: str,
    attendees: list = None,
    confirmed: bool = False
) -> dict:
    """Create calendar event. Requires confirmation."""
    if not confirmed:
        return {
            "needs_confirmation": True,
            "preview": {"title": title, "start": start_time, "end": end_time},
            "message": "Please confirm you want to create this event."
        }
    
    event = await calendar_client.create_event(
        title=title,
        start=start_time,
        end=end_time,
        attendees=attendees or []
    )
    return {"success": True, "event_id": event.id, "link": event.html_link}

Code Execution Tools

code_tools.py
import subprocess
import tempfile
from pathlib import Path

async def execute_python(code: str, timeout: int = 30) -> dict:
    """Execute Python code in sandboxed environment."""
    # Create temp file
    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
        f.write(code)
        temp_path = f.name
    
    try:
        # Run with timeout and restricted permissions
        result = subprocess.run(
            ['python', temp_path],
            capture_output=True,
            text=True,
            timeout=timeout,
            cwd=tempfile.gettempdir(),  # Isolated directory
        )
        
        return {
            "stdout": result.stdout,
            "stderr": result.stderr,
            "return_code": result.returncode
        }
    except subprocess.TimeoutExpired:
        return {"error": f"Execution timed out after {timeout}s"}
    finally:
        Path(temp_path).unlink()  # Clean up

async def execute_shell(command: str, confirmed: bool = False) -> dict:
    """Execute shell command. DANGEROUS - requires confirmation."""
    # Blacklist dangerous commands
    dangerous = ['rm -rf', 'sudo', 'mkfs', 'dd if=', '> /dev']
    if any(d in command for d in dangerous):
        return {"error": "Command blocked for safety"}
    
    if not confirmed:
        return {
            "needs_confirmation": True,
            "command": command,
            "message": "Please confirm you want to run this shell command."
        }
    
    result = subprocess.run(
        command,
        shell=True,
        capture_output=True,
        text=True,
        timeout=60
    )
    
    return {
        "stdout": result.stdout,
        "stderr": result.stderr,
        "return_code": result.returncode
    }

Browser Automation Tools

browser_tools.py
from playwright.async_api import async_playwright

class BrowserTool:
    def __init__(self):
        self.browser = None
        self.page = None
    
    async def initialize(self):
        """Start browser instance."""
        playwright = await async_playwright().start()
        self.browser = await playwright.chromium.launch(headless=True)
        self.page = await self.browser.new_page()
    
    async def navigate(self, url: str) -> dict:
        """Navigate to URL and return page content."""
        await self.page.goto(url, wait_until="networkidle")
        
        # Extract readable content
        content = await self.page.evaluate("""
            () => {
                const article = document.querySelector('article') || document.body;
                return article.innerText;
            }
        """)
        
        return {
            "url": url,
            "title": await self.page.title(),
            "content": content[:30000]
        }
    
    async def screenshot(self, path: str = None) -> dict:
        """Take screenshot of current page."""
        if not path:
            path = f"screenshot_{int(time.time())}.png"
        await self.page.screenshot(path=path, full_page=True)
        return {"path": path}
    
    async def click(self, selector: str) -> dict:
        """Click element by selector."""
        try:
            await self.page.click(selector, timeout=5000)
            return {"success": True}
        except Exception as e:
            return {"error": str(e)}
    
    async def fill(self, selector: str, text: str) -> dict:
        """Fill input field."""
        try:
            await self.page.fill(selector, text)
            return {"success": True}
        except Exception as e:
            return {"error": str(e)}
    
    async def get_page_structure(self) -> dict:
        """Get accessible page structure for agent reasoning."""
        structure = await self.page.evaluate("""
            () => {
                function getAccessibleTree(el, depth = 0) {
                    if (depth > 3) return null;
                    const nodes = [];
                    for (const child of el.children) {
                        const role = child.getAttribute('role') || child.tagName.toLowerCase();
                        const text = child.innerText?.slice(0, 50);
                        if (['a', 'button', 'input', 'select', 'textarea'].includes(role) || 
                            child.getAttribute('role')) {
                            nodes.push({
                                role,
                                text,
                                selector: child.id ? '#' + child.id : null
                            });
                        }
                        const childNodes = getAccessibleTree(child, depth + 1);
                        if (childNodes) nodes.push(...childNodes);
                    }
                    return nodes;
                }
                return getAccessibleTree(document.body);
            }
        """)
        return {"interactive_elements": structure}

Tool Safety Patterns

Safety is critical. Here are patterns we use with Axis:

1. Confirmation for Dangerous Actions

def requires_confirmation(tool_name: str, params: dict) -> bool:
    """Determine if action needs human approval."""
    # Always confirm external communications
    if tool_name in ['send_email', 'post_tweet', 'send_slack']:
        return True
    
    # Always confirm financial actions
    if tool_name in ['make_purchase', 'transfer_funds']:
        return True
    
    # Confirm destructive file operations
    if tool_name == 'delete_file':
        return True
    
    # Confirm shell commands
    if tool_name == 'execute_shell':
        return True
    
    return False

2. Allowlists Over Blocklists

# Bad: trying to block dangerous things
BLOCKED_COMMANDS = ['rm', 'sudo', 'wget', ...]  # Will miss something

# Good: explicitly allow safe things
ALLOWED_COMMANDS = ['ls', 'cat', 'grep', 'find', 'wc', 'head', 'tail']

def is_safe_command(command: str) -> bool:
    cmd = command.split()[0]
    return cmd in ALLOWED_COMMANDS

3. Rate Limiting

from collections import defaultdict
from time import time

class RateLimiter:
    def __init__(self):
        self.calls = defaultdict(list)
    
    def check(self, tool: str, limit: int, window: int) -> bool:
        """Check if tool call is within rate limit."""
        now = time()
        # Remove old calls outside window
        self.calls[tool] = [t for t in self.calls[tool] if now - t < window]
        
        if len(self.calls[tool]) >= limit:
            return False
        
        self.calls[tool].append(now)
        return True

rate_limiter = RateLimiter()

# Usage
async def execute_tool(name: str, params: dict):
    # Limit expensive operations
    if name == "web_search" and not rate_limiter.check("web_search", 10, 60):
        return {"error": "Rate limit exceeded. Max 10 searches per minute."}
    ...

4. Sandboxing

# File operations restricted to workspace
class SandboxedFileSystem:
    def __init__(self, root: Path):
        self.root = root.resolve()
    
    def validate_path(self, path: str) -> Path:
        """Ensure path is within sandbox."""
        resolved = (self.root / path).resolve()
        if not str(resolved).startswith(str(self.root)):
            raise PermissionError(f"Access denied: {path} is outside workspace")
        return resolved
    
    def read(self, path: str) -> str:
        safe_path = self.validate_path(path)
        return safe_path.read_text()
    
    def write(self, path: str, content: str):
        safe_path = self.validate_path(path)
        safe_path.parent.mkdir(parents=True, exist_ok=True)
        safe_path.write_text(content)
⚠️ Security is Non-Negotiable

Every tool you add is a potential attack vector. Assume the LLM might be tricked into misusing tools (prompt injection). Defense in depth: validate inputs, confirm dangerous actions, sandbox execution, log everything.

6. Memory and Context Management

Memory is what transforms a stateless text generator into something that feels intelligent over time. But it's also one of the hardest problems in agent design.

The Memory Challenge

Context windows are limited. Even Claude's 200K tokens fill up quickly with:

You need strategies for:

  1. Deciding what goes into the context window
  2. Storing information that doesn't fit
  3. Retrieving relevant information when needed
  4. Forgetting information that's no longer useful

Memory Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     CONTEXT WINDOW                               β”‚
β”‚   System Prompt + Recent History + Retrieved Memories            β”‚
β”‚                      (Token Limited)                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–²
                              β”‚ Retrieval
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                    β”‚                    β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  WORKING MEMORY β”‚  β”‚  CONVERSATION   β”‚  β”‚   LONG-TERM     β”‚
β”‚                 β”‚  β”‚     STORE       β”‚  β”‚    MEMORY       β”‚
β”‚ Current task    β”‚  β”‚                 β”‚  β”‚                 β”‚
β”‚ Active goals    β”‚  β”‚ Full history    β”‚  β”‚ User prefs      β”‚
β”‚ Temp variables  β”‚  β”‚ Summaries       β”‚  β”‚ Learned facts   β”‚
β”‚                 β”‚  β”‚                 β”‚  β”‚ Procedures      β”‚
β”‚ (In-memory)     β”‚  β”‚ (Database)      β”‚  β”‚ (Vector store)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Strategy 1: Conversation Summarization

Instead of keeping full conversation history, periodically summarize:

summarization.py
class ConversationMemory:
    def __init__(self, llm_client, max_messages: int = 20):
        self.llm = llm_client
        self.max_messages = max_messages
        self.messages = []
        self.summaries = []
    
    def add_message(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})
        
        # Summarize when history gets too long
        if len(self.messages) > self.max_messages:
            self._summarize_oldest()
    
    def _summarize_oldest(self):
        """Summarize oldest messages and remove them."""
        # Take first half of messages
        to_summarize = self.messages[:self.max_messages // 2]
        
        summary = self.llm.chat([{
            "role": "user",
            "content": f"""Summarize this conversation concisely, preserving key facts, 
            decisions, and context needed for continuation:
            
            {self._format_messages(to_summarize)}"""
        }])
        
        self.summaries.append({
            "timestamp": datetime.now(),
            "summary": summary,
            "message_count": len(to_summarize)
        })
        
        # Remove summarized messages
        self.messages = self.messages[self.max_messages // 2:]
    
    def get_context(self) -> list:
        """Get context for LLM, including summaries."""
        context = []
        
        # Include summaries of older conversations
        if self.summaries:
            summary_text = "\n\n".join(
                f"[Earlier: {s['summary']}]" for s in self.summaries[-3:]  # Last 3 summaries
            )
            context.append({
                "role": "user",
                "content": f"Previous conversation context:\n{summary_text}"
            })
        
        # Include recent messages
        context.extend(self.messages)
        
        return context

Strategy 2: Semantic Retrieval (RAG)

Store memories as embeddings and retrieve relevant ones:

semantic_memory.py
import numpy as np
from typing import List, Tuple

class SemanticMemory:
    def __init__(self, embedding_model):
        self.embedding_model = embedding_model
        self.memories = []  # List of (text, embedding, metadata)
    
    def store(self, text: str, metadata: dict = None):
        """Store a memory with its embedding."""
        embedding = self.embedding_model.embed(text)
        self.memories.append({
            "text": text,
            "embedding": embedding,
            "metadata": metadata or {},
            "timestamp": datetime.now()
        })
    
    def retrieve(self, query: str, top_k: int = 5) -> List[dict]:
        """Retrieve most relevant memories."""
        query_embedding = self.embedding_model.embed(query)
        
        # Calculate similarities
        scored = []
        for memory in self.memories:
            similarity = self._cosine_similarity(
                query_embedding, 
                memory["embedding"]
            )
            scored.append((similarity, memory))
        
        # Return top-k
        scored.sort(reverse=True, key=lambda x: x[0])
        return [
            {"score": score, **memory} 
            for score, memory in scored[:top_k]
        ]
    
    def _cosine_similarity(self, a: np.ndarray, b: np.ndarray) -> float:
        return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Usage with OpenAI embeddings
from openai import OpenAI

class OpenAIEmbedding:
    def __init__(self):
        self.client = OpenAI()
    
    def embed(self, text: str) -> np.ndarray:
        response = self.client.embeddings.create(
            model="text-embedding-3-small",
            input=text
        )
        return np.array(response.data[0].embedding)

# Initialize
memory = SemanticMemory(OpenAIEmbedding())

# Store memories
memory.store("User prefers concise responses", {"type": "preference"})
memory.store("User's company is in the cannabis industry", {"type": "fact"})
memory.store("Last project was building an inventory system", {"type": "history"})

# Retrieve relevant memories
relevant = memory.retrieve("What kind of business does the user have?")
# Returns the cannabis industry memory

Strategy 3: Structured Memory Files

OpenClaw uses a file-based approach that's simple but effective:

workspace/MEMORY.md
# Long-Term Memory

## User Preferences
- Prefers concise, actionable responses
- Usually works US Pacific time zone
- Technical background (can show code)

## Important Facts
- Company: As Above Technologies
- Industry: AI/Software
- Key product: OpenClaw agent platform

## Learned Procedures
- For email drafts: always ask about tone before drafting
- For research: provide sources and confidence levels
- For code: include comments and explain reasoning

## Recent Decisions
- 2026-01-15: Decided to use Claude Opus 4 for complex reasoning tasks
- 2026-01-20: Established daily standup format for project updates

## Ongoing Projects
- Building customer onboarding automation
- Writing technical documentation for API

The agent loads this file into context at session start and updates it when learning new information:

memory_management.py
class FileBasedMemory:
    def __init__(self, workspace: Path):
        self.memory_file = workspace / "MEMORY.md"
        self.daily_file = workspace / f"memory/{datetime.now().strftime('%Y-%m-%d')}.md"
    
    def load_context(self) -> str:
        """Load memory for session start."""
        context = ""
        
        # Long-term memory
        if self.memory_file.exists():
            context += f"## Long-term Memory\n{self.memory_file.read_text()}\n\n"
        
        # Recent daily notes
        memory_dir = self.workspace / "memory"
        if memory_dir.exists():
            recent_files = sorted(memory_dir.glob("*.md"))[-3:]  # Last 3 days
            for f in recent_files:
                context += f"## Notes from {f.stem}\n{f.read_text()}\n\n"
        
        return context
    
    def append_daily(self, note: str):
        """Add to today's memory file."""
        self.daily_file.parent.mkdir(exist_ok=True)
        with open(self.daily_file, "a") as f:
            f.write(f"\n- {datetime.now().strftime('%H:%M')}: {note}")
    
    def update_long_term(self, section: str, content: str):
        """Update a section of long-term memory."""
        if not self.memory_file.exists():
            self.memory_file.write_text(f"# Long-Term Memory\n\n## {section}\n{content}")
            return
        
        current = self.memory_file.read_text()
        # Find and update section, or append
        # (Implementation details omitted for brevity)

Strategy 4: Hierarchical Memory

Different information has different lifespans and access patterns:

hierarchical_memory.py
class HierarchicalMemory:
    """
    Tier 1: Always in context (user prefs, core facts)
    Tier 2: Retrieved on relevance (episodic memory)
    Tier 3: Retrieved on explicit request (archived knowledge)
    """
    
    def __init__(self):
        self.tier1_always = {}      # Small, critical info
        self.tier2_indexed = []     # Semantic search
        self.tier3_archived = {}    # Keyword lookup
    
    def build_context(self, query: str) -> str:
        """Build memory context for a query."""
        context_parts = []
        
        # Always include tier 1
        if self.tier1_always:
            context_parts.append("Core context:")
            for key, value in self.tier1_always.items():
                context_parts.append(f"- {key}: {value}")
        
        # Retrieve relevant tier 2
        relevant = self._semantic_search(query, self.tier2_indexed, top_k=5)
        if relevant:
            context_parts.append("\nRelevant memories:")
            for mem in relevant:
                context_parts.append(f"- {mem['text']}")
        
        # Tier 3 only if explicitly referenced
        # (agent can request via tool)
        
        return "\n".join(context_parts)
    
    def promote_memory(self, memory_id: str, to_tier: int):
        """Move memory between tiers based on access patterns."""
        # Frequently accessed tier 2 -> tier 1
        # Rarely accessed tier 2 -> tier 3
        pass
    
    def consolidate(self):
        """Periodically consolidate and reorganize memories."""
        # Merge similar memories
        # Archive old unused memories
        # Update tier 1 based on importance
        pass

The Axis Memory System

Here's how memory actually works in Axis:

πŸ’‘ Axis Memory Architecture
  • SOUL.md: Permanent personality, values, behavioral guidelines
  • USER.md: Information about the human(s) Axis works with
  • MEMORY.md: Curated long-term memory (manually and auto-updated)
  • memory/YYYY-MM-DD.md: Daily logs of significant events
  • TOOLS.md: Environment-specific information (API keys location, server names)
  • Conversation history: Last 20-30 turns, summarized when longer

The key insight: treat memory like a well-organized filing system, not a database. The agent can read and update these files, creating a form of self-modifying memory that persists across sessions and is human-readable for debugging.

7. Deployment Options

You've built an agent. Now where does it run? The deployment choice affects reliability, cost, latency, and what's possible.

Option 1: Local Development Machine

Simplest Not for Production

Run the agent on your laptop/desktop. Good for development and personal use.

local_run.sh
# Simple local deployment
python agent.py

# Or with auto-reload for development
watchmedo auto-restart --patterns="*.py" -- python agent.py

Option 2: Cloud VM (Always-On)

Moderate Complexity Reliable

Run on a cloud server that's always available. The most common production choice.

Provider Smallest Useful Instance Monthly Cost
DigitalOcean 2GB RAM, 1 vCPU ~$12
AWS EC2 t3.small (2GB) ~$15
Google Cloud e2-small (2GB) ~$13
Hetzner CX11 (2GB) ~$4
deploy_vm.sh
# Setup on fresh Ubuntu VM

# Install dependencies
sudo apt update && sudo apt install -y python3.11 python3.11-venv

# Create project directory
mkdir -p ~/agent && cd ~/agent

# Setup virtual environment
python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Create systemd service for auto-restart
sudo tee /etc/systemd/system/agent.service << EOF
[Unit]
Description=AI Agent Service
After=network.target

[Service]
Type=simple
User=$USER
WorkingDirectory=$HOME/agent
Environment="PATH=$HOME/agent/venv/bin"
ExecStart=$HOME/agent/venv/bin/python agent.py
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target
EOF

# Enable and start
sudo systemctl enable agent
sudo systemctl start agent

# Check logs
sudo journalctl -u agent -f

Option 3: Container Deployment

Moderate Complexity Scalable

Package your agent as a Docker container for portability and scaling.

Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Create workspace directory
RUN mkdir -p /app/workspace

# Run agent
CMD ["python", "agent.py"]
docker-compose.yml
version: '3.8'

services:
  agent:
    build: .
    restart: always
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - BRAVE_API_KEY=${BRAVE_API_KEY}
    volumes:
      - ./workspace:/app/workspace  # Persist workspace
      - ./logs:/app/logs
    ports:
      - "8080:8080"  # If exposing HTTP API

Option 4: Serverless / Functions

Pay-per-use Complex State

Run agent logic in serverless functions. Good for event-driven agents with external state management.

lambda_handler.py
import json
from agent import SimpleAgent
import boto3

# Use DynamoDB for state
dynamodb = boto3.resource('dynamodb')
state_table = dynamodb.Table('agent-state')

def handler(event, context):
    """AWS Lambda handler for agent requests."""
    
    # Parse input
    body = json.loads(event.get('body', '{}'))
    user_id = body.get('user_id')
    message = body.get('message')
    
    # Load state
    state = state_table.get_item(Key={'user_id': user_id}).get('Item', {})
    
    # Initialize agent with state
    agent = SimpleAgent()
    agent.conversation_history = state.get('history', [])
    
    # Process message
    response = agent.chat(message)  # Note: need to make this sync for Lambda
    
    # Save state
    state_table.put_item(Item={
        'user_id': user_id,
        'history': agent.conversation_history[-20:]  # Keep last 20
    })
    
    return {
        'statusCode': 200,
        'body': json.dumps({'response': response})
    }

Option 5: Platform-as-a-Service

Easiest Ops Higher Cost

Use a platform designed for agent deployment. Trade flexibility for convenience.

Platform Focus Starting Cost
Railway General Python apps $5/mo + usage
Render Web services $7/mo
Modal AI/ML workloads Pay-per-use
LangServe LangChain agents Varies

How Axis Is Deployed

πŸš€ Axis Production Architecture
Runtime
OpenClaw on dedicated VM (8GB RAM, 4 vCPU)
Handles the main agent loop, tool execution, memory
Storage
Local SSD for workspace files + PostgreSQL for structured data
Memory files on disk, analytics in DB
Interfaces
Web chat (primary), Discord (team), CLI (admin), API (integrations)
Multiple ways to interact based on context
Monitoring
Custom dashboard + alerts via Slack
Track costs, errors, usage patterns
Backup
Daily workspace snapshots to S3
Memory is criticalβ€”never lose it

8. Real Example: How We Built Axis

Theory is nice, but nothing teaches like real experience. Here's the actual story of building Axis, the AI agent that runs significant portions of As Above Technologies.

The Beginning: January 2025

We didn't set out to build an agent framework. We needed a capable AI assistant for our own operationsβ€”managing multiple business units, handling customer inquiries, monitoring systems, creating content. The existing tools (ChatGPT, basic automation) weren't cutting it.

Initial requirements:

Phase 1: Proof of Concept (Weeks 1-2)

Started with a minimal implementation: Claude API + file system tools + basic conversation memory. No framework, just ~500 lines of Python.

First version (simplified)
class Axis:
    def __init__(self):
        self.client = anthropic.Anthropic()
        self.history = []
        self.workspace = Path("./workspace")
        
    def chat(self, message):
        # Load context files
        context = self._load_context_files()
        
        # Call Claude with basic tools
        response = self.client.messages.create(
            model="claude-3-opus",
            system=f"You are Axis. Context:\n{context}",
            messages=self.history + [{"role": "user", "content": message}],
            tools=[read_file_tool, write_file_tool, search_tool]
        )
        
        # Handle tool calls...
        return self._process_response(response)

What we learned:

Phase 2: Tool Explosion (Weeks 3-6)

Once the basics worked, we kept adding tools. Every "I wish Axis could..." became a new tool.

Tools added:

What we learned:

⚠️ The Email Incident

Week 4: Axis drafted an email to a customer and, due to a bug in confirmation flow, actually sent it. The email was fineβ€”polite, accurate, helpful. But we hadn't reviewed it. The customer was happy; we were terrified. Added multiple confirmation layers after that.

Phase 3: Memory Evolution (Weeks 6-10)

Early Axis had goldfish memory. Each day felt like meeting a new assistant. We iterated on memory extensively:

  1. Version 1: Full conversation history in context
    Problem: Context window fills up, expensive, irrelevant old conversations
  2. Version 2: Automatic summarization
    Problem: Lost important details in summaries
  3. Version 3: Hybrid file + conversation system
    Solution: MEMORY.md for curated important facts, conversation history for recent context
  4. Version 4: Daily notes + long-term memory
    Current: Daily logs capture everything, MEMORY.md is curated highlights

The breakthrough was realizing memory should be editable by both human and agent. When Axis learns something important, it can write to MEMORY.md. When it gets something wrong, we can correct it directly.

Phase 4: Multi-Model Strategy (Weeks 10-14)

Using Claude Opus for everything was expensive ($75/M output tokens). We implemented model routing:

model_routing.py
def select_model(task_type: str, complexity: str) -> str:
    """Route to appropriate model based on task."""
    
    if task_type == "simple_question":
        return "claude-3-5-sonnet-20241022"  # Fast, cheap
    
    elif task_type == "complex_reasoning":
        return "claude-opus-4-20250514"  # Best quality
    
    elif task_type == "code_generation":
        return "claude-sonnet-4-20250514"  # Good at code
    
    elif task_type == "image_analysis":
        return "gpt-4o"  # Strong vision
    
    else:
        return "claude-3-5-sonnet-20241022"  # Default: balanced

Result: 60% cost reduction while maintaining quality for complex tasks.

Phase 5: Proactive Behavior (Weeks 14-20)

Axis was helpful when asked, but we wanted proactive assistance. Enter the heartbeat system:

HEARTBEAT.md
# Heartbeat Checklist

## Every heartbeat
- Check for urgent emails (spam filter: ignore marketing)
- Verify production systems are healthy

## Morning (first heartbeat after 8am)
- Summarize calendar for the day
- Check overnight messages

## Every 4 hours
- Review project status in GitHub
- Check analytics dashboards

## Flags
- lastEmailCheck: 2026-01-28T14:30:00
- lastSystemCheck: 2026-01-28T14:45:00

Current State: Axis in 2026

After a year of development and daily use, here's where Axis stands:

Metric Value
Daily interactions 50-100 messages
Tool calls per day 200-400
Active tools 32
Memory files ~400 daily logs, 1 MEMORY.md (~15KB)
Uptime 99.7% (excluding planned maintenance)
Monthly API cost $150-300 (varies with usage)
Estimated time saved 60-80 hours/month

What Axis does regularly:

βœ… The Biggest Win

The compound effect of persistent memory. Axis now knows our business deeplyβ€”customer names, product history, past decisions, learned preferences. It's not starting from zero each interaction. This accumulated context is worth more than any individual capability.

9. Common Pitfalls and How to Avoid Them

We made many mistakes building Axis. Here's what to watch for:

Pitfall 1: Over-Engineering Before Understanding

The trap: Building elaborate infrastructure before proving the basic concept works. Spending weeks on a perfect memory system before having a useful agent.

How to avoid:

Pitfall 2: Tool Definition Neglect

The trap: Writing clear code but vague tool descriptions. The LLM doesn't see your codeβ€”only the descriptions.

How to avoid:

Tool description checklist
# For each tool, answer:
# 1. What does it do? (one sentence)
# 2. When should the agent use it? (specific scenarios)
# 3. When should the agent NOT use it? (common mistakes)
# 4. What parameters does it need? (with examples)
# 5. What does it return? (success and error cases)

Pitfall 3: Insufficient Safety Guardrails

The trap: Trusting the LLM to be careful. It's not malicious, but it can be confidently wrong.

How to avoid:

Pitfall 4: Context Window Stuffing

The trap: Putting everything possible into context, assuming more information is always better.

How to avoid:

Pitfall 5: Ignoring Cost Management

The trap: Not monitoring API costs until you get a surprise bill.

How to avoid:

cost_tracking.py
class CostTracker:
    PRICES = {
        "claude-opus-4-20250514": {"input": 0.015, "output": 0.075},
        "claude-sonnet-4-20250514": {"input": 0.003, "output": 0.015},
        "claude-3-5-sonnet-20241022": {"input": 0.003, "output": 0.015},
        "gpt-4o": {"input": 0.0025, "output": 0.01},
    }
    
    def __init__(self, daily_limit: float = 10.0):
        self.daily_limit = daily_limit
        self.daily_spend = 0.0
        self.last_reset = datetime.now().date()
    
    def track(self, model: str, input_tokens: int, output_tokens: int):
        # Reset daily counter
        if datetime.now().date() != self.last_reset:
            self.daily_spend = 0.0
            self.last_reset = datetime.now().date()
        
        # Calculate cost
        prices = self.PRICES.get(model, {"input": 0.01, "output": 0.03})
        cost = (input_tokens * prices["input"] + output_tokens * prices["output"]) / 1000
        self.daily_spend += cost
        
        # Alert if approaching limit
        if self.daily_spend > self.daily_limit * 0.8:
            self._send_alert(f"Approaching daily limit: ${self.daily_spend:.2f}/${self.daily_limit}")
        
        # Block if over limit
        if self.daily_spend > self.daily_limit:
            raise CostLimitExceeded(f"Daily limit of ${self.daily_limit} exceeded")
        
        return cost

Pitfall 6: Hallucination Blind Trust

The trap: Assuming the agent's output is accurate, especially for facts.

How to avoid:

Pitfall 7: Poor Error Handling

The trap: Agent crashes or gets stuck when tools fail or return unexpected results.

How to avoid:

Error handling pattern
async def execute_with_recovery(tool_name: str, params: dict, max_retries: int = 3):
    """Execute tool with retry and fallback."""
    last_error = None
    
    for attempt in range(max_retries):
        try:
            result = await execute_tool(tool_name, params)
            if result.get("success") or "error" not in result:
                return result
            last_error = result.get("error")
        except Exception as e:
            last_error = str(e)
            await asyncio.sleep(2 ** attempt)  # Exponential backoff
    
    # Return informative error for LLM
    return {
        "error": last_error,
        "suggestion": f"Tool '{tool_name}' failed after {max_retries} attempts. Consider an alternative approach.",
        "attempted_params": params
    }
πŸ’‘ The Meta-Lesson

Most pitfalls come from treating the LLM as either too smart or too dumb. It's neither. It's a powerful but imperfect tool that needs guardrails, clear instructions, and human oversight for high-stakes operations.

10. Cost Considerations and Scaling

AI agents aren't free. Understanding costs helps you build sustainably and scale wisely.

Understanding API Costs

API pricing is per token (roughly 4 characters = 1 token). Input tokens (what you send) are cheaper than output tokens (what you receive).

Model Input ($/1M) Output ($/1M) Typical Request Cost
GPT-4o $2.50 $10.00 $0.01-0.05
Claude 3.5 Sonnet $3.00 $15.00 $0.02-0.08
Claude Sonnet 4 $3.00 $15.00 $0.02-0.08
Claude Opus 4 $15.00 $75.00 $0.10-0.50
GPT-4 Turbo $10.00 $30.00 $0.05-0.20

Cost Breakdown for Typical Agent

Here's what Axis typically costs per month:

πŸ’° Monthly Cost Breakdown (Axis)
LLM API
$120-250/month
~70% Sonnet, ~25% Opus, ~5% GPT-4o for vision
Search API
$15-30/month
Brave Search API for web queries
Embedding
$5-10/month
OpenAI embeddings for semantic search
Compute
$30/month
VPS hosting for 24/7 operation
Total
$170-320/month
Varies with usage intensity

Cost Optimization Strategies

1. Model Routing

Use expensive models only when needed:

# Simple classification to route requests
def classify_complexity(message: str) -> str:
    """Quick heuristic for task complexity."""
    
    # Simple patterns β†’ cheap model
    simple_patterns = [
        r'^(what|when|where|who) is',  # Basic questions
        r'^(hi|hello|hey)',             # Greetings
        r'^(thanks|thank you)',         # Acknowledgments
    ]
    
    for pattern in simple_patterns:
        if re.match(pattern, message.lower()):
            return "simple"
    
    # Complex indicators β†’ expensive model
    complex_indicators = [
        "analyze", "compare", "explain why", "strategy",
        "write a", "draft", "create", "plan"
    ]
    
    if any(ind in message.lower() for ind in complex_indicators):
        return "complex"
    
    return "medium"  # Default

2. Caching

Cache responses for repeated queries:

import hashlib
from functools import lru_cache

class ResponseCache:
    def __init__(self, ttl_seconds: int = 3600):
        self.cache = {}
        self.ttl = ttl_seconds
    
    def get_key(self, messages: list, tools: list) -> str:
        """Generate cache key from request."""
        content = str(messages) + str(tools)
        return hashlib.sha256(content.encode()).hexdigest()
    
    def get(self, key: str) -> dict | None:
        """Get cached response if valid."""
        if key in self.cache:
            entry = self.cache[key]
            if time.time() - entry["timestamp"] < self.ttl:
                return entry["response"]
            del self.cache[key]
        return None
    
    def set(self, key: str, response: dict):
        """Cache a response."""
        self.cache[key] = {
            "response": response,
            "timestamp": time.time()
        }

3. Context Pruning

Minimize tokens in context:

def optimize_context(context: str, max_tokens: int = 10000) -> str:
    """Reduce context size while preserving information."""
    
    # Remove excessive whitespace
    context = re.sub(r'\n\s*\n', '\n\n', context)
    context = re.sub(r'  +', ' ', context)
    
    # Truncate if still too long
    tokens = estimate_tokens(context)
    if tokens > max_tokens:
        # Keep beginning and end, summarize middle
        lines = context.split('\n')
        keep_start = len(lines) // 4
        keep_end = len(lines) // 4
        middle = lines[keep_start:-keep_end]
        
        context = '\n'.join(
            lines[:keep_start] + 
            [f"[... {len(middle)} lines summarized ...]"] +
            lines[-keep_end:]
        )
    
    return context

4. Prompt Optimization

Shorter prompts = lower costs:

Scaling Considerations

As usage grows, consider these patterns:

Scale Requests/Day Architecture Estimated Cost
Personal 10-50 Single instance $30-100/mo
Small Team 100-500 Single instance + queue $150-400/mo
Business 1,000-5,000 Load balanced + workers $500-2,000/mo
Enterprise 10,000+ Distributed + local models $2,000+/mo

When to Consider Local Models

Running models locally (Llama 3, Mixtral) makes sense when:

Hardware requirements for local deployment:

Model VRAM Required Approximate Hardware Cost
Llama 3 8B 16GB $400-800 (RTX 4080)
Llama 3 70B 40-80GB $2,000-8,000 (A100/multi-GPU)
Mixtral 8x7B 24-48GB $1,000-3,000
πŸ’‘ The Hybrid Approach

Many production systems use a hybrid: local models for high-volume, simple tasks (classification, embedding, basic Q&A), cloud APIs for complex reasoning. This balances cost and capability.


Conclusion: Your Path Forward

You now have a comprehensive understanding of how to build AI agentsβ€”from the fundamental architecture to production deployment. But knowledge without action is just entertainment. Here's your concrete path forward:

🎯 Your 30-Day Action Plan
Week 1
Build the basics
Get the simple agent from Section 4 running. Add one custom tool. Experience the fundamentals firsthand.
Week 2
Add memory and tools
Implement file-based memory. Add 2-3 tools relevant to your use case. Start tracking costs.
Week 3
Deploy and use daily
Put it on a server. Make it accessible. Use it for real tasks. Note what's missing.
Week 4
Iterate based on reality
Fix the pain points. Add the tools you actually need. Improve memory based on what matters. Share what you've built.

Building Axis changed how we work at As Above Technologies. The compound effect of persistent memory and capable tools creates something genuinely usefulβ€”an assistant that knows your business and can actually help.

The technology is accessible. The patterns are proven. The only question is whether you'll build or watch others build. We hope you choose to build.

If you build something, we'd love to see it. Share your agent projects, ask questions, and join the community of builders creating the next generation of AI tools.

Ready to explore more technical guides?

Explore Techne

Share this article