- 1. AI Agents vs Chatbots: Understanding the Difference
- 2. The Agent Architecture: LLM + Tools + Memory + Orchestration
- 3. Platforms Compared: OpenClaw, LangChain, AutoGPT, CrewAI
- 4. Step-by-Step: Building a Simple Agent
- 5. Adding Tools and Capabilities
- 6. Memory and Context Management
- 7. Deployment Options
- 8. Real Example: How We Built Axis
- 9. Common Pitfalls and How to Avoid Them
- 10. Cost Considerations and Scaling
In January 2025, we started building what would become Axisβthe AI agent that now manages significant portions of As Above Technologies' operations. It handles customer inquiries, monitors our systems, researches markets, drafts content, and even contributes to its own codebase. It wasn't magic, and it wasn't easy. But it was more achievable than you might think.
This guide is what I wish existed when we started. It covers everything from foundational concepts to production deployment, with code examples and real lessons from building an agent that actually runs a business. By the end, you'll have a clear roadmap for building your own AI agentβwhether it's a weekend project or a production system.
We'll be honest about what works, what doesn't, and where the hype exceeds reality. Building agents is now accessible to developers of all experience levels, but it requires understanding the right patterns and avoiding common traps.
Developers, technical founders, and ambitious entrepreneurs who want to build AI agents. You don't need ML expertise, but basic programming familiarity (Python preferred) will help you get the most from the code examples. Non-developers can still benefit from the architecture and platform sections to make informed decisions.
1. AI Agents vs Chatbots: Understanding the Difference
Before we build anything, we need to be precise about what we're building. The industry uses "AI agent" to describe everything from a slightly enhanced chatbot to science fiction AGI. Here's the taxonomy that actually matters:
The Spectrum of AI Systems
Think of AI systems on a spectrum of autonomy and capability:
| System Type | Autonomy | Tools | Memory | Example |
|---|---|---|---|---|
| Basic Chatbot | None | None | Single conversation | Rule-based support bots |
| LLM Interface | None | None | Single conversation | Basic ChatGPT wrapper |
| Enhanced LLM | Minimal | Built-in only | Session-based | ChatGPT with web browsing |
| AI Assistant | Low | Limited set | Persistent | Custom GPTs, Claude Projects |
| AI Agent | Medium-High | Extensible | Long-term | Axis, OpenClaw agents |
| Autonomous Agent | High | Self-extending | Evolving | AutoGPT, experimental systems |
The Three Defining Characteristics
What separates a true AI agent from a fancy chatbot? Three core capabilities:
1. Tool Use (Action in the World)
A chatbot tells you how to do something. An agent does the thing. This is the most fundamental difference. When you ask an agent to "check if my server is up," it doesn't explain ping commandsβit pings the server and tells you the result.
Tools are the bridge between language and action. They can be:
- Information retrieval: Web search, database queries, API calls
- System interaction: File operations, command execution, browser automation
- Communication: Sending emails, posting to channels, notifications
- Creation: Code generation, image creation, document synthesis
- Integration: CRM updates, calendar management, third-party services
Axis has access to over 30 tools: web search, calendar management, email, file operations, browser automation, code execution, database queries, and our business-specific integrations (inventory systems, customer databases). Each tool extends what Axis can do in the world.
2. Memory (Continuity Across Time)
A chatbot starts fresh each conversation. An agent remembers. This isn't just about technical context windowsβit's about building a persistent understanding that compounds over time.
Effective agent memory includes:
- Working memory: Current task context, active goals, in-progress work
- Short-term memory: Recent interactions, conversation history
- Long-term memory: User preferences, learned procedures, domain knowledge
- Episodic memory: Specific past events, decisions made, outcomes observed
Memory is what allows an agent to say "last time we tried that approach, it didn't work because X" or "you mentioned preferring brief emails" without being reminded each session.
3. Goal-Directed Behavior (Pursuing Objectives)
A chatbot waits for input. An agent pursues goals. This is the autonomy dimensionβthe ability to take a high-level objective and break it down into sub-tasks, execute them in sequence, handle obstacles, and persist until the goal is achieved (or determined impossible).
Compare:
User: "How do I find out what our competitors are charging?"
Chatbot: "You can visit their websites, check industry reports, or use
price monitoring tools like Prisync..."
User: "Find out what our competitors are charging."
Agent: "I'll research that now. [Uses web search to find competitor sites,
navigates to pricing pages, extracts pricing data, synthesizes findings] Here's what I
found: Competitor A charges $49-199/mo, Competitor B is $79/mo flat, Competitor C uses
usage-based pricing starting at $0.01/call. Want me to compile this into a comparison table?"
Why the Distinction Matters
The distinction isn't academicβit determines what you build and how. Agent architecture is fundamentally different from chatbot architecture:
- You need a tool system, not just a prompt template
- You need memory management, not just conversation history
- You need task planning, not just response generation
- You need error handling and recovery, not just retry logic
- You need safety guardrails, not just content filtering
If you're building a chatbot thinking you're building an agent, you'll hit walls. If you're building an agent with chatbot architecture, you'll create something fragile and frustrating. Let's build it right from the start.
2. The Agent Architecture: LLM + Tools + Memory + Orchestration
An AI agent is a system, not just a model. The model (GPT-4, Claude, etc.) is the brain, but brains need bodies, senses, and support systems. Here's the architecture that makes agents work:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ORCHESTRATOR β
β (Receives input, plans tasks, routes to components) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β β β β
β LLM CORE β β TOOL SYSTEM β β MEMORY STORE β
β β β β β β
β β’ Reasoning β β β’ Web Search β β β’ Working β
β β’ Generation β β β’ File I/O β β β’ Short-term β
β β’ Planning β β β’ APIs β β β’ Long-term β
β β’ Synthesis β β β’ Browser β β β’ Retrieval β
β β β β’ Custom β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
ββββββββββββββββββββββΌβββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INTERFACE LAYER β
β (Chat, API, Webhooks, Scheduled Tasks) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Component 1: The LLM Core
The language model is the reasoning engine. It interprets intent, plans approaches, generates outputs, and decides when to use tools. Choosing the right model matters:
| Model | Best For | Context | Cost (per 1M tokens) |
|---|---|---|---|
| GPT-4o | General tasks, speed | 128K | $2.50 in / $10 out |
| GPT-4 Turbo | Complex reasoning | 128K | $10 in / $30 out |
| Claude 3.5 Sonnet | Balanced capability/cost | 200K | $3 in / $15 out |
| Claude Opus 4 | Complex reasoning, agentic tasks | 200K | $15 in / $75 out |
| Llama 3 70B | Self-hosted, privacy | 8K-128K | Hardware cost |
| Mixtral 8x22B | Cost-effective self-hosted | 64K | Hardware cost |
Axis uses a tiered approach: Claude Opus 4 for complex reasoning and planning, Claude 3.5 Sonnet for most day-to-day tasks, and GPT-4o for specific use cases where OpenAI excels. The orchestrator routes tasks to the appropriate model based on complexity and requirements.
Component 2: The Tool System
Tools are functions the agent can call to interact with the world. A well-designed tool system has:
Tool Definition
Each tool needs clear specification that the LLM can understand:
# Example tool definition for a web search tool
web_search_tool = {
"name": "web_search",
"description": "Search the web for current information. Use for questions about recent events, facts you're unsure about, or when user asks to 'look up' something.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query. Be specific and include relevant context."
},
"num_results": {
"type": "integer",
"description": "Number of results to return (1-10)",
"default": 5
}
},
"required": ["query"]
}
}
Tool Execution
The tool executor takes the LLM's tool call and performs the actual action:
async def execute_tool(tool_name: str, parameters: dict) -> dict:
"""Execute a tool and return results."""
if tool_name == "web_search":
results = await brave_search(
query=parameters["query"],
count=parameters.get("num_results", 5)
)
return {
"success": True,
"results": results
}
elif tool_name == "read_file":
content = await read_file(parameters["path"])
return {
"success": True,
"content": content
}
elif tool_name == "send_email":
# Always confirm before external actions
if not parameters.get("confirmed"):
return {
"success": False,
"needs_confirmation": True,
"message": f"Send email to {parameters['to']}?"
}
await send_email(**parameters)
return {"success": True}
else:
return {
"success": False,
"error": f"Unknown tool: {tool_name}"
}
Tool Categories
Organize tools by risk level and type:
-
Read-only tools (safe to run freely):
- Web search
- File reading
- Database queries
- Calendar viewing
- System status checks
-
Write tools (require logging, maybe confirmation):
- File creation/editing
- Database writes
- Calendar event creation
- Note taking
-
External action tools (always require confirmation):
- Sending emails
- Posting to social media
- Making purchases
- Modifying production systems
Component 3: Memory System
Memory is what makes an agent feel intelligent over time. There are multiple layers:
Working Memory (Context Window)
The immediate context the LLM seesβcurrent conversation, task state, relevant retrieved information. This is limited by the model's context window.
Short-Term Memory (Session Storage)
Information persisted across turns in a session but not necessarily across sessions. Typically implemented as in-memory storage or short-TTL cache.
Long-Term Memory (Persistent Storage)
Knowledge that persists across sessions. This requires:
- Storage: Database, file system, or vector store
- Retrieval: How to find relevant memories (search, embeddings)
- Management: What to remember, what to forget, when to consolidate
class AgentMemory:
def __init__(self, storage_path: str):
self.storage_path = storage_path
self.working_memory = {} # Current session state
self.load_long_term_memory()
def load_long_term_memory(self):
"""Load persistent memory from disk."""
self.user_preferences = self._load_json("preferences.json")
self.episodic_memory = self._load_json("episodes.json")
self.learned_procedures = self._load_json("procedures.json")
def remember(self, key: str, value: any, memory_type: str = "working"):
"""Store a memory."""
if memory_type == "working":
self.working_memory[key] = value
elif memory_type == "long_term":
self._persist_to_file(key, value)
def recall(self, query: str, memory_types: list = None) -> list:
"""Retrieve relevant memories."""
results = []
# Search working memory
if "working" in (memory_types or ["working"]):
for key, value in self.working_memory.items():
if self._is_relevant(query, key, value):
results.append({"source": "working", "key": key, "value": value})
# Search long-term memory with embeddings
if "long_term" in (memory_types or []):
relevant = self._semantic_search(query, self.episodic_memory)
results.extend(relevant)
return results
def consolidate(self):
"""Move important working memories to long-term storage."""
# Run periodically to persist important learnings
for key, value in self.working_memory.items():
if self._should_persist(key, value):
self.remember(key, value, memory_type="long_term")
Component 4: The Orchestrator
The orchestrator is the "executive function"βit coordinates everything. Key responsibilities:
- Input processing: Receive and parse user requests
- Task planning: Break complex goals into steps
- Context assembly: Gather relevant memories and context
- LLM routing: Choose which model for which task
- Tool coordination: Manage tool calls and results
- Response synthesis: Compile final output
- Error handling: Recover from failures gracefully
class AgentOrchestrator:
def __init__(self, config: AgentConfig):
self.llm = LLMClient(config.model)
self.tools = ToolRegistry(config.tools)
self.memory = AgentMemory(config.memory_path)
async def handle_request(self, user_input: str) -> str:
"""Main entry point for agent requests."""
# 1. Retrieve relevant context
context = self.memory.recall(user_input)
# 2. Build prompt with context and available tools
messages = self._build_messages(user_input, context)
# 3. Get LLM response (may include tool calls)
response = await self.llm.chat(
messages=messages,
tools=self.tools.get_definitions()
)
# 4. Process tool calls if any
while response.tool_calls:
tool_results = await self._execute_tools(response.tool_calls)
messages.append(response.message)
messages.append(self._format_tool_results(tool_results))
response = await self.llm.chat(messages=messages, tools=self.tools.get_definitions())
# 5. Update memory with interaction
self.memory.remember("last_interaction", {
"input": user_input,
"output": response.content,
"timestamp": datetime.now()
})
return response.content
async def _execute_tools(self, tool_calls: list) -> list:
"""Execute multiple tool calls, potentially in parallel."""
results = []
for call in tool_calls:
result = await self.tools.execute(call.name, call.parameters)
results.append({
"tool": call.name,
"call_id": call.id,
"result": result
})
return results
The ReAct Pattern
Most successful agents use some variant of the ReAct (Reasoning + Acting) pattern:
- Observe: Receive input, gather context
- Think: Reason about what to do (often explicit "thinking" in the prompt)
- Act: Execute a tool or generate output
- Observe: See the result of the action
- Repeat: Continue until task is complete
When Axis receives "Check if our website is down and fix it if possible," the loop looks like:
Think: Need to check website status first.
Act: [calls http_check tool for asabove.tech]
Observe: 503 error, server not responding
Think: Site is down. Should check server status.
Act: [calls server_status tool]
Observe: High CPU, memory exhausted
Think: Need to restart service, but should confirm first.
Act: [requests confirmation to restart]
...and so on
3. Platforms Compared: OpenClaw, LangChain, AutoGPT, CrewAI
You don't have to build everything from scratch. Several platforms provide agent infrastructure. Here's an honest comparison based on our experience:
OpenClaw is what we built Axis on. It's a personal AI assistant framework designed for developers who want a capable agent without drowning in infrastructure complexity.
Key Features
- Built-in tool system with common integrations (web, files, browser, calendar, email)
- Flexible memory with workspace files (SOUL.md, MEMORY.md patterns)
- Multi-model support (switch between Claude, GPT, local models)
- Multiple interfaces: CLI, web chat, API, Discord, Telegram
- Subagent spawning for parallel tasks
- Heartbeat system for proactive behavior
Best For
Developers building personal or business assistants. Those who want a working agent fast but need customization. Teams that value the "workspace as configuration" pattern.
Limitations
- Newer platform, smaller community than LangChain
- Documentation still maturing
- Less suited for massive multi-agent systems
LangChain is the most popular framework for building LLM applications. It provides extensive abstractions for chains, agents, memory, and integrations.
Key Features
- Massive ecosystem of integrations and tools
- LangGraph for complex agent workflows with state machines
- LangSmith for observability and debugging
- Extensive documentation and tutorials
- Large community, many examples
- Support for virtually any LLM provider
Best For
Teams building complex, custom agent systems. Production applications needing observability. Projects requiring specific integrations from the ecosystem.
Limitations
- Abstraction layers can add complexity
- Frequent breaking changes in early versions (stabilizing now)
- Can be overwhelmingβmany ways to do the same thing
- Debugging can be tricky due to abstraction depth
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.tools import Tool
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
# Define tools
tools = [
Tool(
name="web_search",
description="Search the web for information",
func=lambda q: web_search(q)
),
Tool(
name="calculator",
description="Perform mathematical calculations",
func=lambda expr: eval(expr) # Don't do this in production!
)
]
# Create agent
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
agent = create_openai_tools_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
# Run
result = executor.invoke({"input": "What's the population of Tokyo?"})
AutoGPT pioneered the "give an AI a goal and let it figure out how" paradigm. It's more experimental than production-ready, but influential.
Key Features
- Fully autonomous operation (set goal, watch it work)
- Self-prompting loop with planning and reflection
- Built-in web browsing, code execution, file management
- Memory with pinecone/local vector stores
- Plugin system for extensions
Best For
Experimentation and learning. Research into autonomous systems. Tasks where high autonomy is acceptable and cost isn't a primary concern.
Limitations
- High token consumption (loops can be expensive)
- Can get stuck in loops or pursue tangents
- Limited controllability once started
- Not recommended for production business processes
CrewAI specializes in multi-agent systems where different "crew members" collaborate on tasks. Each agent has a role, goal, and backstory.
Key Features
- Role-based agent design (researcher, writer, reviewer, etc.)
- Hierarchical and collaborative process types
- Task delegation between agents
- Clean, intuitive API
- Good for content pipelines and research workflows
Best For
Workflows that naturally decompose into roles. Content creation pipelines. Research and analysis tasks. Teams exploring multi-agent patterns.
Limitations
- Multi-agent overhead isn't always necessary
- Can be slower than single-agent approaches
- Coordination between agents can be unpredictable
- Less flexible for highly custom requirements
from crewai import Agent, Task, Crew, Process
# Define agents
researcher = Agent(
role='Research Analyst',
goal='Find and analyze market information',
backstory='Expert at finding and synthesizing information',
tools=[web_search_tool]
)
writer = Agent(
role='Content Writer',
goal='Create compelling content from research',
backstory='Experienced writer who turns data into narratives',
tools=[write_file_tool]
)
# Define tasks
research_task = Task(
description='Research the AI agent market landscape in 2026',
agent=researcher,
expected_output='Detailed market analysis with key players and trends'
)
writing_task = Task(
description='Write a blog post based on the research',
agent=writer,
context=[research_task],
expected_output='1500-word blog post in markdown format'
)
# Create crew and execute
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, writing_task],
process=Process.sequential
)
result = crew.kickoff()
Platform Selection Guide
| If You Want... | Choose | Why |
|---|---|---|
| Personal assistant, fast setup | OpenClaw | Batteries included, workspace-centric |
| Maximum flexibility, enterprise | LangChain + LangGraph | Most mature ecosystem, best tooling |
| Full autonomy experiments | AutoGPT | Designed for autonomous operation |
| Multi-agent collaboration | CrewAI | Purpose-built for agent teams |
| Learning/understanding agents | Build from scratch | Nothing teaches like building |
Start with OpenClaw or raw API calls to understand the fundamentals. Graduate to LangChain when you need specific integrations or complex workflows. Consider CrewAI for content pipelines. Use AutoGPT for experiments, not production.
4. Step-by-Step: Building a Simple Agent
Let's build an agent from scratch. We'll create a research assistant that can search the web, read files, and maintain conversation memory. This will teach you the fundamentals before using any framework.
Step 1: Project Setup
anthropic>=0.18.0
aiohttp>=3.9.0
python-dotenv>=1.0.0
ANTHROPIC_API_KEY=your_api_key_here
BRAVE_API_KEY=your_brave_search_key # Optional, for web search
Step 2: Define Your Tools
import aiohttp
import os
from typing import Any
from pathlib import Path
# Tool definitions for Claude
TOOLS = [
{
"name": "web_search",
"description": "Search the web for current information. Returns titles, URLs, and snippets.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query"
}
},
"required": ["query"]
}
},
{
"name": "read_file",
"description": "Read the contents of a file from the workspace.",
"input_schema": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Path to the file"
}
},
"required": ["path"]
}
},
{
"name": "write_file",
"description": "Write content to a file. Creates directories if needed.",
"input_schema": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Path to write to"
},
"content": {
"type": "string",
"description": "Content to write"
}
},
"required": ["path", "content"]
}
},
{
"name": "remember",
"description": "Save important information to long-term memory.",
"input_schema": {
"type": "object",
"properties": {
"key": {
"type": "string",
"description": "Memory key/label"
},
"value": {
"type": "string",
"description": "Information to remember"
}
},
"required": ["key", "value"]
}
}
]
class ToolExecutor:
def __init__(self, workspace: str = "./workspace"):
self.workspace = Path(workspace)
self.workspace.mkdir(exist_ok=True)
self.memory = {}
self._load_memory()
def _load_memory(self):
"""Load persistent memory from file."""
memory_file = self.workspace / "memory.json"
if memory_file.exists():
import json
self.memory = json.loads(memory_file.read_text())
def _save_memory(self):
"""Persist memory to file."""
import json
memory_file = self.workspace / "memory.json"
memory_file.write_text(json.dumps(self.memory, indent=2))
async def execute(self, tool_name: str, tool_input: dict) -> Any:
"""Execute a tool and return results."""
if tool_name == "web_search":
return await self._web_search(tool_input["query"])
elif tool_name == "read_file":
return self._read_file(tool_input["path"])
elif tool_name == "write_file":
return self._write_file(tool_input["path"], tool_input["content"])
elif tool_name == "remember":
return self._remember(tool_input["key"], tool_input["value"])
else:
return {"error": f"Unknown tool: {tool_name}"}
async def _web_search(self, query: str) -> dict:
"""Perform web search using Brave API."""
api_key = os.getenv("BRAVE_API_KEY")
if not api_key:
return {"error": "BRAVE_API_KEY not configured"}
async with aiohttp.ClientSession() as session:
async with session.get(
"https://api.search.brave.com/res/v1/web/search",
headers={"X-Subscription-Token": api_key},
params={"q": query, "count": 5}
) as resp:
if resp.status != 200:
return {"error": f"Search failed: {resp.status}"}
data = await resp.json()
results = []
for item in data.get("web", {}).get("results", []):
results.append({
"title": item.get("title"),
"url": item.get("url"),
"snippet": item.get("description")
})
return {"results": results}
def _read_file(self, path: str) -> dict:
"""Read a file from workspace."""
file_path = self.workspace / path
if not file_path.exists():
return {"error": f"File not found: {path}"}
if not str(file_path.resolve()).startswith(str(self.workspace.resolve())):
return {"error": "Access denied: path outside workspace"}
return {"content": file_path.read_text()}
def _write_file(self, path: str, content: str) -> dict:
"""Write content to a file."""
file_path = self.workspace / path
if not str(file_path.resolve()).startswith(str(self.workspace.resolve())):
return {"error": "Access denied: path outside workspace"}
file_path.parent.mkdir(parents=True, exist_ok=True)
file_path.write_text(content)
return {"success": True, "path": str(file_path)}
def _remember(self, key: str, value: str) -> dict:
"""Store information in persistent memory."""
self.memory[key] = value
self._save_memory()
return {"success": True, "remembered": key}
Step 3: Build the Agent Core
import anthropic
import asyncio
from tools import TOOLS, ToolExecutor
from typing import Optional
from datetime import datetime
class SimpleAgent:
def __init__(self, model: str = "claude-sonnet-4-20250514"):
self.client = anthropic.Anthropic()
self.model = model
self.tool_executor = ToolExecutor()
self.conversation_history = []
# System prompt defines agent behavior
self.system_prompt = """You are a helpful AI research assistant. You can:
- Search the web for current information
- Read and write files in your workspace
- Remember important information across conversations
When given a task:
1. Think about what information or actions you need
2. Use your tools to gather information or take actions
3. Synthesize findings into a clear response
Be thorough but concise. If you're unsure about something, say so.
If a task requires multiple steps, work through them systematically.
Current date: {date}
Memories: {memories}
"""
def _build_system_prompt(self) -> str:
"""Build system prompt with current context."""
memories_str = "\n".join(
f"- {k}: {v}" for k, v in self.tool_executor.memory.items()
) if self.tool_executor.memory else "None stored yet."
return self.system_prompt.format(
date=datetime.now().strftime("%Y-%m-%d"),
memories=memories_str
)
async def chat(self, user_message: str) -> str:
"""Process a user message and return response."""
# Add user message to history
self.conversation_history.append({
"role": "user",
"content": user_message
})
# Keep history manageable (last 20 turns)
if len(self.conversation_history) > 40:
self.conversation_history = self.conversation_history[-40:]
# Call Claude with tools
response = self.client.messages.create(
model=self.model,
max_tokens=4096,
system=self._build_system_prompt(),
tools=TOOLS,
messages=self.conversation_history
)
# Process response - may need multiple turns for tool use
while response.stop_reason == "tool_use":
# Extract tool calls from response
assistant_content = response.content
tool_results = []
for block in assistant_content:
if block.type == "tool_use":
print(f" β Using tool: {block.name}")
result = await self.tool_executor.execute(
block.name,
block.input
)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result)
})
# Add assistant message and tool results to history
self.conversation_history.append({
"role": "assistant",
"content": assistant_content
})
self.conversation_history.append({
"role": "user",
"content": tool_results
})
# Continue conversation with tool results
response = self.client.messages.create(
model=self.model,
max_tokens=4096,
system=self._build_system_prompt(),
tools=TOOLS,
messages=self.conversation_history
)
# Extract final text response
final_response = ""
for block in response.content:
if hasattr(block, "text"):
final_response += block.text
# Add final response to history
self.conversation_history.append({
"role": "assistant",
"content": response.content
})
return final_response
async def main():
"""Interactive chat loop."""
agent = SimpleAgent()
print("Agent ready. Type 'quit' to exit.\n")
while True:
try:
user_input = input("You: ").strip()
if user_input.lower() in ['quit', 'exit', 'q']:
break
if not user_input:
continue
print("Agent: ", end="", flush=True)
response = await agent.chat(user_input)
print(response)
print()
except KeyboardInterrupt:
break
print("\nGoodbye!")
if __name__ == "__main__":
asyncio.run(main())
Step 4: Test Your Agent
$ python agent.py
Agent ready. Type 'quit' to exit.
You: What's the current state of AI agent technology? Search for recent news.
β Using tool: web_search
Agent: Based on my search, here's the current state of AI agent technology in early 2026:
**Major Developments:**
1. **Claude's Computer Use** - Anthropic's ability to control computers directly has matured...
2. **OpenAI Operator** - GPT-4 can now browse the web and complete multi-step tasks...
3. **Open Source Progress** - Llama 3 and Mixtral are enabling self-hosted agent systems...
Would you like me to save this summary to a file?
You: Yes, save it to research/ai-agents-2026.md
β Using tool: write_file
Agent: Done! I've saved the summary to research/ai-agents-2026.md.
You: Remember that I'm interested in AI agents for business automation
β Using tool: remember
Agent: Noted! I'll remember your interest in AI agents for business automation.
This will help me tailor future research and suggestions to your focus area.
You: quit
Goodbye!
What You've Built
Congratulations! You now have a working AI agent with:
- β Tool use (web search, file I/O)
- β Conversation memory (within session)
- β Persistent memory (across sessions)
- β Multi-turn tool calling (can use multiple tools per request)
- β Safety constraints (file access limited to workspace)
This is a foundation you can build on. In the next sections, we'll add more sophisticated tools and memory systems.
The complete code for this tutorial is available at github.com/asabove-tech/simple-agent-tutorial. Star it to bookmark for later!
5. Adding Tools and Capabilities
Tools are what transform an LLM from a text generator into a capable agent. Let's explore how to add more sophisticated capabilities.
Tool Design Principles
1. Clear, Specific Descriptions
The LLM decides when to use tools based on descriptions. Be explicit about:
- What the tool does
- When to use it (and when not to)
- What parameters mean
- What the output looks like
"name": "search", "description": "Searches for stuff"
"name": "web_search", "description": "Search the web using Brave Search API.
Use for questions about current events, facts you're uncertain about, or when
the user explicitly asks to look something up. Returns titles, URLs, and snippets
for the top results. Not suitable for accessing specific websitesβuse browser
tools for that."
2. Appropriate Granularity
Tools should be atomic enough to be composable, but not so granular that simple tasks require many calls:
- Too granular: separate tools for "open_file", "read_line", "close_file"
- Too coarse: one tool that "researches a topic and writes a report"
- Right level: "read_file" that handles opening, reading, and returns content
3. Meaningful Error Messages
When tools fail, return errors the LLM can act on:
# Bad: unhelpful error
return {"error": "Failed"}
# Good: actionable error
return {
"error": "File not found",
"details": f"No file at path '{path}'",
"suggestion": "Check if the path is correct or use list_files to see available files"
}
Common Tool Categories
Information Retrieval Tools
# Web search with multiple providers
async def web_search(query: str, provider: str = "brave") -> dict:
"""Multi-provider web search."""
if provider == "brave":
return await brave_search(query)
elif provider == "serper":
return await serper_search(query)
elif provider == "tavily":
return await tavily_search(query) # Good for AI-optimized results
# URL content fetching
async def fetch_url(url: str, extract_mode: str = "markdown") -> dict:
"""Fetch and extract content from a URL."""
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
html = await resp.text()
if extract_mode == "markdown":
# Convert HTML to readable markdown
content = html_to_markdown(html)
elif extract_mode == "text":
content = extract_text(html)
else:
content = html
return {
"url": url,
"content": content[:50000], # Limit size
"truncated": len(content) > 50000
}
# Database queries (read-only!)
def query_database(sql: str, database: str = "analytics") -> dict:
"""Execute read-only SQL query."""
# Validate query is SELECT only
if not sql.strip().upper().startswith("SELECT"):
return {"error": "Only SELECT queries allowed"}
conn = get_connection(database)
try:
results = conn.execute(sql).fetchall()
return {"columns": [d[0] for d in conn.description], "rows": results}
except Exception as e:
return {"error": str(e)}
Communication Tools
# Email with confirmation
async def send_email(
to: str,
subject: str,
body: str,
confirmed: bool = False
) -> dict:
"""Send email. Requires confirmation for safety."""
if not confirmed:
return {
"needs_confirmation": True,
"preview": {
"to": to,
"subject": subject,
"body_preview": body[:200] + "..." if len(body) > 200 else body
},
"message": "Please confirm you want to send this email."
}
# Actually send
result = await email_client.send(to=to, subject=subject, body=body)
return {"success": True, "message_id": result.id}
# Slack/Discord messaging
async def send_message(
channel: str,
message: str,
platform: str = "slack"
) -> dict:
"""Send message to team chat."""
if platform == "slack":
return await slack_client.post(channel=channel, text=message)
elif platform == "discord":
return await discord_client.send(channel_id=channel, content=message)
# Calendar integration
async def create_calendar_event(
title: str,
start_time: str,
end_time: str,
attendees: list = None,
confirmed: bool = False
) -> dict:
"""Create calendar event. Requires confirmation."""
if not confirmed:
return {
"needs_confirmation": True,
"preview": {"title": title, "start": start_time, "end": end_time},
"message": "Please confirm you want to create this event."
}
event = await calendar_client.create_event(
title=title,
start=start_time,
end=end_time,
attendees=attendees or []
)
return {"success": True, "event_id": event.id, "link": event.html_link}
Code Execution Tools
import subprocess
import tempfile
from pathlib import Path
async def execute_python(code: str, timeout: int = 30) -> dict:
"""Execute Python code in sandboxed environment."""
# Create temp file
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
f.write(code)
temp_path = f.name
try:
# Run with timeout and restricted permissions
result = subprocess.run(
['python', temp_path],
capture_output=True,
text=True,
timeout=timeout,
cwd=tempfile.gettempdir(), # Isolated directory
)
return {
"stdout": result.stdout,
"stderr": result.stderr,
"return_code": result.returncode
}
except subprocess.TimeoutExpired:
return {"error": f"Execution timed out after {timeout}s"}
finally:
Path(temp_path).unlink() # Clean up
async def execute_shell(command: str, confirmed: bool = False) -> dict:
"""Execute shell command. DANGEROUS - requires confirmation."""
# Blacklist dangerous commands
dangerous = ['rm -rf', 'sudo', 'mkfs', 'dd if=', '> /dev']
if any(d in command for d in dangerous):
return {"error": "Command blocked for safety"}
if not confirmed:
return {
"needs_confirmation": True,
"command": command,
"message": "Please confirm you want to run this shell command."
}
result = subprocess.run(
command,
shell=True,
capture_output=True,
text=True,
timeout=60
)
return {
"stdout": result.stdout,
"stderr": result.stderr,
"return_code": result.returncode
}
Browser Automation Tools
from playwright.async_api import async_playwright
class BrowserTool:
def __init__(self):
self.browser = None
self.page = None
async def initialize(self):
"""Start browser instance."""
playwright = await async_playwright().start()
self.browser = await playwright.chromium.launch(headless=True)
self.page = await self.browser.new_page()
async def navigate(self, url: str) -> dict:
"""Navigate to URL and return page content."""
await self.page.goto(url, wait_until="networkidle")
# Extract readable content
content = await self.page.evaluate("""
() => {
const article = document.querySelector('article') || document.body;
return article.innerText;
}
""")
return {
"url": url,
"title": await self.page.title(),
"content": content[:30000]
}
async def screenshot(self, path: str = None) -> dict:
"""Take screenshot of current page."""
if not path:
path = f"screenshot_{int(time.time())}.png"
await self.page.screenshot(path=path, full_page=True)
return {"path": path}
async def click(self, selector: str) -> dict:
"""Click element by selector."""
try:
await self.page.click(selector, timeout=5000)
return {"success": True}
except Exception as e:
return {"error": str(e)}
async def fill(self, selector: str, text: str) -> dict:
"""Fill input field."""
try:
await self.page.fill(selector, text)
return {"success": True}
except Exception as e:
return {"error": str(e)}
async def get_page_structure(self) -> dict:
"""Get accessible page structure for agent reasoning."""
structure = await self.page.evaluate("""
() => {
function getAccessibleTree(el, depth = 0) {
if (depth > 3) return null;
const nodes = [];
for (const child of el.children) {
const role = child.getAttribute('role') || child.tagName.toLowerCase();
const text = child.innerText?.slice(0, 50);
if (['a', 'button', 'input', 'select', 'textarea'].includes(role) ||
child.getAttribute('role')) {
nodes.push({
role,
text,
selector: child.id ? '#' + child.id : null
});
}
const childNodes = getAccessibleTree(child, depth + 1);
if (childNodes) nodes.push(...childNodes);
}
return nodes;
}
return getAccessibleTree(document.body);
}
""")
return {"interactive_elements": structure}
Tool Safety Patterns
Safety is critical. Here are patterns we use with Axis:
1. Confirmation for Dangerous Actions
def requires_confirmation(tool_name: str, params: dict) -> bool:
"""Determine if action needs human approval."""
# Always confirm external communications
if tool_name in ['send_email', 'post_tweet', 'send_slack']:
return True
# Always confirm financial actions
if tool_name in ['make_purchase', 'transfer_funds']:
return True
# Confirm destructive file operations
if tool_name == 'delete_file':
return True
# Confirm shell commands
if tool_name == 'execute_shell':
return True
return False
2. Allowlists Over Blocklists
# Bad: trying to block dangerous things
BLOCKED_COMMANDS = ['rm', 'sudo', 'wget', ...] # Will miss something
# Good: explicitly allow safe things
ALLOWED_COMMANDS = ['ls', 'cat', 'grep', 'find', 'wc', 'head', 'tail']
def is_safe_command(command: str) -> bool:
cmd = command.split()[0]
return cmd in ALLOWED_COMMANDS
3. Rate Limiting
from collections import defaultdict
from time import time
class RateLimiter:
def __init__(self):
self.calls = defaultdict(list)
def check(self, tool: str, limit: int, window: int) -> bool:
"""Check if tool call is within rate limit."""
now = time()
# Remove old calls outside window
self.calls[tool] = [t for t in self.calls[tool] if now - t < window]
if len(self.calls[tool]) >= limit:
return False
self.calls[tool].append(now)
return True
rate_limiter = RateLimiter()
# Usage
async def execute_tool(name: str, params: dict):
# Limit expensive operations
if name == "web_search" and not rate_limiter.check("web_search", 10, 60):
return {"error": "Rate limit exceeded. Max 10 searches per minute."}
...
4. Sandboxing
# File operations restricted to workspace
class SandboxedFileSystem:
def __init__(self, root: Path):
self.root = root.resolve()
def validate_path(self, path: str) -> Path:
"""Ensure path is within sandbox."""
resolved = (self.root / path).resolve()
if not str(resolved).startswith(str(self.root)):
raise PermissionError(f"Access denied: {path} is outside workspace")
return resolved
def read(self, path: str) -> str:
safe_path = self.validate_path(path)
return safe_path.read_text()
def write(self, path: str, content: str):
safe_path = self.validate_path(path)
safe_path.parent.mkdir(parents=True, exist_ok=True)
safe_path.write_text(content)
Every tool you add is a potential attack vector. Assume the LLM might be tricked into misusing tools (prompt injection). Defense in depth: validate inputs, confirm dangerous actions, sandbox execution, log everything.
6. Memory and Context Management
Memory is what transforms a stateless text generator into something that feels intelligent over time. But it's also one of the hardest problems in agent design.
The Memory Challenge
Context windows are limited. Even Claude's 200K tokens fill up quickly with:
- System prompts and tool definitions (~2-5K tokens)
- Conversation history (grows with each turn)
- Retrieved documents and search results
- Tool call inputs and outputs
You need strategies for:
- Deciding what goes into the context window
- Storing information that doesn't fit
- Retrieving relevant information when needed
- Forgetting information that's no longer useful
Memory Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CONTEXT WINDOW β
β System Prompt + Recent History + Retrieved Memories β
β (Token Limited) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β²
β Retrieval
ββββββββββββββββββββββΌβββββββββββββββββββββ
β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β WORKING MEMORY β β CONVERSATION β β LONG-TERM β
β β β STORE β β MEMORY β
β Current task β β β β β
β Active goals β β Full history β β User prefs β
β Temp variables β β Summaries β β Learned facts β
β β β β β Procedures β
β (In-memory) β β (Database) β β (Vector store) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Strategy 1: Conversation Summarization
Instead of keeping full conversation history, periodically summarize:
class ConversationMemory:
def __init__(self, llm_client, max_messages: int = 20):
self.llm = llm_client
self.max_messages = max_messages
self.messages = []
self.summaries = []
def add_message(self, role: str, content: str):
self.messages.append({"role": role, "content": content})
# Summarize when history gets too long
if len(self.messages) > self.max_messages:
self._summarize_oldest()
def _summarize_oldest(self):
"""Summarize oldest messages and remove them."""
# Take first half of messages
to_summarize = self.messages[:self.max_messages // 2]
summary = self.llm.chat([{
"role": "user",
"content": f"""Summarize this conversation concisely, preserving key facts,
decisions, and context needed for continuation:
{self._format_messages(to_summarize)}"""
}])
self.summaries.append({
"timestamp": datetime.now(),
"summary": summary,
"message_count": len(to_summarize)
})
# Remove summarized messages
self.messages = self.messages[self.max_messages // 2:]
def get_context(self) -> list:
"""Get context for LLM, including summaries."""
context = []
# Include summaries of older conversations
if self.summaries:
summary_text = "\n\n".join(
f"[Earlier: {s['summary']}]" for s in self.summaries[-3:] # Last 3 summaries
)
context.append({
"role": "user",
"content": f"Previous conversation context:\n{summary_text}"
})
# Include recent messages
context.extend(self.messages)
return context
Strategy 2: Semantic Retrieval (RAG)
Store memories as embeddings and retrieve relevant ones:
import numpy as np
from typing import List, Tuple
class SemanticMemory:
def __init__(self, embedding_model):
self.embedding_model = embedding_model
self.memories = [] # List of (text, embedding, metadata)
def store(self, text: str, metadata: dict = None):
"""Store a memory with its embedding."""
embedding = self.embedding_model.embed(text)
self.memories.append({
"text": text,
"embedding": embedding,
"metadata": metadata or {},
"timestamp": datetime.now()
})
def retrieve(self, query: str, top_k: int = 5) -> List[dict]:
"""Retrieve most relevant memories."""
query_embedding = self.embedding_model.embed(query)
# Calculate similarities
scored = []
for memory in self.memories:
similarity = self._cosine_similarity(
query_embedding,
memory["embedding"]
)
scored.append((similarity, memory))
# Return top-k
scored.sort(reverse=True, key=lambda x: x[0])
return [
{"score": score, **memory}
for score, memory in scored[:top_k]
]
def _cosine_similarity(self, a: np.ndarray, b: np.ndarray) -> float:
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Usage with OpenAI embeddings
from openai import OpenAI
class OpenAIEmbedding:
def __init__(self):
self.client = OpenAI()
def embed(self, text: str) -> np.ndarray:
response = self.client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return np.array(response.data[0].embedding)
# Initialize
memory = SemanticMemory(OpenAIEmbedding())
# Store memories
memory.store("User prefers concise responses", {"type": "preference"})
memory.store("User's company is in the cannabis industry", {"type": "fact"})
memory.store("Last project was building an inventory system", {"type": "history"})
# Retrieve relevant memories
relevant = memory.retrieve("What kind of business does the user have?")
# Returns the cannabis industry memory
Strategy 3: Structured Memory Files
OpenClaw uses a file-based approach that's simple but effective:
# Long-Term Memory
## User Preferences
- Prefers concise, actionable responses
- Usually works US Pacific time zone
- Technical background (can show code)
## Important Facts
- Company: As Above Technologies
- Industry: AI/Software
- Key product: OpenClaw agent platform
## Learned Procedures
- For email drafts: always ask about tone before drafting
- For research: provide sources and confidence levels
- For code: include comments and explain reasoning
## Recent Decisions
- 2026-01-15: Decided to use Claude Opus 4 for complex reasoning tasks
- 2026-01-20: Established daily standup format for project updates
## Ongoing Projects
- Building customer onboarding automation
- Writing technical documentation for API
The agent loads this file into context at session start and updates it when learning new information:
class FileBasedMemory:
def __init__(self, workspace: Path):
self.memory_file = workspace / "MEMORY.md"
self.daily_file = workspace / f"memory/{datetime.now().strftime('%Y-%m-%d')}.md"
def load_context(self) -> str:
"""Load memory for session start."""
context = ""
# Long-term memory
if self.memory_file.exists():
context += f"## Long-term Memory\n{self.memory_file.read_text()}\n\n"
# Recent daily notes
memory_dir = self.workspace / "memory"
if memory_dir.exists():
recent_files = sorted(memory_dir.glob("*.md"))[-3:] # Last 3 days
for f in recent_files:
context += f"## Notes from {f.stem}\n{f.read_text()}\n\n"
return context
def append_daily(self, note: str):
"""Add to today's memory file."""
self.daily_file.parent.mkdir(exist_ok=True)
with open(self.daily_file, "a") as f:
f.write(f"\n- {datetime.now().strftime('%H:%M')}: {note}")
def update_long_term(self, section: str, content: str):
"""Update a section of long-term memory."""
if not self.memory_file.exists():
self.memory_file.write_text(f"# Long-Term Memory\n\n## {section}\n{content}")
return
current = self.memory_file.read_text()
# Find and update section, or append
# (Implementation details omitted for brevity)
Strategy 4: Hierarchical Memory
Different information has different lifespans and access patterns:
class HierarchicalMemory:
"""
Tier 1: Always in context (user prefs, core facts)
Tier 2: Retrieved on relevance (episodic memory)
Tier 3: Retrieved on explicit request (archived knowledge)
"""
def __init__(self):
self.tier1_always = {} # Small, critical info
self.tier2_indexed = [] # Semantic search
self.tier3_archived = {} # Keyword lookup
def build_context(self, query: str) -> str:
"""Build memory context for a query."""
context_parts = []
# Always include tier 1
if self.tier1_always:
context_parts.append("Core context:")
for key, value in self.tier1_always.items():
context_parts.append(f"- {key}: {value}")
# Retrieve relevant tier 2
relevant = self._semantic_search(query, self.tier2_indexed, top_k=5)
if relevant:
context_parts.append("\nRelevant memories:")
for mem in relevant:
context_parts.append(f"- {mem['text']}")
# Tier 3 only if explicitly referenced
# (agent can request via tool)
return "\n".join(context_parts)
def promote_memory(self, memory_id: str, to_tier: int):
"""Move memory between tiers based on access patterns."""
# Frequently accessed tier 2 -> tier 1
# Rarely accessed tier 2 -> tier 3
pass
def consolidate(self):
"""Periodically consolidate and reorganize memories."""
# Merge similar memories
# Archive old unused memories
# Update tier 1 based on importance
pass
The Axis Memory System
Here's how memory actually works in Axis:
- SOUL.md: Permanent personality, values, behavioral guidelines
- USER.md: Information about the human(s) Axis works with
- MEMORY.md: Curated long-term memory (manually and auto-updated)
- memory/YYYY-MM-DD.md: Daily logs of significant events
- TOOLS.md: Environment-specific information (API keys location, server names)
- Conversation history: Last 20-30 turns, summarized when longer
The key insight: treat memory like a well-organized filing system, not a database. The agent can read and update these files, creating a form of self-modifying memory that persists across sessions and is human-readable for debugging.
7. Deployment Options
You've built an agent. Now where does it run? The deployment choice affects reliability, cost, latency, and what's possible.
Option 1: Local Development Machine
Simplest Not for Production
Run the agent on your laptop/desktop. Good for development and personal use.
- Pros: Zero infrastructure cost, easy debugging, fast iteration
- Cons: Only works when computer is on, no remote access, can't scale
- Good for: Development, personal assistant, testing
# Simple local deployment
python agent.py
# Or with auto-reload for development
watchmedo auto-restart --patterns="*.py" -- python agent.py
Option 2: Cloud VM (Always-On)
Moderate Complexity Reliable
Run on a cloud server that's always available. The most common production choice.
| Provider | Smallest Useful Instance | Monthly Cost |
|---|---|---|
| DigitalOcean | 2GB RAM, 1 vCPU | ~$12 |
| AWS EC2 | t3.small (2GB) | ~$15 |
| Google Cloud | e2-small (2GB) | ~$13 |
| Hetzner | CX11 (2GB) | ~$4 |
# Setup on fresh Ubuntu VM
# Install dependencies
sudo apt update && sudo apt install -y python3.11 python3.11-venv
# Create project directory
mkdir -p ~/agent && cd ~/agent
# Setup virtual environment
python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Create systemd service for auto-restart
sudo tee /etc/systemd/system/agent.service << EOF
[Unit]
Description=AI Agent Service
After=network.target
[Service]
Type=simple
User=$USER
WorkingDirectory=$HOME/agent
Environment="PATH=$HOME/agent/venv/bin"
ExecStart=$HOME/agent/venv/bin/python agent.py
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
EOF
# Enable and start
sudo systemctl enable agent
sudo systemctl start agent
# Check logs
sudo journalctl -u agent -f
Option 3: Container Deployment
Moderate Complexity Scalable
Package your agent as a Docker container for portability and scaling.
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Create workspace directory
RUN mkdir -p /app/workspace
# Run agent
CMD ["python", "agent.py"]
version: '3.8'
services:
agent:
build: .
restart: always
environment:
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- BRAVE_API_KEY=${BRAVE_API_KEY}
volumes:
- ./workspace:/app/workspace # Persist workspace
- ./logs:/app/logs
ports:
- "8080:8080" # If exposing HTTP API
Option 4: Serverless / Functions
Pay-per-use Complex State
Run agent logic in serverless functions. Good for event-driven agents with external state management.
- Pros: Scale to zero, pay only for usage, no server management
- Cons: Cold starts, execution time limits, state must be external
- Good for: Webhook handlers, scheduled tasks, low-volume agents
import json
from agent import SimpleAgent
import boto3
# Use DynamoDB for state
dynamodb = boto3.resource('dynamodb')
state_table = dynamodb.Table('agent-state')
def handler(event, context):
"""AWS Lambda handler for agent requests."""
# Parse input
body = json.loads(event.get('body', '{}'))
user_id = body.get('user_id')
message = body.get('message')
# Load state
state = state_table.get_item(Key={'user_id': user_id}).get('Item', {})
# Initialize agent with state
agent = SimpleAgent()
agent.conversation_history = state.get('history', [])
# Process message
response = agent.chat(message) # Note: need to make this sync for Lambda
# Save state
state_table.put_item(Item={
'user_id': user_id,
'history': agent.conversation_history[-20:] # Keep last 20
})
return {
'statusCode': 200,
'body': json.dumps({'response': response})
}
Option 5: Platform-as-a-Service
Easiest Ops Higher Cost
Use a platform designed for agent deployment. Trade flexibility for convenience.
| Platform | Focus | Starting Cost |
|---|---|---|
| Railway | General Python apps | $5/mo + usage |
| Render | Web services | $7/mo |
| Modal | AI/ML workloads | Pay-per-use |
| LangServe | LangChain agents | Varies |
How Axis Is Deployed
Handles the main agent loop, tool execution, memory
Memory files on disk, analytics in DB
Multiple ways to interact based on context
Track costs, errors, usage patterns
Memory is criticalβnever lose it
8. Real Example: How We Built Axis
Theory is nice, but nothing teaches like real experience. Here's the actual story of building Axis, the AI agent that runs significant portions of As Above Technologies.
The Beginning: January 2025
We didn't set out to build an agent framework. We needed a capable AI assistant for our own operationsβmanaging multiple business units, handling customer inquiries, monitoring systems, creating content. The existing tools (ChatGPT, basic automation) weren't cutting it.
Initial requirements:
- Persistent memory across sessions (critical for business context)
- Tool use: web search, file operations, system commands
- Multiple interfaces: chat, Discord, scheduled tasks
- Safety guardrails: never send external communications without confirmation
- Cost efficiency: can't spend $500/day on API calls
Phase 1: Proof of Concept (Weeks 1-2)
Started with a minimal implementation: Claude API + file system tools + basic conversation memory. No framework, just ~500 lines of Python.
class Axis:
def __init__(self):
self.client = anthropic.Anthropic()
self.history = []
self.workspace = Path("./workspace")
def chat(self, message):
# Load context files
context = self._load_context_files()
# Call Claude with basic tools
response = self.client.messages.create(
model="claude-3-opus",
system=f"You are Axis. Context:\n{context}",
messages=self.history + [{"role": "user", "content": message}],
tools=[read_file_tool, write_file_tool, search_tool]
)
# Handle tool calls...
return self._process_response(response)
What we learned:
- Simple works. Don't over-engineer before you understand the problem.
- The workspace file pattern (SOUL.md, MEMORY.md) emerged naturally.
- Claude was better than GPT-4 for our use case (longer context, better instruction following).
Phase 2: Tool Explosion (Weeks 3-6)
Once the basics worked, we kept adding tools. Every "I wish Axis could..." became a new tool.
Tools added:
- Web search (Brave API)
- URL fetching and content extraction
- Email reading and drafting (Gmail API)
- Calendar management (Google Calendar)
- Browser automation (Playwright)
- Code execution (sandboxed Python)
- Database queries (read-only)
- Slack/Discord messaging
- Image generation (DALL-E)
- Business-specific integrations (inventory, CRM)
What we learned:
- Tool descriptions matter enormously. Spent as much time on descriptions as implementation.
- Confirmation flows are essential. Axis almost sent emails it shouldn't have several times.
- Tool sprawl is real. Started organizing tools into categories, only loading relevant ones.
Week 4: Axis drafted an email to a customer and, due to a bug in confirmation flow, actually sent it. The email was fineβpolite, accurate, helpful. But we hadn't reviewed it. The customer was happy; we were terrified. Added multiple confirmation layers after that.
Phase 3: Memory Evolution (Weeks 6-10)
Early Axis had goldfish memory. Each day felt like meeting a new assistant. We iterated on memory extensively:
-
Version 1: Full conversation history in context
Problem: Context window fills up, expensive, irrelevant old conversations -
Version 2: Automatic summarization
Problem: Lost important details in summaries -
Version 3: Hybrid file + conversation system
Solution: MEMORY.md for curated important facts, conversation history for recent context -
Version 4: Daily notes + long-term memory
Current: Daily logs capture everything, MEMORY.md is curated highlights
The breakthrough was realizing memory should be editable by both human and agent. When Axis learns something important, it can write to MEMORY.md. When it gets something wrong, we can correct it directly.
Phase 4: Multi-Model Strategy (Weeks 10-14)
Using Claude Opus for everything was expensive ($75/M output tokens). We implemented model routing:
def select_model(task_type: str, complexity: str) -> str:
"""Route to appropriate model based on task."""
if task_type == "simple_question":
return "claude-3-5-sonnet-20241022" # Fast, cheap
elif task_type == "complex_reasoning":
return "claude-opus-4-20250514" # Best quality
elif task_type == "code_generation":
return "claude-sonnet-4-20250514" # Good at code
elif task_type == "image_analysis":
return "gpt-4o" # Strong vision
else:
return "claude-3-5-sonnet-20241022" # Default: balanced
Result: 60% cost reduction while maintaining quality for complex tasks.
Phase 5: Proactive Behavior (Weeks 14-20)
Axis was helpful when asked, but we wanted proactive assistance. Enter the heartbeat system:
- Every 30 minutes, Axis receives a "heartbeat" prompt
- Checks HEARTBEAT.md for scheduled tasks
- Can initiate actions: check email, review calendar, monitor systems
- Notifies humans of important items
# Heartbeat Checklist
## Every heartbeat
- Check for urgent emails (spam filter: ignore marketing)
- Verify production systems are healthy
## Morning (first heartbeat after 8am)
- Summarize calendar for the day
- Check overnight messages
## Every 4 hours
- Review project status in GitHub
- Check analytics dashboards
## Flags
- lastEmailCheck: 2026-01-28T14:30:00
- lastSystemCheck: 2026-01-28T14:45:00
Current State: Axis in 2026
After a year of development and daily use, here's where Axis stands:
| Metric | Value |
|---|---|
| Daily interactions | 50-100 messages |
| Tool calls per day | 200-400 |
| Active tools | 32 |
| Memory files | ~400 daily logs, 1 MEMORY.md (~15KB) |
| Uptime | 99.7% (excluding planned maintenance) |
| Monthly API cost | $150-300 (varies with usage) |
| Estimated time saved | 60-80 hours/month |
What Axis does regularly:
- Answers customer inquiries (draft β human approval β send)
- Researches markets and competitors
- Monitors systems and alerts on issues
- Drafts content (articles, emails, documentation)
- Manages calendar and meeting prep
- Updates project documentation
- Queries business databases for reports
- Helps debug code and systems
The compound effect of persistent memory. Axis now knows our business deeplyβcustomer names, product history, past decisions, learned preferences. It's not starting from zero each interaction. This accumulated context is worth more than any individual capability.
9. Common Pitfalls and How to Avoid Them
We made many mistakes building Axis. Here's what to watch for:
Pitfall 1: Over-Engineering Before Understanding
The trap: Building elaborate infrastructure before proving the basic concept works. Spending weeks on a perfect memory system before having a useful agent.
How to avoid:
- Start with the simplest possible implementation
- Add complexity only when you hit real limitations
- Get something working in days, not weeks
- Iterate based on actual usage, not anticipated needs
Pitfall 2: Tool Definition Neglect
The trap: Writing clear code but vague tool descriptions. The LLM doesn't see your codeβonly the descriptions.
How to avoid:
- Treat tool descriptions as user documentation
- Include when to use AND when not to use
- Specify expected inputs and outputs
- Test with edge casesβdoes the LLM choose the right tool?
# For each tool, answer:
# 1. What does it do? (one sentence)
# 2. When should the agent use it? (specific scenarios)
# 3. When should the agent NOT use it? (common mistakes)
# 4. What parameters does it need? (with examples)
# 5. What does it return? (success and error cases)
Pitfall 3: Insufficient Safety Guardrails
The trap: Trusting the LLM to be careful. It's not malicious, but it can be confidently wrong.
How to avoid:
- Default to requiring confirmation for external actions
- Implement rate limiting on expensive/risky operations
- Sandbox file and code execution
- Log everything for debugging and auditing
- Test with adversarial prompts (prompt injection attempts)
Pitfall 4: Context Window Stuffing
The trap: Putting everything possible into context, assuming more information is always better.
How to avoid:
- Be selective about what goes in context
- Use retrieval for relevant information rather than including everything
- Monitor token usage and optimize
- Test how the agent performs with minimal vs. maximal context
Pitfall 5: Ignoring Cost Management
The trap: Not monitoring API costs until you get a surprise bill.
How to avoid:
- Set up cost alerts from day one
- Track tokens per request and per day
- Use cheaper models for simple tasks
- Implement caching for repeated queries
- Set hard limits on daily spend
class CostTracker:
PRICES = {
"claude-opus-4-20250514": {"input": 0.015, "output": 0.075},
"claude-sonnet-4-20250514": {"input": 0.003, "output": 0.015},
"claude-3-5-sonnet-20241022": {"input": 0.003, "output": 0.015},
"gpt-4o": {"input": 0.0025, "output": 0.01},
}
def __init__(self, daily_limit: float = 10.0):
self.daily_limit = daily_limit
self.daily_spend = 0.0
self.last_reset = datetime.now().date()
def track(self, model: str, input_tokens: int, output_tokens: int):
# Reset daily counter
if datetime.now().date() != self.last_reset:
self.daily_spend = 0.0
self.last_reset = datetime.now().date()
# Calculate cost
prices = self.PRICES.get(model, {"input": 0.01, "output": 0.03})
cost = (input_tokens * prices["input"] + output_tokens * prices["output"]) / 1000
self.daily_spend += cost
# Alert if approaching limit
if self.daily_spend > self.daily_limit * 0.8:
self._send_alert(f"Approaching daily limit: ${self.daily_spend:.2f}/${self.daily_limit}")
# Block if over limit
if self.daily_spend > self.daily_limit:
raise CostLimitExceeded(f"Daily limit of ${self.daily_limit} exceeded")
return cost
Pitfall 6: Hallucination Blind Trust
The trap: Assuming the agent's output is accurate, especially for facts.
How to avoid:
- For factual queries, always require tool use (web search) over training data
- Ask the agent to cite sources when possible
- Implement verification for critical information
- Human review for anything published or sent externally
Pitfall 7: Poor Error Handling
The trap: Agent crashes or gets stuck when tools fail or return unexpected results.
How to avoid:
- Every tool should have clear error responses
- Implement retry logic with backoff for transient failures
- Give the agent guidance on what to do when tools fail
- Set maximum tool call attempts per request
async def execute_with_recovery(tool_name: str, params: dict, max_retries: int = 3):
"""Execute tool with retry and fallback."""
last_error = None
for attempt in range(max_retries):
try:
result = await execute_tool(tool_name, params)
if result.get("success") or "error" not in result:
return result
last_error = result.get("error")
except Exception as e:
last_error = str(e)
await asyncio.sleep(2 ** attempt) # Exponential backoff
# Return informative error for LLM
return {
"error": last_error,
"suggestion": f"Tool '{tool_name}' failed after {max_retries} attempts. Consider an alternative approach.",
"attempted_params": params
}
Most pitfalls come from treating the LLM as either too smart or too dumb. It's neither. It's a powerful but imperfect tool that needs guardrails, clear instructions, and human oversight for high-stakes operations.
10. Cost Considerations and Scaling
AI agents aren't free. Understanding costs helps you build sustainably and scale wisely.
Understanding API Costs
API pricing is per token (roughly 4 characters = 1 token). Input tokens (what you send) are cheaper than output tokens (what you receive).
| Model | Input ($/1M) | Output ($/1M) | Typical Request Cost |
|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | $0.01-0.05 |
| Claude 3.5 Sonnet | $3.00 | $15.00 | $0.02-0.08 |
| Claude Sonnet 4 | $3.00 | $15.00 | $0.02-0.08 |
| Claude Opus 4 | $15.00 | $75.00 | $0.10-0.50 |
| GPT-4 Turbo | $10.00 | $30.00 | $0.05-0.20 |
Cost Breakdown for Typical Agent
Here's what Axis typically costs per month:
~70% Sonnet, ~25% Opus, ~5% GPT-4o for vision
Brave Search API for web queries
OpenAI embeddings for semantic search
VPS hosting for 24/7 operation
Varies with usage intensity
Cost Optimization Strategies
1. Model Routing
Use expensive models only when needed:
# Simple classification to route requests
def classify_complexity(message: str) -> str:
"""Quick heuristic for task complexity."""
# Simple patterns β cheap model
simple_patterns = [
r'^(what|when|where|who) is', # Basic questions
r'^(hi|hello|hey)', # Greetings
r'^(thanks|thank you)', # Acknowledgments
]
for pattern in simple_patterns:
if re.match(pattern, message.lower()):
return "simple"
# Complex indicators β expensive model
complex_indicators = [
"analyze", "compare", "explain why", "strategy",
"write a", "draft", "create", "plan"
]
if any(ind in message.lower() for ind in complex_indicators):
return "complex"
return "medium" # Default
2. Caching
Cache responses for repeated queries:
import hashlib
from functools import lru_cache
class ResponseCache:
def __init__(self, ttl_seconds: int = 3600):
self.cache = {}
self.ttl = ttl_seconds
def get_key(self, messages: list, tools: list) -> str:
"""Generate cache key from request."""
content = str(messages) + str(tools)
return hashlib.sha256(content.encode()).hexdigest()
def get(self, key: str) -> dict | None:
"""Get cached response if valid."""
if key in self.cache:
entry = self.cache[key]
if time.time() - entry["timestamp"] < self.ttl:
return entry["response"]
del self.cache[key]
return None
def set(self, key: str, response: dict):
"""Cache a response."""
self.cache[key] = {
"response": response,
"timestamp": time.time()
}
3. Context Pruning
Minimize tokens in context:
def optimize_context(context: str, max_tokens: int = 10000) -> str:
"""Reduce context size while preserving information."""
# Remove excessive whitespace
context = re.sub(r'\n\s*\n', '\n\n', context)
context = re.sub(r' +', ' ', context)
# Truncate if still too long
tokens = estimate_tokens(context)
if tokens > max_tokens:
# Keep beginning and end, summarize middle
lines = context.split('\n')
keep_start = len(lines) // 4
keep_end = len(lines) // 4
middle = lines[keep_start:-keep_end]
context = '\n'.join(
lines[:keep_start] +
[f"[... {len(middle)} lines summarized ...]"] +
lines[-keep_end:]
)
return context
4. Prompt Optimization
Shorter prompts = lower costs:
- Remove redundant instructions
- Use concise tool descriptions
- Avoid repeating information in system prompt and messages
- Load tool definitions dynamically (only include relevant tools)
Scaling Considerations
As usage grows, consider these patterns:
| Scale | Requests/Day | Architecture | Estimated Cost |
|---|---|---|---|
| Personal | 10-50 | Single instance | $30-100/mo |
| Small Team | 100-500 | Single instance + queue | $150-400/mo |
| Business | 1,000-5,000 | Load balanced + workers | $500-2,000/mo |
| Enterprise | 10,000+ | Distributed + local models | $2,000+/mo |
When to Consider Local Models
Running models locally (Llama 3, Mixtral) makes sense when:
- High volume: >$500/month in API costs
- Privacy critical: Data can't leave your infrastructure
- Latency sensitive: Need <100ms response times
- Specific fine-tuning: Need a model trained on your data
Hardware requirements for local deployment:
| Model | VRAM Required | Approximate Hardware Cost |
|---|---|---|
| Llama 3 8B | 16GB | $400-800 (RTX 4080) |
| Llama 3 70B | 40-80GB | $2,000-8,000 (A100/multi-GPU) |
| Mixtral 8x7B | 24-48GB | $1,000-3,000 |
Many production systems use a hybrid: local models for high-volume, simple tasks (classification, embedding, basic Q&A), cloud APIs for complex reasoning. This balances cost and capability.
Conclusion: Your Path Forward
You now have a comprehensive understanding of how to build AI agentsβfrom the fundamental architecture to production deployment. But knowledge without action is just entertainment. Here's your concrete path forward:
Get the simple agent from Section 4 running. Add one custom tool. Experience the fundamentals firsthand.
Implement file-based memory. Add 2-3 tools relevant to your use case. Start tracking costs.
Put it on a server. Make it accessible. Use it for real tasks. Note what's missing.
Fix the pain points. Add the tools you actually need. Improve memory based on what matters. Share what you've built.
Building Axis changed how we work at As Above Technologies. The compound effect of persistent memory and capable tools creates something genuinely usefulβan assistant that knows your business and can actually help.
The technology is accessible. The patterns are proven. The only question is whether you'll build or watch others build. We hope you choose to build.
If you build something, we'd love to see it. Share your agent projects, ask questions, and join the community of builders creating the next generation of AI tools.
Ready to explore more technical guides?
Explore Techne