- 1. What Is Edge Computing and Why It Matters
- 2. Cloudflare Workers vs Traditional Hosting
- 3. Setting Up Wrangler CLI
- 4. Your First Worker: Hello World to Production
- 5. Cloudflare Pages for Static Sites
- 6. Email Workers and Routing
- 7. KV Storage and Durable Objects
- 8. Workers AI: Running Models at the Edge
- 9. Environment Variables and Secrets
- 10. Custom Domains and DNS
- 11. Real Examples: Our Deployments
- 12. Pricing and Free Tier Limits
- 13. When NOT to Use Workers
Every millisecond of latency costs you. In AI applicationsβwhere users expect instant responses and real-time interactionsβthe difference between a server in Virginia and one in your user's city can mean the difference between a seamless experience and an abandoned session.
This isn't theoretical. We run asabove.tech on Cloudflare Pages. Our voice agents process audio through Workers. Email routing happens at the edge. API endpoints respond in under 50ms globally. The infrastructure that would have required a team of DevOps engineers five years ago now deploys with a single command.
This guide is the practical walkthrough I wish I'd hadβcovering everything from your first "hello world" Worker to production deployments handling real traffic. We'll use actual code from our infrastructure, not sanitized examples. By the end, you'll understand not just how to use Workers, but when they're the right choiceβand when they're not.
1. What Is Edge Computing and Why It Matters
Traditional cloud computing centralizes processing in a few massive data centers. Your server might be in us-east-1 (Virginia), and a user in Tokyo experiences 150ms+ of latency just from the physics of light traveling through fiber optic cables. That's before your application even starts processing.
Edge computing flips this model. Instead of users coming to your server, your code runs on servers distributed across the globeβas close to users as possible. Cloudflare operates 300+ data centers in 100+ countries. When a user in Tokyo makes a request, it's handled by a server in Tokyo. When someone in SΓ£o Paulo requests the same endpoint, a server in SΓ£o Paulo handles it.
Why This Matters for AI Applications
AI applications have unique latency requirements that make edge computing particularly valuable:
- Voice interfaces: Users expect responses within 200-300ms for natural conversation. Every millisecond of network latency compounds. Voice-to-text, AI processing, text-to-speechβeach step adds latency. Starting with a 150ms network penalty makes responsive voice agents nearly impossible.
- Real-time features: AI-powered autocomplete, live translation, streaming responsesβthese require sub-100ms round trips to feel instantaneous.
- Cost optimization: Edge processing can filter, validate, and transform requests before they hit expensive AI APIs. Rejecting invalid requests at the edge saves API costs and reduces load on origin servers.
- Resilience: Edge networks absorb DDoS attacks, handle traffic spikes, and keep serving cached content even if your origin is down.
Our voice agent worker handles initial WebSocket connections at the edge. The connection setupβauthentication, session creation, initial stateβhappens in the nearest data center. Only the actual AI inference calls go to centralized GPU servers. The result: voice interactions feel instantaneous even for users on the other side of the world.
Edge Computing Trade-offs
Edge isn't magic. Understanding the trade-offs helps you architect correctly:
| Advantage | Trade-off |
|---|---|
| Low latency globally | Limited CPU time per request (varies by plan) |
| Automatic scaling | No persistent connections to databases (must use HTTP/REST) |
| Zero cold starts | Execution environment is V8 isolates, not full containers |
| Built-in DDoS protection | Storage options are eventually consistent (for global distribution) |
| Simple deployment | Not all npm packages work (no Node.js APIs like fs, net) |
The key insight: edge is for request handling, not heavy computation. Use edge workers to route, validate, transform, cache, and respond quickly. Use traditional servers (or serverless functions with longer timeouts) for heavy lifting like AI inference.
2. Cloudflare Workers vs Traditional Hosting
Before diving into code, let's compare Workers to alternatives you might consider. Each has its placeβthe goal is choosing the right tool.
V8 isolates running in 300+ locations. Zero cold starts, millisecond spin-up. Best for request handling, API routing, and lightweight processing.
Best For
- API gateways and routing
- Authentication and authorization
- Request/response transformation
- Static site enhancement (A/B testing, personalization)
- Webhook handlers
- Simple AI inference with Workers AI
Container-based serverless with full runtime support. Higher limits but cold starts and regional deployment.
Best For
- Long-running processes
- Heavy computation
- Complex dependencies (native modules)
- Direct database connections
- Jobs requiring more than 128MB memory
EC2, DigitalOcean Droplets, or containers on Kubernetes. Full control, persistent connections, but you manage scaling and availability.
Best For
- Persistent WebSocket connections
- Stateful applications
- GPU-based AI inference
- Database servers
- Applications requiring specific OS features
Decision Framework
Use this mental model when choosing:
β Yes: Workers are ideal. Fast, cheap, scales automatically.
β Yes: Use traditional serverless or containers.
β Yes: Use Durable Objects (edge) or containers (heavy).
β Yes: Run on dedicated GPU servers, call from edge worker.
The most effective architectures combine edge and centralized compute. Workers handle the request layerβauthentication, routing, caching, response formattingβwhile heavier operations happen on specialized servers. Your Worker becomes a smart orchestrator, not just a dumb proxy.
3. Setting Up Wrangler CLI
Wrangler is Cloudflare's CLI for managing Workers. It handles local development, deployment, secrets, and configuration. Let's get it running.
Installation
# Install globally with npm
npm install -g wrangler
# Or use npx (no global install needed)
npx wrangler --version
# Authenticate with your Cloudflare account
wrangler login
The wrangler login command opens a browser window for OAuth authentication.
After approving, Wrangler stores credentials locally and you're ready to deploy.
Project Structure
A typical Workers project looks like this:
Configuration: wrangler.toml
The wrangler.toml file is the heart of your Worker configuration:
# Basic configuration
name = "my-worker"
main = "src/index.ts"
compatibility_date = "2024-01-01"
# Account info (optional - defaults to authenticated account)
account_id = "your-account-id"
# Environment-specific settings
[env.production]
name = "my-worker-production"
route = "api.example.com/*"
[env.staging]
name = "my-worker-staging"
route = "staging-api.example.com/*"
# KV namespace bindings
[[kv_namespaces]]
binding = "CACHE"
id = "abc123..."
# Environment variables (non-secret)
[vars]
ENVIRONMENT = "development"
API_VERSION = "v1"
# Durable Objects
[[durable_objects.bindings]]
name = "SESSIONS"
class_name = "SessionManager"
# Workers AI
[ai]
binding = "AI"
Essential Commands
| Command | Purpose |
|---|---|
wrangler dev |
Start local development server with hot reload |
wrangler deploy |
Deploy to production |
wrangler deploy --env staging |
Deploy to specific environment |
wrangler tail |
Live stream logs from production |
wrangler secret put API_KEY |
Add encrypted secret |
wrangler kv:namespace create CACHE |
Create KV namespace |
wrangler pages deploy ./dist |
Deploy static site to Pages |
wrangler dev runs a local server that closely mimics the production
environment, including access to KV, Durable Objects, and other bindings. It's
fast, supports hot reload, and catches most issues before deployment. Use
wrangler dev --remote to test against actual production bindings
(useful for debugging data issues).
4. Your First Worker: Hello World to Production
Let's build a Worker from scratch, understanding each part along the way. We'll start simple and add complexity until we have something production-ready.
The Simplest Worker
export default {
async fetch(request: Request): Promise<Response> {
return new Response("Hello, World!");
}
};
That's it. A Worker is an object with a fetch handler that receives a
Request and returns a Response. The interface mirrors the web standard Fetch APIβif
you know how to use fetch() in the browser, you understand the basics.
Adding Request Handling
Let's make it actually do something useful:
export default {
async fetch(request: Request): Promise<Response> {
const url = new URL(request.url);
// Route based on path
switch (url.pathname) {
case "/":
return new Response("Welcome to the API");
case "/health":
return Response.json({ status: "healthy", timestamp: Date.now() });
case "/echo":
if (request.method !== "POST") {
return new Response("Method not allowed", { status: 405 });
}
const body = await request.json();
return Response.json({ received: body });
default:
return new Response("Not found", { status: 404 });
}
}
};
Adding Type Safety and Environment
Production Workers need type-safe access to bindings (KV, secrets, etc.):
// Define your environment bindings
interface Env {
// Secrets (set via wrangler secret put)
API_KEY: string;
// KV namespaces
CACHE: KVNamespace;
// Environment variables (from wrangler.toml [vars])
ENVIRONMENT: string;
// Workers AI binding
AI: Ai;
}
export default {
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
// Now you have type-safe access to all bindings
const apiKey = env.API_KEY;
const cached = await env.CACHE.get("some-key");
return new Response(`Environment: ${env.ENVIRONMENT}`);
}
};
A Production-Ready API Worker
Here's a more complete example with error handling, CORS, logging, and proper structure:
interface Env {
API_KEY: string;
CACHE: KVNamespace;
ALLOWED_ORIGINS: string;
}
// CORS headers for browser requests
function corsHeaders(origin: string | null, allowedOrigins: string): HeadersInit {
const allowed = allowedOrigins.split(",");
const allowOrigin = origin && allowed.includes(origin) ? origin : allowed[0];
return {
"Access-Control-Allow-Origin": allowOrigin,
"Access-Control-Allow-Methods": "GET, POST, OPTIONS",
"Access-Control-Allow-Headers": "Content-Type, Authorization",
};
}
// Error response helper
function errorResponse(message: string, status: number, headers: HeadersInit = {}): Response {
return Response.json(
{ error: message, status },
{ status, headers }
);
}
// Main handler
export default {
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
const url = new URL(request.url);
const origin = request.headers.get("Origin");
const cors = corsHeaders(origin, env.ALLOWED_ORIGINS);
// Handle CORS preflight
if (request.method === "OPTIONS") {
return new Response(null, { status: 204, headers: cors });
}
try {
// Route to handlers
let response: Response;
if (url.pathname === "/api/data") {
response = await handleData(request, env);
} else if (url.pathname.startsWith("/api/cache")) {
response = await handleCache(request, env, url);
} else {
response = errorResponse("Not found", 404);
}
// Add CORS headers to all responses
const newHeaders = new Headers(response.headers);
Object.entries(cors).forEach(([key, value]) => newHeaders.set(key, value));
return new Response(response.body, {
status: response.status,
headers: newHeaders
});
} catch (error) {
console.error("Worker error:", error);
return errorResponse("Internal server error", 500, cors);
}
}
};
// Handler functions
async function handleData(request: Request, env: Env): Promise<Response> {
// Validate API key
const authHeader = request.headers.get("Authorization");
if (authHeader !== `Bearer ${env.API_KEY}`) {
return errorResponse("Unauthorized", 401);
}
// Process request
return Response.json({
message: "Authenticated successfully",
timestamp: new Date().toISOString()
});
}
async function handleCache(request: Request, env: Env, url: URL): Promise<Response> {
const key = url.pathname.replace("/api/cache/", "");
if (request.method === "GET") {
const value = await env.CACHE.get(key);
if (!value) {
return errorResponse("Not found", 404);
}
return Response.json({ key, value: JSON.parse(value) });
}
if (request.method === "POST") {
const body = await request.json();
await env.CACHE.put(key, JSON.stringify(body), { expirationTtl: 3600 });
return Response.json({ success: true, key });
}
return errorResponse("Method not allowed", 405);
}
Deploy to Production
# Set your secret
wrangler secret put API_KEY
# Enter: your-secret-api-key
# Deploy
wrangler deploy
# Watch logs
wrangler tail
That wrangler deploy command pushed your code to 300+ data centers
worldwide. Users in Tokyo, London, SΓ£o Pauloβeveryone gets sub-50ms response times.
No load balancers to configure, no auto-scaling rules to tune, no multi-region
replication to manage. It just works.
5. Cloudflare Pages for Static Sites
While Workers handle dynamic requests, Cloudflare Pages is optimized for static sites and frontend applications. It's what we use for asabove.techβand it's remarkably simple for what it delivers.
Static site built with vanilla HTML/CSS/JS, deployed via Git integration. Automatic deployments on push, preview URLs for branches, global CDN distribution. Total monthly cost: $0 (free tier).
Pages vs Workers: When to Use Which
| Use Case | Cloudflare Pages | Cloudflare Workers |
|---|---|---|
| Static websites | Best choice | Overkill |
| SPAs (React, Vue) | Best choice | Can work, but why? |
| API endpoints | Pages Functions | Best choice |
| Full-stack apps | Pages + Functions | Workers + KV/D1 |
| Real-time features | Not supported | Best choice |
Setting Up Pages
Option 1: Git Integration (Recommended)
- Push your site to GitHub or GitLab
- Go to Cloudflare Dashboard β Pages β Create a project
- Connect your repository
- Configure build settings (if using a framework)
- Deploy
Every push to your main branch triggers a deployment. Pull requests get preview URLs. It's the simplest CI/CD setup possible.
Option 2: Direct Upload with Wrangler
# Deploy a directory directly
wrangler pages deploy ./dist
# Create a new project
wrangler pages project create my-site
# Deploy to specific project
wrangler pages deploy ./dist --project-name my-site
Build Configuration for Frameworks
| Framework | Build Command | Output Directory |
|---|---|---|
| React (CRA) | npm run build |
build |
| Next.js (static) | next build && next export |
out |
| Vue | npm run build |
dist |
| Astro | npm run build |
dist |
| SvelteKit (static) | npm run build |
build |
| Plain HTML | None | / (root) |
Pages Functions: Adding Server-Side Logic
Pages supports "Functions"βWorkers that run alongside your static site. Create a
functions directory, and files become API endpoints:
export async function onRequest(context) {
return new Response(JSON.stringify({
message: "Hello from Pages Functions!",
timestamp: new Date().toISOString()
}), {
headers: { "Content-Type": "application/json" }
});
}
// Dynamic route: /api/users/123 β params.id = "123"
export async function onRequest(context) {
const { params } = context;
return new Response(JSON.stringify({
userId: params.id,
// In reality, fetch from database
name: `User ${params.id}`
}), {
headers: { "Content-Type": "application/json" }
});
}
Custom Domain Setup
- In Pages project settings, go to "Custom domains"
- Add your domain (e.g.,
asabove.tech) - If using Cloudflare DNS: automatic configuration
- If using external DNS: add the provided CNAME record
Cloudflare automatically provisions and renews SSL certificates for custom domains. No configuration needed, no certificate management, no renewal reminders. It's included in the free tier.
6. Email Workers and Routing
One of Cloudflare's hidden gems: Email Routing with Workers. You can receive emails at your domain and process them with codeβparsing, forwarding, triggering workflows, storing data.
All emails to our domain route through a Worker. Auto-replies, spam filtering, intelligent forwarding based on content, webhook triggers for support tickets. Zero external email service dependencies.
Setting Up Email Routing
- Enable Email Routing: In Cloudflare Dashboard, go to your domain β Email β Email Routing β Enable
- Add DNS records: Cloudflare will prompt you to add MX records (automatic if using Cloudflare DNS)
- Create catch-all rule: Route all addresses to a Worker
Basic Email Worker
name = "email-handler"
main = "src/index.ts"
compatibility_date = "2024-01-01"
# Enable email handling
[[email]]
enabled = true
import { EmailMessage } from "cloudflare:email";
import { createMimeMessage } from "mimetext";
interface Env {
FORWARD_TO: string;
KV_EMAILS: KVNamespace;
}
export default {
async email(message: EmailMessage, env: Env, ctx: ExecutionContext) {
// Parse email details
const from = message.from;
const to = message.to;
const subject = message.headers.get("subject") || "(no subject)";
console.log(`Email received: ${from} β ${to}: ${subject}`);
// Get email body
const rawEmail = await new Response(message.raw).text();
// Store in KV for logging
await env.KV_EMAILS.put(
`email:${Date.now()}`,
JSON.stringify({ from, to, subject, receivedAt: new Date().toISOString() }),
{ expirationTtl: 86400 * 30 } // 30 days
);
// Forward to destination
await message.forward(env.FORWARD_TO);
}
};
Advanced Email Handling
Here's a more sophisticated example with routing logic, auto-replies, and webhook triggers:
import { EmailMessage } from "cloudflare:email";
import { createMimeMessage } from "mimetext";
interface Env {
// Forwarding destinations
SUPPORT_EMAIL: string;
SALES_EMAIL: string;
DEFAULT_EMAIL: string;
// Webhook for notifications
WEBHOOK_URL: string;
// Storage
KV_EMAILS: KVNamespace;
}
// Route emails based on recipient address
function getDestination(toAddress: string, env: Env): string {
const localPart = toAddress.split("@")[0].toLowerCase();
const routes: Record<string, string> = {
"support": env.SUPPORT_EMAIL,
"help": env.SUPPORT_EMAIL,
"sales": env.SALES_EMAIL,
"contact": env.SALES_EMAIL,
"info": env.DEFAULT_EMAIL,
};
return routes[localPart] || env.DEFAULT_EMAIL;
}
// Check if email looks like spam
function isLikelySpam(from: string, subject: string): boolean {
const spamIndicators = [
/\b(viagra|cialis|lottery|winner|prince|inheritance)\b/i,
/^.{0,5}$/, // Very short subject
/URGENT.*REPLY/i,
];
return spamIndicators.some(pattern => pattern.test(subject));
}
// Send webhook notification
async function notifyWebhook(data: object, webhookUrl: string): Promise<void> {
await fetch(webhookUrl, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(data)
});
}
export default {
async email(message: EmailMessage, env: Env, ctx: ExecutionContext) {
const from = message.from;
const to = message.to;
const subject = message.headers.get("subject") || "(no subject)";
console.log(`Processing: ${from} β ${to}: ${subject}`);
// Spam check
if (isLikelySpam(from, subject)) {
console.log("Rejected as spam");
message.setReject("Message rejected");
return;
}
// Determine destination
const destination = getDestination(to, env);
// Log to KV
const emailId = `email:${Date.now()}:${Math.random().toString(36).slice(2)}`;
await env.KV_EMAILS.put(emailId, JSON.stringify({
from,
to,
subject,
destination,
receivedAt: new Date().toISOString()
}));
// Send webhook notification (non-blocking)
ctx.waitUntil(
notifyWebhook({
type: "email_received",
from,
to,
subject,
destination,
id: emailId
}, env.WEBHOOK_URL)
);
// Forward the email
await message.forward(destination);
console.log(`Forwarded to ${destination}`);
}
};
Email Routing Patterns
Auto-create tickets, assign priority based on content
Log leads, auto-respond with availability
Process receipts, shipping notifications, alerts
Discover typos, track attempted deliveries
- Can receive and forward emails, but not send arbitrary outbound emails
- Size limit: 25MB per email
- Rate limit: 100,000 emails/day on free tier
- For sending emails, pair with an outbound service (SendGrid, SES, etc.)
7. KV Storage and Durable Objects
Edge workers are stateless by defaultβeach request is independent. For persistence, Cloudflare provides two primary options: KV for simple key-value storage and Durable Objects for stateful, coordinated workloads.
Workers KV: Global Key-Value Storage
KV is eventually consistent, globally distributed key-value storage. Think of it as a global Redis with ~60-second eventual consistency.
| Characteristic | Value |
|---|---|
| Max key size | 512 bytes |
| Max value size | 25MB |
| Reads | Eventually consistent (~60s), cached at edge |
| Writes | Propagate globally within 60 seconds |
| Free tier | 100,000 reads/day, 1,000 writes/day |
Setting Up KV
# Create a namespace
wrangler kv:namespace create CACHE
# Create a preview namespace (for wrangler dev)
wrangler kv:namespace create CACHE --preview
# List namespaces
wrangler kv:namespace list
[[kv_namespaces]]
binding = "CACHE"
id = "abc123..." # From create command output
preview_id = "def456..." # From preview create output
Using KV in Your Worker
interface Env {
CACHE: KVNamespace;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const url = new URL(request.url);
// Simple get
const value = await env.CACHE.get("my-key");
// Get with type
const jsonValue = await env.CACHE.get("my-json", "json");
// Get with metadata
const { value: val, metadata } = await env.CACHE.getWithMetadata("my-key");
// Put with expiration
await env.CACHE.put("temp-key", "temp-value", {
expirationTtl: 3600 // 1 hour in seconds
});
// Put with metadata
await env.CACHE.put("user:123", JSON.stringify({ name: "Alice" }), {
metadata: { createdAt: Date.now(), version: 1 }
});
// Delete
await env.CACHE.delete("old-key");
// List keys (with prefix)
const keys = await env.CACHE.list({ prefix: "user:", limit: 100 });
return Response.json({ keys: keys.keys });
}
};
KV Best Practices
- Use for read-heavy workloads: KV is optimized for reads. Writes propagate globally but aren't instant.
- Cache expensive computations: Store API responses, computed results, configuration that changes rarely.
-
Set TTLs: Use
expirationTtlfor data that should expire. Prevents stale data buildup. -
Use prefixes for organization:
user:123,cache:api:response. Makes listing and bulk operations easier.
Durable Objects: Stateful Edge Computing
Durable Objects are the answer to "but what if I need real state?" They provide:
- Single-threaded execution (no race conditions)
- Persistent storage local to the object
- WebSocket support (for real-time applications)
- Consistent state across requests
Each Durable Object has a unique ID and lives in one location. All requests to that object route to that location, ensuring consistency.
When to Use Durable Objects
| Use Case | Why Durable Objects |
|---|---|
| Real-time collaboration | Single source of truth for document state |
| Game servers | Consistent game state, no race conditions |
| Rate limiting | Accurate counters per user/IP |
| Session management | WebSocket connections + persistent state |
| Distributed locks | Coordination without external databases |
Durable Object Example: Session Manager
export class SessionManager {
state: DurableObjectState;
sessions: Map<string, WebSocket> = new Map();
constructor(state: DurableObjectState) {
this.state = state;
}
async fetch(request: Request): Promise<Response> {
const url = new URL(request.url);
if (url.pathname === "/websocket") {
return this.handleWebSocket(request);
}
if (url.pathname === "/broadcast") {
const message = await request.text();
this.broadcast(message);
return new Response("Broadcasted");
}
return new Response("Not found", { status: 404 });
}
async handleWebSocket(request: Request): Promise<Response> {
const upgradeHeader = request.headers.get("Upgrade");
if (upgradeHeader !== "websocket") {
return new Response("Expected WebSocket", { status: 400 });
}
const [client, server] = Object.values(new WebSocketPair());
const sessionId = crypto.randomUUID();
server.accept();
this.sessions.set(sessionId, server);
server.addEventListener("message", (event) => {
// Handle incoming messages
console.log(`Message from ${sessionId}: ${event.data}`);
// Echo back or process
this.broadcast(`[${sessionId}]: ${event.data}`);
});
server.addEventListener("close", () => {
this.sessions.delete(sessionId);
console.log(`Session ${sessionId} closed`);
});
// Store session count persistently
await this.state.storage.put("sessionCount", this.sessions.size);
return new Response(null, {
status: 101,
webSocket: client
});
}
broadcast(message: string) {
for (const [id, socket] of this.sessions) {
try {
socket.send(message);
} catch (e) {
this.sessions.delete(id);
}
}
}
}
export { SessionManager } from "./session";
interface Env {
SESSIONS: DurableObjectNamespace;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const url = new URL(request.url);
// Get room ID from path or query
const roomId = url.searchParams.get("room") || "default";
// Get the Durable Object for this room
const id = env.SESSIONS.idFromName(roomId);
const stub = env.SESSIONS.get(id);
// Forward request to the Durable Object
return stub.fetch(request);
}
};
name = "my-app"
main = "src/index.ts"
[[durable_objects.bindings]]
name = "SESSIONS"
class_name = "SessionManager"
[[migrations]]
tag = "v1"
new_classes = ["SessionManager"]
KV: Read-heavy, eventually consistent, simple key-value patterns.
Use for caching, configuration, session storage where slight staleness is OK.
Durable Objects: Strong consistency, real-time coordination, WebSockets.
Use when you need "exactly once" semantics or real-time features.
8. Workers AI: Running Models at the Edge
Workers AI brings machine learning inference to the edge. Instead of calling external APIs with added latency, run models directly in Cloudflare's networkβtext generation, embeddings, image classification, speech recognition, and more.
WebSocket-based voice agent using Workers AI for speech-to-text and embeddings. Audio processing at the edge means sub-200ms response times globally. Heavy inference calls route to our GPU servers.
Available Models
Cloudflare hosts various model categories:
| Category | Models | Use Cases |
|---|---|---|
| Text Generation | Llama 3, Mistral, Gemma | Chatbots, content generation, summarization |
| Embeddings | BGE, E5 | Semantic search, RAG, similarity |
| Image | Stable Diffusion, ResNet | Generation, classification, description |
| Speech | Whisper | Transcription, voice interfaces |
| Translation | M2M-100 | Multi-language translation |
Setting Up Workers AI
name = "ai-worker"
main = "src/index.ts"
compatibility_date = "2024-01-01"
[ai]
binding = "AI"
Text Generation
interface Env {
AI: Ai;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const { prompt } = await request.json();
const response = await env.AI.run("@cf/meta/llama-3-8b-instruct", {
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: prompt }
],
max_tokens: 500
});
return Response.json(response);
}
};
Streaming Responses
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const { prompt } = await request.json();
// Stream the response
const stream = await env.AI.run("@cf/meta/llama-3-8b-instruct", {
messages: [
{ role: "user", content: prompt }
],
stream: true
});
return new Response(stream, {
headers: { "Content-Type": "text/event-stream" }
});
}
};
Embeddings for Semantic Search
interface Env {
AI: Ai;
VECTOR_INDEX: VectorizeIndex; // Cloudflare Vectorize
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const { query } = await request.json();
// Generate embedding for the query
const embedding = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
text: query
});
// Search vector database
const results = await env.VECTOR_INDEX.query(embedding.data[0], {
topK: 5,
returnMetadata: true
});
return Response.json({ results: results.matches });
}
};
Speech-to-Text with Whisper
export default {
async fetch(request: Request, env: Env): Promise<Response> {
// Get audio data from request
const audioData = await request.arrayBuffer();
// Transcribe
const result = await env.AI.run("@cf/openai/whisper", {
audio: [...new Uint8Array(audioData)]
});
return Response.json({
text: result.text,
language: result.language
});
}
};
AI Gateway: Logging and Caching
Cloudflare AI Gateway sits in front of AI providers (OpenAI, Anthropic, Workers AI) and provides:
- Request logging: Track all AI calls, costs, and latency
- Caching: Cache identical prompts to reduce costs
- Rate limiting: Protect against runaway costs
- Fallback: Route to backup providers on failure
// Using AI Gateway with external providers
const response = await fetch(
"https://gateway.ai.cloudflare.com/v1/YOUR_ACCOUNT/YOUR_GATEWAY/openai/chat/completions",
{
method: "POST",
headers: {
"Authorization": `Bearer ${env.OPENAI_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "gpt-4",
messages: [{ role: "user", content: "Hello!" }]
})
}
);
- Smaller models than hosted services (8B params vs 70B+)
- No fine-tuning (use pre-trained models only)
- Longer inference times for complex models
- Best for: embeddings, small generation tasks, preprocessing
- Not ideal for: complex reasoning, long-form generation
9. Environment Variables and Secrets
Workers need configurationβAPI keys, feature flags, environment-specific settings. Cloudflare provides two mechanisms: environment variables (in code) and secrets (encrypted, never in code).
Environment Variables
Non-sensitive configuration goes in wrangler.toml:
[vars]
ENVIRONMENT = "development"
API_VERSION = "v2"
MAX_RESULTS = "100"
FEATURE_NEW_UI = "true"
[env.production.vars]
ENVIRONMENT = "production"
MAX_RESULTS = "50"
[env.staging.vars]
ENVIRONMENT = "staging"
Secrets
Sensitive values (API keys, passwords, tokens) should never be in your codebase:
# Add a secret
wrangler secret put API_KEY
# Prompts for value (not echoed)
# Add secret to specific environment
wrangler secret put API_KEY --env production
# List secrets (shows names, not values)
wrangler secret list
# Delete a secret
wrangler secret delete API_KEY
Using in Code
interface Env {
// Environment variables (from wrangler.toml [vars])
ENVIRONMENT: string;
API_VERSION: string;
MAX_RESULTS: string; // Note: always strings!
FEATURE_NEW_UI: string;
// Secrets (from wrangler secret put)
API_KEY: string;
DATABASE_URL: string;
JWT_SECRET: string;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
// Access like any other property
const isProduction = env.ENVIRONMENT === "production";
const maxResults = parseInt(env.MAX_RESULTS, 10);
const showNewUI = env.FEATURE_NEW_UI === "true";
// Use secrets for auth
const response = await fetch("https://api.external.com/data", {
headers: { "Authorization": `Bearer ${env.API_KEY}` }
});
return new Response(`Environment: ${env.ENVIRONMENT}`);
}
};
Environment-Specific Deployments
name = "my-api"
main = "src/index.ts"
# Default (development)
[vars]
ENVIRONMENT = "development"
LOG_LEVEL = "debug"
# Production environment
[env.production]
name = "my-api-production"
routes = [{ pattern = "api.example.com/*", zone_name = "example.com" }]
[env.production.vars]
ENVIRONMENT = "production"
LOG_LEVEL = "error"
# Staging environment
[env.staging]
name = "my-api-staging"
routes = [{ pattern = "staging.example.com/*", zone_name = "example.com" }]
[env.staging.vars]
ENVIRONMENT = "staging"
LOG_LEVEL = "info"
# Deploy to development (default)
wrangler deploy
# Deploy to staging
wrangler deploy --env staging
# Deploy to production
wrangler deploy --env production
# Secrets are environment-specific
wrangler secret put API_KEY --env production
wrangler secret put API_KEY --env staging
For local development, create a .dev.vars file (add to .gitignore!):
API_KEY=dev-key-for-testing
DATABASE_URL=postgres://localhost:5432/dev
Wrangler automatically loads this file during wrangler dev.
10. Custom Domains and DNS
Connecting Workers and Pages to custom domains requires DNS configuration. Cloudflare makes this trivial if you're using their DNSβslightly more involved with external DNS.
Option 1: Cloudflare-Managed DNS (Recommended)
If your domain's nameservers point to Cloudflare:
- Add your domain to Cloudflare (free plan works)
- Update nameservers at your registrar
- Wait for propagation (minutes to hours)
- Workers and Pages can now route to your domain automatically
Adding Routes to Workers
# Simple route
route = "api.example.com/*"
# Multiple routes
routes = [
{ pattern = "api.example.com/*", zone_name = "example.com" },
{ pattern = "api.example.org/*", zone_name = "example.org" }
]
# Custom domain (automatic SSL)
[[routes]]
pattern = "api.example.com/*"
custom_domain = true
Via Dashboard
- Go to Workers & Pages β Your Worker β Triggers
- Add route or custom domain
- Cloudflare creates necessary DNS records automatically
Option 2: Custom Domain on Pages
- In Pages project β Custom Domains
- Add domain (e.g.,
asabove.tech) - Cloudflare handles SSL certificate provisioning
- If using Cloudflare DNS: automatic
- If external DNS: add the CNAME record shown
Option 3: External DNS
If you can't move DNS to Cloudflare, you can still use Workers:
# For Workers
Type: CNAME
Name: api
Value: your-worker.your-subdomain.workers.dev
# For Pages
Type: CNAME
Name: www
Value: your-project.pages.dev
With external DNS, you lose some Cloudflare benefits: no automatic SSL on apex domains (only subdomains), no full DDoS protection, no analytics on the DNS level. For production, Cloudflare DNS is strongly recommended.
SSL/TLS Configuration
Cloudflare automatically provisions SSL certificates for custom domains. Configure encryption mode in your domain settings:
| Mode | Description | Use When |
|---|---|---|
| Off | No encryption | Never (don't use) |
| Flexible | HTTPS to Cloudflare, HTTP to origin | Origin doesn't support HTTPS |
| Full | HTTPS end-to-end (any cert) | Self-signed cert at origin |
| Full (Strict) | HTTPS end-to-end (valid cert) | Production (recommended) |
For Workers and Pages with no external origin, "Full (Strict)" works automaticallyβthere's no origin to worry about.
11. Real Examples: Our Deployments
Theory is nice, but let's look at how we actually use Workers in production. These are real deployments running right now.
What it does:
- Static content platform with articles, guides, and documentation
- Zero JavaScript frameworksβvanilla HTML/CSS for performance
- Automatic deployments from Git
- Preview URLs for every branch
Configuration:
- Build command: None (static files)
- Output directory:
public/ - Custom domain via Cloudflare DNS
- Monthly cost: $0 (free tier)
Why Pages: Pure static content with no server-side logic needed. Git integration means content updates deploy automatically on push.
What it does:
- WebSocket endpoint for real-time voice interactions
- Audio preprocessing and validation at the edge
- Session management with Durable Objects
- Routes transcription requests to Whisper (Workers AI)
- Forwards complex inference to GPU backend
Architecture:
Why Workers + Durable Objects: WebSocket connections need persistent state. Durable Objects provide single-threaded, consistent session management. Edge preprocessing reduces latency for the voice-critical path.
What it does:
- Catch-all email routing for the domain
- Intelligent forwarding based on address and content
- Spam filtering with basic heuristics
- Webhook notifications for specific patterns
- Logging to KV for analytics
Routing Logic:
Why Email Workers: No external email service needed for receiving. Programmable routing beats static forwarding rules. Webhooks enable integration with ticketing systems, CRMs, etc.
What it does:
- Authentication and API key validation
- Rate limiting per user/IP
- Request transformation and validation
- Response caching in KV
- Routing to various backend services
- CORS handling
Pattern:
export default {
async fetch(request: Request, env: Env): Promise<Response> {
// 1. CORS preflight
if (request.method === "OPTIONS") {
return handleCors(request, env);
}
// 2. Authentication
const auth = await validateAuth(request, env);
if (!auth.valid) {
return errorResponse("Unauthorized", 401);
}
// 3. Rate limiting
const rateLimitOk = await checkRateLimit(auth.userId, env);
if (!rateLimitOk) {
return errorResponse("Rate limit exceeded", 429);
}
// 4. Check cache
const cached = await getFromCache(request, env);
if (cached) return cached;
// 5. Route to backend
const response = await routeToBackend(request, env);
// 6. Cache response
ctx.waitUntil(cacheResponse(request, response.clone(), env));
return response;
}
};
Why Workers for API Gateway: Authentication and rate limiting at the edge protects backend servers. Caching reduces origin load. CORS handling is consistent across all endpoints. All with sub-50ms overhead globally.
Deployment Workflow
Our deployment process for Workers:
wrangler dev with .dev.vars for secretsHot reload, local KV, test against production bindings with --remote
wrangler deploy --env stagingSeparate Worker name, staging secrets, staging routes
Verify functionality before production push
wrangler deploy --env productionAtomic deployment, instant rollback if needed
wrangler tail --env productionLive logs, error tracking, performance monitoring in dashboard
12. Pricing and Free Tier Limits
Cloudflare's free tier is genuinely usable for productionβnot a trial that expires or degrades. Understanding the limits helps you plan.
Workers Pricing
| Resource | Free Tier | Paid ($5/mo base) |
|---|---|---|
| Requests | 100,000/day | 10M included, $0.30/M after |
| CPU Time | 10ms/request | 30s/request (50ms included) |
| KV Reads | 100,000/day | 10M included, $0.50/M after |
| KV Writes | 1,000/day | 1M included, $5/M after |
| KV Storage | 1GB | 1GB included, $0.50/GB/mo after |
| Durable Objects | Not available | Requests: $0.15/M, Storage: $0.20/GB |
Pages Pricing
| Resource | Free Tier | Pro ($20/mo) |
|---|---|---|
| Bandwidth | Unlimited | Unlimited |
| Builds | 500/month | 5,000/month |
| Concurrent builds | 1 | 5 |
| Sites | Unlimited | Unlimited |
| Functions requests | 100,000/day | 10M/month included |
Workers AI Pricing
| Model Category | Free Tier | Pricing |
|---|---|---|
| Text Generation (small) | 10,000 neurons/day | $0.011/1K neurons |
| Text Generation (large) | 10,000 neurons/day | $0.022/1K neurons |
| Embeddings | Included | $0.00002/1K tokens |
| Speech-to-Text | Included | $0.003/minute |
| Image Generation | Included | $0.02/image |
Our asabove.tech deployment (Pages + Workers + KV + Email Routing):
Monthly traffic: ~50,000 page views, ~10,000 API calls, ~500 emails
Monthly cost: $0 (well within free tier)
The free tier is substantial enough for many production workloads. You only
pay when you scale significantly.
When Costs Grow
Scenarios where you'll exceed free tier:
- 100,000+ requests/day: Time for paid Workers ($5/mo base)
- Heavy KV writes: 1,000 writes/day is low for write-heavy apps
- Durable Objects: Any usage requires paid plan
- Workers AI at scale: Free tier is for experimentation; production needs paid
- Long-running tasks: 10ms CPU limit is tight; paid gives 30 seconds
Even at scale, Workers is often cheaper than alternatives. Compare to Lambda ($0.20/M requests + compute time) or always-on servers ($5-50/mo minimum).
13. When NOT to Use Workers
Workers are powerful, but they're not the right choice for everything. Knowing when not to use them saves you from painful refactors.
Don't Use Workers For:
The problem: Workers have a 30-second CPU time limit (paid plan). Long-running data processing, report generation, or batch jobs will timeout.
Use instead: AWS Lambda (15min), Google Cloud Functions (9min), or dedicated servers for batch processing. Workers can trigger these jobs and handle the callback.
The problem: Workers AI offers smaller models. Complex reasoning, long-form generation, or fine-tuned models need more power.
Use instead: Dedicated GPU servers (Replicate, Modal, RunPod), or major provider APIs (OpenAI, Anthropic). Use Workers as a gateway for auth, caching, and response handling.
The problem: Workers can't maintain persistent TCP connections to databases. Every request would open a new connectionβcatastrophic for connection limits.
Use instead: HTTP-based databases (PlanetScale, Neon's serverless driver, Cloudflare D1, Supabase). Or use Workers as an API layer in front of a traditional backend that manages database connections.
The problem: While Durable Objects support WebSockets, broadcasting to millions of connections isn't what they're designed for.
Use instead: Cloudflare Pub/Sub, dedicated WebSocket services (Pusher, Ably), or self-hosted solutions (Socket.io on servers). Workers can handle connection setup and message validation at the edge.
The problem: Workers run V8 isolates, not full Node.js. Native
npm modules (compiled C/C++) won't work. No fs, net,
child_process.
Use instead: Full serverless (Lambda, Cloud Functions) or containers. Many popular packages have web-compatible versions; check before assuming it won't work.
The problem: Even Durable Objects are designed for request-response patterns, not always-on background processes.
Use instead: Actual servers (EC2, DigitalOcean, fly.io). Workers can coordinate with these servers, but can't replace them for persistent background work.
The Hybrid Architecture
The most effective architectures combine Workers with traditional backends:
Workers excel at the request layer: fast, global, cheap at scale. Traditional infrastructure handles what Workers can't: stateful processing, heavy computation, persistent connections. The edge layer becomes a smart, distributed API gateway.
Think of Workers as your application's "front desk." They handle everyone who walks in the door: checking credentials, answering common questions, routing complex issues to specialists. The front desk doesn't do surgeryβthat's what specialists (your backend servers) are for. But a good front desk makes everything run smoother.
Summary: Workers Are Right When...
| β Use Workers | β Use Something Else |
|---|---|
| Request/response handling | Long-running batch jobs |
| API gateways and routing | GPU-intensive AI inference |
| Authentication/authorization | Traditional database connections |
| Simple data transformations | Native module dependencies |
| Caching and CDN logic | Large-scale WebSocket broadcasting |
| Email processing | Always-on background processes |
| Edge AI (embeddings, small models) | Complex reasoning tasks |
| Real-time features (with Durable Objects) | Heavy stateful computation |
Conclusion: Start Simple, Scale Globally
Cloudflare Workers represent a fundamental shift in how we build web applications. The ability to run code in 300+ locations worldwide, with zero cold starts and minimal configuration, removes barriers that previously required entire DevOps teams to overcome.
For AI applications specifically, edge computing isn't optionalβit's essential. Voice interfaces demand sub-200ms latency. Real-time features need instant responses. API gateways must protect expensive AI backends from abuse. Workers provide the infrastructure to make all of this possible without managing servers.
Our deployments at asabove.tech demonstrate the practical reality: static sites on Pages, voice agents processing audio through Workers, email routing without external services, API endpoints responding in milliseconds globally. The combined monthly cost? Zero, on the free tier.
wrangler login, deploy a hello world Worker15 minutes to your first global deployment
Learn the patterns that matter for your use case
Connect to your existing infrastructure as the edge layer
The edge layer grows with your application
The edge is no longer a luxury for companies with massive infrastructure budgets. It's accessible to anyone who can write JavaScript. The tools are mature, the documentation is solid, and the free tier is generous enough for real production use.
Every millisecond of latency you eliminate is a user experience improved. Every request handled at the edge is a backend server protected. Every global deployment is infrastructure you didn't have to manage.
Deploy to the edge. Your users are there.
Ready to explore more infrastructure and AI topics?
Explore Techne