Antonello Fratepietro

Agentic AI Systems: Multi-Agent Architectures

2025-11-15T00:00:00+01:00

Single AI agents struggle with complex tasks. They exceed context limits, conflate responsibilities, and become brittle monoliths. Multi-agent systems decompose complexity: specialized agents handle distinct concerns, coordinate through messages, and compose into robust systems.

I built a multi-agent code analysis system: one agent parsed code structure, another reasoned about architecture, a third suggested refactorings. Each was smaller, testable, and replaceable. The coordinator orchestrated their interaction. The result was more maintainable than a single “do everything” agent.

Multi-agent systems aren’t new—distributed AI has decades of research. But LLMs make them practical: agents can understand natural language instructions, reason about tasks, and collaborate without rigid protocols.

Why Multiple Agents?

Separation of concerns - Parsing, reasoning, and execution are distinct skills. Separate agents, separate prompts, separate tests.

Context management - LLMs have finite context. Multiple focused agents stay within limits.

Specialization - Train/tune agents for specific domains (legal analysis, code review, data extraction).

Fault isolation - If the code execution agent fails, the reasoning agent continues.

Testability - Test each agent independently with unit tests.

Scalability - Scale expensive agents (GPT-4) separately from cheap ones (Claude Haiku).

Read AutoGen and LangGraph for framework approaches.

Agent Roles and Patterns

1. Coordinator (Orchestrator)

Decomposes high-level goals into subtasks, assigns to specialists, aggregates results.

from anthropic import Anthropic
from typing import List, Dict

class Coordinator:
    """Orchestrate multi-agent workflow."""
    
    def __init__(self, specialists: Dict[str, 'Agent']):
        self.client = Anthropic()
        self.specialists = specialists
        self.history = []
    
    async def process(self, task: str) -> str:
        """Break down task and coordinate execution."""
        
        # Decompose task
        subtasks = await self.decompose(task)
        
        results = {}
        for subtask in subtasks:
            agent_type = subtask['agent']
            agent = self.specialists[agent_type]
            
            # Execute subtask
            result = await agent.execute(subtask['task'])
            results[subtask['id']] = result
        
        # Synthesize results
        final_answer = await self.synthesize(task, results)
        return final_answer
    
    async def decompose(self, task: str) -> List[Dict]:
        """Decompose task into subtasks."""
        response = self.client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=2048,
            system="""You are a task coordinator. Break down complex tasks into subtasks.

Output JSON array:
[
  {"id": "1", "agent": "search", "task": "Find relevant documentation"},
  {"id": "2", "agent": "code", "task": "Analyze code structure"}
]""",
            messages=[{
                "role": "user",
                "content": f"Break down this task:\n\n{task}"
            }]
        )
        
        import json
        return json.loads(response.content[0].text)
    
    async def synthesize(self, task: str, results: Dict) -> str:
        """Combine results into final answer."""
        context = "\n\n".join([
            f"Subtask {k}:\n{v}" for k, v in results.items()
        ])
        
        response = self.client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=4096,
            system="You are a synthesizer. Combine subtask results into a coherent answer.",
            messages=[{
                "role": "user",
                "content": f"""Original task: {task}

Subtask results:
{context}

Provide a comprehensive answer to the original task."""
            }]
        )
        
        return response.content[0].text

2. Specialist Agents

Domain-specific agents with focused expertise:

class SearchAgent:
    """Specialist for web/documentation search."""
    
    def __init__(self, search_api):
        self.client = Anthropic()
        self.search_api = search_api
    
    async def execute(self, task: str) -> str:
        """Execute search task."""
        # Extract search query
        query = await self.extract_query(task)
        
        # Perform search
        results = await self.search_api.search(query)
        
        # Synthesize results
        return self.synthesize_results(results)
    
    async def extract_query(self, task: str) -> str:
        """Extract search query from task description."""
        response = self.client.messages.create(
            model="claude-3-haiku-20240307",  # Cheap model for extraction
            max_tokens=256,
            system="Extract the search query from the task. Return only the query text.",
            messages=[{"role": "user", "content": task}]
        )
        return response.content[0].text.strip()


class CodeAgent:
    """Specialist for code analysis."""
    
    def __init__(self):
        self.client = Anthropic()
    
    async def execute(self, task: str) -> str:
        """Execute code analysis task."""
        response = self.client.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=4096,
            system="""You are a code analysis expert. Analyze code for:
- Structure and architecture
- Potential bugs
- Performance issues
- Security vulnerabilities

Provide clear, actionable feedback.""",
            messages=[{"role": "user", "content": task}]
        )
        return response.content[0].text


class ExecutionAgent:
    """Specialist for executing code/commands safely."""
    
    def __init__(self, sandbox):
        self.client = Anthropic()
        self.sandbox = sandbox
    
    async def execute(self, task: str) -> str:
        """Execute code in sandbox."""
        # Parse code from task
        code = await self.extract_code(task)
        
        # Execute in sandbox
        result = await self.sandbox.run(code, timeout=30)
        
        return f"Execution result:\n{result.stdout}\n\nErrors:\n{result.stderr}"

3. Message Bus Pattern

For loose coupling and extensibility:

import asyncio
from typing import Callable, Dict, List
import json

class MessageBus:
    """Pub/sub message bus for agent communication."""
    
    def __init__(self):
        self.subscribers: Dict[str, List[Callable]] = {}
    
    def subscribe(self, topic: str, handler: Callable):
        """Subscribe to topic."""
        if topic not in self.subscribers:
            self.subscribers[topic] = []
        self.subscribers[topic].append(handler)
    
    async def publish(self, topic: str, message: Dict):
        """Publish message to topic."""
        if topic not in self.subscribers:
            return
        
        # Add metadata
        message['topic'] = topic
        message['timestamp'] = time.time()
        
        # Notify all subscribers
        tasks = [
            asyncio.create_task(handler(message))
            for handler in self.subscribers[topic]
        ]
        
        await asyncio.gather(*tasks, return_exceptions=True)


# Usage
bus = MessageBus()

# Subscribe agents
async def search_handler(message):
    query = message['query']
    results = await search_api.search(query)
    await bus.publish('search_results', {'results': results})

async def code_handler(message):
    results = message['results']
    analysis = await code_agent.analyze(results)
    await bus.publish('analysis_complete', {'analysis': analysis})

bus.subscribe('search_request', search_handler)
bus.subscribe('search_results', code_handler)

# Trigger workflow
await bus.publish('search_request', {'query': 'Flask security best practices'})

Production Architecture

from dataclasses import dataclass
from enum import Enum
import time

class AgentStatus(Enum):
    IDLE = "idle"
    WORKING = "working"
    FAILED = "failed"

@dataclass
class AgentMetrics:
    """Track agent performance."""
    total_tasks: int = 0
    successful_tasks: int = 0
    failed_tasks: int = 0
    total_latency: float = 0.0
    total_cost: float = 0.0

class ProductionAgent:
    """Production-ready agent with monitoring."""
    
    def __init__(self, name: str, client):
        self.name = name
        self.client = client
        self.status = AgentStatus.IDLE
        self.metrics = AgentMetrics()
    
    async def execute(self, task: str) -> str:
        """Execute with monitoring and error handling."""
        self.status = AgentStatus.WORKING
        start_time = time.time()
        
        try:
            # Execute with retries
            result = await self._execute_with_retry(task, max_retries=3)
            
            # Update metrics
            self.metrics.successful_tasks += 1
            self.metrics.total_latency += time.time() - start_time
            
            self.status = AgentStatus.IDLE
            return result
            
        except Exception as e:
            # Handle failure
            self.metrics.failed_tasks += 1
            self.status = AgentStatus.FAILED
            
            # Log error
            logger.error(f"Agent {self.name} failed: {e}")
            
            # Raise for coordinator to handle
            raise
        
        finally:
            self.metrics.total_tasks += 1
    
    async def _execute_with_retry(self, task: str, max_retries: int) -> str:
        """Execute with exponential backoff."""
        for attempt in range(max_retries):
            try:
                response = self.client.messages.create(
                    model="claude-3-5-sonnet-20241022",
                    max_tokens=2048,
                    messages=[{"role": "user", "content": task}]
                )
                
                # Track cost
                self.metrics.total_cost += self._calculate_cost(response)
                
                return response.content[0].text
                
            except Exception as e:
                if attempt == max_retries - 1:
                    raise
                
                # Exponential backoff
                await asyncio.sleep(2 ** attempt)
    
    def _calculate_cost(self, response) -> float:
        """Calculate API cost."""
        input_tokens = response.usage.input_tokens
        output_tokens = response.usage.output_tokens
        
        # Claude Sonnet pricing (example)
        input_cost = input_tokens * 0.003 / 1000
        output_cost = output_tokens * 0.015 / 1000
        
        return input_cost + output_cost
    
    def get_metrics(self) -> Dict:
        """Export metrics."""
        return {
            'agent': self.name,
            'status': self.status.value,
            'total_tasks': self.metrics.total_tasks,
            'success_rate': self.metrics.successful_tasks / max(self.metrics.total_tasks, 1),
            'avg_latency': self.metrics.total_latency / max(self.metrics.successful_tasks, 1),
            'total_cost': self.metrics.total_cost
        }

Observability and Monitoring

import structlog
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

# Set up tracing
trace.set_tracer_provider(TracerProvider())
tracer = trace.get_tracer(__name__)

# Add OTLP exporter
otlp_exporter = OTLPSpanExporter(endpoint="http://localhost:4317")
span_processor = BatchSpanProcessor(otlp_exporter)
trace.get_tracer_provider().add_span_processor(span_processor)

# Structured logging
logger = structlog.get_logger()

class ObservableCoordinator(Coordinator):
    """Coordinator with full observability."""
    
    async def process(self, task: str) -> str:
        """Process with tracing and logging."""
        with tracer.start_as_current_span("coordinator.process") as span:
            span.set_attribute("task", task)
            
            logger.info("processing_task", task=task)
            
            try:
                # Decompose
                with tracer.start_as_current_span("coordinator.decompose"):
                    subtasks = await self.decompose(task)
                    span.set_attribute("subtask_count", len(subtasks))
                    logger.info("decomposed_task", subtasks=len(subtasks))
                
                # Execute
                results = {}
                for subtask in subtasks:
                    with tracer.start_as_current_span(f"agent.{subtask['agent']}"):
                        result = await self.specialists[subtask['agent']].execute(subtask['task'])
                        results[subtask['id']] = result
                
                # Synthesize
                with tracer.start_as_current_span("coordinator.synthesize"):
                    answer = await self.synthesize(task, results)
                
                logger.info("task_completed", task=task)
                return answer
                
            except Exception as e:
                logger.error("task_failed", task=task, error=str(e))
                span.record_exception(e)
                span.set_status(trace.Status(trace.StatusCode.ERROR))
                raise

Testing Multi-Agent Systems

import pytest
from unittest.mock import Mock, AsyncMock

@pytest.mark.asyncio
async def test_coordinator_decomposition():
    """Test task decomposition."""
    coordinator = Coordinator({})
    coordinator.client = Mock()
    coordinator.client.messages.create = AsyncMock(return_value=Mock(
        content=[Mock(text='[{"id": "1", "agent": "search", "task": "Search docs"}]')]
    ))
    
    subtasks = await coordinator.decompose("Find Flask security info")
    
    assert len(subtasks) == 1
    assert subtasks[0]['agent'] == 'search'

@pytest.mark.asyncio
async def test_agent_execution():
    """Test agent execution with mock API."""
    agent = SearchAgent(Mock())
    agent.client = Mock()
    agent.client.messages.create = AsyncMock(return_value=Mock(
        content=[Mock(text='Flask security')]
    ))
    agent.search_api.search = AsyncMock(return_value=['result1', 'result2'])
    
    result = await agent.execute("Search for Flask security")
    
    assert result is not None
    agent.search_api.search.assert_called_once()

@pytest.mark.asyncio
async def test_message_bus():
    """Test pub/sub message bus."""
    bus = MessageBus()
    received = []
    
    async def handler(message):
        received.append(message)
    
    bus.subscribe('test', handler)
    await bus.publish('test', {'data': 'test'})
    
    await asyncio.sleep(0.1)  # Wait for async handlers
    assert len(received) == 1
    assert received[0]['data'] == 'test'

Best Practices

Design for failure - Agents will fail. Implement retries, circuit breakers, fallbacks.
Keep agents focused - One agent, one responsibility. Don’t build god agents.
Use structured outputs - JSON schemas, Pydantic models. Makes coordination reliable.
Monitor everything - Latency, cost, success rate, per agent.
Test independently - Unit test each agent with mocked dependencies.
Version agents - Deploy different agent versions independently.
Implement timeouts - Agents can hang. Set aggressive timeouts.
Cache expensive operations - Search results, embeddings, analysis.
Cost management - Track per-agent costs. Use cheaper models where possible.
Security boundaries - Agents may have different trust levels. Enforce permissions.

Conclusion

Multi-agent systems transform complex AI tasks into manageable, composable components. By decomposing responsibilities, you gain testability, fault isolation, and scalability—at the cost of coordination complexity.

The patterns are well-established: coordinators orchestrate, specialists execute, message buses decouple. The tooling is maturing: LangGraph, AutoGen, CrewAI provide frameworks. The economics work: scaling cheap and expensive agents independently optimizes cost.

Start simple: coordinator + 2-3 specialists. Add observability early. Measure everything. Iterate based on bottlenecks.

Multi-agent systems aren’t always the answer—sometimes a well-prompted single agent suffices. But for complex, multi-step tasks requiring different expertise, they’re the right architecture.

Further Resources:

AutoGen Framework - Microsoft’s multi-agent framework
LangGraph - LangChain’s graph-based agents
CrewAI - Role-based multi-agent system
Multi-Agent Systems Book - Academic foundation
OpenTelemetry - Observability standard
Anthropic Agent Patterns - Claude agent guidance
Agent Protocol - Standardized agent communication

Agentic AI systems from November 2025, covering multi-agent architectures and production patterns.

WebSocket vs SSE vs Long Polling: Choosing the Right Protocol

2025-10-15T00:00:00+02:00

Real-time communication on the web comes in three flavors: WebSocket, Server-Sent Events (SSE), and Long Polling. Each has distinct trade-offs affecting latency, resource usage, and complexity.

I’ve built systems using all three. For a collaborative code editor, WebSocket was essential—bidirectional, low-latency updates. For a live dashboard showing server metrics, SSE was perfect—simple, unidirectional stream. For a notification system supporting old browsers, Long Polling grudgingly worked.

The right choice depends on your requirements: bidirectionality, browser support, firewall friendliness, and operational complexity.

WebSocket: Full Duplex Communication

How it works: Upgrade HTTP connection to persistent TCP socket. After handshake, both client and server can send messages anytime.

Client                    Server
  |                         |
  |--- HTTP Upgrade ------->|
  |<-- 101 Switching ------- |
  |                         |
  |<-----> Binary/Text <--->|  (Bidirectional messaging)
  |                         |

When to Use WebSocket

Real-time collaboration - Google Docs, Figma, VS Code Live Share
Chat applications - Slack, Discord, Telegram web
Multiplayer games - Real-time position updates, game state
Live trading platforms - Stock prices, order book updates
IoT dashboards - Sensor data streaming

Node.js WebSocket Server

const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });

// Track connected clients
const clients = new Set();

wss.on('connection', (ws, req) => {
    console.log('Client connected from', req.socket.remoteAddress);
    clients.add(ws);
    
    // Send welcome message
    ws.send(JSON.stringify({
        type: 'welcome',
        timestamp: Date.now()
    }));
    
    // Handle messages
    ws.on('message', (message) => {
        console.log('Received:', message.toString());
        
        try {
            const data = JSON.parse(message);
            
            // Broadcast to all clients except sender
            clients.forEach(client => {
                if (client !== ws && client.readyState === WebSocket.OPEN) {
                    client.send(JSON.stringify({
                        type: 'broadcast',
                        data: data,
                        timestamp: Date.now()
                    }));
                }
            });
        } catch (error) {
            console.error('Invalid message:', error);
        }
    });
    
    // Handle errors
    ws.on('error', (error) => {
        console.error('WebSocket error:', error);
    });
    
    // Handle disconnect
    ws.on('close', (code, reason) => {
        console.log('Client disconnected:', code, reason.toString());
        clients.delete(ws);
    });
    
    // Heartbeat to detect dead connections
    ws.isAlive = true;
    ws.on('pong', () => {
        ws.isAlive = true;
    });
});

// Ping clients every 30 seconds
const interval = setInterval(() => {
    clients.forEach(ws => {
        if (ws.isAlive === false) {
            return ws.terminate();
        }
        
        ws.isAlive = false;
        ws.ping();
    });
}, 30000);

wss.on('close', () => {
    clearInterval(interval);
});

console.log('WebSocket server running on ws://localhost:8080');

Browser Client

const ws = new WebSocket('ws://localhost:8080');

ws.onopen = () => {
    console.log('Connected');
    ws.send(JSON.stringify({ action: 'subscribe', channel: 'updates' }));
};

ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    console.log('Received:', data);
    
    if (data.type === 'welcome') {
        console.log('Welcome message received');
    }
};

ws.onerror = (error) => {
    console.error('WebSocket error:', error);
};

ws.onclose = (event) => {
    console.log('Disconnected:', event.code, event.reason);
    
    // Reconnect logic
    setTimeout(() => {
        console.log('Reconnecting...');
        // Recreate WebSocket connection
    }, 3000);
};

Production Considerations

Load Balancing: Use sticky sessions or shared pub/sub:

// Shared pub/sub with Redis
const Redis = require('ioredis');
const pub = new Redis();
const sub = new Redis();

sub.subscribe('messages');
sub.on('message', (channel, message) => {
    // Broadcast to local WebSocket clients
    clients.forEach(client => {
        if (client.readyState === WebSocket.OPEN) {
            client.send(message);
        }
    });
});

// When receiving from WebSocket client
ws.on('message', (message) => {
    // Publish to Redis (reaches all servers)
    pub.publish('messages', message);
});

Monitoring:

const metrics = {
    connections: 0,
    messagesReceived: 0,
    messagesSent: 0,
    errors: 0
};

wss.on('connection', (ws) => {
    metrics.connections++;
    
    ws.on('message', () => metrics.messagesReceived++);
    ws.on('error', () => metrics.errors++);
    ws.on('close', () => metrics.connections--);
});

// Expose metrics endpoint
app.get('/metrics', (req, res) => {
    res.json(metrics);
});

Read WebSocket RFC 6455 for protocol details.

Server-Sent Events: Unidirectional Streaming

How it works: HTTP connection kept open, server sends events as text/event-stream.

When to Use SSE

Live feeds - News, sports scores, social media updates
Monitoring dashboards - Metrics, logs, alerts
Notifications - Push notifications, status updates
Progress tracking - File upload progress, job status
Stock tickers - Price updates (when client doesn’t need to send)

Express SSE Server

const express = require('express');
const app = express();

// SSE endpoint
app.get('/events', (req, res) => {
    // Set SSE headers
    res.setHeader('Content-Type', 'text/event-stream');
    res.setHeader('Cache-Control', 'no-cache');
    res.setHeader('Connection', 'keep-alive');
    res.setHeader('Access-Control-Allow-Origin', '*');
    
    // Set reconnection time
    res.write('retry: 10000\n\n');
    
    // Send initial message
    res.write(`data: ${JSON.stringify({ type: 'connected', time: Date.now() })}\n\n`);
    
    // Send updates every second
    const interval = setInterval(() => {
        const data = {
            type: 'update',
            value: Math.random() * 100,
            timestamp: Date.now()
        };
        
        // SSE format: "data: \n\n"
        res.write(`data: ${JSON.stringify(data)}\n\n`);
    }, 1000);
    
    // Clean up on disconnect
    req.on('close', () => {
        clearInterval(interval);
        console.log('Client disconnected');
    });
});

app.listen(3000, () => {
    console.log('SSE server running on http://localhost:3000');
});

Browser Client

const eventSource = new EventSource('http://localhost:3000/events');

eventSource.onopen = () => {
    console.log('SSE connection opened');
};

eventSource.onmessage = (event) => {
    const data = JSON.parse(event.data);
    console.log('Received:', data);
    
    // Update UI
    document.getElementById('value').textContent = data.value.toFixed(2);
};

eventSource.onerror = (error) => {
    console.error('SSE error:', error);
    
    if (eventSource.readyState === EventSource.CLOSED) {
        console.log('SSE connection closed');
    }
};

// Named events
eventSource.addEventListener('custom-event', (event) => {
    console.log('Custom event:', event.data);
});

Named Events

// Server: Send named events
res.write('event: alert\n');
res.write(`data: ${JSON.stringify({ message: 'System alert!' })}\n\n`);

// Client: Listen for specific events
eventSource.addEventListener('alert', (event) => {
    const alert = JSON.parse(event.data);
    showAlert(alert.message);
});

Read Server-Sent Events spec for details.

Long Polling: Request-Response Loop

How it works: Client makes request, server holds it open until data available or timeout, client immediately reconnects.

Client                    Server
  |                         |
  |--- HTTP Request ------->|
  |                         | (Wait for data or timeout)
  |<-- Response with data --|
  |                         |
  |--- HTTP Request ------->| (Immediately reconnect)
  |                         |

When to Use Long Polling

Legacy browser support - IE9, old mobile browsers
Firewall restrictions - Corporate networks blocking WebSocket
Simple notifications - Infrequent updates
Fallback mechanism - When WebSocket unavailable

Express Long Polling Server

const express = require('express');
const app = express();

// In-memory queue of pending messages
const messageQueues = new Map();

app.get('/poll', (req, res) => {
    const userId = req.query.userId;
    
    if (!messageQueues.has(userId)) {
        messageQueues.set(userId, []);
    }
    
    const queue = messageQueues.get(userId);
    
    // If messages available, send immediately
    if (queue.length > 0) {
        res.json({ messages: queue.splice(0, queue.length) });
        return;
    }
    
    // Otherwise, wait for new message or timeout
    const timeout = setTimeout(() => {
        res.json({ messages: [] });
    }, 30000);  // 30 second timeout
    
    // Store request to send message when available
    const checkInterval = setInterval(() => {
        if (queue.length > 0) {
            clearTimeout(timeout);
            clearInterval(checkInterval);
            res.json({ messages: queue.splice(0, queue.length) });
        }
    }, 100);
    
    // Clean up on disconnect
    req.on('close', () => {
        clearTimeout(timeout);
        clearInterval(checkInterval);
    });
});

// Endpoint to send message to user
app.post('/send', express.json(), (req, res) => {
    const { userId, message } = req.body;
    
    if (!messageQueues.has(userId)) {
        messageQueues.set(userId, []);
    }
    
    messageQueues.get(userId).push({
        message,
        timestamp: Date.now()
    });
    
    res.json({ success: true });
});

app.listen(3000);

Browser Client

let polling = true;

async function poll() {
    while (polling) {
        try {
            const response = await fetch(`/poll?userId=${userId}`);
            const data = await response.json();
            
            if (data.messages.length > 0) {
                data.messages.forEach(handleMessage);
            }
            
        } catch (error) {
            console.error('Polling error:', error);
            await sleep(5000);  // Back off on error
        }
    }
}

function handleMessage(message) {
    console.log('Received:', message);
}

// Start polling
poll();

// Stop polling
function stopPolling() {
    polling = false;
}

Comparison Table

Feature	WebSocket	SSE	Long Polling
Direction	Bidirectional	Server → Client	Bidirectional (via requests)
Protocol	Custom (WS)	HTTP	HTTP
Latency	<10ms	<50ms	100-500ms
Overhead	Low	Low	High (HTTP headers each poll)
Browser Support	IE10+	IE/Edge (polyfill), others native	Universal
Firewall Friendly	Sometimes blocked	Yes (HTTP)	Yes (HTTP)
Reconnection	Manual	Automatic	Manual
Binary Data	Native	Base64 encoding	Base64 encoding
Complexity	High	Low	Medium
Max Connections	65k per server	65k per server	Limited by request rate

Scaling Patterns

Pub/Sub for WebSocket/SSE

// Using NATS for pub/sub
const NATS = require('nats');
const nc = await NATS.connect({ servers: 'nats://localhost:4222' });

// Subscribe to messages
const sub = nc.subscribe('messages');
for await (const msg of sub) {
    const data = JSON.parse(msg.data);
    
    // Broadcast to WebSocket clients
    broadcastToClients(data);
}

// Publish message
nc.publish('messages', JSON.stringify({ type: 'update', data: {...} }));

Graceful Shutdown

let shuttingDown = false;

process.on('SIGTERM', async () => {
    shuttingDown = true;
    console.log('Shutting down gracefully...');
    
    // Stop accepting new connections
    wss.close(() => {
        console.log('No longer accepting connections');
    });
    
    // Wait for existing connections to finish
    const timeout = setTimeout(() => {
        console.log('Forcing shutdown');
        process.exit(0);
    }, 30000);
    
    // Close all connections gracefully
    for (const client of clients) {
        client.close(1001, 'Server shutting down');
    }
    
    clearTimeout(timeout);
    process.exit(0);
});

Conclusion

Choose WebSocket for interactive, bidirectional communication (chat, games, collaboration).

Choose SSE for server-to-client streams (dashboards, feeds, notifications). It’s simpler than WebSocket and works over HTTP.

Choose Long Polling only as a fallback for old browsers or restrictive networks. The overhead is significant.

In practice, implement WebSocket with SSE as fallback. Long Polling is rarely worth the complexity in 2025.

Further Resources:

WebSocket RFC 6455 - Protocol specification
Server-Sent Events Spec - HTML standard
Socket.IO - WebSocket library with fallbacks
ws npm package - Fast WebSocket implementation
SSE npm package - SSE server utilities
WebSocket Best Practices - MDN guide

WebSocket vs SSE vs Long Polling from October 2025, updated with production patterns.

Databricks for Data Engineers: Getting Started

2025-09-15T00:00:00+02:00

Databricks is a unified analytics platform built on Apache Spark. Founded by Spark’s creators (Matei Zaharia, Ali Ghodsi, and others from UC Berkeley), it’s become the standard for big data processing and machine learning at scale.

I moved to Databricks after struggling with self-managed Spark clusters. Maintaining Spark—tuning configs, managing resources, debugging failed jobs—consumed more time than actual data engineering. Databricks handles the infrastructure, letting you focus on transforming data.

The platform combines notebooks for exploration, production-grade job scheduling, Delta Lake for reliable storage, and MLflow for ML workflows. It’s opinionated but that opinion is informed by years of Spark expertise.

Core Components

Databricks Workspace - Web-based environment for notebooks, jobs, and clusters.

Apache Spark - Distributed processing engine (3.5+ as of 2025). See Spark documentation.

Delta Lake - ACID transactions on data lakes. Open source project: delta-io/delta.

MLflow - ML lifecycle management. Track experiments, package models, deploy. MLflow docs.

Unity Catalog - Centralized governance, lineage, and access control.

Read Databricks architecture for details.

Notebooks: Interactive Development

Databricks notebooks support Python, SQL, Scala, and R:

Python Notebook

# Read data from Delta Lake
df = spark.read.format("delta").load("/mnt/data/events")

# Show schema
df.printSchema()

# Quick stats
df.describe().show()

# Transform data
from pyspark.sql.functions import col, count, window

daily_active_users = (df
    .filter(col("event_type") == "login")
    .groupBy(window("timestamp", "1 day"))
    .agg(count("user_id").alias("daily_active_users"))
    .orderBy("window")
)

# Display in notebook
display(daily_active_users)

# Write results
daily_active_users.write.format("delta").mode("overwrite").save("/mnt/data/dau")

SQL Notebook

-- Create or replace table using Delta Lake
CREATE OR REPLACE TABLE analytics.user_activity
USING DELTA
LOCATION '/mnt/data/user_activity'
AS
SELECT 
    user_id,
    DATE(timestamp) as activity_date,
    COUNT(*) as event_count,
    COUNT(DISTINCT session_id) as session_count
FROM events
WHERE timestamp >= current_date() - INTERVAL 30 DAYS
GROUP BY user_id, DATE(timestamp);

-- Query with visualization
SELECT 
    activity_date,
    COUNT(DISTINCT user_id) as active_users,
    SUM(event_count) as total_events
FROM analytics.user_activity
GROUP BY activity_date
ORDER BY activity_date;

Notebooks support inline visualizations—click “Visualization” to create charts.

Widgets for Parameterization

# Create text widget
dbutils.widgets.text("start_date", "2025-01-01", "Start Date")
dbutils.widgets.dropdown("region", "US", ["US", "EU", "APAC"], "Region")

# Read widget values
start_date = dbutils.widgets.get("start_date")
region = dbutils.widgets.get("region")

# Use in queries
df = spark.read.format("delta").load("/mnt/data/events")
filtered = df.filter(
    (col("date") >= start_date) & 
    (col("region") == region)
)

display(filtered)

Widgets make notebooks reusable for different parameters.

Data Pipelines

ETL Pipeline

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("ETL").getOrCreate()

# Extract
raw_data = spark.read.csv("s3://bucket/data.csv")

# Transform
transformed = raw_data.select("id", "name", "age")

# Load
transformed.write.format("delta").save("/mnt/data/processed")

Best Practices

Use notebooks - Interactive development
Leverage Spark - Distributed processing
Use Delta Lake - ACID transactions
Monitor - Track job performance
Optimize - Query tuning
Test - Verify pipelines
Document - Clear processes
Scale - Handle large data

Delta Lake: ACID on Data Lakes

Delta Lake brings database reliability to data lakes—ACID transactions, schema enforcement, time travel.

Writing Delta Tables

from pyspark.sql import SparkSession
from delta.tables import DeltaTable

spark = SparkSession.builder.appName("ETL").getOrCreate()

# Write with Delta Lake
df = spark.read.csv("s3://bucket/raw-data.csv", header=True)

(df
    .write
    .format("delta")
    .mode("overwrite")
    .option("overwriteSchema", "true")
    .save("/mnt/data/processed/users")
)

# Append new data
new_data = spark.read.csv("s3://bucket/new-data.csv", header=True)
new_data.write.format("delta").mode("append").save("/mnt/data/processed/users")

Time Travel

Query historical versions:

# Read current version
df_current = spark.read.format("delta").load("/mnt/data/processed/users")

# Read version from 7 days ago
df_old = (spark.read
    .format("delta")
    .option("timestampAsOf", "2025-09-01")
    .load("/mnt/data/processed/users")
)

# Compare
changed_users = df_current.subtract(df_old)
display(changed_users)

# Read specific version number
df_v5 = (spark.read
    .format("delta")
    .option("versionAsOf", 5)
    .load("/mnt/data/processed/users")
)

MERGE (Upserts)

from delta.tables import DeltaTable

# Load Delta table
delta_table = DeltaTable.forPath(spark, "/mnt/data/processed/users")

# Merge updates
delta_table.alias("target").merge(
    source=new_data.alias("source"),
    condition="target.user_id = source.user_id"
).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

This is how you do CDC (Change Data Capture) efficiently.

OPTIMIZE and VACUUM

Maintain table performance:

# Optimize: compact small files into larger ones
spark.sql("OPTIMIZE delta.`/mnt/data/processed/users`")

# Z-ordering: colocate related data
spark.sql("OPTIMIZE delta.`/mnt/data/processed/users` ZORDER BY (user_id, created_date)")

# Vacuum: remove old versions (7 day default)
spark.sql("VACUUM delta.`/mnt/data/processed/users` RETAIN 168 HOURS")

Run OPTIMIZE weekly, VACUUM monthly. See Delta Lake performance tuning.

Production ETL Pipeline

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, current_timestamp, sha2, concat_ws
from delta.tables import DeltaTable

class ProductionETL:
    """Production-grade ETL pipeline."""
    
    def __init__(self):
        self.spark = SparkSession.builder.appName("ETL").getOrCreate()
    
    def extract(self, source_path: str):
        """Extract from source."""
        return (self.spark.read
            .format("parquet")
            .load(source_path)
        )
    
    def transform(self, df):
        """Transform data with validation."""
        # Add processing metadata
        df = df.withColumn("processed_at", current_timestamp())
        
        # Data quality checks
        df = df.filter(col("user_id").isNotNull())
        df = df.filter(col("amount") > 0)
        
        # Hash sensitive fields
        df = df.withColumn(
            "email_hash",
            sha2(col("email"), 256)
        )
        
        # Deduplication
        df = df.dropDuplicates(["user_id", "transaction_id"])
        
        return df
    
    def load(self, df, target_path: str):
        """Load to Delta Lake with merge."""
        # Check if table exists
        if DeltaTable.isDeltaTable(self.spark, target_path):
            # Merge into existing table
            delta_table = DeltaTable.forPath(self.spark, target_path)
            
            delta_table.alias("target").merge(
                source=df.alias("source"),
                condition="target.transaction_id = source.transaction_id"
            ).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()
        else:
            # Create new table
            (df.write
                .format("delta")
                .mode("overwrite")
                .save(target_path)
            )
    
    def run(self, source_path: str, target_path: str):
        """Run complete ETL."""
        try:
            # Extract
            raw_df = self.extract(source_path)
            print(f"Extracted {raw_df.count()} rows")
            
            # Transform
            clean_df = self.transform(raw_df)
            print(f"Cleaned to {clean_df.count()} rows")
            
            # Load
            self.load(clean_df, target_path)
            print(f"Loaded to {target_path}")
            
            # Optimize after load
            self.spark.sql(f"OPTIMIZE delta.`{target_path}`")
            
        except Exception as e:
            print(f"ETL failed: {e}")
            raise

# Usage
etl = ProductionETL()
etl.run(
    source_path="s3://bucket/raw/transactions/",
    target_path="/mnt/data/processed/transactions"
)

Best Practices from Production

Use Delta Lake always - ACID guarantees are worth it
Partition large tables - By date or high-cardinality keys
Z-order frequently queried columns - Colocates data
Set retention policies - Balance time travel vs storage costs
Monitor cluster metrics - CPU, memory, I/O utilization
Right-size clusters - Match to workload (don’t over-provision)
Use auto-termination - Clusters shutdown after idle time
Enable Photon - Vectorized execution engine (2-5x faster)
Cache frequently accessed data - df.cache() for reuse
Test with small samples - .limit(1000) for development

Conclusion

Databricks simplifies data engineering by handling Spark cluster management, providing excellent notebook UX, and offering Delta Lake for reliable storage. The managed platform lets you focus on transforming data rather than managing infrastructure.

The combination of Spark’s distributed processing, Delta Lake’s ACID guarantees, and notebook-based development creates a productive environment for data teams. ETL pipelines that took days to build on self-managed Spark can be prototyped in hours on Databricks.

Cost management is crucial—clusters can get expensive. Use autoscaling, right-size instances, and terminate idle clusters. The productivity gains typically justify the costs for teams processing terabytes of data.

Further Resources:

Databricks Documentation - Comprehensive guides
Apache Spark Docs - Core engine
Delta Lake - Open format and engine
MLflow - ML lifecycle management
Databricks Academy - Free courses
Delta Lake GitHub - Open source repo
Unity Catalog - Data governance

Databricks for data engineers from September 2025 — updated with production guidance.

Container Orchestration at the Edge: New Paradigms

2025-08-15T00:00:00+02:00

Edge computing promises low latency by running workloads close to users. But orchestrating containers at thousands of edge locations isn’t the same as managing a data center cluster. Resource constraints, intermittent connectivity, and distributed management demand new approaches.

I deployed a CDN edge service using traditional Kubernetes—control plane used 2GB RAM before running any workload. At 500 edge locations, that’s 1TB just for orchestration. We switched to K3s, Rancher’s lightweight Kubernetes: 512MB for control plane + agents. Same APIs, 75% less overhead.

Edge orchestration challenges three Kubernetes assumptions: abundant resources, reliable networking, and centralized control. Solutions require rethinking each.

The Edge is Different

Resource constraints:

Edge nodes: 2-4 CPU cores, 4-8GB RAM
Data center nodes: 32-96 cores, 128-512GB RAM
Difference: 10-20x less resources

Network reality:

Data center: 10Gbps+ local, <1ms latency
Edge: 10-100Mbps WAN, 50-200ms latency, periodic disconnects

Management scale:

Data center: 10-1000 nodes, centralized
Edge: 100-10,000 nodes, geographically distributed

Traditional Kubernetes doesn’t fit. New solutions emerged: K3s, MicroK8s, KubeEdge.

Lightweight Kubernetes: K3s

K3s is Kubernetes minus the bloat:

What’s removed:

Legacy alpha features
Non-default admission controllers
In-tree cloud providers
In-tree storage plugins

What’s changed:

etcd → SQLite (or Postgres/MySQL for HA)
Docker → containerd (no Docker dependency)
Single binary deployment

Result: 512MB RAM footprint vs 2GB+ for standard K8s.

Install K3s

# Master node
curl -sfL https://get.k3s.io | sh -

# Get node token
sudo cat /var/lib/rancher/k3s/server/node-token

# Worker node
curl -sfL https://get.k3s.io | K3S_URL=https://master-ip:6443 \
  K3S_TOKEN= sh -

# Verify
sudo k3s kubectl get nodes

Production install (with external database):

# PostgreSQL HA
curl -sfL https://get.k3s.io | sh -s - server \
  --datastore-endpoint="postgres://user:pass@postgres-host:5432/k3s"

Read K3s architecture for details.

Deploy Edge Application

# edge-app-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: edge-app
  labels:
    app: edge-app
spec:
  replicas: 1  # Single replica per edge location
  selector:
    matchLabels:
      app: edge-app
  template:
    metadata:
      labels:
        app: edge-app
    spec:
      # Resource limits for constrained edge
      containers:
      - name: app
        image: my-edge-app:v1.2
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: 100m      # 0.1 CPU core
            memory: 128Mi
          limits:
            cpu: 500m      # 0.5 CPU core max
            memory: 512Mi  # Hard limit
        
        # Health checks
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          failureThreshold: 3
        
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        
        # Environment config
        env:
        - name: REGION
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['region']
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName

---
# Service with NodePort (for edge ingress)
apiVersion: v1
kind: Service
metadata:
  name: edge-app
spec:
  type: NodePort
  ports:
  - port: 8080
    targetPort: 8080
    nodePort: 30080  # Accessible on node IP
  selector:
    app: edge-app

Deploy:

kubectl apply -f edge-app-deployment.yaml

# Verify
kubectl get pods
kubectl get svc

Offline-First Applications

Edge locations lose connectivity. Design for it:

Local State + Sync

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: edge-cache
spec:
  serviceName: edge-cache
  replicas: 1
  selector:
    matchLabels:
      app: edge-cache
  template:
    metadata:
      labels:
        app: edge-cache
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        ports:
        - containerPort: 6379
        volumeMounts:
        - name: data
          mountPath: /data
        command:
        - redis-server
        - --save
        - "60 1"  # Persist every 60s if 1+ keys changed
        - --appendonly
        - "yes"
        resources:
          requests:
            memory: 256Mi
          limits:
            memory: 512Mi
  
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: local-path
      resources:
        requests:
          storage: 1Gi

Application uses local Redis, syncs to central database when online:

import redis
import requests
from typing import Optional

class EdgeCache:
    """Offline-first cache with background sync."""
    
    def __init__(self):
        self.redis = redis.Redis(host='edge-cache', port=6379)
        self.central_api = 'https://central.example.com/api'
    
    def get(self, key: str) -> Optional[str]:
        """Get from local cache."""
        return self.redis.get(key)
    
    def set(self, key: str, value: str):
        """Set in local cache and queue for sync."""
        self.redis.set(key, value)
        self.redis.rpush('sync_queue', f"{key}:{value}")
    
    def sync(self):
        """Sync pending changes to central (background task)."""
        while True:
            item = self.redis.lpop('sync_queue')
            if not item:
                break
            
            try:
                key, value = item.decode().split(':', 1)
                
                # Upload to central
                response = requests.post(
                    f'{self.central_api}/sync',
                    json={'key': key, 'value': value},
                    timeout=5
                )
                response.raise_for_status()
                
            except requests.RequestException as e:
                # Network error - requeue
                self.redis.lpush('sync_queue', item)
                break  # Stop syncing, try again later

Run sync as cron job:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: sync-job
spec:
  schedule: "*/5 * * * *"  # Every 5 minutes
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: sync
            image: my-edge-app:v1.2
            command: ["python", "sync.py"]
          restartPolicy: OnFailure

Image Optimization for Edge

Bandwidth is limited. Minimize image sizes:

Multi-Stage Builds

# Build stage
FROM golang:1.21 AS builder

WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .

# Runtime stage (distroless)
FROM gcr.io/distroless/static-debian12

COPY --from=builder /app/app /app

EXPOSE 8080
USER nonroot:nonroot

ENTRYPOINT ["/app"]

Result: 10MB image vs 300MB+ with full golang base.

Pre-pull Images

Use DaemonSet to pre-pull images on all nodes:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: image-puller
spec:
  selector:
    matchLabels:
      name: image-puller
  template:
    metadata:
      labels:
        name: image-puller
    spec:
      initContainers:
      - name: pull-app-image
        image: my-edge-app:v1.2
        command: ['sh', '-c', 'echo "Image pulled"']
      - name: pull-cache-image
        image: redis:7-alpine
        command: ['sh', '-c', 'echo "Image pulled"']
      containers:
      - name: pause
        image: gcr.io/google_containers/pause:3.9

Multi-Cluster Management

Managing 100+ edge clusters requires automation. Rancher and ArgoCD help:

GitOps with ArgoCD

# argocd-app.yaml - Deploy to all edge clusters
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: edge-app-us-west
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/company/edge-apps
    targetRevision: HEAD
    path: apps/edge-app
    helm:
      values: |
        region: us-west
        replicas: 1
        image:
          tag: v1.2
  destination:
    server: https://edge-cluster-us-west.example.com
    namespace: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

Generate apps for all clusters programmatically:

# generate-apps.py
regions = ['us-west', 'us-east', 'eu-west', 'ap-southeast']

for region in regions:
    with open(f'argocd-app-{region}.yaml', 'w') as f:
        f.write(template.format(
            name=f'edge-app-{region}',
            region=region,
            server=f'https://edge-cluster-{region}.example.com'
        ))

Monitoring Distributed Edge

Centralize metrics from all edge locations:

Prometheus Federation

# prometheus-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
      external_labels:
        cluster: edge-us-west
        region: us-west
    
    # Scrape local metrics
    scrape_configs:
    - job_name: 'edge-apps'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        action: keep
        regex: edge-app
    
    # Federate to central Prometheus
    remote_write:
    - url: https://central-prometheus.example.com/api/v1/write
      basic_auth:
        username: edge
        password: secret

Query across all edge locations from central Prometheus:

# Total requests across all edge locations
sum(http_requests_total) by (region)

# P95 latency per region
histogram_quantile(0.95, 
  sum(rate(http_request_duration_seconds_bucket[5m])) by (region, le)
)

Best Practices

Right-size resources - Edge nodes are constrained. Profile actual usage:
```
kubectl top pods
kubectl top nodes
```
Use local storage - Network storage adds latency. Use K3s local-path provisioner:
```
storageClassName: local-path
```
Design for network failures - Test disconnected mode: ```bash
Simulate network partition

sudo iptables -A OUTPUT -p tcp –dport 6443 -j DROP

App should continue working offline

4. **Automate updates** - Manual updates don't scale to 100+ clusters. Use GitOps.

5. **Monitor everything** - Metrics, logs, traces. Edge issues are hard to debug remotely.

6. **Security at edge** - Edge nodes may be physically accessible:
```yaml
# Enable Pod Security Standards
apiVersion: v1
kind: Namespace
metadata:
  name: default
  labels:
    pod-security.kubernetes.io/enforce: restricted

Conclusion

Edge container orchestration requires rethinking traditional patterns. Lightweight runtimes (K3s), offline-first applications, optimized images, and centralized management make it practical.

The paradigm shift: from assuming abundant resources and reliable networking to designing for constraints and intermittency. K3s proves Kubernetes APIs work at edge scale—if you remove the bloat.

For 10-100 edge locations, this approach works. Beyond that, consider specialized edge platforms (AWS Wavelength, Cloudflare Workers) that abstract orchestration entirely.

Further Resources:

K3s Documentation - Lightweight Kubernetes
KubeEdge - Edge-native Kubernetes
MicroK8s - Minimal Kubernetes
ArgoCD - GitOps continuous delivery
Rancher - Multi-cluster management
CNCF Edge Computing - Architecture patterns
K3s GitHub - Source and issues

Container orchestration at edge from August 2025 — updated with production guidance.

Cloudflare D1: SQLite at the Edge

2025-07-15T00:00:00+02:00

Cloudflare D1 is SQLite running at the edge—familiar SQL, global replication, sub-10ms queries from anywhere. It’s SQLite’s simplicity combined with Cloudflare’s distribution network.

I built a user preferences system with D1 and was surprised by how well it worked. Query latency from Sydney? 6ms. From São Paulo? 8ms. The same database, automatically replicated to edge locations, responding fast everywhere. No sharding configuration, no multi-region complexity—just SQLite that runs globally.

D1 makes sense for read-heavy workloads that benefit from geographic proximity: user settings, feature flags, product catalogs, metadata stores. It’s less suited for write-heavy transactional systems (those need stronger consistency guarantees).

Based on SQLite (the most widely deployed database), D1 brings that reliability to the edge.

Why D1?

Familiar SQL - If you know SQLite, you know D1. Standard SQL syntax, no new query language.

Global replication - Writes propagate to edge locations automatically. Reads are always local and fast.

Workers integration - First-class integration with Cloudflare Workers. No connection pooling, no ORMs—just direct access.

Cost-effective - $0.75/million reads, $5.00/GB storage. No per-database charges.

Zero-configuration scaling - No sharding, no read replicas, no deployment topology. It just works.

Read the D1 announcement for Cloudflare’s vision.

Using D1 with Workers

D1 integrates natively with Cloudflare Workers:

Create Database

# Install Wrangler
npm install -g wrangler

# Create D1 database
wrangler d1 create my-database

# Output: database_name = "my-database", database_id = "xxx-xxx-xxx"

Add to wrangler.toml:

[[d1_databases]]
binding = "DB"  # Available as env.DB in Workers
database_name = "my-database"
database_id = "xxx-xxx-xxx"

Create Tables

# Create schema file
cat > schema.sql << 'EOF'
CREATE TABLE IF NOT EXISTS users (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    email TEXT UNIQUE NOT NULL,
    name TEXT NOT NULL,
    created_at INTEGER NOT NULL
);

CREATE INDEX idx_users_email ON users(email);

CREATE TABLE IF NOT EXISTS preferences (
    user_id INTEGER NOT NULL,
    key TEXT NOT NULL,
    value TEXT,
    PRIMARY KEY (user_id, key),
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);
EOF

# Apply schema
wrangler d1 execute my-database --file=schema.sql

Query from Workers

// worker.ts
export interface Env {
    DB: D1Database;
}

export default {
    async fetch(request: Request, env: Env): Promise<Response> {
        const url = new URL(request.url);
        
        if (url.pathname === '/api/users' && request.method === 'GET') {
            // List users
            const result = await env.DB
                .prepare('SELECT id, email, name, created_at FROM users ORDER BY created_at DESC LIMIT 10')
                .all();
            
            return Response.json(result.results);
        }
        
        if (url.pathname === '/api/users' && request.method === 'POST') {
            // Create user
            const body = await request.json();
            
            const result = await env.DB
                .prepare('INSERT INTO users (email, name, created_at) VALUES (?, ?, ?) RETURNING id')
                .bind(body.email, body.name, Date.now())
                .first();
            
            return Response.json({ id: result.id }, { status: 201 });
        }
        
        if (url.pathname.startsWith('/api/users/')) {
            const userId = url.pathname.split('/')[3];
            
            // Get user with preferences
            const user = await env.DB
                .prepare('SELECT * FROM users WHERE id = ?')
                .bind(userId)
                .first();
            
            if (!user) {
                return Response.json({ error: 'User not found' }, { status: 404 });
            }
            
            const prefs = await env.DB
                .prepare('SELECT key, value FROM preferences WHERE user_id = ?')
                .bind(userId)
                .all();
            
            return Response.json({
                ...user,
                preferences: Object.fromEntries(
                    prefs.results.map(p => [p.key, p.value])
                ),
            });
        }
        
        return Response.json({ error: 'Not found' }, { status: 404 });
    },
};

Batch Operations

Batch multiple statements for efficiency:

async function createUserWithPreferences(
    db: D1Database,
    email: string,
    name: string,
    preferences: Record<string, string>
) {
    // Prepare statements
    const insertUser = db
        .prepare('INSERT INTO users (email, name, created_at) VALUES (?, ?, ?) RETURNING id')
        .bind(email, name, Date.now());
    
    // Execute in batch (returns array of results)
    const results = await db.batch([
        insertUser,
        ...Object.entries(preferences).map(([key, value]) =>
            db.prepare('INSERT INTO preferences (user_id, key, value) VALUES (?, ?, ?)')
                .bind('(SELECT last_insert_rowid())', key, value)
        ),
    ]);
    
    const userId = results[0].results[0].id;
    return userId;
}

Batching reduces round trips—crucial for edge performance.

Prepared Statements

Always use prepared statements (parameterized queries):

// Good: Prepared statement (prevents SQL injection)
const user = await env.DB
    .prepare('SELECT * FROM users WHERE email = ?')
    .bind(email)
    .first();

// Bad: String interpolation (SQL injection risk!)
const user = await env.DB
    .prepare(`SELECT * FROM users WHERE email = '${email}'`)
    .first();

D1 automatically caches prepared statement plans for performance.

Production Best Practices

1. Schema Design

Keep schemas simple and focused:

-- Good: Compact rows, appropriate types
CREATE TABLE products (
    id INTEGER PRIMARY KEY,
    sku TEXT NOT NULL UNIQUE,
    name TEXT NOT NULL,
    price INTEGER NOT NULL,  -- Store cents as INTEGER
    active INTEGER DEFAULT 1,  -- SQLite uses INTEGER for booleans
    created_at INTEGER NOT NULL
);

CREATE INDEX idx_products_sku ON products(sku);
CREATE INDEX idx_products_active ON products(active) WHERE active = 1;

-- Avoid: Large TEXT/BLOB columns at edge
-- Store large data (images, documents) in R2, keep references in D1

Guidelines:

Normalize appropriately—D1 supports JOINs efficiently
Use INTEGER for timestamps (Unix epoch)
Index foreign keys and frequently queried columns
Keep row sizes under 1KB when possible
Store large blobs in R2, reference by key

2. Query Optimization

// Use EXPLAIN QUERY PLAN to understand queries
const plan = await env.DB
    .prepare('EXPLAIN QUERY PLAN SELECT * FROM users WHERE email = ?')
    .bind('test@example.com')
    .all();

console.log(plan.results);
// Shows if query uses indexes or table scans

Optimization tips:

Use indexes for WHERE, ORDER BY, and JOIN columns
Avoid SELECT *—specify columns you need
Use LIMIT for pagination
Consider denormalization for read-heavy workloads
Profile queries in development with EXPLAIN

See SQLite query optimization for deep dives.

3. Migrations

Version your schema changes:

# Create migrations directory
mkdir -p migrations

# Migration 001: Initial schema
cat > migrations/001_initial.sql << 'EOF'
CREATE TABLE users (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    email TEXT UNIQUE NOT NULL,
    created_at INTEGER NOT NULL
);
EOF

# Migration 002: Add name column
cat > migrations/002_add_name.sql << 'EOF'
ALTER TABLE users ADD COLUMN name TEXT;
EOF

# Apply migrations in order
for migration in migrations/*.sql; do
    echo "Applying $migration..."
    wrangler d1 execute my-database --file="$migration"
done

Migration best practices:

Make migrations idempotent (use IF NOT EXISTS)
Test in staging first
Keep migrations small and focused
Never delete migrations after deployment
Document breaking changes

4. Backups

D1 provides automatic backups, but export critical data:

# Export database to SQL
wrangler d1 export my-database --output=backup.sql

# Or export to CSV
wrangler d1 execute my-database \
    --command="SELECT * FROM users" \
    --json > users_backup.json

Schedule regular exports to R2 for redundancy.

5. Monitoring

Track query performance:

async function queryWithMetrics(
    db: D1Database,
    query: string,
    ...params: any[]
) {
    const start = Date.now();
    
    try {
        const result = await db.prepare(query).bind(...params).all();
        const duration = Date.now() - start;
        
        // Log slow queries
        if (duration > 100) {  // 100ms threshold
            console.warn('Slow query', {
                query,
                duration,
                rows: result.results.length,
            });
        }
        
        return result;
    } catch (error) {
        console.error('Query failed', { query, error });
        throw error;
    }
}

Monitor:

Query latency (p50, p95, p99)
Slow query frequency
Error rates
Database size growth
Read/write ratio

D1 vs Other Databases

Feature	D1	Planet Scale	Neon	Traditional RDS
Latency (read)	5-10ms	50-200ms	30-100ms	50-300ms
Global read	✅ Automatic	❌ Single region	❌ Single region	❌ Manual setup
SQL dialect	SQLite	MySQL	Postgres	Various
Scaling	Automatic	Automatic	Automatic	Manual
Cost (10GB + 100M reads)	~$80/mo	~$40/mo	~$60/mo	~$200/mo
Edge integration	✅ Native	❌	❌	❌

Choose D1 when:

Read-heavy workloads at global scale
Sub-10ms latency requirements
Using Cloudflare Workers
Simple to moderate complexity queries

Choose alternatives when:

Strong consistency critical (use Postgres/MySQL)
Complex transactions required
Existing Postgres/MySQL dependency
Need advanced features (triggers, stored procedures)

Limitations

D1 is early—know the constraints:

Size limits (as of 2025):

Database size: 10GB (soft limit)
Row size: ~1MB
Query execution time: 30s max
Batch size: 50 statements

Feature gaps:

No full-text search (FTS) yet
Limited geospatial support
No database-level replication control
Eventual consistency for writes (typically <1s propagation)

Check D1 limits documentation for current constraints.

Practical Use Cases

1. User preferences/settings

CREATE TABLE user_settings (
    user_id TEXT PRIMARY KEY,
    theme TEXT DEFAULT 'light',
    language TEXT DEFAULT 'en',
    notifications INTEGER DEFAULT 1,
    updated_at INTEGER
);

2. Feature flags

CREATE TABLE feature_flags (
    flag_key TEXT PRIMARY KEY,
    enabled INTEGER DEFAULT 0,
    rollout_percentage INTEGER DEFAULT 0,
    updated_at INTEGER
);

3. Product catalog

CREATE TABLE products (
    id TEXT PRIMARY KEY,
    name TEXT NOT NULL,
    description TEXT,
    price INTEGER NOT NULL,
    inventory INTEGER NOT NULL,
    metadata TEXT  -- JSON string
);

4. API rate limiting

CREATE TABLE rate_limits (
    key TEXT PRIMARY KEY,
    count INTEGER DEFAULT 0,
    window_start INTEGER NOT NULL,
    expires_at INTEGER NOT NULL
);

CREATE INDEX idx_rate_limits_expires ON rate_limits(expires_at);

Conclusion

D1 brings SQLite’s simplicity and SQLite’s proven reliability to the edge. For read-heavy workloads that benefit from global distribution, it’s compelling—sub-10ms queries from anywhere with zero configuration.

The Developer Experience is excellent: familiar SQL, direct Workers integration, no connection pooling headaches. The automatic replication is invisible and just works.

D1 is young. Features are still rolling out. But for the right use case—globally distributed, read-heavy data with moderate write frequency—it’s hard to beat. SQLite at the edge is a powerful primitive.

Further Resources:

Cloudflare D1 Documentation - Official docs
D1 Get Started Guide - Quick start
Wrangler CLI - D1 management
SQLite Documentation - SQL reference
SQLite Query Optimization - Performance tuning
D1 Pricing - Cost details
D1 Limits - Current constraints

Cloudflare D1 from July 2025 — updated with production guidance.

Generative AI Engineering: Best Practices

2025-06-15T00:00:00+02:00

Generative AI engineering is less about the models (they’re commodities now) and more about the systems around them: prompts, retrieval, evaluation, caching, and monitoring. The difference between a demo and production is these unglamorous layers.

I’ve built multiple production GenAI systems—chatbots, coding assistants, document analysis. The models (GPT-4, Claude, Gemini) are interchangeable. The hard parts are: getting the right context into prompts, handling failures gracefully, managing costs, and measuring quality. This post covers patterns that work at scale.

Drawing from Anthropic’s prompt engineering guide, OpenAI’s best practices, and real production experience.

Prompt Engineering: The Core Skill

Prompts are your interface to LLMs. Good prompts are specific, structured, and include examples.

The Six Principles

From Anthropic’s guide:

Give Claude a role - Context shapes behavior
Use XML tags - Structure improves parsing
Be specific - Vague prompts get vague outputs
Use examples - Few-shot examples are powerful
Let Claude think - Chain-of-thought improves reasoning
Use prefill - Control output format

Structured Prompts

Always structure prompts with clear sections:

from anthropic import Anthropic

client = Anthropic(api_key='your-key')

def analyze_document(document: str, question: str) -> str:
    """Analyze document with structured prompt."""
    
    prompt = f"""You are an expert document analyst. Your task is to answer questions about documents accurately and concisely.


{document}



{question}


Instructions:
1. Read the document carefully
2. Identify relevant information
3. Answer the question based only on the document
4. If the answer isn't in the document, say "Not found in document"
5. Cite specific passages when possible

Think through your answer step-by-step, then provide your final answer."""

    response = client.messages.create(
        model='claude-3-5-sonnet-20241022',
        max_tokens=1024,
        messages=[{'role': 'user', 'content': prompt}]
    )
    
    return response.content[0].text

Why this works:

Clear role definition
XML tags separate inputs
Explicit instructions
Step-by-step thinking
Constraints on output

Few-Shot Examples

Examples are more powerful than instructions:

def classify_sentiment(text: str) -> str:
    """Classify sentiment with examples."""
    
    prompt = f"""Classify the sentiment of the following text as positive, negative, or neutral.

Examples:

Text: "This product exceeded my expectations! Amazing quality."
Sentiment: positive

Text: "Terrible experience. Would not recommend."
Sentiment: negative

Text: "The item arrived on time."
Sentiment: neutral

Now classify this text:

Text: "{text}"
Sentiment:"""

    response = client.messages.create(
        model='claude-3-5-sonnet-20241022',
        max_tokens=10,
        messages=[{'role': 'user', 'content': prompt}]
    )
    
    return response.content[0].text.strip()

Three examples teach the model the pattern. For complex tasks, 5-10 examples work better.

Chain-of-Thought Prompting

For reasoning tasks, ask the model to think step-by-step:

def solve_math_problem(problem: str) -> dict:
    """Solve with chain-of-thought reasoning."""
    
    prompt = f"""Solve this math problem step-by-step.

Problem: {problem}

Let's solve this step by step:
1. First, identify what we're looking for
2. Then, break down the problem
3. Show your work
4. Finally, state the answer

Begin:"""

    response = client.messages.create(
        model='claude-3-5-sonnet-20241022',
        max_tokens=1024,
        messages=[{'role': 'user', 'content': prompt}]
    )
    
    reasoning = response.content[0].text
    
    # Extract final answer (simplified)
    answer = reasoning.split('answer')[-1].strip()
    
    return {
        'reasoning': reasoning,
        'answer': answer,
    }

Studies show CoT improves accuracy on reasoning tasks by 20-40%. See Google’s CoT paper.

Prompt Templates

Use templates for consistency:

from string import Template

# Define template once
SUMMARIZATION_TEMPLATE = Template("""Summarize the following ${document_type} in ${length} words or less.

Focus on:
${focus_areas}

${document_type}:
${content}

Summary:""")

# Use with different parameters
prompt = SUMMARIZATION_TEMPLATE.substitute(
    document_type='research paper',
    length='100',
    focus_areas='- Main findings\n- Methodology\n- Conclusions',
    content=paper_text
)

Templates ensure consistent quality and make A/B testing easier.

RAG: Retrieval-Augmented Generation

RAG solves the knowledge cutoff and hallucination problems by retrieving relevant context before generation.

Basic RAG Pipeline

from openai import OpenAI
import pinecone

client = OpenAI(api_key='your-key')
pc = pinecone.Pinecone(api_key='your-key')
index = pc.Index('knowledge-base')

def rag_query(question: str, top_k: int = 5) -> str:
    """Answer question using RAG."""
    
    # 1. Embed the question
    question_embedding = client.embeddings.create(
        model='text-embedding-3-small',
        input=question
    ).data[0].embedding
    
    # 2. Retrieve relevant documents
    results = index.query(
        vector=question_embedding,
        top_k=top_k,
        include_metadata=True
    )
    
    # 3. Format context
    context = "\n\n".join([
        f"Document {i+1}:\n{match.metadata['text']}"
        for i, match in enumerate(results.matches)
    ])
    
    # 4. Generate answer with context
    prompt = f"""Answer the question based on the provided context.

Context:
{context}

Question: {question}

Answer based only on the context above. If the answer isn't in the context, say "I don't have enough information to answer that."

Answer:"""

    response = client.chat.completions.create(
        model='gpt-4o',
        messages=[{'role': 'user', 'content': prompt}],
        temperature=0,  # Low temperature for factual answers
    )
    
    return response.choices[0].message.content

Advanced RAG: Reranking

Simple vector search isn’t always accurate. Rerank with a cross-encoder:

from sentence_transformers import CrossEncoder

reranker = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

def rag_with_reranking(question: str, top_k: int = 5) -> str:
    """RAG with reranking for better accuracy."""
    
    # 1. Retrieve more candidates than needed
    results = vector_search(question, top_k=top_k * 3)
    
    # 2. Rerank using cross-encoder
    pairs = [[question, match.metadata['text']] for match in results.matches]
    scores = reranker.predict(pairs)
    
    # 3. Sort by reranker scores
    reranked = sorted(zip(results.matches, scores), key=lambda x: x[1], reverse=True)
    
    # 4. Use top-k after reranking
    top_results = [match for match, score in reranked[:top_k]]
    
    # 5. Generate with reranked context
    context = format_context(top_results)
    return generate_answer(question, context)

Reranking improves accuracy by 10-20% in my experience. See LlamaIndex’s reranking guide.

HyDE: Hypothetical Document Embeddings

For complex queries, generate a hypothetical answer first:

def hyde_rag(question: str) -> str:
    """RAG with hypothetical document embeddings."""
    
    # 1. Generate hypothetical answer
    hypothetical_prompt = f"""Generate a detailed answer to this question:

{question}

Write as if you're answering from authoritative sources."""

    hypothetical_answer = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{'role': 'user', 'content': hypothetical_prompt}],
        temperature=0.7,
    ).choices[0].message.content
    
    # 2. Embed and search using hypothetical answer
    # (hypothetical answer better matches document style)
    embedding = embed(hypothetical_answer)
    results = index.query(vector=embedding, top_k=5)
    
    # 3. Generate final answer with retrieved context
    context = format_context(results.matches)
    return generate_answer(question, context)

HyDE improves retrieval for questions that don’t match document phrasing. Paper: Precise Zero-Shot Dense Retrieval.

Evaluation: Measuring Quality

LLM outputs are probabilistic. You need systematic evaluation.

Automated Evaluation Metrics

from openai import OpenAI
import numpy as np

client = OpenAI()

class LLMEvaluator:
    """Evaluate LLM outputs systematically."""
    
    def evaluate_answer(self, question: str, answer: str, ground_truth: str) -> dict:
        """Evaluate answer quality."""
        
        # 1. Semantic similarity (embeddings)
        answer_emb = self.embed(answer)
        truth_emb = self.embed(ground_truth)
        similarity = np.dot(answer_emb, truth_emb)
        
        # 2. LLM-as-judge
        judge_prompt = f"""Evaluate the quality of this answer on a scale of 1-5.

Question: {question}

Expected Answer: {ground_truth}

Actual Answer: {answer}

Rate the answer considering:
- Accuracy (is it factually correct?)
- Completeness (does it fully answer the question?)
- Relevance (does it stay on topic?)

Provide a score (1-5) and brief explanation.

Format:
Score: [1-5]
Explanation: [your reasoning]"""

        judge_response = client.chat.completions.create(
            model='gpt-4o',
            messages=[{'role': 'user', 'content': judge_prompt}],
            temperature=0,
        ).choices[0].message.content
        
        # Parse score
        score = int(judge_response.split('Score:')[1].split('\n')[0].strip())
        
        return {
            'semantic_similarity': similarity,
            'llm_judge_score': score,
            'judge_explanation': judge_response,
        }
    
    def embed(self, text: str):
        """Get embedding."""
        return client.embeddings.create(
            model='text-embedding-3-small',
            input=text
        ).data[0].embedding

Test Sets

Build curated test sets:

test_cases = [
    {
        'question': 'What is the capital of France?',
        'expected': 'Paris',
        'category': 'factual',
    },
    {
        'question': 'Explain photosynthesis simply',
        'expected': 'Plants convert sunlight into energy...',
        'category': 'explanation',
    },
    # ... more test cases
]

def run_evaluation(system, test_cases):
    """Run systematic evaluation."""
    results = []
    
    for test in test_cases:
        answer = system.answer(test['question'])
        
        metrics = evaluator.evaluate_answer(
            test['question'],
            answer,
            test['expected']
        )
        
        results.append({
            'question': test['question'],
            'answer': answer,
            'metrics': metrics,
            'category': test['category'],
        })
    
    # Aggregate by category
    by_category = {}
    for result in results:
        cat = result['category']
        if cat not in by_category:
            by_category[cat] = []
        by_category[cat].append(result['metrics']['llm_judge_score'])
    
    # Print summary
    for category, scores in by_category.items():
        avg = np.mean(scores)
        print(f"{category}: {avg:.2f}/5.0")
    
    return results

A/B Testing

Compare prompt variants:

def ab_test_prompts(variant_a, variant_b, test_cases, sample_size=100):
    """A/B test two prompt variants."""
    
    results_a = []
    results_b = []
    
    for test in test_cases[:sample_size]:
        # Test variant A
        answer_a = generate_with_prompt(variant_a, test['question'])
        score_a = evaluate(test['question'], answer_a, test['expected'])
        results_a.append(score_a)
        
        # Test variant B
        answer_b = generate_with_prompt(variant_b, test['question'])
        score_b = evaluate(test['question'], answer_b, test['expected'])
        results_b.append(score_b)
    
    # Statistical comparison
    from scipy import stats
    t_stat, p_value = stats.ttest_ind(results_a, results_b)
    
    return {
        'variant_a_mean': np.mean(results_a),
        'variant_b_mean': np.mean(results_b),
        'p_value': p_value,
        'winner': 'A' if np.mean(results_a) > np.mean(results_b) else 'B',
        'significant': p_value < 0.05,
    }

Use Weights & Biases or LangSmith for experiment tracking.

Production Best Practices

1. Cost Optimization

LLM costs are variable—optimize aggressively:

class CostOptimizedLLM:
    """LLM client with cost optimization."""
    
    PRICING = {
        'gpt-4o': {'input': 0.0025, 'output': 0.010, 'quality': 5},
        'gpt-4o-mini': {'input': 0.00015, 'output': 0.0006, 'quality': 4},
        'claude-3-5-sonnet': {'input': 0.003, 'output': 0.015, 'quality': 5},
    }
    
    def __init__(self):
        self.cache = {}
        self.total_cost = 0
    
    def generate(self, prompt: str, task_complexity: str = 'medium'):
        """Generate with cost optimization."""
        
        # 1. Check cache
        cache_key = hash(prompt)
        if cache_key in self.cache:
            return self.cache[cache_key]
        
        # 2. Select model based on task
        model = self._select_model(task_complexity)
        
        # 3. Minimize token usage
        optimized_prompt = self._optimize_prompt(prompt)
        
        # 4. Generate
        response = client.chat.completions.create(
            model=model,
            messages=[{'role': 'user', 'content': optimized_prompt}],
            max_tokens=self._calculate_max_tokens(task_complexity),
        )
        
        # 5. Track cost
        cost = self._calculate_cost(model, response.usage)
        self.total_cost += cost
        
        # 6. Cache result
        result = response.choices[0].message.content
        self.cache[cache_key] = result
        
        return result
    
    def _select_model(self, complexity: str) -> str:
        """Choose cheapest model that meets quality needs."""
        if complexity == 'simple':
            return 'gpt-4o-mini'
        elif complexity == 'medium':
            return 'gpt-4o-mini'  # Try cheap first
        else:
            return 'gpt-4o'
    
    def _optimize_prompt(self, prompt: str) -> str:
        """Remove unnecessary tokens."""
        # Remove extra whitespace
        optimized = ' '.join(prompt.split())
        # Truncate if too long
        if len(optimized) > 10000:
            optimized = optimized[:10000] + '...'
        return optimized
    
    def _calculate_max_tokens(self, complexity: str) -> int:
        """Set appropriate max_tokens."""
        limits = {'simple': 256, 'medium': 512, 'complex': 2048}
        return limits.get(complexity, 512)

Cost reduction strategies:

Use cheaper models (GPT-4o-mini) for simple tasks
Cache aggressively (30-50% cache hit rate typical)
Minimize prompt tokens (context compression)
Set appropriate max_tokens
Batch requests where possible

2. Reliability and Error Handling

LLMs fail. Handle it gracefully:

import time
import random
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
def generate_with_retry(prompt: str) -> str:
    """Generate with automatic retries."""
    try:
        response = client.chat.completions.create(
            model='gpt-4o',
            messages=[{'role': 'user', 'content': prompt}],
            timeout=30,
        )
        return response.choices[0].message.content
    
    except client.RateLimitError:
        # Hit rate limit, wait and retry
        time.sleep(5)
        raise
    
    except client.APIError as e:
        # API error, retry
        print(f"API error: {e}")
        raise
    
    except Exception as e:
        # Unexpected error
        print(f"Unexpected error: {e}")
        return "I apologize, but I'm having trouble processing your request."

3. Monitoring

Track what matters:

import structlog
from dataclasses import dataclass
from datetime import datetime

logger = structlog.get_logger()

@dataclass
class LLMMetrics:
    """Track LLM usage metrics."""
    request_id: str
    model: str
    prompt_tokens: int
    completion_tokens: int
    latency_ms: float
    cost_usd: float
    success: bool
    error: str = None

def log_llm_request(metrics: LLMMetrics):
    """Log for analysis."""
    logger.info(
        "llm_request",
        request_id=metrics.request_id,
        model=metrics.model,
        prompt_tokens=metrics.prompt_tokens,
        completion_tokens=metrics.completion_tokens,
        latency_ms=metrics.latency_ms,
        cost=metrics.cost_usd,
        success=metrics.success,
        error=metrics.error,
    )

# Track aggregate metrics:
# - Requests per minute
# - Average latency (p50, p95, p99)
# - Token usage per user/endpoint
# - Cost per day/user
# - Error rate by type
# - Cache hit rate

Use Helicone, LangSmith, or Weights & Biases for LLM observability.

4. Security

Protect against prompt injection and data leakage:

def sanitize_input(user_input: str) -> str:
    """Remove potential prompt injection."""
    # Remove system-like instructions
    dangerous_patterns = [
        'ignore previous instructions',
        'disregard the above',
        'system:',
        'assistant:',
    ]
    
    cleaned = user_input.lower()
    for pattern in dangerous_patterns:
        if pattern in cleaned:
            return "[Input rejected: suspicious pattern detected]"
    
    # Limit length
    if len(user_input) > 5000:
        user_input = user_input[:5000]
    
    return user_input

def detect_pii(text: str) -> bool:
    """Check for personally identifiable information."""
    import re
    
    # Email
    if re.search(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text):
        return True
    
    # Phone number
    if re.search(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', text):
        return True
    
    # SSN pattern
    if re.search(r'\b\d{3}-\d{2}-\d{4}\b', text):
        return True
    
    return False

Conclusion

Generative AI engineering is systems engineering. The models are tools—the value is in how you use them. Focus on prompts, retrieval, evaluation, cost optimization, and reliability.

Start simple: good prompts with few-shot examples, basic RAG, automated evaluation. Add complexity only when needed. Measure everything—costs, latency, quality. Iterate based on data.

The best AI systems feel simple to users but are sophisticated underneath. That sophistication comes from engineering discipline, not fancy models.

Further Resources:

Anthropic Prompt Engineering - Comprehensive guide
OpenAI Best Practices - Prompting strategies
LangChain Documentation - RAG patterns
LlamaIndex - Advanced RAG
Weights & Biases for LLMs - Experiment tracking
LangSmith - LLM observability
Helicone - LLM monitoring
Prompt Engineering Guide - Techniques and examples

Generative AI engineering from June 2025 — updated with production guidance.

Building AI Coding Assistants: Technical Deep Dive

2025-05-15T00:00:00+02:00

AI coding assistants have gone from party trick to indispensable tool in under two years. GitHub Copilot, Cursor, Cody, and Tabnine are used by millions of developers daily. Building one requires solving hard problems: understanding massive codebases, generating correct code, and integrating with developer workflows.

I’ve built several coding assistants—from simple autocomplete to full agentic systems. The core challenge isn’t LLMs (they’re a commodity now)—it’s everything around them: context retrieval, code analysis, execution validation, and UX.

This post covers the architecture that works in production, learned from systems processing millions of code generation requests.

High-Level Architecture

A production coding assistant has these components:

┌─────────────┐
│  IDE Plugin │  ← User interaction
└──────┬──────┘
       │
┌──────▼──────────────┐
│  Orchestrator       │  ← Request routing, rate limiting
├─────────────────────┤
│  Context Engine     │  ← RAG, file selection
├─────────────────────┤
│  Code Analysis      │  ← AST, LSP, static analysis
├─────────────────────┤
│  LLM Service        │  ← OpenAI, Anthropic, local models
├─────────────────────┤
│  Execution Sandbox  │  ← Run and test generated code
├─────────────────────┤
│  Cache Layer        │  ← Response caching, embeddings
└─────────────────────┘

Each layer handles specific concerns. Let’s dive into each.

Context is Everything: RAG for Code

Large codebases have millions of lines. You can’t fit that in LLM context. You need intelligent retrieval.

Chunking Code

Unlike prose, code has structure. Chunk by semantic units:

import ast
from typing import List, Dict

class CodeChunker:
    """Chunk code by functions, classes, and top-level statements."""
    
    def chunk_python_file(self, code: str) -> List[Dict]:
        """Split Python file into semantic chunks."""
        tree = ast.parse(code)
        chunks = []
        
        for node in ast.iter_child_nodes(tree):
            chunk = {
                'type': type(node).__name__,
                'name': getattr(node, 'name', 'anonymous'),
                'code': ast.get_source_segment(code, node),
                'lineno': node.lineno,
                'end_lineno': node.end_lineno,
            }
            
            # Add docstring if present
            if isinstance(node, (ast.FunctionDef, ast.ClassDef)):
                docstring = ast.get_docstring(node)
                if docstring:
                    chunk['docstring'] = docstring
            
            chunks.append(chunk)
        
        return chunks

# Usage
chunker = CodeChunker()
chunks = chunker.chunk_python_file(open('app.py').read())

for chunk in chunks:
    print(f"{chunk['type']}: {chunk['name']} (lines {chunk['lineno']}-{chunk['end_lineno']})")

For other languages, use tree-sitter for consistent parsing:

from tree_sitter import Language, Parser
import tree_sitter_python

# Load Python grammar
PY_LANGUAGE = Language(tree_sitter_python.language())
parser = Parser(PY_LANGUAGE)

def extract_functions(code: str) -> List[Dict]:
    """Extract all function definitions."""
    tree = parser.parse(bytes(code, 'utf8'))
    
    functions = []
    for node in tree.root_node.children:
        if node.type == 'function_definition':
            functions.append({
                'name': node.child_by_field_name('name').text.decode(),
                'code': code[node.start_byte:node.end_byte],
                'start_line': node.start_point[0],
                'end_line': node.end_point[0],
            })
    
    return functions

Embedding and Indexing

Use code-specific embedding models for better semantic search:

from sentence_transformers import SentenceTransformer
import pinecone

# Specialized code embedding model
model = SentenceTransformer('microsoft/codebert-base')

# Initialize Pinecone
pc = pinecone.Pinecone(api_key='your-key')
index = pc.Index('codebase')

def index_codebase(repo_path: str):
    """Index an entire codebase."""
    for filepath in glob_python_files(repo_path):
        code = open(filepath).read()
        chunks = chunker.chunk_python_file(code)
        
        for chunk in chunks:
            # Create searchable text
            search_text = f"""
{chunk['type']} {chunk['name']}
{chunk.get('docstring', '')}
{chunk['code']}
            """.strip()
            
            # Embed
            embedding = model.encode(search_text)
            
            # Store in vector DB
            index.upsert([{
                'id': f"{filepath}:{chunk['lineno']}",
                'values': embedding.tolist(),
                'metadata': {
                    'file': filepath,
                    'name': chunk['name'],
                    'type': chunk['type'],
                    'code': chunk['code'][:1000],  # Truncate for storage
                    'lineno': chunk['lineno'],
                }
            }])

def find_relevant_code(query: str, top_k: int = 5):
    """Find code relevant to query."""
    query_embedding = model.encode(query)
    
    results = index.query(
        vector=query_embedding.tolist(),
        top_k=top_k,
        include_metadata=True
    )
    
    return [
        {
            'file': r.metadata['file'],
            'name': r.metadata['name'],
            'code': r.metadata['code'],
            'score': r.score,
        }
        for r in results.matches
    ]

# Usage
relevant_code = find_relevant_code("How to authenticate users?")
for code in relevant_code:
    print(f"Score: {code['score']:.3f} - {code['file']}:{code['name']}")

Hybrid Search: Combine Semantic + Keyword

Pure vector search misses exact matches. Combine with keyword search:

def hybrid_search(query: str, top_k: int = 10):
    """Combine semantic and keyword search."""
    # Semantic search
    semantic_results = find_relevant_code(query, top_k=top_k * 2)
    
    # Keyword search (simple implementation)
    keyword_results = search_by_keywords(query, top_k=top_k * 2)
    
    # Merge and rank (Reciprocal Rank Fusion)
    combined_scores = {}
    for rank, result in enumerate(semantic_results, 1):
        combined_scores[result['id']] = 1 / (rank + 60)
    
    for rank, result in enumerate(keyword_results, 1):
        result_id = result['id']
        combined_scores[result_id] = combined_scores.get(result_id, 0) + 1 / (rank + 60)
    
    # Sort by combined score
    ranked = sorted(combined_scores.items(), key=lambda x: x[1], reverse=True)
    return ranked[:top_k]

See Anthropic’s guide on RAG for code for more techniques.

Code Analysis: Understanding Structure

Static analysis helps validate and improve generated code:

Language Server Protocol (LSP)

LSP provides IDE-like intelligence:

from pylsp.python_lsp import PythonLanguageServer

class CodeAnalyzer:
    """Analyze code using LSP."""
    
    def __init__(self):
        self.lsp = PythonLanguageServer()
    
    def get_completions(self, filepath: str, line: int, column: int):
        """Get completion suggestions at cursor."""
        return self.lsp.completions({
            'textDocument': {'uri': f'file://{filepath}'},
            'position': {'line': line, 'character': column}
        })
    
    def get_diagnostics(self, filepath: str, code: str):
        """Get errors and warnings."""
        return self.lsp.lint({
            'textDocument': {'uri': f'file://{filepath}'},
            'text': code
        })
    
    def find_references(self, filepath: str, symbol: str):
        """Find all references to a symbol."""
        return self.lsp.references({
            'textDocument': {'uri': f'file://{filepath}'},
            'position': self.find_symbol_position(symbol)
        })

Type Inference

Use Pyright or mypy to validate generated code:

import subprocess
import json

def check_types(code: str) -> List[Dict]:
    """Run Pyright on code."""
    # Write code to temp file
    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
        f.write(code)
        filepath = f.name
    
    try:
        # Run Pyright
        result = subprocess.run(
            ['pyright', '--outputjson', filepath],
            capture_output=True,
            text=True
        )
        
        diagnostics = json.loads(result.stdout)
        return diagnostics.get('generalDiagnostics', [])
    finally:
        os.unlink(filepath)

# Usage
code = """
def add(a: int, b: int) -> int:
    return a + b

result = add("5", 10)  # Type error!
"""

errors = check_types(code)
for error in errors:
    print(f"Line {error['range']['start']['line']}: {error['message']}")

Security Scanning

Detect security issues with Bandit:

import bandit
from bandit.core import manager

def security_scan(code: str) -> List[Dict]:
    """Scan for security issues."""
    # Create temp file
    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
        f.write(code)
        filepath = f.name
    
    try:
        # Run Bandit
        b = manager.BanditManager(bandit.config.BanditConfig(), 'file')
        b.discover_files([filepath])
        b.run_tests()
        
        issues = []
        for result in b.get_issue_list():
            issues.append({
                'severity': result.severity,
                'confidence': result.confidence,
                'text': result.text,
                'line': result.lineno,
            })
        
        return issues
    finally:
        os.unlink(filepath)

Execution and Validation

Generate code, run it, validate results:

Test-Driven Generation

Generate code and tests together:

from anthropic import Anthropic

client = Anthropic(api_key='your-key')

def generate_with_tests(spec: str, context: str) -> Dict:
    """Generate function and tests."""
    prompt = f"""Generate a Python function and pytest tests for:

{spec}

Context from codebase:
{context}

Return:
1. The function implementation
2. At least 3 pytest test cases
3. Docstring with examples

Format as Python code blocks."""

    response = client.messages.create(
        model='claude-3-5-sonnet-20241022',
        max_tokens=2000,
        messages=[{'role': 'user', 'content': prompt}]
    )
    
    # Parse response (simplified)
    code = extract_code_blocks(response.content[0].text)
    
    return {
        'function': code[0],
        'tests': code[1],
    }

def validate_generated_code(code: str, tests: str) -> bool:
    """Run tests against generated code."""
    # Combine code and tests
    full_code = f"{code}\n\n{tests}"
    
    # Write to temp file
    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
        f.write(full_code)
        filepath = f.name
    
    try:
        # Run pytest
        result = subprocess.run(
            ['pytest', filepath, '-v'],
            capture_output=True,
            text=True,
            timeout=10
        )
        
        return result.returncode == 0
    except subprocess.TimeoutExpired:
        return False
    finally:
        os.unlink(filepath)

Sandboxed Execution

Use E2B for secure code execution:

from e2b import Sandbox

def execute_safely(code: str, inputs: List[str]) -> Dict:
    """Execute code in sandbox."""
    with Sandbox() as sandbox:
        # Write code
        sandbox.filesystem.write('main.py', code)
        
        # Execute with inputs
        results = []
        for input_data in inputs:
            result = sandbox.run_code(
                code,
                env_vars={'INPUT': input_data},
                timeout=5
            )
            
            results.append({
                'stdout': result.stdout,
                'stderr': result.stderr,
                'exit_code': result.exit_code,
                'error': result.error,
            })
        
        return results

# Usage
code = """
import os
print(f"Hello {os.getenv('INPUT', 'World')}!")
"""

results = execute_safely(code, ['Alice', 'Bob'])
for i, r in enumerate(results):
    print(f"Run {i+1}: {r['stdout']}")

If code fails tests, refine iteratively:

def iterative_generation(spec: str, max_iterations: int = 3) -> str:
    """Generate and refine code until tests pass."""
    context = find_relevant_code(spec)
    
    for i in range(max_iterations):
        # Generate code
        result = generate_with_tests(spec, context)
        code = result['function']
        tests = result['tests']
        
        # Validate
        if validate_generated_code(code, tests):
            return code
        
        # If failed, add error context and retry
        errors = check_types(code)
        context += f"\n\nPrevious attempt had errors:\n{errors}"
    
    raise Exception("Failed to generate working code")

Production Considerations

Cost Optimization

LLM costs add up fast at scale:

class CostTracker:
    """Track and optimize LLM costs."""
    
    PRICING = {
        'gpt-4o': {'input': 0.0025, 'output': 0.010},
        'gpt-4o-mini': {'input': 0.00015, 'output': 0.0006},
        'claude-3-5-sonnet': {'input': 0.003, 'output': 0.015},
    }
    
    def __init__(self):
        self.total_cost = 0
    
    def calculate_cost(self, model: str, input_tokens: int, output_tokens: int):
        """Calculate request cost."""
        pricing = self.PRICING[model]
        cost = (
            (input_tokens / 1000) * pricing['input'] +
            (output_tokens / 1000) * pricing['output']
        )
        self.total_cost += cost
        return cost

# Optimization strategies:
# 1. Use cheaper models for simple tasks (autocomplete)
# 2. Cache responses aggressively
# 3. Minimize context with smart retrieval
# 4. Use streaming to show results faster

Response Caching

Cache at multiple levels:

import hashlib
import redis

class CacheLayer:
    """Multi-level caching for coding assistant."""
    
    def __init__(self):
        self.redis = redis.Redis(host='localhost', port=6379)
        self.memory_cache = {}  # In-memory for ultra-fast access
    
    def get_cached_response(self, query: str, context: str) -> Optional[str]:
        """Get cached response."""
        # Create cache key
        cache_key = hashlib.sha256(
            f"{query}:{context}".encode()
        ).hexdigest()
        
        # Check memory cache
        if cache_key in self.memory_cache:
            return self.memory_cache[cache_key]
        
        # Check Redis
        cached = self.redis.get(cache_key)
        if cached:
            response = cached.decode()
            self.memory_cache[cache_key] = response  # Promote to memory
            return response
        
        return None
    
    def cache_response(self, query: str, context: str, response: str, ttl: int = 3600):
        """Cache response."""
        cache_key = hashlib.sha256(
            f"{query}:{context}".encode()
        ).hexdigest()
        
        # Store in both layers
        self.memory_cache[cache_key] = response
        self.redis.setex(cache_key, ttl, response)

Model Selection

Use different models for different tasks:

def select_model(task_type: str) -> str:
    """Choose model based on task."""
    if task_type == 'autocomplete':
        return 'gpt-4o-mini'  # Fast, cheap
    elif task_type == 'explain':
        return 'gpt-4o-mini'  # Good enough
    elif task_type == 'generate_complex':
        return 'claude-3-5-sonnet'  # Best quality
    elif task_type == 'refactor':
        return 'gpt-4o'  # Balance of speed/quality
    else:
        return 'gpt-4o-mini'  # Default to cheap

Monitoring

Track what matters:

import structlog
from dataclasses import dataclass

logger = structlog.get_logger()

@dataclass
class RequestMetrics:
    request_id: str
    task_type: str
    model: str
    input_tokens: int
    output_tokens: int
    latency_ms: float
    cache_hit: bool
    cost_usd: float
    success: bool

def log_request(metrics: RequestMetrics):
    """Log request metrics for analysis."""
    logger.info(
        "coding_assistant_request",
        request_id=metrics.request_id,
        task=metrics.task_type,
        model=metrics.model,
        input_tokens=metrics.input_tokens,
        output_tokens=metrics.output_tokens,
        latency_ms=metrics.latency_ms,
        cache_hit=metrics.cache_hit,
        cost=metrics.cost_usd,
        success=metrics.success,
    )

# Track aggregate metrics
# - Requests per minute
# - Cache hit rate
# - P50/P95/P99 latency
# - Cost per user
# - Success rate

Conclusion

Building a production coding assistant is 20% LLM calls and 80% everything else: context retrieval, code analysis, validation, caching, and monitoring. The LLM is a commodity—the value is in the system around it.

Start with strong RAG (code-aware chunking, hybrid search), validate generated code (tests, type checking, security), and optimize costs (caching, model selection). Test extensively with real codebases.

The best coding assistants feel invisible—they understand context, generate correct code, and integrate seamlessly into developer workflow. That requires careful engineering at every layer.

Further Resources:

GitHub Copilot Architecture - How Copilot works
Cursor - Leading AI IDE
Sourcegraph Cody - Code AI assistant
CodeBERT - Code understanding model
Tree-sitter - Universal code parser
Language Server Protocol - IDE features as a protocol
E2B - Code execution sandbox
Continue.dev - Open source coding assistant

Updated May 2025 — practical implementation notes for production AI coding assistants.

Hono: Fast Web Framework for Edge

2025-04-15T00:00:00+02:00

Hono (Japanese for “flame”) might be the best web framework you’ve never heard of. Created by Yusuke Wada, it’s optimized for edge runtimes—Cloudflare Workers, Deno Deploy, and Bun—with a bundle size under 14KB and performance that rivals or beats anything else out there.

I discovered Hono when rebuilding an Express API for Cloudflare Workers. Express doesn’t run on edge runtimes (relies on Node.js APIs), and most “edge-compatible” frameworks felt like compromises. Hono felt like Express’s fast, modern cousin—familiar API, zero Node.js dependencies, blazing fast.

The killer feature? Write once, deploy anywhere. The same Hono code runs on Cloudflare Workers, Deno, Bun, Node.js, and even Fastly Compute. No vendor lock-in, no runtime-specific APIs to learn.

Why Hono Stands Out

After using Express for years, then trying various edge frameworks, here’s what makes Hono special:

Tiny bundle size - Just ~14KB (Express is ~200KB+). This matters on edge where cold starts depend on bundle size. Smaller = faster.

RegExp router - Hono’s router uses a smart RegExp-based approach that’s faster than traditional trie-based routers for most workloads. In Web Framework Benchmarks, Hono consistently ranks in the top tier.

Edge-native - Built from day one for edge runtimes. No Node.js APIs, no compatibility layers, just clean Web Standards (Request/Response).

TypeScript-first - Type inference is excellent. Route handlers know their parameter types, validators provide type safety, middleware is fully typed.

Zero dependencies - Seriously. Check the package.json—no runtime dependencies. This reduces supply chain risk and keeps bundles small.

Getting Started

Install Hono (works everywhere - Node, Deno, Bun, Cloudflare):

npm install hono
# or
deno add @hono/hono
# or
bun add hono

Basic app - if you know Express, this feels familiar:

import { Hono } from 'hono';

const app = new Hono();

// Simple text response
app.get('/', (c) => {
    return c.text('Hello, Hono!');
});

// JSON response
app.get('/api/users', async (c) => {
    const users = await db.query('SELECT * FROM users');
    return c.json(users);
});

// URL parameters with TypeScript inference
app.get('/api/users/:id', (c) => {
    const id = c.req.param('id');  // TypeScript knows this is a string
    return c.json({ id, name: `User ${id}` });
});

// POST with JSON body
app.post('/api/users', async (c) => {
    const body = await c.req.json();
    // Validate and create user
    return c.json({ success: true }, 201);
});

export default app;

Deploy to Cloudflare Workers:

// wrangler.toml
name = "hono-api"
main = "src/index.ts"
compatibility_date = "2025-04-15"

// src/index.ts
import { Hono } from 'hono';

const app = new Hono();
// ... your routes ...

export default app;

Deploy with: wrangler deploy

For Deno: deno serve src/index.ts For Bun: bun src/index.ts For Node: Use @hono/node-server

Middleware Ecosystem

Hono has 40+ built-in middleware covering common use cases:

import { Hono } from 'hono';
import { cors } from 'hono/cors';
import { logger } from 'hono/logger';
import { jwt } from 'hono/jwt';
import { etag } from 'hono/etag';
import { prettyJSON } from 'hono/pretty-json';

const app = new Hono();

// Logger - tracks requests in console
app.use('*', logger());

// CORS - configure allowed origins
app.use('/api/*', cors({
    origin: ['https://example.com', 'https://app.example.com'],
    allowMethods: ['GET', 'POST', 'PUT', 'DELETE'],
    credentials: true,
}));

// JWT authentication
app.use('/api/protected/*', jwt({
    secret: 'your-secret-key',
}));

// ETag for caching
app.use('*', etag());

// Pretty-print JSON in development
app.use('*', prettyJSON());

// Protected route
app.get('/api/protected/profile', (c) => {
    const payload = c.get('jwtPayload');  // TypeScript knows this exists
    return c.json({ user: payload });
});

Popular middleware:

CORS - Cross-origin resource sharing
Logger - Request logging
JWT - JSON Web Token auth
Bearer Auth - Token authentication
Basic Auth - HTTP basic auth
Rate Limiter - Protect from abuse
Zod Validator - Schema validation

Custom middleware is straightforward:

// Timing middleware
const timing = async (c, next) => {
    const start = Date.now();
    await next();
    const ms = Date.now() - start;
    c.res.headers.set('X-Response-Time', `${ms}ms`);
};

app.use('*', timing);

Runtime-Specific Features

Hono adapts to each runtime’s capabilities through a clean adapter pattern:

Cloudflare Workers

Access Workers-specific bindings (KV, R2, Durable Objects, D1):

import { Hono } from 'hono';

type Bindings = {
    DB: D1Database;
    BUCKET: R2Bucket;
    CACHE: KVNamespace;
};

const app = new Hono<{ Bindings: Bindings }>();

app.get('/api/posts', async (c) => {
    // Access D1 database
    const posts = await c.env.DB
        .prepare('SELECT * FROM posts ORDER BY created_at DESC LIMIT 10')
        .all();
    
    return c.json(posts.results);
});

app.get('/api/images/:id', async (c) => {
    const id = c.req.param('id');
    
    // Access R2 object storage
    const object = await c.env.BUCKET.get(`images/${id}`);
    if (!object) return c.notFound();
    
    return new Response(object.body, {
        headers: {
            'Content-Type': object.httpMetadata?.contentType || 'image/jpeg',
        },
    });
});

export default app;

Deno Deploy

Use Deno KV and native APIs:

import { Hono } from 'hono';

const app = new Hono();
const kv = await Deno.openKv();

app.get('/api/counter', async (c) => {
    const result = await kv.get(['counter']);
    const count = (result.value as number) || 0;
    await kv.set(['counter'], count + 1);
    
    return c.json({ count: count + 1 });
});

Deno.serve(app.fetch);

Bun

Leverage Bun’s performance:

import { Hono } from 'hono';

const app = new Hono();

app.get('/', (c) => c.text('Running on Bun!'));

export default {
    port: 3000,
    fetch: app.fetch,
};

Run with: bun run index.ts

Best Practices

Use for edge - Cloudflare Workers
Leverage middleware - Reusable logic
Type safety - TypeScript
Optimize - Small bundles
Test - Unit and integration
Monitor - Track performance
Document - Clear APIs
Stay updated - New features

Production considerations

Performance characteristics

Hono is optimized for edge runtimes:

Bundle size: ~14KB (compared to Express ~200KB).
Cold starts: Minimal overhead; fast initialization.
Request handling: Handles thousands of requests per second.

Edge runtime compatibility

Hono works across multiple runtimes:

Cloudflare Workers: Full support with D1, R2, Durable Objects.
Deno: Native support; works with Deno Deploy.
Bun: Fast execution on Bun runtime.
Node.js: Also works, though optimized for edge.

Type Safety with Zod

Hono’s Zod validator provides runtime validation and TypeScript types:

import { Hono } from 'hono';
import { zValidator } from '@hono/zod-validator';
import { z } from 'zod';

const app = new Hono();

// Define schema
const userSchema = z.object({
    name: z.string().min(1).max(100),
    email: z.string().email(),
    age: z.number().int().positive().optional(),
});

// Type-safe route handler
app.post('/api/users',
    zValidator('json', userSchema),
    async (c) => {
        // TypeScript knows the exact shape of validated data
        const user = c.req.valid('json');
        // user.name: string
        // user.email: string  
        // user.age: number | undefined
        
        // Save to database
        const result = await db.insert('users', user);
        
        return c.json({ id: result.id, ...user }, 201);
    }
);

// Query parameter validation
const querySchema = z.object({
    page: z.string().regex(/^\d+$/).transform(Number).default('1'),
    limit: z.string().regex(/^\d+$/).transform(Number).default('10'),
});

app.get('/api/users',
    zValidator('query', querySchema),
    async (c) => {
        const { page, limit } = c.req.valid('query');
        // page and limit are numbers with defaults
        
        const offset = (page - 1) * limit;
        const users = await db.query(
            'SELECT * FROM users LIMIT ? OFFSET ?',
            [limit, offset]
        );
        
        return c.json(users);
    }
);

Invalid requests return 400 with clear error messages automatically. The combination of runtime validation and TypeScript types catches bugs before they reach production.

Middleware ecosystem

Hono has a rich middleware ecosystem:

CORS: Built-in CORS support.
Logger: Request logging middleware.
JWT: Authentication middleware.
Rate limiting: Protect endpoints from abuse.

Deployment

Deploy to Cloudflare Workers:

# Install Wrangler
npm install -g wrangler

# Deploy
wrangler deploy

Or use Deno Deploy:

deno deploy --project=my-project src/index.ts

Hono vs Express vs Fastify

Real-world comparison from my experience migrating projects:

Feature	Hono	Express	Fastify
Bundle size	14KB	209KB	645KB
Runtime	Edge + Node	Node only	Node only
Cold start	<5ms	50-100ms	30-60ms
TypeScript	Excellent	Good (types separate)	Very Good
Middleware	40+ built-in	Huge ecosystem	Large ecosystem
Learning curve	Low (Express-like)	Low	Medium
Performance	Top tier	Mid	High
Community	Growing fast	Massive	Large

When to choose Hono:

✅ Deploying to edge (Cloudflare, Deno, Bun)
✅ Need minimal bundle size
✅ Want TypeScript-first DX
✅ Building new projects

Stick with Express if:

You have heavy Node.js dependencies (fs, crypto, etc.)
Your team is deeply invested in the Express ecosystem
You’re maintaining legacy apps (migration cost > benefit)

Performance numbers (from Web Framework Benchmarks on Bun):

Hono: ~100,000 req/sec
Express: ~16,000 req/sec
Fastify: ~80,000 req/sec

In production on Cloudflare Workers, I’ve seen Hono apps handle 1000+ req/sec per region with p99 latency under 30ms.

Testing, Observability and Performance

Local testing: run Hono apps with your target runtime (Deno/Bun/Node) and use integration tests to validate middleware and edge-specific bindings.
Metrics: expose request latency histograms, error counts, and cold-start metrics. Integrate with provider or external monitoring (Prometheus/Grafana via exporters or synthetic checks).
Profiling: measure CPU and memory in the target runtime; avoid heavy synchronous CPU work on the event loop and prefer streaming for large payloads.
Security: validate and sanitize inputs, enforce request size limits, and apply rate limiting at the edge to protect downstream systems.

Edge-specific tips

Keep handlers minimal and offload heavy compute to background workers or server-side compute.
Minimize dependencies to keep bundle size small; prefer tree-shakable utilities.
Use streaming responses for large files to reduce memory overhead.

Conclusion

Hono represents the future of web frameworks—edge-native, runtime-agnostic, and fast by default. After building multiple production apps with it, I’m convinced it’s the right choice for new edge projects.

The developer experience is excellent. Familiar Express-like API, outstanding TypeScript support, and comprehensive middleware. The fact that the same code runs on Cloudflare Workers, Deno, Bun, and Node.js is incredible—true write-once-deploy-anywhere.

Performance speaks for itself. 14KB bundle, sub-5ms cold starts, and top-tier request throughput. When every millisecond matters (and on the edge, they all do), Hono delivers.

The ecosystem is maturing rapidly. Yusuke and contributors ship improvements weekly, the Discord community is active and helpful, and adoption is accelerating. Companies are moving production workloads to Hono.

If you’re building APIs for edge runtimes, give Hono serious consideration. It might just be the best framework you’ve never heard of—until now.

Further Resources:

Hono Documentation - Excellent docs with examples
GitHub Repository - Source code and issues
Hono Examples - Sample applications
Discord Community - Get help and discuss
Cloudflare Workers + Hono Tutorial - Official guide
Yusuke Wada’s Blog - Creator’s insights
Web Framework Benchmarks - Performance data

Hono fast edge framework from April 2025 — updated with production guidance.

AI Agents: Architecture and Design Patterns

2025-03-15T00:00:00+01:00

AI agents are fundamentally different from traditional LLM applications. Instead of one-shot prompts, agents reason iteratively, use tools, maintain state, and work toward goals autonomously. Building them requires rethinking your architecture from the ground up.

I built my first agent in early 2023—a simple ReAct loop that could search the web and answer questions. It worked, barely, with lots of error handling and retry logic. Fast forward to today, and the patterns have crystallized. Libraries like LangGraph, AutoGen, and CrewAI encode best practices, but understanding the fundamentals is crucial.

This post covers the core patterns that work in production—not the latest research paper, but battle-tested architectures running in real systems.

The ReAct Pattern: Reason + Act

ReAct (Reasoning and Acting) is the foundational agent pattern. The agent alternates between thinking (reasoning about what to do) and acting (using tools to gather information or perform actions).

The loop:

Thought: Reason about the current state and what action to take
Action: Execute a tool with specific parameters
Observation: Receive the tool’s output
Repeat: Continue until the goal is achieved

Here’s a minimal implementation:

from anthropic import Anthropic
import json

class ReActAgent:
    def __init__(self, tools):
        self.client = Anthropic(api_key="your-key")
        self.tools = tools  # Dict of tool_name -> callable
        self.history = []
        
    def run(self, task: str, max_iterations: int = 10):
        """Execute task using ReAct loop."""
        self.history = [{
            "role": "user",
            "content": task
        }]
        
        for i in range(max_iterations):
            # Think: Ask LLM what to do next
            response = self.client.messages.create(
                model="claude-3-5-sonnet-20241022",
                max_tokens=1024,
                system=self._system_prompt(),
                messages=self.history
            )
            
            assistant_msg = response.content[0].text
            self.history.append({
                "role": "assistant", 
                "content": assistant_msg
            })
            
            # Parse action from response
            action = self._parse_action(assistant_msg)
            
            if action["type"] == "final_answer":
                return action["content"]
            
            # Act: Execute tool
            if action["type"] == "tool_use":
                tool_name = action["tool"]
                tool_args = action["args"]
                
                try:
                    result = self.tools[tool_name](**tool_args)
                    observation = f"Tool result: {result}"
                except Exception as e:
                    observation = f"Tool error: {str(e)}"
                
                # Observe: Add result to history
                self.history.append({
                    "role": "user",
                    "content": observation
                })
        
        raise TimeoutError("Agent exceeded max iterations")
    
    def _system_prompt(self):
        tool_descriptions = "\n".join([
            f"- {name}: {func.__doc__}" 
            for name, func in self.tools.items()
        ])
        
        return f"""You are a helpful assistant that can use tools.

Available tools:
{tool_descriptions}

You should:
1. Think step-by-step about the task
2. Use tools when needed to gather information
3. Return a final answer when ready

Format your responses as:
Thought: 
Action: ()
OR
Thought: 
Final Answer: """
    
    def _parse_action(self, text: str):
        """Parse action from LLM response."""
        if "Final Answer:" in text:
            answer = text.split("Final Answer:")[-1].strip()
            return {"type": "final_answer", "content": answer}
        
        if "Action:" in text:
            action_line = text.split("Action:")[-1].strip()
            # Parse tool call (simplified)
            # In production, use proper parsing or structured output
            return {"type": "tool_use", "tool": "...", "args": {}}
        
        return {"type": "continue"}

# Example tools
def search_web(query: str) -> str:
    """Search the web for information."""
    # In production: use actual search API
    return f"Search results for: {query}"

def calculate(expression: str) -> float:
    """Evaluate a mathematical expression."""
    return eval(expression)  # Don't use eval in production!

# Run agent
agent = ReActAgent({
    "search_web": search_web,
    "calculate": calculate,
})

result = agent.run("What is the population of Tokyo times 2?")
print(result)

Key insight: The LLM decides which tool to use and when. You provide capabilities; the model chains them together.

Modern implementations use function calling (OpenAI) or tool use (Anthropic) for more reliable tool invocation. See Anthropic’s tool use guide for production patterns.

Tool Use: Extending Agent Capabilities

Tools are how agents interact with the world. A tool is any function the agent can call—search APIs, databases, code executors, file systems, external APIs.

Defining Tools

Use structured schemas so the LLM knows how to call them:

tools = [
    {
        "name": "search_web",
        "description": "Search the web for current information. Use this when you need up-to-date data or facts not in your training.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query, be specific"
                }
            },
            "required": ["query"]
        }
    },
    {
        "name": "execute_python",
        "description": "Execute Python code in a sandboxed environment. Use for calculations, data processing, or algorithmic tasks.",
        "input_schema": {
            "type": "object",
            "properties": {
                "code": {
                    "type": "string",
                    "description": "Python code to execute"
                }
            },
            "required": ["code"]
        }
    },
    {
        "name": "read_file",
        "description": "Read contents of a file from the filesystem.",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "File path to read"
                }
            },
            "required": ["path"]
        }
    }
]

# Use with Anthropic's tool use API
response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's 15% of 482?"}]
)

# Claude returns structured tool use
if response.stop_reason == "tool_use":
    tool_use = response.content[-1]
    print(f"Tool: {tool_use.name}")
    print(f"Input: {tool_use.input}")

Tool Safety

Tools can have side effects. Implement safeguards:

class SafeToolExecutor:
    def __init__(self):
        self.read_only_tools = {"search_web", "read_file", "list_directory"}
        self.write_tools = {"write_file", "execute_code", "send_email"}
        
    def execute(self, tool_name: str, args: dict, require_confirmation: bool = True):
        """Execute tool with safety checks."""
        
        # Check if tool requires confirmation
        if tool_name in self.write_tools and require_confirmation:
            print(f"⚠️  Agent wants to use {tool_name} with args: {args}")
            confirm = input("Allow? (y/n): ")
            if confirm.lower() != 'y':
                return {"error": "User denied permission"}
        
        # Validate arguments
        if tool_name == "execute_python":
            # Check for dangerous imports
            if any(danger in args['code'] for danger in ['os', 'subprocess', 'eval']):
                return {"error": "Dangerous code detected"}
        
        # Rate limit
        if not self._check_rate_limit(tool_name):
            return {"error": "Rate limit exceeded"}
        
        # Execute
        try:
            result = self._execute_tool(tool_name, args)
            self._log_execution(tool_name, args, result)
            return result
        except Exception as e:
            return {"error": str(e)}
    
    def _check_rate_limit(self, tool_name: str) -> bool:
        # Implement rate limiting logic
        return True

Popular tool libraries:

LangChain Tools - 100+ pre-built tools
E2B - Secure code execution sandbox
Browserbase - Browser automation for agents

Memory Management

Agents need memory to maintain context across interactions. There are two types:

Short-Term Memory (Conversation History)

Simply the message history passed to the LLM:

class ConversationMemory:
    def __init__(self, max_tokens: int = 8000):
        self.messages = []
        self.max_tokens = max_tokens
        
    def add(self, role: str, content: str):
        """Add message and trim if needed."""
        self.messages.append({"role": role, "content": content})
        
        # Rough token estimation
        total_tokens = sum(len(m['content']) // 4 for m in self.messages)
        
        # Trim old messages if over limit (keep system message)
        while total_tokens > self.max_tokens and len(self.messages) > 2:
            removed = self.messages.pop(1)  # Keep index 0 (system)
            total_tokens -= len(removed['content']) // 4
    
    def get_recent(self, n: int = 10):
        """Get recent messages."""
        return self.messages[-n:]
    
    def summarize_and_compress(self, llm):
        """Summarize old messages to save tokens."""
        if len(self.messages) < 10:
            return
        
        # Summarize messages 1-5
        old_messages = self.messages[1:6]
        summary_prompt = f"Summarize this conversation:\n{old_messages}"
        summary = llm.complete(summary_prompt)
        
        # Replace with summary
        self.messages = [
            self.messages[0],  # System message
            {"role": "assistant", "content": f"Summary of earlier conversation: {summary}"},
            *self.messages[6:]  # Recent messages
        ]

Long-Term Memory (External Storage)

For facts that persist across sessions, use a vector database:

import pinecone
from openai import OpenAI

class LongTermMemory:
    def __init__(self):
        self.pc = pinecone.Pinecone(api_key="your-key")
        self.index = self.pc.Index("agent-memory")
        self.client = OpenAI()
        
    def remember(self, fact: str, metadata: dict = None):
        """Store a fact in long-term memory."""
        embedding = self.client.embeddings.create(
            model="text-embedding-3-small",
            input=fact
        ).data[0].embedding
        
        self.index.upsert([{
            "id": str(uuid.uuid4()),
            "values": embedding,
            "metadata": {"text": fact, **(metadata or {})}
        }])
    
    def recall(self, query: str, top_k: int = 5):
        """Retrieve relevant memories."""
        query_embedding = self.client.embeddings.create(
            model="text-embedding-3-small",
            input=query
        ).data[0].embedding
        
        results = self.index.query(
            vector=query_embedding,
            top_k=top_k,
            include_metadata=True
        )
        
        return [match.metadata['text'] for match in results.matches]

# Usage
memory = LongTermMemory()

# Agent learns user preferences
memory.remember("User prefers concise answers", {"type": "preference"})
memory.remember("User's company is called Acme Corp", {"type": "fact"})

# Later, recall relevant info
context = memory.recall("What does the user prefer?")
# Returns: ["User prefers concise answers", ...]

Memory strategies:

Buffer Memory: Keep last N messages (simple, works for short conversations)
Summarization: Periodically compress old messages into summaries
Entity Memory: Track specific entities (people, places, facts)
Vector Memory: Semantic retrieval of relevant past interactions

See LangChain Memory docs for more patterns.

Production Considerations

After running agents in production serving thousands of requests:

1. Implement Guardrails

Agents can go off the rails. Add safety checks:

class AgentGuardrails:
    def __init__(self, max_iterations=10, max_cost=1.0):
        self.max_iterations = max_iterations
        self.max_cost = max_cost  # USD
        self.current_cost = 0
        
    def check_iteration(self, iteration: int):
        """Prevent infinite loops."""
        if iteration >= self.max_iterations:
            raise AgentError(f"Exceeded max iterations: {self.max_iterations}")
    
    def check_cost(self, tokens_used: int, cost_per_1k: float):
        """Prevent runaway costs."""
        cost = (tokens_used / 1000) * cost_per_1k
        self.current_cost += cost
        
        if self.current_cost > self.max_cost:
            raise AgentError(f"Exceeded budget: ${self.current_cost:.2f}")
    
    def check_tool_call(self, tool_name: str, args: dict):
        """Validate tool calls."""
        # Check for suspicious patterns
        if tool_name == "execute_code":
            dangerous = ["import os", "eval(", "exec(", "subprocess"]
            if any(d in args.get('code', '') for d in dangerous):
                raise SecurityError("Dangerous code detected")

2. Observability and Logging

You need to see what agents are doing:

import structlog
from datetime import datetime

logger = structlog.get_logger()

class ObservableAgent:
    def __init__(self):
        self.trace_id = str(uuid.uuid4())
        
    def log_step(self, step_type: str, content: dict):
        """Log each agent step."""
        logger.info(
            "agent_step",
            trace_id=self.trace_id,
            timestamp=datetime.utcnow().isoformat(),
            step_type=step_type,
            **content
        )
    
    def run(self, task: str):
        self.log_step("start", {"task": task})
        
        for i in range(max_iterations):
            self.log_step("iteration", {"number": i})
            
            # Think
            thought = self.think(task)
            self.log_step("thought", {"content": thought})
            
            # Act
            action = self.act(thought)
            self.log_step("action", {
                "tool": action.tool,
                "args": action.args
            })
            
            # Observe
            result = self.execute_tool(action)
            self.log_step("observation", {
                "result": str(result)[:500]  # Truncate long results
            })
        
        self.log_step("complete", {"result": final_answer})

Use tools like LangSmith, Weights & Biases, or Helicone for agent observability.

3. Error Handling and Recovery

Agents fail. Handle it gracefully:

class ResilientAgent:
    def execute_tool_with_retry(
        self, 
        tool_name: str, 
        args: dict,
        max_retries: int = 3
    ):
        """Execute tool with exponential backoff."""
        for attempt in range(max_retries):
            try:
                return self.tools[tool_name](**args)
            except RateLimitError as e:
                if attempt == max_retries - 1:
                    raise
                wait = (2 ** attempt) + random.random()
                time.sleep(wait)
            except Exception as e:
                logger.error("tool_error", 
                    tool=tool_name, 
                    error=str(e),
                    attempt=attempt
                )
                if attempt == max_retries - 1:
                    # Return error as observation
                    return {"error": f"Tool failed: {str(e)}"}

4. Testing Agents

Testing is hard. Agents are non-deterministic. Strategies:

import pytest
from unittest.mock import Mock

def test_agent_can_search_and_answer():
    """Test agent can use search tool to answer questions."""
    
    # Mock tools with deterministic responses
    mock_search = Mock(return_value="Tokyo population: 14 million")
    
    agent = ReActAgent(tools={"search": mock_search})
    result = agent.run("What is the population of Tokyo?")
    
    # Verify tool was called
    mock_search.assert_called_once()
    
    # Verify answer contains expected info
    assert "14 million" in result or "14M" in result

def test_agent_respects_iteration_limit():
    """Test agent stops after max iterations."""
    
    agent = ReActAgent(tools={})
    
    with pytest.raises(TimeoutError):
        agent.run("Impossible task", max_iterations=3)

# Integration tests with real LLM (slow, expensive)
@pytest.mark.integration
@pytest.mark.slow
def test_agent_integration():
    """Full integration test with real LLM."""
    agent = ReActAgent(tools=real_tools)
    result = agent.run("What's 2+2?")
    assert "4" in result

5. Cost Management

LLM calls add up. Monitor and optimize:

Cache tool results: Don’t re-run expensive operations
Use cheaper models for simple tasks: GPT-4o-mini for tool selection, GPT-4o for reasoning
Implement token limits: Cap max tokens per request
Track costs in real-time: Alert on spending anomalies

class CostTracker:
    PRICING = {
        "gpt-4o": {"input": 0.0025, "output": 0.010},
        "gpt-4o-mini": {"input": 0.00015, "output": 0.0006},
        "claude-3-5-sonnet": {"input": 0.003, "output": 0.015},
    }
    
    def calculate_cost(self, model: str, input_tokens: int, output_tokens: int):
        pricing = self.PRICING[model]
        cost = (input_tokens / 1000 * pricing["input"] + 
                output_tokens / 1000 * pricing["output"])
        return cost

Conclusion

Building production AI agents requires more than chaining LLM calls. You need robust architecture: ReAct loops for reasoning, structured tool use, proper memory management, guardrails, and observability.

The ecosystem has matured significantly. LangGraph provides state machines for complex agent flows. AutoGen enables multi-agent conversations. Instructor makes structured outputs reliable. The primitives exist—use them.

Start simple: build a ReAct agent with 2-3 tools, add memory, then scale complexity. Every production agent I’ve built started as a simple loop and evolved based on real requirements.

The future is agentic. LLMs will increasingly work autonomously—browsing the web, writing code, managing infrastructure. The agents you build today are training wheels for tomorrow’s AI systems.

Further Resources:

LangChain Agents Docs - Comprehensive guide
LangGraph - Build stateful agent workflows
AutoGen - Multi-agent framework from Microsoft
Anthropic Tool Use Guide - Official patterns
OpenAI Function Calling - Structured tool use
ReAct Paper - Original research
LangSmith - Agent observability platform
Awesome LLM Agents - Curated resources

AI agents architecture from March 2025 — updated with production guidance.

SolidJS: A Reactive UI Framework

2025-02-15T00:00:00+01:00

SolidJS is what happens when you take React’s API and throw away the virtual DOM. Created by Ryan Carniato, Solid delivers React-like DX with performance that rivals vanilla JavaScript.

I discovered Solid while optimizing a React dashboard that re-rendered too often. The app was functional but sluggish—every state change triggered component re-renders up the tree. Solid’s fine-grained reactivity meant updates only touched the exact DOM nodes that needed changing. The same app in Solid felt instant.

The key insight: React’s virtual DOM is a workaround, not a feature. Solid compiles JSX to real DOM operations at build time, eliminating the reconciliation overhead entirely. Updates go straight to the DOM with surgical precision.

How Solid Differs from React

No Virtual DOM - Solid compiles JSX to efficient DOM updates. No diffing, no reconciliation.

Fine-grained reactivity - State changes update only affected DOM nodes, not entire component trees.

Real JSX - JSX in Solid maps to actual DOM elements, not function calls that return descriptions of elements.

No rules of hooks - Solid’s primitives (createSignal, createEffect) work anywhere, not just component top level.

Smaller bundle - ~7KB vs React’s ~45KB (gzipped). Every byte matters for initial load.

Check out Solid’s reactivity documentation for deep technical details.

Signals: Solid’s Reactive Primitives

Signals are Solid’s state management. Think React’s useState, but they’re getters/setters that automatically track dependencies.

Basic Component

import { createSignal } from 'solid-js';

function Counter() {
    // createSignal returns [getter, setter]
    const [count, setCount] = createSignal(0);
    
    return (
        <div>
            {/* Call count() to read value */}
            <p>Count: {count()}p>
            
            {/* Updates only the text node, not the whole component */}
            <button onClick={() => setCount(count() + 1)}>
                Increment
            button>
        div>
    );
}

Key difference from React: You call count() to read the value. This function call establishes the dependency relationship—Solid knows exactly which DOM nodes depend on which signals.

Effects: Reactive Side Effects

import { createSignal, createEffect } from 'solid-js';

function App() {
    const [name, setName] = createSignal('World');
    
    // createEffect runs immediately and re-runs when dependencies change
    createEffect(() => {
        console.log(`Hello, ${name()}!`);
        // Automatically tracks name() as dependency
    });
    
    return (
        <div>
            <input 
                type="text"
                value={name()} 
                onInput={(e) => setName(e.target.value)} 
            />
            <p>Hello, {name()}!p>
        div>
    );
}

No dependency array needed - Solid automatically tracks which signals you read. In React, you’d write:

useEffect(() => { ... }, [name])  // Manual dependency tracking

In Solid, just reading name() inside the effect establishes the dependency.

Derived State (Memos)

import { createSignal, createMemo } from 'solid-js';

function ExpensiveComputation() {
    const [count, setCount] = createSignal(0);
    const [multiplier, setMultiplier] = createSignal(2);
    
    // createMemo caches the result until dependencies change
    const result = createMemo(() => {
        console.log('Computing...');
        return count() * multiplier();
    });
    
    return (
        <div>
            <p>Result: {result()}p>  {/* Reads cached value */}
            <button onClick={() => setCount(count() + 1)}>Incrementbutton>
        div>
    );
}

Memos are like React’s useMemo, but again, no dependency array—Solid tracks automatically.

Stores: Nested Reactive State

For complex state, use Solid’s stores:

import { createStore } from 'solid-js/store';

function TodoList() {
    const [todos, setTodos] = createStore([
        { id: 1, text: 'Learn Solid', done: false },
        { id: 2, text: 'Build app', done: false }
    ]);
    
    // Update nested property - only that property reactively updates
    const toggleTodo = (id) => {
        setTodos(
            (todo) => todo.id === id,  // Find todo
            'done', (done) => !done     // Toggle done
        );
    };
    
    return (
        <ul>
            {/* For loops are reactive in Solid */}
            <For each={todos}>
                {(todo) => (
                    <li 
                        style=text-decoration
                        onClick={() => toggleTodo(todo.id)}
                    >
                        {todo.text}
                    li>
                )}
            For>
        ul>
    );
}

Stores give you granular updates—changing todos[0].done only updates that specific list item’s DOM, not the entire list.

Performance: Why Solid is Fast

Solid consistently ranks at the top of JS Framework Benchmarks, often trading places with vanilla JS.

Benchmark Results (from js-framework-benchmark)

Framework	Create 1,000 rows	Update every 10th	Remove row	Startup time
Vanilla JS	1.0x	1.0x	1.0x	1.0x
SolidJS	1.1x	1.0x	1.1x	1.1x
Svelte	1.2x	1.3x	1.2x	1.3x
Vue 3	1.3x	1.4x	1.2x	1.5x
React	1.7x	2.4x	1.3x	2.1x

(Lower is better. Solid performs within ~10% of vanilla JS)

Why the Speed?

1. No Virtual DOM overhead - React diffs two virtual trees on every update. Solid updates the DOM directly.

2. Compilation over runtime - Solid’s JSX is compiled to DOM instructions at build time:

// You write:
<h1>Hello {name()}h1>

// Solid compiles to something like:
const el = document.createElement('h1');
el.firstChild.data = 'Hello ';
createEffect(() => el.firstChild.nextSibling.data = name());

3. Granular updates - When name() changes, only that text node updates. Not the

, not the component—just the text node.

4. Component functions run once - Unlike React where components re-run on every render:

// React: This logs on every state change
function Component() {
    console.log('Rendering!');
    const [count, setCount] = useState(0);
    return <div>{count}div>;
}

// Solid: This logs once
function Component() {
    console.log('Rendering!');  // Only runs once!
    const [count, setCount] = createSignal(0);
    return <div>{count()}div>;
}

Bundle Size

Framework	Minified + Gzipped
SolidJS	7 KB
Preact	11 KB
Vue 3	34 KB
React + ReactDOM	45 KB
Angular	62 KB

For production apps, check Bundlephobia.

Migrating from React

Solid’s API is intentionally React-like to ease migration. Here are the key differences:

State

// React
const [count, setCount] = useState(0);
useEffect(() => { console.log(count); }, [count]);

// Solid
const [count, setCount] = createSignal(0);
createEffect(() => { console.log(count()); });  // No dependency array!

Key differences:

Call the getter: count() not count
No dependency arrays - automatic tracking
Effects run synchronously, not after render

Conditional Rendering

// React
{isLoggedIn && <Dashboard />}
{isLoggedIn ? <Dashboard /> : <Login />}

// Solid - use Show for conditionals
<Show when={isLoggedIn()}>
    <Dashboard />
Show>

<Show when={isLoggedIn()} fallback={<Login />}>
    <Dashboard />
Show>

Using is important—it ensures reactivity. The condition is only evaluated once, and the component mounts/unmounts efficiently.

Lists

// React
{items.map(item => <Item data={item} key={item.id} />)}

// Solid - use For
<For each={items()}>
    {(item, index) => <Item data={item} />}
For>

// Or Index for keying by index
<Index each={items()}>
    {(item, index) => <Item data={item()} />}
Index>

is optimized for minimal DOM operations. It only updates changed items, not the entire list.

Event Handlers

// React - synthetic events
onClick={(e) => handleClick(e)}

// Solid - real DOM events, no synthetic events
onClick={(e) => handleClick(e)}  // Same syntax, but it's the real DOM event

Solid uses native DOM events—no synthetic event system means less overhead.

Context

// React
const ThemeContext = createContext();
const theme = useContext(ThemeContext);

// Solid - similar but typed
const ThemeContext = createContext();
const theme = useContext(ThemeContext);

Context API is nearly identical. No surprises here.

When to Use Solid

Choose Solid when:

✅ Performance is critical (dashboards, data visualizations, games)
✅ You want React-like DX but better performance
✅ Bundle size matters (mobile, slow connections)
✅ You’re building a new project (migration cost is low)
✅ Your team knows React (learning curve is gentle)

Stick with React when:

You have a large existing React codebase (migration cost)
You need React Native (Solid is web-only)
You rely heavily on React’s ecosystem (though Solid’s is growing)

Ecosystem and Tools

UI Libraries:

Solid UI - Component library
Hope UI - Accessible components
Kobalte - Unstyled, accessible primitives

Routing:

Solid Router - Official router
Solid App Router - File-based routing with Solid Start

State Management:

Built-in stores usually sufficient
Solid Query - Data fetching

Meta-Frameworks:

Solid Start - Full-stack framework (like Next.js for Solid)
SolidHack - Starter templates

Dev Tools:

Solid DevTools - Browser extension
Built-in TypeScript support

Production Best Practices

Use TypeScript - Solid’s type inference is excellent
Leverage compilation - Let the compiler optimize
Test with Solid Testing Library - Similar to React Testing Library
Profile with browser DevTools - Solid’s updates are visible in the DOM
Use and - Not raw conditionals/maps (they’re not reactive)
Batch updates - Use batch() for multiple signal updates
Lazy load components - const Comp = lazy(() => import('./Comp'))

Conclusion

Solid proves that fine-grained reactivity beats virtual DOM for most applications. By compiling JSX to efficient DOM operations and eliminating unnecessary re-renders, Solid delivers React-like DX with near-vanilla-JS performance.

The 7KB bundle and top-tier benchmark results aren’t theoretical—they translate to faster load times and more responsive UIs. For performance-critical applications where every frame matters, Solid is the right choice.

Ryan Carniato and the Solid community have built something genuinely novel: a reactive framework that’s both powerful and minimal. If you’re starting a new project and React’s re-rendering model has ever frustrated you, give Solid a weekend. You might not go back.

Further Resources:

SolidJS Documentation - Comprehensive guides
Interactive Tutorial - Learn by doing
Solid Playground - Try it in browser
Ryan Carniato’s YouTube - Deep dives from the creator
Solid Discord - Active community
JS Framework Benchmark - Performance data
Solid vs React - Official comparison

SolidJS reactive framework from February 2025 — updated with production guidance.

Antonello Fratepietro

Agentic AI Systems: Multi-Agent Architectures

Why Multiple Agents?

Agent Roles and Patterns

1. Coordinator (Orchestrator)

2. Specialist Agents

3. Message Bus Pattern

Production Architecture

Observability and Monitoring

Testing Multi-Agent Systems

Best Practices

Conclusion

WebSocket vs SSE vs Long Polling: Choosing the Right Protocol

WebSocket: Full Duplex Communication

When to Use WebSocket

Node.js WebSocket Server

Browser Client

Production Considerations

Server-Sent Events: Unidirectional Streaming

When to Use SSE

Express SSE Server

Browser Client

Named Events

Long Polling: Request-Response Loop

When to Use Long Polling

Express Long Polling Server

Browser Client

Comparison Table

Scaling Patterns

Pub/Sub for WebSocket/SSE

Graceful Shutdown

Conclusion

Databricks for Data Engineers: Getting Started

Core Components

Notebooks: Interactive Development

Python Notebook

SQL Notebook

Widgets for Parameterization

Data Pipelines

ETL Pipeline

Best Practices

Delta Lake: ACID on Data Lakes

Writing Delta Tables

Time Travel

MERGE (Upserts)

OPTIMIZE and VACUUM

Production ETL Pipeline

Best Practices from Production

Conclusion

Container Orchestration at the Edge: New Paradigms

The Edge is Different

Lightweight Kubernetes: K3s

Install K3s

Deploy Edge Application

Offline-First Applications

Local State + Sync

Image Optimization for Edge

Multi-Stage Builds

Pre-pull Images

Multi-Cluster Management

GitOps with ArgoCD

Monitoring Distributed Edge

Prometheus Federation

Best Practices

Simulate network partition

App should continue working offline

Conclusion

Cloudflare D1: SQLite at the Edge

Why D1?

Using D1 with Workers

Create Database

Create Tables

Query from Workers

Batch Operations

Prepared Statements

Production Best Practices

1. Schema Design

2. Query Optimization

3. Migrations

4. Backups