π¦GoClaw Deep Dive π€ β A Builder's Guide to a Multi-Tenant AI Agent Platform π
Source: https://github.com/nextlevelbuilder/goclaw β a Go-based, multi-tenant AI agent gateway with 20+ LLM providers, 7 messaging channels, an 8-stage pipeline, 3-tier memory, and 5-layer security.
This document distills GoClaw's architecture into the principles, patterns, and concrete building blocks you need to build a similar platform from scratch. Read top-to-bottom for theory, jump to Part 4 β Build-It-Yourself Blueprint for a sequenced implementation plan.
Table of Contents
- π§ What GoClaw Actually Is (mental model)
- βοΈ The 11 Core Principles
- π The Agent Loop: Think β Act β Observe
- π§ The 8-Stage Pluggable Pipeline
- π€ Provider Abstraction & Resilience
- π οΈ The Tool Registry Pattern
- π§ 3-Tier Memory (L0/L1/L2)
- π’ Multi-Tenant Isolation by Default
- π‘οΈ 5-Layer Defense-in-Depth Security
- πΎ Persistence: Interface-First, Dual Backend
- π‘ Channels as Pluggable Adapters
- π€ Teams, Delegation, and Subagents
- π± Self-Evolution with Guardrails
- π Cross-Cutting Patterns
- πΊοΈ Build-It-Yourself Blueprint
- β οΈ Anti-Patterns to Avoid
- π Reference Map
Part 1 β π§ What GoClaw Actually Is
GoClaw is not a chatbot or "wrapper around OpenAI." It is an AI agent gateway β a backend service that sits between your application and LLM providers + tools + storage, and exposes a stable RPC/HTTP surface to the outside world.
[Browser / Telegram / Discord / Your SaaS Backend / CLI]
β (WebSocket RPC, HTTP REST, OpenAI-compat /v1/chat/completions)
βΌ
βββββββββββββββββββββββββββββββββββ
β GoClaw Gateway β
β Auth Β· RBAC Β· Rate-limit β
β Tenant Isolation Layer β
ββββββββββββββββββ¬βββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββ
β Agent Engine β
β Loop Β· Pipeline Β· Router β
β Tools Β· Memory Β· Skills Β· MCP β
ββββββββββββββββββ¬βββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββ
β PostgreSQL Β· Redis Β· Files β
β (sessions Β· agents Β· memory Β· β
β traces Β· KG Β· vault Β· keys) β
βββββββββββββββββββββββββββββββββββ
β
βΌ
20+ LLM Providers (Anthropic, OpenAI, Gemini, β¦)
Three sentences that capture the design
- Agents are configurations, not code β defined by rows in a DB plus a few markdown bootstrap files (
SOUL.md,IDENTITY.md,AGENTS.md,TOOLS.md). - Everything is multi-tenant from day one β every table carries
tenant_id, every query enforces it, and tenant scope flows throughcontext.Context. - Every concern is an interface with at least one implementation β providers, stores, channels, tools, all behind small interfaces so they can be swapped or mocked.
Part 2 β βοΈ The 11 Core Principles
2.1 π The Agent Loop: Think β Act β Observe
The fundamental shape of any agent is a loop. GoClaw caps it at 20 iterations by default and structures each iteration as three actions:
loop (β€ 20 times):
THINK β Build prompt β call LLM β get response (text + tool calls?)
if no tool calls: BREAK
ACT β Execute tool calls (parallel if multiple)
OBSERVE β Append tool results back into the message history
finalize β sanitize output, persist messages, emit completion event
Key implementation details:
| Detail | Value | Why |
|---|---|---|
| Max iterations | 20 | Prevents runaway loops; configurable per-agent and per-request |
| Parallel tools | goroutines + result sort by index | Latency win when LLM calls 3+ tools at once |
| Single tool | sequential | Goroutine overhead isn't worth it |
| Mid-loop compaction | trigger at 75% of context window | Summarize first ~70% of history in-place to avoid overflow |
| Cancel handling | context.Background() fallback for trace finalize |
Ensures the trace record always saves even on /stop |
Build it yourself: start here. A loop with one provider, one tool (echo), one in-memory session store, and a for i := 0; i < 20; i++ is a 200-line program that already works.
2.2 π§ The 8-Stage Pluggable Pipeline
The V3 architecture turns the monolithic loop into 8 independent stages. Each stage is a Stage interface implementation that mutates a shared RunState.
Setup (once)
ββ ContextStage Inject ctx (agentID, userID, locale), resolve workspace,
ensure per-user files exist, persist IDs on session.
Iteration loop (β€ 20)
ββ ThinkStage Build system prompt (15+ sections), filter tools via policy,
β call LLM, record span, emit `chunk` events.
ββ PruneStage If context > 25%: soft-trim oversized tool results.
β If > 50%: hard-clear. Run sanitizeHistory after.
ββ ToolStage Execute tool calls (parallel for multi-call).
β Emit `tool.call` / `tool.result`.
ββ ObserveStage Append tool results to message buffer.
β Handle `NO_REPLY` convention (silent completion).
ββ CheckpointStage Increment iteration. Break on max-iters or ctx cancel.
Finalize (once)
ββ FinalizeStage 7-step output sanitization, atomic message flush,
update session metadata, emit `run.completed`.
Why this matters:
- Each stage is testable in isolation (
stages_test.goper stage). - New behavior (e.g. a
RagStage) is one file β no surgery on a 2k-linerunLoop(). - Both V2 (monolithic) and V3 (pipeline) can coexist behind a feature flag.
Stage interface (sketch):
type Stage interface {
Name() string
Run(ctx context.Context, state *RunState) (StageResult, error)
}
type StageResult int
const (
Continue StageResult = iota // proceed to next stage
BreakLoop // exit iteration loop
AbortRun // abort the entire run
)
Lesson: Pluggable pipelines beat monolithic loops once the loop has more than ~3 conditional branches. Pay the abstraction cost early.
2.3 π€ Provider Abstraction & Resilience
A Provider is a tiny interface. Everything that's hard about LLMs lives inside this seam.
type Provider interface {
Name() string
DefaultModel() string
Chat(ctx context.Context, req ChatRequest) (ChatResponse, error)
ChatStream(ctx context.Context, req ChatRequest, onChunk func(Chunk)) (ChatResponse, error)
}
Every backend β Anthropic native HTTP+SSE, OpenAI-compatible (Groq, DeepSeek, Gemini, Mistral via the same wire format), Claude CLI subprocess, ACP JSON-RPC, DashScope wrapper β implements this interface. The agent loop never knows which one it's talking to.
Resilience layers wrapped around providers:
| Layer | Purpose |
|---|---|
| Retry | Exponential backoff with jitter; honors Retry-After; retries 5xx + network errors only (not 4xx) |
| Cooldown | Per-model cooldown timer after repeated failures β skip the model for N seconds |
| Failover | 2-tier: rotate API profiles, then degrade to a fallback model |
| Cache | Composable middleware β caches identical prompts within a TTL |
| Service tier | Middleware that picks priority/flex/auto tier per request |
| Error classify | Map raw provider errors to 9 canonical reasons (rate-limit, context-overflow, auth, etc.) |
Wire-format quirks live in the adapter, not the loop. Examples:
- Anthropic uses
x-api-key; OpenAI-compat usesBearer; Codex uses OAuth + token refresh. - Claude CLI is a subprocess speaking stdio; ACP is JSON-RPC 2.0 over stdio.
- DashScope wraps Qwen with a custom thinking-budget mapping.
Lesson: When you support N providers, the spread of behaviors is enormous. Force every quirk through one interface and you keep the agent loop boringly simple.
2.4 π οΈ The Tool Registry Pattern
Tools are the agent's hands. Every tool call goes through one place: Registry.ExecuteWithContext. The registry mediates every invocation.
Agent Loop
β ExecuteWithContext(name, args, channel, chatID, ...)
βΌ
[Registry]
1. Inject per-call context (channel, chatID, peerKind, sandbox key, workspace)
2. Rate-limit check (token bucket per session key)
3. Policy check (RBAC: is this tool allowed for this agent?)
4. Execute the Tool.Execute(ctx, args)
5. Scrub credentials from output (regex + dynamic registered values)
6. Return Result{ ForLLM, ForUser, IsError, MediaRefs, ... }
Tool capabilities (metadata that drives policy):
| Capability | Examples |
|---|---|
read-only |
read_file, web_search, memory_search β safe to retry |
mutating |
write_file, exec, cron, team_tasks |
async |
spawn β returns immediately, result delivered later |
mcp-bridged |
Anything proxied to an external MCP server |
The Policy Engine filters tools through 7 layers before sending the list to the LLM:
- Global profile (
full/coding/messaging/minimal) - Provider profile override
- Global allow list
- Provider allow override
- Agent allow
- Agent + provider allow
- Group allow β then deny lists β then
AlsoAllow(additive) β then subagent deny β final list
The 4-tier config overlay (most specific wins):
- Per-agent override (
agents.builtin_tool_settings) - Per-tenant override (
builtin_tool_tenant_configs) - Global default (
builtin_tools.settings) - Hardcoded fallback (in tool code)
Built-in tool inventory (the floor you should aim for):
| Group | Tools |
|---|---|
fs |
read_file, write_file, list_files, edit, send_file |
runtime |
exec (with credentialed CLI mode for secret injection) |
web |
web_search, web_fetch (with allow/block domains) |
memory |
memory_search, memory_get, memory_expand |
sessions |
sessions_list, sessions_history, sessions_send, spawn |
automation |
cron, datetime, heartbeat |
messaging |
message, create_forum_topic, list_group_members |
team |
team_tasks (create/list/claim/complete/comment/attach/...) |
media-gen |
create_image, create_audio, create_video, tts |
media-read |
read_image, read_audio, read_document, read_video |
knowledge |
vault_search, vault_read, knowledge_graph_search, skill_search |
Custom tools are shell commands with Go-template placeholders, stored in custom_tools table. Hot-reloaded via pub/sub on change. Supports encrypted env vars for credentials.
Virtual filesystem interceptors route specific paths to the database, not disk:
ContextFileInterceptorβ routesSOUL.md,IDENTITY.md, etc. toagent_context_files/user_context_files.MemoryInterceptorβ routesMEMORY.md,memory/*tomemory_documents. Writing a.mdtriggers chunking + embedding automatically.
Path security: every filesystem op runs through resolvePath() which filepath.Clean()s and verifies the result starts with the workspace prefix. Blocks path traversal.
Lesson: the tool registry is where security lives. If every tool call doesn't go through one chokepoint, you have no place to enforce rate-limit / RBAC / scrubbing.
2.5 π§ 3-Tier Memory (L0/L1/L2)
GoClaw treats memory as a progressive loading problem: cheap context first, expensive context only when asked.
L0 β Working Memory L1 β Episodic L2 β Semantic
ββββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β Current session β β Session summariesβ β Knowledge Graph β
β messages β ββββββββΊ β + L0 abstracts β βββββββΊ β entities + β
β (auto-injected β β (~50 tokens) β β relations β
β if relevant) β β + embeddings β β + temporal β
β Threshold-based β β 90-day retention β β validity β
β compaction β β Hybrid search β β (valid_from/to) β
ββββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β² β² β²
auto-inject memory_search memory_expand
(ContextStage) (tool, top-K) (tool, full doc)
The progressive flow:
- L0 auto-injection β On every turn,
ContextStagerunsAutoInjectorwhich scores the user message against episodic summaries + KG entities. If relevance β₯ 0.3, inject up to 5 entries / 200 tokens at the top of the system prompt. Free for the agent β no tool call. - L1 unified search β When the agent calls
memory_search(query), it runs hybrid search (BM25 + vector) across both episodic L0 abstracts and KG entities. Returns top K within score threshold. - L2 deep retrieval β When the agent calls
memory_expand(episodic_id), it loads the full summary plus linked KG edges.
Hybrid search formula:
combined_score = vector_score * 0.7 + fts_score * 0.3
β
FTS: PostgreSQL tsvector + β
plainto_tsquery('simple') β
Vector: pgvector with <=> β
cosine distance β
Per-user boost: 1.2x β
Dedup: per-user wins over globalβ
Event-driven consolidation (the magic that fills L1/L2 over time):
run.completed event
β
βΌ
EpisodicWorker β extract summary + L0 abstract via LLM
β
β episodic.created event
βΌ
SemanticWorker β extract entities/relations from summary, write to KG
β
β entity.upserted event
βΌ
DedupWorker β embedding-similarity merge, redirect relations
(separately, debounced 10m)
DreamingWorker β batch unpromoted summaries scored by:
0.30 * frequency + 0.35 * relevance +
0.20 * recency + 0.15 * freshness (14-day half-life)
β LLM synthesis β write to long-term memory / vault
Two compaction strategies for L0:
| When | Trigger | Strategy |
|---|---|---|
| Mid-loop | prompt_tokens >= 75% of context window during iteration |
Summarize first ~70% of in-memory messages, keep last ~30% |
| Post-run | > 50 messages OR > 75% context window after run |
Per-session try-lock β memory flush β background summarize β save summary + truncate to last 4 messages |
Lesson: Memory is not a single tier. Treat it as a hierarchy with cost gradients (free auto-inject β tool call for L1 β tool call for L2). Use embeddings + FTS together, not either-or.
2.6 π’ Multi-Tenant Isolation by Default
This is the single most consequential design decision β and the one most projects skip until it's painful.
Three rules, never broken:
- Every isolatable table has
tenant_id NOT NULL. 40+ tables in GoClaw enforce this. - Every query includes
WHERE tenant_id = $N. No exceptions. Fail-closed. - Tenant flows through
context.Context. Resolved at the gateway, propagated everywhere, never taken from client headers (which can be spoofed).
Tenant resolution at the gateway:
| Credential | How tenant is resolved |
|---|---|
| Tenant-bound API key | Auto from api_keys.tenant_id (the recommended path) |
System-level API key + X-GoClaw-Tenant-Id header |
From header (UUID or slug); only system keys can do this |
| Gateway token + owner user ID | All tenants (cross-tenant admin) |
| Channel webhook (Telegram, Discord, β¦) | Baked into channel_instances.tenant_id at registration |
| No credentials | Master tenant only (dev mode) |
Per-tenant overrides β each tenant gets its own:
- LLM provider configs and API keys
- Tool settings (web_search providers, TTS voice, etc.)
- Skills enabled/disabled
- MCP servers + per-user credentials
- Channel instances
API key flow:
[Your SaaS Backend] ββ Bearer goclaw_sk_abc... ββ [GoClaw]
β
βΌ
api_keys table:
hash = SHA-256(key)
tenant_id = UUID
scopes = [...]
β
βΌ
ctx = WithTenantID(parent, tenantID)
β
βΌ
All downstream queries:
WHERE tenant_id = $N
Storage hardening:
- API keys: SHA-256 at rest, constant-time compare for validation (
crypto/subtle.ConstantTimeCompare). - Provider/MCP/custom-tool secrets: AES-256-GCM with
aes-gcm:prefix + 12-byte nonce + ciphertext + tag, base64'd. - Master scope guard: writes to global tables (
builtin_tools,config.*) requireIsMasterScope(ctx)β otherwise tenant admin only.
Identity propagation pattern: GoClaw doesn't authenticate end-users. The upstream service (your SaaS backend, your auth proxy) provides user_id, opaque, max 255 chars. The recommended convention for multi-tenant deployments is tenant.{tenantId}.user.{userId}.
Lesson: Retrofitting multi-tenancy is one of the most painful migrations in software. Make tenant_id a column on day one, even if you only have one tenant.
2.7 π‘οΈ 5-Layer Defense-in-Depth Security
Each layer is independent β even if one is bypassed, the others still protect.
Layer 1 β π Transport
- CORS allow-list validation
- WebSocket message size limit: 512 KB
- HTTP body limit:
MaxBytesReader1 MB - Timing-safe token comparison (
crypto/subtle) - Rate limiting (token bucket, per user / per IP)
- Ping/pong every 30s; read deadline 60s; write deadline 10s
Layer 2 β π Input Validation (InputGuard)
6 regex patterns scan every user message:
| Pattern | Catches |
|---|---|
ignore_instructions |
"Ignore all previous instructions" |
role_override |
"You are now a different assistant" |
system_tags |
<\|im_start\|>system, [SYSTEM] |
instruction_injection |
"New instructions:", "override:" |
null_bytes |
\x00 |
delimiter_escape |
</instructions>, "end of system" |
4 action modes: off / log / warn (default) / block.
Layer 3 β βοΈ Tool Execution
- Shell deny groups β 15 classes, all denied by default:
destructive_ops,data_exfiltration,reverse_shell,code_injection,privilege_escalation,dangerous_paths,env_injection,container_escape,crypto_mining,filter_bypass,network_recon,package_install,persistence,process_control,env_dump. Live-reloadable via pub/sub. - Path traversal prevention β
resolvePath()cleans + prefix-checks every filesystem op. - SSRF guards β
validateProviderURL()blocks127.0.0.1/localhostfor provider base URLs. - Credentialed CLI gate β when calling registered binaries (
gh,gcloud,aws,kubectl,terraform), theexectool injects encrypted env vars directly into the child process (no shell), unwrapssh -cwrappers up to depth 3 to prevent bypass, and fails-closed on DB error. - Domain allow/block β
web_fetchhonors per-tenant allow_domains / block_domains.
Layer 4 β π§Ή Output Sanitization
- Credential scrubber β static regex patterns for OpenAI, Anthropic, GitHub, AWS keys + dynamic registry of runtime values. Replaces with
[REDACTED]. Always-on. - Output sanitizer (7 steps applied to LLM output before delivery):
- Strip garbled tool XML (
<tool_call>,<minimax:tool_call>, etc. from broken models) - Strip downgraded text-format tool calls (
[Tool Call: ...]) - Strip thinking tags (
<think>,<thinking>,<antThinking>) - Strip final wrapper tags (preserve inner content)
- Strip echoed
[System Message]blocks - Collapse consecutive duplicate paragraphs (model stuttering)
- Strip leading blank lines
- Strip garbled tool XML (
Layer 5 β π Isolation
- Per-user workspace β
base + "/" + sanitize(userID), injected viaWithToolWorkspace(ctx) - Docker sandbox β read-only root, dropped capabilities, scoped per-session
- Subagent depth limit β max depth 1, max children 5/parent, max concurrent 8 system-wide
Lesson: Don't pick one security strategy. Layer them. Assume each one will fail and ask "what's the next line of defense?"
2.8 πΎ Persistence: Interface-First, Dual Backend
Every store is a Go interface. Each interface has both a PostgreSQL implementation (server) and a SQLite implementation (Lite desktop). Selected at compile time via //go:build tags.
type SessionStore interface {
GetOrCreate(ctx context.Context, key string) (*Session, error)
AddMessage(ctx context.Context, key string, msg Message) error
SetSummary(ctx context.Context, key, summary string) error
Save(ctx context.Context, key string) error
Delete(ctx context.Context, key string) error
List(ctx context.Context, opts ListOpts) ([]*Session, error)
}
// PG: writes through to PostgreSQL, in-memory write-behind cache
// SQLite: same interface, plain SQLite, no FTS5/vector
Why this matters:
- Write the agent loop once, ship a server edition (PG) and a desktop edition (SQLite + Wails app).
- Tests use mocks against the interface.
- Replace any backend without touching call sites.
The 22+ stores in the system:
| Store | What it owns |
|---|---|
| SessionStore | Conversation history (with in-memory write-behind cache) |
| AgentStore | Agent definitions, soft-delete, RBAC sharing |
| ProviderStore | LLM provider configs, encrypted keys |
| MemoryStore | Memory docs + chunks (FTS + pgvector hybrid) |
| EpisodicStore | Session summaries with embeddings + recall scoring |
| KnowledgeGraphStore | Entities + relations with temporal validity |
| VaultStore | Knowledge vault docs + bidirectional wikilinks |
| TeamStore | Teams, tasks (atomic claim), members, messages |
| CronStore | Scheduled jobs + run logs |
| TracingStore | Traces + spans (LLM, tool, agent) |
| MCPServerStore | MCP server configs + grants |
| CustomToolStore | Dynamic shell-based tools |
| ChannelInstanceStore | Channel configs (Telegram bot tokens, Discord guild IDs, β¦) |
| ConfigSecretsStore | Encrypted config values |
| BuiltinToolStore | System tool metadata + per-tenant settings |
| PendingMessageStore | Offline group-chat queue with auto-compaction |
| ContactStore | Cross-channel contact dedup + merge |
| ActivityStore | Audit log |
| SnapshotStore | Hourly usage aggregations for dashboards |
| SecureCLIStore | Credentialed binary configs (encrypted env) |
| APIKeyStore | Gateway API keys (SHA-256 hashed) |
| HookStore | Lifecycle hook definitions + execution audit |
Two power patterns from the PG layer:
-
xmaxtrick for "is this row new?"INSERT INTO user_agent_profiles (...) VALUES (...) ON CONFLICT (...) DO UPDATE SET last_seen_at = NOW() RETURNING xmax = 0 AS is_newis_new = truemeans a real INSERT happened β trigger first-time setup (seed context files).falsemeans it was an UPDATE β returning user. -
Atomic task claim (race-safe without distributed locks):
UPDATE team_tasks SET status = 'in_progress', owner_agent_id = $1 WHERE id = $2 AND status = 'pending' AND owner_agent_id IS NULL -- 1 row updated = claimed; 0 rows = someone else got it
Other PG conventions:
- No ORM.
database/sqlwithpgx/v5/stdlib. Raw SQL,$1/$2/$3positional params. - Nullable columns via Go pointers (
*string,*time.Time); helpers likenilStr()convert zero-values tonil. execMapUpdate(map[string]any)builds dynamic UPDATE statements without one-function-per-field-combo.UUID v7(time-ordered) for all primary keys viaGenNewID().- Required extensions:
pgvector+pgcrypto.
Session caching pattern (write-behind):
Read: GetOrCreate(key) β cache miss? load from DB into cache β return
Write: AddMessage / SetSummary β in-memory only (no DB write)
Save: Save(key) β snapshot under read lock β flush to DB via UPDATE
Delete: Delete(key) β remove from cache + DB
Reads of List() go straight to DB to avoid stale results.
Lesson: Define stores as interfaces from line one. You'll thank yourself when you need a desktop edition, an in-memory test, or to swap PG for CockroachDB.
2.9 π‘ Channels as Pluggable Adapters
Each external messaging platform is an adapter that converts platform-specific events to a unified InboundMessage and platform-specific replies from a unified OutboundMessage.
7 supported channels:
| Channel | Transport | DM | Group | STT | Streaming |
|---|---|---|---|---|---|
| Telegram | Long polling (telego) | β | β | β | β |
| Feishu/Lark | WebSocket / webhook | β | β | β | β |
| Discord | Gateway WebSocket | β | β | β | β |
| Slack | Socket Mode | β | β | β | β |
| Multi-device protocol | β | β | β | β | |
| Zalo OA | Webhook | β | β | β | β |
| Zalo Personal | Reverse-engineered | β | β | β | β |
4 internal channels (cli, system, subagent, browser) are silently skipped by the outbound dispatcher β they never reach an external platform.
Three DM access policies: pairing (8-character code, 60-min validity) / allowlist / open.
Session key format encodes everything you need:
agent:{agentId}:{channel}:direct:{peerId} β DM
agent:{agentId}:{channel}:group:{groupId} β Group
agent:{agentId}:subagent:{label} β Subagent
agent:{agentId}:cron:{jobId}:run:{runId} β Cron run
agent:{agentId}:main β Default/main session
This single key fully scopes session state and enables cross-channel deduplication.
Lesson: Channels look diverse but reduce to two functions: Listen() -> InboundMessage and Send(OutboundMessage) -> error. Keep the agent loop ignorant of platform specifics.
2.10 π€ Teams, Delegation, and Subagents
Three orchestration modes determine which inter-agent tools are available:
| Mode | Tools available | When |
|---|---|---|
Spawn (default) |
spawn |
No team, no delegate links |
Delegate |
spawn, delegate |
agent_links table has rows for this agent |
Team |
spawn, delegate, team_tasks |
teams table has a row for this agent |
Resolution priority: Team > Delegate > Spawn.
Subagents (parallel child agents):
| Limit | Default |
|---|---|
| Max concurrent (system-wide) | 8 |
| Max spawn depth | 1 |
| Max children per parent | 5 |
| Auto-archive after | 60 min |
| Max iterations per subagent | 20 |
Subagent actions: spawn (async), run (sync), list, cancel (id/all/last), steer (cancel + respawn with new message). Subagents share the parent's SecureCLIStore β credentialed binary gate cannot be bypassed by delegation.
Teams (collaborative multi-agent with a shared task board):
User β Team Lead (sees TEAM.md with member list + roles)
β
βΌ creates task on board
team_tasks table
β status: pending
βΌ atomic claim (SQL row lock)
Member Agent β works in their own session
β
βΌ on completion: result via message bus with "teammate:" prefix
Team Lead β synthesizes results β replies to user
Only the lead receives TEAM.md in its system prompt. Members discover context through tools (team_tasks list, list_group_members). This saves tokens on idle agents.
Task states: pending / in_progress / in_review / completed / failed / cancelled / blocked / stale.
Task dependencies via blocked_by UUID[]: completing a task auto-unblocks dependents whose blockers are all complete.
Lesson: Don't overload a single agent with everything. Start with spawn for simple parallelism. Add delegate when agents have distinct skills. Add team_tasks when you need a board (work tracking, dependencies, peer messages).
2.11 π± Self-Evolution with Guardrails
Agents adapt their behavior based on metrics β within strict bounds.
Three rules for the suggestion engine:
| Rule | Detects | Suggests |
|---|---|---|
LowRetrievalUsageRule |
memory_search / knowledge_graph_search underused |
Enable vault, adjust retrieval weights |
ToolFailureRule |
Frequently failing tools | Limit tool set or reword tool descriptions |
RepeatedToolRule |
Same tool called many times in a row (loop) | Adjust prompt to break the loop |
Adaptation guardrails (in agents.other_config.evolution_guardrails):
| Field | Default | Purpose |
|---|---|---|
max_delta_per_cycle |
0.1 | Max parameter change per cycle (no wild swings) |
min_data_points |
100 | Need β₯ N metrics before applying |
rollback_on_drop_pct |
20.0 | Auto-revert if quality drops > 20% after change |
locked_params |
[] |
Names that cannot auto-change (e.g. temperature) |
The workflow:
SuggestionEngine.Analyze()runs over a 7-day metrics window.- Generates
EvolutionSuggestionrecords withstatus="pending". - Admin reviews in dashboard, approves/rejects.
- On approval, the auto-adapt worker applies and records baseline metrics.
- Next cycle detects regression and rolls back if
rollback_on_drop_pctexceeded.
Lesson: "Self-evolving agents" without guardrails is a recipe for production incidents. Bound the change rate, require admin approval, and always keep a rollback path.
Part 3 β π Cross-Cutting Patterns
A handful of patterns repeat across every module. They're worth internalizing as habits.
Pattern A β π Context Propagation, Not Mutable State
Everything per-request flows through context.Context:
ctx = store.WithTenantID(ctx, tenantID)
ctx = store.WithUserID(ctx, userID)
ctx = store.WithAgentID(ctx, agentID)
ctx = store.WithAgentType(ctx, "predefined")
ctx = store.WithLocale(ctx, "en")
ctx = tools.WithToolChannel(ctx, "telegram")
ctx = tools.WithToolChatID(ctx, chatID)
ctx = tools.WithToolWorkspace(ctx, "/data/workspaces/u_123")
Tools and store calls read from ctx, never from globals. This is what makes per-tenant + per-user concurrent execution thread-safe without mutexes.
Pattern B β π’ Event Bus for Decoupling
Agent run completion fires run.completed on a domain event bus. Workers subscribe asynchronously:
EpisodicWorkerβ extract summarySemanticWorkerβ extract entitiesDedupWorkerβ merge duplicatesDreamingWorkerβ debounced batch synthesis
The agent loop never imports any of them. New workers just subscribe.
Pattern C β π System Prompt as 19+ Composable Sections
The system prompt is assembled at request time from these sections (build order matters):
- Identity (channel-aware)
- First-run bootstrap notice (if BOOTSTRAP.md exists)
- Persona (SOUL.md, IDENTITY.md) β early "primacy zone"
- Tooling (filtered + sandbox-aware)
- Credentialed CLI context (optional)
- Safety preamble + identity anchoring
- Self-Evolution rules (predefined agents only)
- Skills inline (β€ 15 skills) OR via
skill_searchtool - MCP tools inline OR via
mcp_tool_search - Workspace info
- Team workspace (team agents)
- Sandbox container info
- User identity / owner IDs
- Time (UTC)
- Channel formatting hints
- Extra context (
<extra_context>tags) - Project/bootstrap context files (defensive preamble)
- Sub-agent spawning rules
- Runtime info (agent ID, model, pricing)
- Persona reminder β late "recency zone" β fights "lost in the middle"
- Memory reminders (run
memory_searchfirst)
Two modes: PromptFull (main runs) and PromptMinimal (subagents, cron, memory flush β only AGENTS.md + TOOLS.md).
Two reinforcement zones (primacy + recency) are the cheapest reliability win in agent prompting.
Pattern D β π§Ή Always Sanitize, Always Trace, Always Scrub
Three callbacks that wrap every run:
- Sanitize output (7 steps) before delivery.
- Record a span for every LLM call and every tool call. Trace tree mirrors the run shape.
- Scrub credentials from every tool result via static + dynamic patterns.
Pattern E β βοΈ Atomic, Race-Safe Mutations via SQL, Not Locks
Don't reach for distributed locks. Instead:
- Atomic claim:
UPDATE β¦ WHERE status = 'pending'(row-level lock, 1 winner) - Upsert:
INSERT β¦ ON CONFLICT β¦ DO UPDATE(idempotent) - Dynamic update:
execMapUpdate(map[string]any)β no one-function-per-field-combo
Pattern F β π Per-Session Try-Lock for Long-Running Side Effects
When a run finishes and decides to compact:
if !sessionLock.TryLock(sessionKey) { return } // someone else is already compacting
defer sessionLock.Unlock(sessionKey)
runMemoryFlush()
go runSummarize(ctx, ...)
Try-lock instead of blocking lock β skip if another concurrent run is already doing it.
Pattern G β β‘ Write-Behind Cache for Hot Data
Session messages are written to memory only during a run. One Save(key) flushes to DB at the end. This collapses 10β20 individual INSERTs into 1 UPDATE.
Pattern H β π Two-Phase Tool Registry (Global + Per-Agent)
Global tools loaded at startup into a shared registry. Per-agent custom tools merged on first agent access into a clone of the global registry β never mutating the shared one.
Part 4 β πΊοΈ Build-It-Yourself Blueprint
A concrete, sequenced plan to build a similar system. Each milestone is a runnable, testable deliverable.
Milestone 0 β ποΈ Foundation (1β2 days)
- [ ] Pick the language (Go is a great fit; Python is too).
- [ ] Pick the DB (PostgreSQL + pgvector if you want vector search).
- [ ] Set up project skeleton:
cmd/,internal/,pkg/,migrations/,docs/,Makefile,docker-compose.yml. - [ ] Define the
Providerinterface (4 methods). - [ ] Implement one provider β start with OpenAI-compatible (covers Groq, DeepSeek, Together, etc. for free).
- [ ] Wire a
cmd/servethat loads config, makes one HTTP request to the provider, and prints the response.
Milestone 1 β π Minimum Viable Agent Loop (1 week)
- [ ] Define
Toolinterface:Name() string,Description() string,Schema() JSONSchema,Execute(ctx, args) (Result, error). - [ ] Implement 3 tools:
read_file,write_file,list_files(workspace-scoped, withresolvePath()traversal guard). - [ ] Build the loop:
Loop.Run(req) β for i := 0; i < 20; i++ { think; if no tools break; act; observe }. - [ ] Persist sessions:
SessionStoreinterface + in-memory implementation. Add PG implementation behind it. - [ ] Emit events via callback (
onEvent func(EventType, payload)). Just three:run.started,tool.call,run.completed. - [ ] Build
cmd/serveHTTP/v1/chat/completions(OpenAI-compatible). One agent. No streaming yet.
You should now have an LLM that can read/write files in a workspace.
Milestone 2 β π System Prompt Architecture (3β4 days)
- [ ] Bootstrap files in DB:
agent_context_files(agent-level) +user_context_files(per-user). 6 known files: SOUL, IDENTITY, AGENTS, TOOLS, BOOTSTRAP, USER. - [ ]
ContextFileInterceptorβ when a tool reads/writes one of these names, route to DB instead of disk. - [ ] System prompt builder β assemble from sections (start with 5β6, grow as needed). Persona early, persona reminder late.
- [ ] Two modes:
PromptFullandPromptMinimal. - [ ] Per-user file seeding on first chat (use the
xmaxtrick with PG; on SQLite uselast_insert_rowid()afterINSERT ... ON CONFLICT DO NOTHING).
Milestone 3 β π’ Multi-Tenancy from the Start (3β4 days)
- [ ]
tenantsandapi_keystables. UUID v7 PKs. - [ ]
tenant_id NOT NULLon every table that holds tenant data (agents,sessions,memory_documents,traces,agent_context_files, β¦). - [ ] Add
WithTenantID(ctx)/TenantIDFromContext(ctx)helpers. - [ ] At the gateway: resolve API key β SHA-256 lookup β set tenant on ctx.
- [ ] Update every store query to add
WHERE tenant_id = $N. Audit the diff. - [ ] Master tenant for legacy/single-user data. Master scope guard for global writes.
Milestone 4 β π§ Pipeline Refactor (1 week)
Once your monolithic loop has > 3 conditional branches, split it:
- [ ] Define
Stageinterface,StageResultenum,RunStatestruct. - [ ] Implement:
ContextStage,ThinkStage,ToolStage,ObserveStage,CheckpointStage,FinalizeStage. AddPruneStagelater. - [ ]
Pipeline.Runorchestrates: setup β iteration loop β finalize. - [ ] Add a feature flag (
pipeline_enabled) so V2 (monolithic) and V3 (pipeline) coexist during the migration.
Milestone 5 β π§ Memory & Search (1β2 weeks)
- [ ]
memory_documents+memory_chunkstables.tsvector(FTS) +vector(1536)(pgvector) columns. - [ ]
MemoryInterceptorβ auto-chunks + embeds on.mdwrites insidememory/*. - [ ] Hybrid search:
0.7 * vector + 0.3 * fts, with per-user 1.2x boost and dedup (per-user wins). - [ ]
memory_searchandmemory_gettools. - [ ] (Later)
episodic_summariestable +EpisodicWorkersubscribed torun.completed. - [ ] (Later)
kg_entities+kg_relationswithvalid_from/valid_untilfor L2.
Milestone 6 β π οΈ Tool Registry Hardening (1 week)
- [ ] Funnel every tool call through
Registry.ExecuteWithContext. - [ ] Add rate limiting (token bucket per session key, defaults: 60/min, burst 5).
- [ ] Add credential scrubber β start with 5β10 high-value patterns (OpenAI sk-, Anthropic sk-ant-, GitHub ghp_, AWS AKIA, generic 64-char hex).
- [ ] Add policy engine: profiles (
full/coding/messaging/minimal), groups (fs,runtime,web, β¦), allow/deny lists. - [ ] Add shell deny groups (start with:
destructive_ops,reverse_shell,dangerous_paths,package_install). - [ ] Capability metadata on every tool (
read-only/mutating/async).
Milestone 7 β π‘ Channels (per channel, ~2 days each)
- [ ] Define
Channelinterface:Name() string,Listen(ctx, onMessage),Send(ctx, OutboundMessage) error. - [ ] Telegram first (simplest, long-polling library exists).
- [ ] Add
channel_instancestable withtenant_idbaked in. - [ ] Outbound dispatcher routes by
channel_instance_id. Internal channels (cli,system,subagent) silently skipped. - [ ] Pairing flow: 8-char code, 60-min TTL, paired-device tracking.
- [ ] Then add: Discord (websocket), Slack (Socket Mode), WhatsApp, Feishu, Zalo.
Milestone 8 β π Observability (3β4 days)
- [ ]
tracesandspanstables. Three span types:agent,llm_call,tool_call. - [ ] Wrap every LLM call in a span. Wrap every tool call in a span.
- [ ]
BatchCreateSpansin batches of 100; on batch failure, retry individually. - [ ] Verbose mode (
TRACE_VERBOSE=1) records full input/output truncated at 50 KB. - [ ] Optional: OpenTelemetry exporter for spans.
Milestone 9 β πͺ Resilience (3β4 days)
- [ ] Wrap providers with retry middleware (exponential backoff, jitter, honor
Retry-After, only retry 5xx + network). - [ ] Per-model cooldown β track failures per model, skip cooldown'd models for N seconds.
- [ ] Failover β try API profile A, then profile B, then degraded model.
- [ ] Mid-loop compaction at 75% context. Post-run compaction at 50 messages or 75% context.
- [ ] Per-session
TryLockfor compaction goroutine.
Milestone 10 β π€ Multi-Agent (1β2 weeks)
- [ ]
subagenttable for spawn tracking. Limits: depth 1, max 5 children, max 8 concurrent. - [ ]
spawntool (async return),delegatetool (sync with timeout). - [ ]
agent_linkstable for delegation eligibility. - [ ] When ready:
teams,agent_team_members,team_tasks,team_messages. - [ ] Atomic task claim:
UPDATE β¦ WHERE status = 'pending' AND owner_agent_id IS NULL. - [ ]
team_taskstool with actions: create / list / claim / complete / comment / attach / approve / reject.
Milestone 11 β π Production Hardening (ongoing)
- [ ] Add the remaining 4 security layers (input guard, output sanitizer, isolation).
- [ ] AES-256-GCM encryption for all at-rest secrets.
aes-gcm:prefix convention. - [ ] API keys: 16 random bytes, SHA-256 hash, constant-time compare.
- [ ] Activity log for every admin action.
- [ ] Hourly
SnapshotStoreaggregations. - [ ] Per-tenant config UI.
- [ ] Self-evolution suggestion engine (only after you have β₯ 100 metrics per agent).
Milestone 12 β π Optional Surface Area
- [ ] Knowledge Vault with wikilinks (
[[target]]syntax). - [ ] MCP bridge (stdio + SSE + streamable-http transports, per-agent + per-user grants).
- [ ] Custom shell tools (DB-stored, hot-reloaded).
- [ ] Cron jobs (cron expressions +
cron_run_logs). - [ ] Browser automation (headless Chrome,
browser.act/browser.snapshot/browser.screenshot).
Part 5 β β οΈ Anti-Patterns to Avoid
GoClaw earns its design by not doing these things:
| Anti-pattern | Why it's a trap | What GoClaw does instead |
|---|---|---|
| Hard-coding one LLM provider | You'll need 5 within a year | Provider interface; adapters per provider |
| Single-tenant first, "we'll add it later" | Migration is brutal β every query, every test, every cache key | tenant_id NOT NULL on day one |
| Mutable global agent state | Race conditions across concurrent runs | Per-call data lives in context.Context |
| Bypassing the tool registry "just for this one call" | Loses scrubbing, rate-limit, RBAC | Every tool call through Registry.ExecuteWithContext, no exceptions |
| Trusting the model's tool-call format | Models hallucinate <tool_call> XML, [Tool Call: ...] text, etc. |
7-step output sanitizer strips them all |
| Storing secrets unencrypted because "it's the same DB" | Database dumps leak; insider access widens blast radius | AES-256-GCM with aes-gcm: prefix on every secret |
One giant runLoop() function |
2k-line functions become untestable | 8-stage pipeline, each stage isolated |
Using time.Sleep between LLM retries |
Wastes time + cost; no jitter β thundering herd | Exponential backoff with jitter, honors Retry-After |
| One memory tier ("just embeddings") | Slow, expensive, irrelevant matches | L0 auto-inject + L1 hybrid search + L2 deep retrieval |
| Distributed lock for "claim this task" | Adds Redis/Zookeeper dependency; race conditions still possible | Atomic SQL UPDATE with WHERE status = 'pending' |
Trusting client-supplied tenant_id header |
Spoofable; cross-tenant leakage | Tenant resolved from API key at gateway, never from clients |
| Loading the full agent config on every request | Slow; chatty | Router cache with TTL + pub/sub invalidation |
| Synchronous summarization on the request path | User waits 10+ seconds | Synchronous memory flush, asynchronous summarization in background goroutine |
| Letting the agent self-modify its prompts | One bad cycle and quality cratters | Suggestion engine + admin approval + rollback_on_drop_pct guardrail |
Part 6 β π Reference Map
π Repo structure (the parts that matter)
goclaw/
βββ cmd/ 130+ files: serve, onboard, migrate
β βββ gateway*.go Gateway lifecycle + setup + wiring
β βββ tui_*.go TUI for onboarding/setup
βββ internal/
β βββ agent/ V2 monolithic loop, router, system prompt,
β β resolver, sanitize, compaction, evolution
β βββ pipeline/ V3 8-stage pipeline (context_stage.go,
β β think_stage.go, tool_stage.go, β¦)
β βββ providers/ Provider interface + adapters per backend
β β + retry, cooldown, failover, middleware
β βββ tools/ Registry, capabilities, policy engine,
β β scrubber, rate limiter, custom tools
β βββ memory/ 3-tier memory + auto-injector + embeddings
β βββ consolidation/ Episodic/semantic/dreaming workers
β βββ vault/ Knowledge vault + wikilinks + FS sync
β βββ knowledgegraph/ KG entities + relations + traversal
β βββ store/ Store interfaces (the contract)
β β βββ pg/ PostgreSQL implementations
β β βββ sqlitestore/ SQLite implementations
β βββ gateway/ WS server, HTTP mux, method router,
β β rate limiter, client lifecycle
β βββ http/ HTTP API handlers (/v1/*)
β βββ channels/ Telegram, Discord, Slack, WhatsApp,
β β Feishu, Zalo OA, Zalo Personal
β βββ mcp/ MCP bridge (stdio/sse/http transports)
β βββ crypto/ AES-256-GCM with `aes-gcm:` prefix
β βββ permissions/ RBAC: viewer/operator/admin
β βββ eventbus/ Domain event bus for consolidation
β βββ tracing/ Trace + span hierarchy
β βββ tokencount/ tiktoken-based counter
β βββ workspace/ Per-user workspace resolver
β βββ bootstrap/ SOUL/IDENTITY system prompt loading
β βββ config/ JSON5 config + env overlay
β βββ i18n/ EN/VI/ZH backend message catalog
β βββ audio/ TTS provider layer (5 providers)
β βββ media/ Image / audio / video generation
β βββ sandbox/ Docker sandbox for shell exec
βββ pkg/
β βββ browser/ Browser automation
β βββ protocol/ Frame types, RPC method names, errors
βββ migrations/ PostgreSQL migrations (45+)
βββ docker/ Docker compose variants
βββ docs/ 31 architecture docs (00-architecture-overview,
β 01-agent-loop, 03-tools-system, β¦)
βββ ui/
βββ web/ React SPA (Vite, Tailwind, Radix, Zustand)
βββ desktop/ Wails v2 desktop app (SQLite, embedded gateway)
ποΈ Key files to read first (in order)
docs/00-architecture-overview.mdβ system mapdocs/01-agent-loop.mdβ the loop in detail (V2 + V3)docs/03-tools-system.mdβ tool registry, policy, securitydocs/06-store-data-model.mdβ every table and store interfacedocs/09-security.mdβ the 5 layersdocs/23-multi-tenant-architecture.mdβ tenant resolution + isolationdocs/24-knowledge-vault.mdβ vault, wikilinks, hybrid searchdocs/04-gateway-protocol.mdβ RPC + HTTP API surfacedocs/02-providers.mdβ provider abstraction + resiliencedocs/codebase-summary.mdβ module map
π‘ The shortest possible "what is GoClaw"
A multi-tenant AI agent gateway in Go that exposes WebSocket RPC + HTTP REST + OpenAI-compatible APIs. Behind a single
Providerinterface it talks to 20+ LLM backends. Behind a singleToolregistry it offers 50+ built-in tools plus MCP and custom shell tools, all gated by RBAC + rate limits + credential scrubbing + path/SSRF/shell-deny guards. Agent runs flow through an 8-stage pluggable pipeline (thinkβpruneβtoolβobserveβcheckpointβfinalize). Memory is 3-tier (working / episodic / semantic) with hybrid BM25+vector search. Every isolatable table carriestenant_id; every query enforces it; tenant scope flows throughcontext.Context. Channels (Telegram, Discord, Slack, β¦) are pluggable adapters. Teams of agents collaborate on a SQL-claimed task board.
π Closing Thoughts
GoClaw is a study in disciplined boundaries. The agent loop never knows which provider it's talking to. The provider never knows which channel a message came from. The tool never knows which tenant owns the data. Each layer reduces to a small interface and a context-propagated set of values.
If you take only one thing from this document: make every concern an interface from line one, and make multi-tenancy and security non-optional from line one. Everything else can be added incrementally β those two cannot.
If you found this helpful, let me know by leaving a π or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! π
All rights reserved