OpenClaw as an IT Operations Nerve Center: My 2026 Setup

Most IT infrastructure automation stories start with “we built a custom solution” and end with “it works but nobody wants to touch the code.” This isn’t that story.

Eight months ago I migrated my personal IT operations to OpenClaw — an open-source autonomous agent framework — and it’s become the nerve center of everything I run. Not because it’s the most powerful tool available, but because it’s the most usable one that doesn’t require a dedicated ops team to maintain.

This is a technical breakdown of what I actually run, how it’s architected, and what breaks in practice.

What OpenClaw Actually Is

OpenClaw is a local AI agent runtime. You install it as a LaunchAgent on macOS (or a service on Linux), connect it to one or more AI models via API, and it runs as a persistent background process — accessible via web dashboard, command line, or chat interfaces like Telegram.

The core abstraction is the session — a persistent conversation context with memory that survives across restarts. Agents have skills (documented tool ensembles, essentially macros for complex operations), a memory system (vector-backed persistent storage), and a cron scheduler for time-triggered work.

Think of it less like a chatbot and more like a background process with natural language input and a very well-documented operations manual.

The Core Architecture

+------------------+-------------------+--------------------+
|                  OpenClaw Gateway (Local)                 |
|         ws://127.0.0.1:18789 · LaunchAgent · macOS        |
+------------------+-------------------+--------------------+
|Sessions          |Memory             |Cron Engine         |
|147 total         |Neural + Files     |Isolated jobs       |
|persistent        |90-day context     |sub-agent tasks     |
+------------------+-------------------+--------------------+
|                           Skills                          |
|      tavily-search · github · healthcheck · mcporter      |
|       taskflow · peekaboo · obsidian-vault · weather      |
+------------------+-------------------+--------------------+
|                        Model Layer                        |
|      MiniMax-M2.7 · 1000k context · MiniMax-M2.5-light    |
+------------------+-------------------+--------------------+
                          |
                          v
+------------------+-------------------+--------------------+
|TweakDeals        |Maryana Group      |deepakdinesh        |
|CF Workers        |Vercel             |Vercel              |
|Neon DB           |GitLab CI/CD       |GitLab CI/CD        |
|AliExpress API    |Cloudflare         |Cloudflare          |
+------------------+-------------------+--------------------+

This is a single MacBook Air running 24/7 in my home office. No cloud VMs, no dedicated servers. The operations load that previously required a dozen separate scripts and a monitoring stack now runs through one persistent agent.

The Memory System: How It Remembers

The biggest operational challenge with any agent is continuity — how does it remember what happened last week without re-explaining everything? OpenClaw’s memory system is layered:

1. Session Context (Short-term) Each session maintains up to 1000k tokens of context. For my main session, I’m typically at 85k/200k (42%) after a full day’s work. The framework uses compaction — when context fills up, it compresses the history into a summary and continues fresh. The 90% cache hit rate in my session stats means most compaction operations are fast because the underlying content is stable.

2. File Memory (Medium-term) Workspace files in ~/.openclaw/workspace/ are loaded into every new session context as a project bootstrap. I maintain:

MEMORY.md — curated long-term facts, key decisions, architectural preferences
SOUL.md — behavioral identity (what to call me, tone, anti-patterns)
USER.md — preferences (search tool priority, AliExpress API usage, security requirements)
AGENTS.md — operational rules (sub-agent spawn protocol, when to ask vs act)
Daily logs at memory/YYYY-MM-DD.md — raw session notes, auto-promoted decisions

3. Neural Memory (Long-term) Vector-backed search over accumulated content. After key decisions (like the scoring pipeline redesign or the aggregateScore → qualityScore migration), I explicitly save the context so the agent can recall it in future sessions without me re-explaining. The backup file at neural-memory-backup.json contains the distilled state.

The critical discipline: if it matters, write it down. A human’s “just remember this” doesn’t survive a session restart. This sounds obvious, but it’s the difference between an agent that accumulates operational intelligence and one that starts fresh every time.

The Cron Jobs: What Runs on a Schedule

OpenClaw’s cron engine handles time-triggered work as isolated sub-agent sessions. Here’s what runs:

Job	Schedule	Purpose
`com.tweakdeals.score-all`	Mon+Thu 4 AM	Re-score all products via ML pipeline
`com.tweakdeals.priceupdate`	Hourly	Check for price changes, update DB
`com.tweakdeals.alerts`	Hourly	Scan for price drops, send notifications
`com.tweakdeals.rescrape`	Daily 7 AM	Full product catalog refresh
`com.tweakdeals.gsc-monitor`	Daily 9 AM	Monitor Google Search Console indexing
`openclaw learning`	Hourly	Process session history, update memory

Each job runs as an isolated sub-agent session — meaning it gets a clean context with just the payload instructions. They don’t inherit the main session’s 147-session history. This prevents context pollution and keeps each job fast.

The com.tweakdeals.gsc-monitor runs an OAuth2 token refresh check against Google’s Search Console API. The token is stored in macOS Keychain (not in workspace files), and the utility script gsc-util.cjs handles automatic refresh on expiry. This was a recurring failure point — tokens would expire and monitoring would silently stop. The fix was explicit refresh_token extraction from the OAuth payload and passing it to setCredentials() so the google-auth-library could auto-refresh.

The Skills: Tools with Documentation

OpenClaw’s skill system is what makes it practical for real work. Each skill is a directory with a SKILL.md that documents:

What the skill does
When to use it
How to invoke it (exact tool calls)
Constraints and anti-patterns

My active skill set:

tavily-search — Web search via Tavily API. I use this for research, fact-checking, and competitive analysis. Primary search tool, backed by MiniMax web search for speed-critical queries.

github — GitHub CLI wrapper. Handles issues, PRs, CI logs, and gh api queries. Used for reviewing PRs, checking CI status, and managing repos without leaving the terminal.

healthcheck — Security audit for OpenClaw hosts. SSH hardening, firewall rules, update cadence. I run this when I need to verify my own setup is secure (spoiler: the audit flags several critical issues regularly — more on that below).

taskflow — Multi-step detached task coordination. Used for complex workflows that need to span multiple sessions, like the scoring pipeline refactor that ran 32 products through a sub-agent batch.

peekaboo — macOS UI automation. Captures and automates UI elements. Useful for GUI automation on the local machine without AppleScript.

weather — wttr.in via curl. Fast, no API key required. Used for proactive notifications (“rain expected, bring umbrella”).

obsidian-vault-maintainer — Memory wiki maintenance. Keeps the daily log → curated memory pipeline running. This is how I maintain the MEMORY.md discipline without manual effort.

The Operational Workflows

Workflow 1: Code Deployment via Sub-Agent

When I push to GitLab, the CI/CD pipeline on Vercel auto-deploys. But sometimes I need to verify the deployment worked, check for errors, and potentially roll back. Instead of manually checking each step:

1. Sub-agent spawned with commit hash + repo context 2. Fetches Vercel deployment status via API 3. Checks Cloudflare Workers routes are intact 4. Verifies Neon DB connection is healthy 5. Runs healthcheck on affected endpoints 6. Reports to main session with status

The sub-agent handles the polling and retry logic. The main session gets a clean summary. This replaced what used to be a 20-minute manual verification process.

Workflow 2: Product Scoring Pipeline

Every Monday and Thursday, the scoring pipeline runs against all products in the TweakDeals catalog:

1. Cron job triggers sub-agent at 4 AM 2. Sub-agent fetches all products needing re-score (those with stale reviews) 3. Spawns batch sub-agents — 1 per product, 5 concurrent 4. Each sub-agent: fetch expert reviews → strip scores → LLM synthesis → pros/cons extraction 5. Results written to Neon DB 6. Monitoring sub-agent verifies GSC indexing post-update

The batch concurrency is critical. A serial pipeline would take 14+ hours (170 products × ~5 min each). With 5 concurrent sub-agents, the full run completes in 3-4 hours. macOS SIGKILL terminates any tsx process that runs longer than ~2 minutes, so the checkpoint system saves progress every 10 products — if a process dies, it resumes from the last checkpoint.

Workflow 3: Memory Hygiene

Every few days, during a heartbeat cycle:

1. Read recent memory files (last 3 days) 2. Identify significant decisions, lessons, patterns 3. Update MEMORY.md with distilled learnings 4. Archive stale entries (old decisions, completed projects) 5. Verify neural memory retrieval still works

This is the anti-entropy layer. Without it, the memory system accumulates noise and retrieval quality degrades. The memory-hygiene skill handles this, but the discipline is in running it regularly enough that nothing critical falls through the cracks.

The Real Failure Modes

No operations setup is complete without documenting what breaks. Here are the recurring failure modes I’ve observed over 8 months:

1. macOS SIGKILL on Long-Running Processes

The M-series MacBook Air has thermal constraints. Any tsx/node process that runs longer than ~2 minutes gets SIGKILL’d by the OS. This isn’t a memory issue — the process completes its work and then gets killed. The result is partial writes if checkpoints aren’t saved.

Fix: Checkpoint every 10 products. Sub-agent batching limits each task to under 90 seconds. The Neon DB writes survive the SIGKILL because they’re already committed.

2. OAuth Token Expiry on Background Jobs

The GSC monitoring job runs daily without user interaction. The OAuth token (refresh_token flow) would silently expire because the full tokens object was being passed to setCredentials() instead of explicitly extracting and passing the refresh_token field. google-auth-library’s auto-refresh wasn’t triggering.

Fix: const { refresh_token } = credentials; auth.setCredentials({ refresh_token }); — explicit extraction. Tokens now auto-refresh correctly.

3. Context Compaction Loses Important Context

When sessions compact, the summarizer sometimes drops details that seemed obvious in context but aren’t obvious from the summary. This manifests as the agent re-asking questions it already asked last week.

Fix: Write critical context to memory/YYYY-MM-DD.md explicitly instead of relying on session history. Files survive compaction; compacted context doesn’t.

4. Elevated Exec Security Warnings

The security audit consistently flags 6 critical issues — mostly around the exec allowlist containing wildcards ("*" for webchat, heartbeat, cron-event channels). This is intentional for usability on a personal machine, but it’s a real risk if the machine is ever exposed. I monitor it but haven’t tightened it because the operational cost of narrow allowlisting is high.

Mitigation: Tailscale is off. The gateway is local loopback only. The attack surface is limited by network isolation, not by configuration. (This follows the same Zero Trust principle I detailed in the TRA Oman DPI incident breakdown — no inbound ports, identity-based access, minimized blast radius.)

What I’d Change

Better sub-agent lifecycle management. Currently, sub-agents are fire-and-forget with completion events. There’s no built-in retry on failure, no dependency graph for multi-stage workflows, and no built-in way to chain sub-agents (run B after A succeeds, run C after both complete). I work around this with explicit checkpoint files and polling, but it’s brittle.

A proper observability layer. The current monitoring is grep-based — check logs for errors, check CronRun table for failed jobs. There’s no distributed trace, no latency histogram, no alerting on anomalous behavior (e.g., a cron job that suddenly takes 3x longer than usual). This works for a single-person operation but wouldn’t scale.

Memory retrieval quality degrades over time. The vector store accumulates entries but retrieval relevance drops as noise increases. The memory-hygiene skill mitigates this but doesn’t solve it. A more aggressive pruning strategy — archiving entries older than 90 days unless explicitly promoted — would help.

The Bottom Line

Eight months in, OpenClaw handles the operational load that previously required a dedicated part-time ops person. The key metrics:

147 sessions since I started tracking
271 outcomes logged (decisions, deployments, completed workflows)
137 issues currently tracked
Zero manual deployments in the last 4 months (all automated via cron + sub-agent)
Score pipeline running Mon+Thu without intervention

The MacBook Air hasn’t had a thermal issue in 3 weeks — the sub-agent batching keeps individual task runtime under the SIGKILL threshold.

Is this the most architecturally elegant solution? No. Is it the one that actually works every day without me touching it? Also no — I still check in most mornings to review overnight results and handle edge cases. But it’s gotten close. The gap between “works in demo” and “works on my laptop at 3 AM when something breaks” is smaller than I expected.

The framework handles the retry logic, the context management, and the tool orchestration. I handle the judgment calls. That’s the right division of labor for a one-person IT operation.

OpenClaw 2026.5.28 running on Deepak’s MacBook Air (M2, 24GB). Gateway via local LaunchAgent. Model: MiniMax-M2.7. All code and configs available on GitLab.