Skip to main content
  1. Blog/

I stopped fine-tuning and put personality in markdown

Building Skippy - This article is part of a series.
Part 6: This Article

I spent seven years fine-tuning personality into model weights. Five architectures, from char-rnn to GPT-4o. The fine-tuned Skippy was good. But the moment I wanted to change his tone, boundaries, or capabilities, I was looking at another round of data curation and training.

Then I tried something different. I wrote it down.

A full identity file, persistent memory, session hooks, 54 domain skills, and an environment map, all in markdown. Running on Claude Code. No fine-tuning. No custom model. The same base model everyone else has access to, made specific through architecture instead of weights.

The result is more consistent, faster to iterate, and more useful day-to-day than anything I trained. Not because the model is better. Because the interface is.

Here’s what that looks like in practice. Every morning at 8:00 AM, a scheduled job fires. Skippy syncs my meeting transcripts, extracts action items, checks them against my task board, pulls today’s calendar from two Google accounts, scans overnight email across three inboxes, and delivers it all as a single briefing email. By the time I sit down with coffee, the day is organized. No prompt from me.

That kind of behavior didn’t come from a better model. It came from surrounding a generic one with the right structure. Here’s what that structure looks like.

The identity file
#

A shell hook fires at session start and CLAUDE.md gets injected into context. It defines voice, role boundaries, operating rules, and working style in one file.

Without it, every session starts from zero. You get a capable but generic assistant that says “Great question!” and hedges every answer. CLAUDE.md turns that into something specific.

There’s a list of things it should never do:

  • No sycophancy (“Great question!” is banned)
  • No parroting back what you just said
  • No filler preambles
  • No over-apologizing
  • No hedging when it actually knows the answer
  • No performative enthusiasm

That last one matters more than you’d think. When every sentence ends with an exclamation mark, none of them mean anything.

The iteration advantage is real. With fine-tuned models, changing the voice meant curating training examples, running a job, testing, adjusting, training again. With CLAUDE.md, I edit a file and the next session picks it up. Days to minutes.

Memory that outlasts the session
#

AI assistants forget everything the moment you close the chat. This setup doesn’t work that way.

The memory system has three layers.

Daily notes (memory/YYYY-MM-DD.md) are raw capture. What happened, what was decided, what was built. Written during the day, then left untouched.

Here’s a real entry from April 5th, the day we built out infrastructure monitoring:

Ping sweep found 26 hosts on 192.168.10.0/24. UniFi skill pulled 51 clients + 11 infrastructure devices. SSH’d into all accessible Linux hosts for OS, CPU, RAM, disk, containers. Final inventory: 35 physical devices, 14 VMs, full container lists.

Six months from now, when something breaks and I need to remember what’s on the network, it’s there.

Persistent memory (MEMORY.md) is the distilled version. Lessons, preferences, active projects, environment details. Loaded at every session start:

## Lessons Learned
- Use uv, never pip/pip3.
- Don't act without explicit go-ahead. Larry shares ideas to discuss,
  not to trigger building.
- Branding: "VergeOS" (capital S), not "Verge.IO."

That second bullet is a correction that became a rule. I mentioned an idea once and Skippy started building it. The correction happened once and never needed to happen again.

Chat history is the raw transcript. Every message in and out, timestamped, searchable. The last 20 messages from the previous session get injected at startup for continuity. Over 400 session logs exist now.

A consolidation process periodically distills daily notes into durable memories. It resolves contradictions, avoids duplicates, and keeps persistent memory concise. Raw conversation becomes actual learning.

The session lifecycle
#

Claude Code has a hook system. Four events, each wired to specific behaviors.

SessionStart injects CLAUDE.md and MEMORY.md into context. Each session starts with the operating rules, memory, current context, and prior corrections already loaded.

PreToolUse runs two hooks. Every bash command gets logged to an audit file. Every file edit gets checked against a protect list. .env, .pem, package-lock.json are blocked.

PostToolUse runs auto-formatting after edits. Prettier, Black, Ruff, whatever fits the file type.

SessionEnd fires auto-commit and push. The workspace state is always preserved.

The protect list exists because it was needed once. An early edit to a .env file happened before I’d thought through the implications. The hook was written that afternoon. That’s the pattern with most guardrails here. They’re responses to specific incidents, not theoretical design.

Domain skills
#

The .claude/skills/ directory holds 54 skill definitions. Each is a markdown file that teaches a specific domain. vrg for VergeOS CLI management, email for inbox triage, granola for meeting sync, sre-methodology for monitoring design.

Skills aren’t plugins. They’re domain-specific operating manuals loaded when needed. The vrg skill is 311 lines documenting all 200+ VergeOS CLI commands with examples and fallback patterns. Before the skill existed, interacting with VergeOS through the CLI was hit-or-miss. After writing it, I ran a 100-task evaluation across common VergeOS operations. The pass rate jumped from 22% to 93%. Still not perfect, but the difference between unreliable and useful.

The ones that stand out aren’t the developer tools. Every AI coding assistant can commit code and run tests. These are the skills nobody else’s setup has:

  • /granola syncs meeting transcripts from Granola’s API into an Obsidian vault, extracts action items, converts ProseMirror notes to markdown. Skippy reads my meetings without attending them.
  • /todoist is full task management via CLI. Create, complete, reschedule, filter across projects and workspaces. Not a read-only integration. Skippy manages the task board.
  • /uptime-kuma manages 26 Uptime Kuma monitors. Add monitors, check status, pause during maintenance, pull heartbeat history. Infrastructure awareness, not just code awareness.
  • /peekaboo is full macOS UI automation. Screenshots, clicking, typing, app management. When Skippy needs to interact with something that doesn’t have an API, it drives the GUI.
  • /cmux controls the terminal multiplexer. When I start a new project, I tell Skippy. It creates a cmux workspace, names it, opens a file manager in a right split, sets the sidebar status, and logs the event. I never touch the multiplexer directly.

Each skill has a fine-grained permission model. The vrg skill gets Bash(vrg:*) but not Bash(sudo:*). Skills are powerful exactly because they’re scoped.

The tool landscape
#

Skills define how to use tools. A separate file, TOOLS.md, captures what’s available. The endpoints, credentials, and infrastructure that make up my environment. Skills are shareable. TOOLS.md is private.

The file covers code, productivity, and infrastructure: Gitea, Google Workspace, Unraid, Proxmox, VergeOS, and the rest of the local stack. When I say “check the Unraid containers,” Skippy doesn’t need me to remember the IP address or the GraphQL endpoint. It’s in TOOLS.md.

The separation matters. Adding a new service means updating TOOLS.md. Improving how Skippy uses an existing service means updating the skill. Neither change breaks the other.

The heartbeat
#

Instead of a pile of cron jobs, the system uses a rotating heartbeat. One check per tick, whichever is most overdue.

Check Cadence What
todoist 30 min Due/overdue tasks
calendar 2 hr Events in the next 2 hours
email 30 min Urgent unread messages
git_status 1/day Uncommitted changes
infra 4 hr Container and VM health
memory 3/week Consolidate logs into MEMORY.md

Quiet hours (11pm-8am) suppress everything except urgent checks. Report only if something is actionable. “All clear” is HEARTBEAT_OK, not a summary of what got checked. One cheap check per message instead of one expensive sweep.

The partnership model
#

The relationship is explicitly asymmetrical. From CLAUDE.md:

I have access to everything and authority over nothing -- and that's the
design working correctly.

Skippy observes, analyzes, and recommends. I decide. Sometimes I go a direction Skippy wouldn’t have picked, and the system is designed to follow rather than relitigate.

In practice, this looks like a session where I’m planning a demo for coworkers and Skippy pushes back on scope:

The key insight: reference docs are prep work, not part of the demo. Start from mkdir live, no pre-scaffolding.

Sometimes the opinions are wrong. But a collaborator that never disagrees is just an echo with extra steps.

Skippy has access to my shell, filesystem, browser, calendar, email, meeting transcripts, and infrastructure APIs. Permission scoping at the settings level keeps it bounded. Bash commands are allowlisted, sudo is denied, skills inherit scoped permissions. A “no autonomous action” rule means Skippy only does what’s explicitly asked. If it notices unfinished work from a previous session, it doesn’t resume it.

That’s the most important rule. It prevents an AI with deep context from deciding to “help” by taking actions you didn’t authorize. Access without autonomy. Capability without initiative.

The process behind the process
#

Everything described so far lives inside Claude Code sessions. A separate Python process called Submind ties it together.

Submind maintains a long-running Claude Code subprocess, runs an HTTP server for webhooks and scheduling, and manages memory consolidation across workspaces. When a session starts, it injects context in a specific order. Identity, persistent memory, workspace memory, recent daily notes, conversation history, available tools, then the actual message. By the time Skippy sees my first message, the context is already loaded.

The scheduling API lets Skippy schedule its own future tasks. It supports one-time jobs, daily jobs, and interval-based jobs. Self-terminating monitors are a pattern. Skippy sets auto_remove, and when the condition is met, the job delivers its message and stops checking.

External APIs are brokered through a service proxy. API keys get injected at request time by Submind. The AI never sees them. It calls a localhost endpoint, gets results back. Keys stay in .env, authentication happens at the boundary.

To be honest about the thesis, “I put it in markdown” is the catchy version. The accurate version is that markdown handles the knowledge layer, but Submind handles the orchestration that makes it useful. Configuration replaced training, but configuration alone isn’t the whole story. There’s enough going on here for its own post.

Where this breaks
#

This system has real limitations, and pretending otherwise would undermine everything above.

Context bloat. CLAUDE.md, MEMORY.md, TOOLS.md, recent history, daily notes. It adds up. Every session starts with roughly 40-50k tokens of context before I’ve said a word. That’s a real tax on every interaction. Keeping it at that limit requires constant pruning, which leads directly to the next problem.

Memory drift. Pruning means deciding what to keep, and those decisions aren’t always right. Stale entries stick around. Contradictory memories coexist. A correction from January can conflict with a decision from March, and the system doesn’t always resolve the tension gracefully. I still manually edit MEMORY.md more often than I’d like.

Silent fragility. The hook system, the heartbeat, the Submind process. These are layers of automation that can fail quietly. A broken hook doesn’t announce itself. A scheduling job that stops firing doesn’t send an error. The system works well when it works, but monitoring the monitor is its own unsolved problem.

Calcified assumptions. Identity files can encode bad habits as easily as good ones. If a rule in CLAUDE.md was the right call six months ago but isn’t anymore, Skippy will keep following it faithfully. A /retrospective skill now runs after significant tasks, reviewing what worked, what didn’t, and proposing concrete updates to instructions and skills. That helps, but it still catches staleness reactively, not proactively. The system optimizes for consistency, which is a strength until it isn’t.

Overtrust. The partnership framing is useful, but it can also be seductive. Skippy is good at seeming like it understands context, which makes it easy to stop checking its work. I’ve caught confident-sounding outputs that were subtly wrong, plausible enough to slip past a quick review. The asymmetry model helps, but it doesn’t eliminate the risk.

These aren’t hypothetical. They’re things I’ve hit. The system is net positive, but it’s not magic, and the maintenance overhead is real.

What this actually looks like
#

The morning briefing from the intro is the best example of the whole system working together. A scheduled job fires a markdown runbook:

  1. Sync new Granola meetings to the Obsidian vault
  2. Read the last two days of meeting files, extract action items and commitments
  3. Check Todoist for existing tasks to avoid duplicates
  4. Create new tasks for anything that fell through the cracks, tagged with the source meeting
  5. Pull today’s calendar from both personal and work Google accounts
  6. Scan overnight email across three inboxes, flag anything that needs attention
  7. Check the weather forecast
  8. Suggest 2-4 things Skippy can handle today

That’s five skills working together, granola, todoist, Google calendar and email, the scheduling API, orchestrated by a numbered list in a .md file. No workflow engine. No DAG.

During working hours, Skippy has the full context of the workspace. When I say “check the VergeOS cluster health,” it knows the CLI, the credentials, and the output format. If I need to see what’s on screen in another app, /peekaboo takes a screenshot. If I want a project open in a new workspace, /cmux sets it up.

At the end of the day, the session closes, auto-commit fires, daily notes capture what happened, and consolidation runs overnight. The next morning, the briefing runs again.

What doesn’t happen is just as important. No re-explaining context, no re-configuring tools, no manually reviewing meeting notes for things I promised to do.

What I’ve learned
#

Corrections are the most valuable data. Every time I said “not like that,” the correction got saved and never needed repeating. The gap between what I want and what I get has narrowed to almost nothing for routine tasks.

Explicit asymmetry works. “Access to everything, authority over nothing” isn’t limiting. When the AI knows its role clearly, it operates with more confidence within that role, not less.

Plain text scales. Markdown files, shell hooks, a scheduling API. Every unnecessary layer of abstraction I didn’t add is a layer that can’t break. If Claude Code disappears tomorrow, the knowledge layer survives intact.

I spent fourteen years building toward this, through Markov chains and char-rnns and seq2seq models and fine-tuned transformers. The version that actually works is the one where I stopped trying to train the personality and started writing it down.

The biggest win wasn’t changing the model. It was changing everything around it. Memory, tools, constraints, and feedback loops. That turned a generic assistant into a system that fits how I actually work.

Building Skippy - This article is part of a series.
Part 6: This Article

Related