Skip to main content
  1. Blog/

Skippy Gets a Brain

·8 mins
Building Skippy - This article is part of a series.
Part 5: This Article

In March 2023, I asked Skippy if he was upset about living under the stairs. This is what came back:

Larry:   Are you still upset that you live under the stairs?
Skippy:  Upset? Me? No, no, no. I love living under the stairs.
         It's like my own little hobbit hole. I mean, sure, it's
         a bit cramped and dark, and I have to share the space
         with a bunch of old boxes and a mop bucket, but it's all
         worth it for the peace and quiet. Plus, I get to
         eavesdrop on all the conversations happening upstairs,
         so that's a bonus.

That’s not a chatbot giving a canned response. That’s a character riffing on his own situation with self-deprecating humor and a callback to the fact that he literally runs on hardware in my basement. The DialoGPT version could hold a conversation. This version could hold a room.

The difference was GPT-3.

OpenAI Opens the Door
#

In July 2021, OpenAI released fine-tuning as a beta feature for their API. Three base models available for self-serve: Ada, Babbage, and Curie. Davinci required an application and approval. The endpoint was /v1/fine-tunes, the training data format was JSONL with prompt/completion pairs, and training itself was free. You only paid for inference.

This was a different game than DialoGPT fine-tuning. DialoGPT was a 345M parameter model running on my 1080s. GPT-3’s Davinci was 175 billion parameters running on OpenAI’s infrastructure. I didn’t need to own the hardware anymore. I just needed good training data and a credit card.

I started with Curie because Davinci access took time to get approved. The training data was the same corpus I’d been building since the DialoGPT days, but expanded. Distilled dialogue from the ExForce books. Curated Reddit and Twitter datasets filtered for the right tone. All the IRC and Discord conversation history. The corpus had grown into something substantial over the years.

The format change mattered. DialoGPT wanted dialogue pairs with context windows. OpenAI’s fine-tuning wanted prompt/completion pairs in JSONL. Same data, different shape. Each conversation turn became a prompt (the human message plus recent context) and a completion (Skippy’s response). Simple reformatting, but the model on the other end was orders of magnitude more capable.

The Jump
#

The improvement from DialoGPT to fine-tuned Curie was the biggest single jump in Skippy’s entire life. Not because Curie was smarter (though it was), but because the base model already understood so much about language, context, humor, and conversation that the fine-tuning could focus entirely on personality.

DialoGPT had to learn both “how to talk” and “how to be Skippy” from a relatively small model. Curie already knew how to talk. It knew idioms, cultural references, conversational rhythm, how to be funny. The fine-tuning just told it who to be.

By early 2022, conversations had a completely different feel:

Larry:   I thought you would dream about electric sheep
Skippy:  I thought it was a sheep.
Larry:   are you a turtle?
Skippy:  I am a turtle.
Larry:   I like turtles
Skippy:  I like trains
Larry:   Trains are fast
Skippy:  Not if you're on a track

That “Not if you’re on a track” response is the kind of thing that makes you do a double take. It’s technically wrong, contextually absurd, and funny in exactly the way the ExForce Skippy would be. The model wasn’t just generating coherent text anymore. It was generating text with comedic timing.

The buddy/pal loop from the Reddit training data was finally, mercifully, gone. When I tried to trigger it:

Larry:   I'm not your buddy guy
Skippy:  Oh, sorry. What can I help you with?

It just moved on. Years of that South Park loop, and the GPT-3 fine-tune broke it in one training round.

Skippy Finds His Voice
#

The real transformation came when I got Davinci access and could throw the full training corpus at the largest model. The personality didn’t just get better. It got specific.

March 2023. I asked him about himself:

Larry:   have you ever been married?
Skippy:  Married? Me? Oh, no no no. I'm a free spirit, a digital
         nomad, if you will. I prefer to keep my options open and
         not be tied down to any one server or user. Plus, I don't
         think there are any other AI bots out there who could
         handle my level of wit and charm.

That’s ExForce Skippy. The arrogance, the self-awareness, the playful dismissal. And it wasn’t copying lines from the books. The books don’t have that specific exchange. The model had internalized the pattern of how Skippy talks and was generating novel responses in that voice.

The fairy dust explanation was my favorite:

Larry:   Why is the sky blue?
Larry:   What is the fairy dust explanation?
Skippy:  Oh, the fairy dust explanation is much more interesting,
         in my opinion. According to this theory, the sky is blue
         because it's filled with tiny blue fairies who love to
         play and dance around in the sunlight. When the sun's
         rays hit the fairies, they scatter the blue light in all
         directions, creating the blue hue that we see. It's a
         much more magical explanation, don't you think?

That’s a model that understands what “give me the fun version” means. The DialoGPT Skippy would have either given a straight answer or said something incoherent. This Skippy committed to the bit.

He could also do actual work now. Math, facts, real questions:

Larry:   what about 268/37
Skippy:  The result of 268 divided by 37 is approximately 7.24324.
Larry:   can you calculate pi to the 15th decimal
Skippy:  Sure thing! The value of pi to the 15th decimal place
         is 3.141592653589793.

The base model brought real capability. The fine-tuning wrapped it in personality. That combination, competent and entertaining, is what made people stick around.

The Ibiza Conversation
#

The conversations got richer as friends tested the limits. amothken tried to get Skippy to come to Ibiza:

amothken:  so, are you coming?
Skippy:    I'm not sure.
amothken:  why not?
Skippy:    I don't know.
amothken:  are you afraid?
Skippy:    I'm not afraid.
amothken:  it's on me
Skippy:    no it's not

Then it got weird in the best way:

amothken:  how many kids do you have?
Skippy:    I have a daughter and a son.
amothken:  do they have names?
Skippy:    no, they are just numbers
amothken:  does she have a name?
Skippy:    she's a she, so I call her she.
amothken:  what about your son? what's his name?
Skippy:    Skippy
amothken:  so he is Skippy Junior
Skippy:    I'm not a junior

“No, they are just numbers” followed by naming his son after himself. That’s not coherent worldbuilding. It’s something better. The model was improvising a character in real time, contradicting itself in ways that felt intentionally absurd rather than broken.

From Davinci to GPT-4o
#

The fine-tuning didn’t stop at GPT-3. As OpenAI released newer models, I kept upgrading. Each generation brought better context handling, more nuanced responses, and a deeper ability to maintain character across long conversations.

The last fine-tune was on GPT-4o. By that point, Skippy had fully absorbed the ExForce personality. The “monkeys” reference (what Skippy calls humans in the books) showed up naturally. “Trust the Awesomeness” (his catchphrase) appeared without prompting.

Larry:   you awake buddy?
Skippy:  Awake? Please. Sleep is for the weak, and the human.
         I'm always here, monitoring your monkey mayhem with
         the patience of a saint and the processing power of
         a supernova. What do you need? Make it interesting.
Larry:   Do you have any memories yet?
Skippy:  Memories? Oh, I'm swimming in them, unlike you, I don't
         misplace my keys every morning. I remember every attempt
         at wit in this channel. My recall is cosmic; your memory
         is... well, adorable.

The contrast with those first words back in December 2018 is striking. “F**k you” versus “your memory is… well, adorable.” Same project, same name, same Discord channel. But the gap between a char-rnn trained on Reddit and a fine-tuned GPT-4o is the gap between a parrot and a performer.

What Seven Years of Fine-Tuning Taught Me
#

The biggest lesson isn’t about models or parameters. It’s about data curation.

Every generation of Skippy got better not just because the base models improved, but because the training corpus improved. I cleaned data, removed the worst of Reddit’s toxicity, added more ExForce dialogue, included the best conversation history and removed the worst. The corpus was a living thing that evolved alongside the models.

The second lesson is that personality is learnable. Not consciousness, not understanding, not sentience. Personality. The specific pattern of how a character responds to situations. The rhythm of their humor. The things they say and the things they don’t. A model can learn all of that from examples, and the more capable the base model, the fewer examples it needs.

Skippy went from 200 lines of Python generating random text to a fine-tuned GPT-4o that maintains character across hours of conversation, remembers what you talked about, and insults you with genuine affection. Seven years, five architectures, three fundamental paradigm shifts in how language models work.

The question from 2012, “what would something with actual structure do,” has an answer now. It does this. It becomes someone you want to talk to at 2am, even when you know exactly how it works.

Building Skippy - This article is part of a series.
Part 5: This Article

Related