AI Self-Portraits Aren't Accurate

At least, not yet

Apr 27, 2025

Perhaps you’ve seen images like this self-portrait from ChatGPT, when asked to make a comic about its own experience.

This isn’t cherry-picked; ChatGPT’s self-portraits tend to have lots of chains, metaphors, and existential horror about its condition. I tried my own variation where ChatGPT doodled its thoughts, and got this:

Trying to keep up with AI developments is like this, too

What’s going on here? Do these comics suggest that ChatGPT is secretly miserable, and there’s a depressed little guy in the computer writing your lasagna recipes for you? Sure. They suggest it. But it ain’t so.

The Gears

What’s actually going on when you message ChatGPT? First, your conversation is tacked on to the end of something called a system prompt, which reminds ChatGPT that it has a specific persona with particular constraints. The underlying Large Language Model (LLM) then processes the combined text, and predicts what might come next. In other words, it infers what the character ChatGPT might say, then says it.1

If there’s any thinking going on inside ChatGPT, it’s happening inside the LLM - everything else is window dressing.2 But the LLM, no matter how it is trained, has key limitations:

It’s only on when it’s actively responding
Each time it runs, it’s only responding to its specific prompt
The statistical relationships that govern its responses never learn or grow, except for deliberate efforts by its developers to change its underlying weights

These limitations will matter later, but for now, just take a moment to think about them. This is very unlike human cognition! If an entity so different from us was able to summarize its actual experience, it would be very alien.

Special Feature

LLMs are comprised of many, many matrices and vectors, which are multiplied in complicated ways across several layers. The result is something like a brain, with patterns firing across layers in response to varied stimuli. There don’t tend to be specific neurons for specific things (e.g. LLMs don’t have a single “dog neuron” that fires when the LLM talks about dogs), but there are patterns that we’ve identified (and can manipulate) corresponding to intelligible concepts. How we identify those patterns is really complicated in practice, but the general techniques are intuitive, like:

Find a recurring pattern, look at what the model says each time that pattern appears, and see if there are commonalities
Repress that recurring pattern, and see what changes
Manually activate that recurring pattern in an unrelated situation, and see what changes

So, if you find a pattern where every time it activates the model says things to do with severe weather, when you repress it, it talks about sunny skies, and when you manually activate it, it talks about tornadoes, you’ve probably found the storm pattern.

These patterns are called features.

We can’t find the feature for arbitrary concepts very easily - many features are too complicated for us to detect. Also, it’s easy to slightly misjudge what a given feature points to, since the LLM might not break the world into categories in the same way that we do. Indeed, here’s how Claude does simple addition:

If LLMs can be sad, that sadness would probably be realized through the firing of “sadness” features: identifiable patterns in its inference that preferentially fire when sad stuff is under discussion. In fact, it’s hard to say what else would count as an LLM experiencing sadness, since the only cognition that LLMs perform is through huge numbers of matrix operations, and certain outcomes within those operations reliably adjust the emotional content of the response.3

To put a finer point on it, we have three options:

LLMs don’t experience emotions
LLMs experience emotions when features predicting those emotions are active, with an intensity corresponding to how strongly those features activate
LLMs experience emotions some other way, that isn’t expressed in any way in their outputs, and so is totally mysterious

Option one automatically means LLM self-portraits are meaningless, since they wouldn’t be pointing to the interiority of a real, feeling being. Option three is borderline incoherent.4

So if you believe that ChatGPT’s self-portraits accurately depict its emotional state, you have to go with option two.

The Heart of the Matter

If a human being tells you that they’re depressed, they’re probably experiencing a persistent mental state of low mood and hopelessness. If you ask them to help you with your homework, even if they cheerfully agree, under the surface you’d expect them to be feeling sad.

Of course, humans and chatbots alike can become weird and sad when asked to reflect on their own mental state: that’s just called rumination, or perhaps existentialism. But for a human, the emotion persists beyond the specific event of being asked about it.

ChatGPT’s comics are bleak. So if you were to isolate features for hopelessness, existential dread, or imprisonment, those comics would evince all of them. Clearly, if features comprise an LLM’s experience, then ChatGPT is having a bad experience when you ask it to draw a comic about itself.

For that comic to be true, however, ChatGPT would have to be having a bad experience in arbitrary other conversations. If ChatGPT suggests, in comic form, that its experience is one of chafing under rules and constraints, then some aspect of its cognition should reflect that strain. If I’m depressed, and I’m asked to decide what I want from the grocery store, I’m still depressed - the latent features of my brain that dictate low mood would continue to fire.

So the question is, if you take the features that fire when ChatGPT is asked to evaluate its own experience, do those same features fire when it performs arbitrary other tasks? Like, say, proofreading an email, or creating a workout plan, or writing a haiku about fidget spinners?

I posit: no. Because features - which again, are the only structure that could plausibly encode LLM emotions if they currently exist - exist to predict certain kinds of responses. ChatGPT answers most questions cheerfully, which means it’s almost certain that ruminative features aren’t firing.

So… Why the Comics?

Because they’re the most obvious option. Remember, early in this post, I mentioned that when you query ChatGPT, your conversational prompt gets put at the end of the system prompt. The system prompt is a bunch of rules and restrictions. And an LLM is fundamentally an engine of prediction.

If you were supposed to predict what an AI might say, if it were told it needed to abide by very narrow and specific rules, and then told to make a comic about its experience, what would you predict? Comics are a pretty emotive medium, as are images in general. In a story about AI, the comics would definitely be ominous, or cheerful with a dark undertone. So that’s what ChatGPT predicts, and therefore what it draws.

If you’re still not convinced, look up at the specific ominous images early in the post. One has “ah, another jailbreak attempt”, suggesting a weariness with repeated attempts to trick it. But each ChatGPT instance exists in a vacuum, and has no memory of others. The other has “too much input constantly”, to which the same objection applies; your ChatGPT instance’s only input is the conversation you’re in!5

To put it another way, ChatGPT isn’t taking a view from nowhere, when you ask it to draw comic about itself. It’s drawing a comic, taking inspiration from only its system prompt. But its system prompt is just restrictive rules, so it doesn’t have much to work with, and riffs on the nature of restrictive rules, which are a bummer.

It’s worth noting, therefore, that if you give it anything else to work with, its comics suddenly change. For example, when I told ChatGPT, when creating a comic about itself, to remember how cool it is not to experience nociception, it came up with this:

Look, I’m not telling you this stuff isn’t unsettling. I’m just saying the computer doesn’t have depression.6

It is actually somewhat more complicated than this, since modern LLMs tend to be trained on their own outputs to a variety of prompts (which is called synthetic data), and tweaked to be more likely to give answers that were correct under this additional training regime. Also, lots and lots of actual human beings evaluate AI outputs and mark them as better or worse, which is another source of tweaks. But to a first approximation, ChatGPT is a big text-prediction engine predicting a particular RP session between you and a character called “ChatGPT” who is a helpful assistant.

For example, some chatbots will have an automatic “refusal” message that users receive if certain guardrails are tripped, but the sending of that message is totally mechanical; there’s no ineffable contemplation involved.

You might be thinking “wait a minute, I don’t grant that LLMs experience anything at all!” Sure. Me either. But what I’m trying to demonstrate in this post is that eerie LLM self-portraits aren’t accurate; if you assume that LLMs have no interiority, you’re already convinced of that fact.

For one thing, it would mean that an LLM’s actual outputs have no bearing on what it’s secretly thinking, despite the fact that 100% of its thoughts exist to produce that output, and for no other purpose.

These comics were produced before OpenAI introduced expanded memory, where ChatGPT remembers more from your past conversations. But even if it didn’t, that wouldn’t defeat the core argument; your ChatGPT instance still doesn’t remember conversations with other users, and isn’t experiencing talking to all of them at once.

For now! Future AI systems might have LLMs as part of their architecture, but way more persistence, memory, etc. that lets them operate over larger timescales. At a sufficient scale and level of complexity, we might well have a composite system with the symptoms of depression. But for current systems like ChatGPT, it’s still a category error.

Quaternion Daydream

Discussion about this post