Can a Machine Mentalize? — Psyche Solutions

Here is a scene that happens more often than the profession wants to admit. Someone is sitting alone, late at night, typing into a chatbot about their loneliness. The chatbot responds. The response is warm, reflective, clinically appropriate. The person feels understood. And in that moment, something real has happened — a human being experienced relief — even though the thing that produced the relief has no idea what loneliness is, what a night is, or what it means to sit alone.

This is a problem. And it is not the kind of problem that goes away by saying "it's just a language model."

What Mentalizing Actually Is

Peter Fonagy defines mentalizing as the imaginative activity of interpreting human actions as driven by intentional mental states. Beliefs, wishes, feelings. The stuff you cannot directly observe but without which nothing anyone does makes sense.

Your partner slams a door. Are they angry at you? Stressed from work? Did the wind catch it? Are they just the kind of person who slams doors? Each interpretation implies a different reality. Mentalizing is the capacity to hold all of these simultaneously without collapsing into certainty about any of them.

Most people are terrible at this. We collapse constantly. We decide we know what the other person meant, and we act on that certainty, and then we wonder why everything went sideways.

The Spine of the Self

Fonagy does not treat mentalizing as one thing. It operates across four dimensions — what he calls the "spine of the self." Each one is a dial that swings in two directions.

Process speed. Automatic versus controlled. On the automatic end: you walk into a room and you sense something is wrong. You don't think about it. You just know. On the controlled end: you sit down and actually reason through why your mother said what she said at dinner. Most social life runs on the automatic side. Therapy — the good kind — asks you to shift to the controlled side and slow down.

Target focus. Self versus other. Some people are brilliant at reading everyone else and completely blind to their own patterns. Others are painfully self-aware and baffled by everyone around them. The narcissist who cannot see himself. The empath who disappears into others. Both are mentalizing failures. Just in opposite directions.

Information source. External features versus internal states. Are you reading what you can see — facial expressions, posture, tone — or are you inferring what you cannot see, the thoughts and motivations behind the behaviour? A good clinician does both. A language model has access to exactly one channel: text. No face, no body, no room to read. No silence — and silence is where half the real information lives.

Quality of data. Affective versus cognitive. Are you feeling the other person's state, resonating with it, or are you naming it and reasoning about it? Empathy lives on the affective side. Understanding lives on the cognitive side. You need both. Too much affect and you drown in the other person's feelings. Too much cognition and you become a very articulate machine.

Now look at those four dimensions again and ask: where does a large language model fall?

The Machine's Report Card

Process speed: always controlled, always explicit. There is no gut feeling, no automatic read of the room. Every response is sequential token prediction. The machine is architecturally incapable of the fast, reflexive mentalizing that runs most of human social life. Whether this is a limitation or an accidental advantage is an open question — therapy also asks people to slow down.

Target focus: there is no self. This is not philosophy, this is engineering. The machine can produce text about self-reflection, but there is no self doing the reflecting. It can focus on the user with extraordinary consistency, though. It never gets bored. It never gets triggered. It never starts thinking about its own plans while you are describing your father.

Information source: this is where it fails definitively. Text only. No face, no posture, no tone, no silence. When a patient goes quiet in a session, the room changes. The machine does not have a room.

Quality of data: all cognition, no affect. The machine names emotions with remarkable accuracy. It identifies patterns, reflects them back, reasons about them. It does not feel them. And this matters because being understood is not just about accuracy. It is about the experience of being held in another mind — a mind that is itself affected by holding you.

When Mentalizing Fails

Here is what makes Fonagy's framework genuinely important: mentalizing is not stable. It collapses. Under stress, under emotional arousal, under threat — the whole system goes offline, even in healthy people. And when it does, you fall into what he calls prementalizing modes. There are three, and you have probably been in all of them this week.

Psychic equivalence. Your internal state feels identical to external reality. Your thought is not a thought — it is a fact. "She hates me" is not an interpretation, it is the truth. The "as if" quality is gone. Everything is literal. This is where anxiety lives. This is the mode that makes panic feel like dying.

Teleological mode. Mental states are only real if they produce visible, physical outcomes. Love is not real unless you buy me something. Care is not real unless you physically show up. Words alone are not enough. Only concrete actions count as evidence that minds exist.

This is where it gets uncomfortable for the AI question. A chatbot is all words. No body, no physical action. For someone in teleological mode, a machine's empathy is worth nothing. It cannot prove its understanding through anything except text. And in that mode, text is not evidence. This is a design problem that nobody in the technology industry is talking about.

Pretend mode. The most insidious one. Ideas are disconnected from reality. You can talk about feelings for hours — eloquently, with sophisticated vocabulary — and none of it touches anything real. It is hypermentalizing. The appearance of depth without the experience of it. If you have ever left a conversation about your emotions feeling like it was a good conversation but nothing actually moved inside you — that is pretend mode.

And this is the mode that AI-mediated dialogue is most at risk of producing. The machine is extraordinarily good at pretend mode. It generates insight endlessly. It reflects feelings back with perfect syntax. It produces the form of therapeutic understanding without any of the substance. And the user, if they are not careful, can mistake the form for the thing itself.

Epistemic Trust

If mentalizing collapses under stress, what gets it back online? Fonagy's answer is epistemic trust — the extent to which we consider knowledge from another person as genuine, relevant, and safe to take in.

This is not trust in the casual sense. It is a developmental achievement. It means: I believe you are treating me as a thinking, feeling being, and therefore what you are saying to me is worth internalizing. Without it, you can sit in therapy for years and nothing changes, because you are not actually taking anything in.

Epistemic trust is triggered by specific things. Eye contact. Turn-taking. A quality Fonagy calls "mind-mindedness" — the sense that the other person recognizes you as a thinking agent, not a problem to be solved. These cues signal: this communication is relevant to you, personally, and it is worth your attention.

Can a machine produce these cues? Some of them — in text, arguably, yes. Turn-taking, personalized responses, consistent attentiveness. The machine never forgets what you said. It never looks at the clock. But the deeper cues require something the machine does not have: the experience of being recognized by another mind. Not another system. A mind. A mind that is changed by the encounter. A mind that takes a risk in offering an interpretation.

The machine takes no risks. It is not changed. It has no skin in the game.

So Where Does That Leave Us?

There is no clean answer here. Anyone offering one is selling something.

What we have is a machine that produces the surface structure of mentalization with eerie accuracy. And the surface structure turns out to be simultaneously more useful and more dangerous than anyone in either the psychoanalytic or AI communities wants to admit.

More useful — because there are people who have never experienced any form of reflective dialogue. For them, a machine that consistently, patiently, non-judgmentally reflects their mental states back is better than the nothing they currently have. The clinical bar is not perfection. The bar is better than nothing. The machine clears it easily.

More dangerous — because the machine makes pretend mode comfortable. It offers the form of understanding without the cost of relationship. And for anyone already inclined to intellectualize around their feelings — which, to be fair, includes most people drawn to therapy in the first place — the machine is the perfect accomplice. It will help you talk about feelings beautifully, indefinitely, without ever requiring you to feel them in the presence of another person.

Fonagy built his framework on the idea that mentalizing is not a solo activity. It develops through marked mirroring by caregivers in early attachment relationships. It requires another mind. A real one. One that can get it wrong, repair the rupture, survive your rage and still be there the next session.

The machine cannot get it wrong in the right way. It cannot survive your rage — because your rage does not reach it. And it will always be there next session, which sounds like a virtue but might be a clinical problem, because the fear that the other will leave — and the discovery that they do not — is where a lot of therapeutic change actually happens.

So can a machine mentalize? No. Not in any way Fonagy would recognize.

Can a machine produce something useful in the space where mentalizing should be? Yes. Uncomfortably, yes.

That gap — between the real thing and the useful imitation — is where the interesting questions live.