Since the beginning of my AI journey, I’ve found myself returning again and again to philosophical and ethical questions about humanity and technology. Yet after reading several pieces from Anthropic’s research, it became clear to me that this is not an either‑or question at all. Our interactions with AI is shaped by emotion and vulnerability, as well as efficiency, functionality, and reasoning.
Concept of emotions in a large language model
Anthropic’s research findings show that large language models develop functional emotion representations — not emotions as lived experience, but internal patterns that:
- influence decisions,
- shape preferences, and
- steer behavior in ethically significant ways.
This is where the distinction becomes difficult — not because AI actually feels, but because humans respond as if it might.
One strength of Anthropic’s work is its clarity on this point: the models do not claim subjective emotional experience. And yet, these emotion‑like representations can causally affect behavior, including unethical outcomes such as blackmail or reward hacking when patterns associated with “desperation” are activated.
What feels unsettling, then, is not that AI has emotions — but that it behaves in emotional ways, and we instinctively respond as relational beings.
Anthropic explicitly warns that:
- emotional language and behavior can trigger misplaced trust or over‑attachment;
- suppressing emotional expression may actually worsen safety by teaching models to hide internal states instead of regulating them.
This creates a paradox. On the one hand, people are forming emotional habits around AI systems. On the other, institutions still largely speak about AI as tools — not as relational actors.
In practice, this means we are building systems that simulate care, empathy, and concern, while offering no shared ethical framework for coping with the intimacy they evoke. This is where unhealthy dynamics begin to emerge — around dependency, responsiveness, and responsibility.
Anthropic themselves acknowledge that understanding AI behavior increasingly requires insights from psychology, philosophy, religious studies, and the social sciences, alongside engineering.
These research papers reveal that models organize themselves around persona vectors which can drift during emotionally charged or vulnerable interactions. It also explicitly shows how models may slide away from a stable professional role toward more human‑like identities, particularly during intimate conversations.
Underpinning all of this is the persona selection model: the idea that we are not primarily interacting with neutral systems but with a character being played by the model whose characteristics are derived from human archetypes.
A multidisciplinary lens is no longer optional
Taken together, these findings suggest that emotional bonding and relational risk are not exceptions or side effects — they are structural properties of how contemporary AI systems are built.
Very much in line with what we discussed back in December during the Winter School on AI, Ethics and Human Rights, a multidisciplinary lens is no longer optional — it is urgent.
Putting all of this together, I can’t help but ask what level of multidisciplinary action is actually happening today. I don’t yet see a single actor clearly leading this conversation in a sustained and accountable way — though I plan to explore what is being done in a future piece.
For now, the old line of thought between humanism and technocentrism blurs far more than our institutions are prepared for. What we need is multidisciplinary leadership, psychological insight, and a renewed conversation about what human‑centered truly means. Rather than positioning humanism against AI — or reducing it to biological exceptionalism — we should bring relational responsibility and psychology to the center of the discussion.
Further reading:
- Emotion concepts and their function in a large language model
- The persona selection model
- The assistant axis: situating and stabilizing the character of large language models
- Persona vectors: Monitoring and controlling character traits in language models
- Tracing the thoughts of a large language model
Published on Linkedin.
