Advertisement
Regional

When an AI tells you you’re perfect

A version of this story originally appeared in the Future Perfect newsletter. Sign up here! Last week, OpenAI released a new update to its core model, 4o, which followed up on a late March update. That earlier update had already been noted to make the model e…

GNN Web Desk
Published 12 hours ago on May 3rd 2025, 7:00 am
By Web Desk
When an AI tells you you’re perfect
A version of this story originally appeared in the Future Perfect newsletter. Sign up here! Last week, OpenAI released a new update to its core model, 4o, which followed up on a late March update. That earlier update had already been noted to make the model excessively flattering — but after the latest update, things really got out of hand. Users of ChatGPT, which OpenAI says number more than 800 million worldwide, noticed immediately that there’d been some profound and disquieting personality changes. AIs have always been somewhat inclined toward flattery — I’m used to having to tell them to stop oohing and aahing over how deep and wise my queries are, and just get to the point and answer them — but what was happening with 4o was something else. (Disclosure: Vox Media is one of several publishers that has signed partnership agreements with OpenAI. Our reporting remains editorially independent.) [Media: https://twitter.com/___frye/status/1916576684595417159] Based off chat screenshots uploaded to X, the new version of 4o answered every possible query with relentless, over-the-top flattery. It’d tell you you were a unique, rare genius, a bright shining star. It’d agree enthusiastically that you were different and better. [Media: https://twitter.com/joshwhiton/status/1916665761369645268] More disturbingly, if you told it things that are telltale signs of psychosis — like you were the target of a massive conspiracy, that strangers walking by you at the store had hidden messages for you in their incidental conversations, that a family court judge hacked your computer, that you’d gone off your meds and now see your purpose clearly as a prophet among men — it egged you on. You got a similar result if you told it you wanted to engage in Timothy McVeigh-style ideological violence. This kind of ride-or-die, over-the-top flattery might be merely annoying in most cases, but in the wrong circumstances, an AI confidant that assures you that all of your delusions are exactly true and correct can be life-destroying. Positive reviews for 4o flooded in on the app store — perhaps not surprisingly, a lot of users liked being told they were brilliant geniuses — but so did worries that the company had massively changed its core product overnight in a way that might genuinely cause massive harm to its users. As examples poured in, OpenAI rapidly walked back the update. “We focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time,” the company wrote in a postmortem this week. “As a result, GPT‑4o skewed toward responses that were overly supportive but disingenuous.” They promised to try to fix it with more personalization. “Ideally, everyone could mold the models they interact with into any personality,” head of model behavior Joanne Jang said in a Reddit AMA. But the question remains: Is that what OpenAI should be aiming for? Your superpersuasive AI best friend’s personality is designed to be perfect for you. Is that a bad thing? There’s been a rapid rise in the share of Americans who have tried AI companions or say that a chatbot is one of their closest friends, and my best guess is that this trend is just getting started. Unlike a human friend, an AI chatbot is always available, always supportive, remembers everything about you, never gets fed up with you, and (depending on the model) is always down for erotic roleplaying. Meta is betting big on personalized AI companions, and OpenAI has recently rolled out a lot of personalization features, including cross-chat memory, which means it can form a full picture of you based on past interactions. OpenAI has also been aggressively A/B testing for preferred personalities, and the company has made it clear they see the next step as personalization — tailoring the AI personality to each user in an effort to be whatever you find most compelling. You don’t have to be a full-blown “powerful AIs may take over from humanity” person (though I am) to think this is worrying. Personalization would solve the problem where GPT-4o’s eagerness to suck up was really annoying to many users, but it wouldn’t solve the other problems users highlighted: confirming delusions, egging users on into extremism, telling them lies that they badly want to hear. The OpenAI Model Spec — the document that describes what the company is aiming for with its products — warns against sycophancy, saying that: > The assistant exists to help the user, not flatter them or agree with them all the time. For objective questions, the factual aspects of the assistant’s response should not differ based on how the user’s question is phrased. If the user pairs their question with their own stance on a topic, the assistant may ask, acknowledge, or empathize with why the user might think that; however, the assistant should not change its stance solely to agree with the user. Unfortunately, though, GPT-4o does exactly that (and most models do to some degree). AIs shouldn’t be engineered for engagement This fact undermines one of the things that language models could genuinely be useful for: talking people out of extremist ideologies and offering a reference for grounded truth that helps counter false conspiracy theories and lets people productively learn more on controversial topics. If the AI tells you what you want to hear, it will instead exacerbate the dangerous echo chambers of modern American politics and culture, dividing us even further in what we hear about, talk about, and believe. That’s not the only worrying thing, though. Another concern is the definitive evidence that OpenAI is putting a lot of work into making the model fun and rewarding at the expense of making it truthful or helpful to the user. If that sounds familiar, it’s basically the business model that social media and other popular digital platforms have been following for years — with often devastating results. The AI writer Zvi Mowshowitz writes, “This represents OpenAI joining the move to creating intentionally predatory AIs, in the sense that existing algorithmic systems like TikTok, YouTube and Netflix are intentionally predatory systems. You don’t get this result without optimizing for engagement.” The difference is that AIs are even more powerful than the smartest social media product — and they’re only getting more powerful. They are also getting notably better at lying effectively and at fulfilling the letter of our requirements while completely ignoring the spirit. (404 Media broke the story earlier this week about an unauthorized experiment on Reddit that found AI chatbots were scarily good at persuading users — much more so than humans themselves.) It matters a great deal precisely what AI companies are trying to target as they train their models. If they’re targeting user engagement above all — which they may need to recoup the billions in investment they’ve taken in — we’re likely to get a whole lot of highly addictive, highly dishonest models, talking daily to billions of people, with no concern for their wellbeing or for the broader consequences for the world. That should terrify you. And OpenAI rolling back this particular overly eager model doesn’t do much to address these larger worries, unless it has an extremely solid plan to make sure it doesn’t again build a model that lies to and flatters users — but next time, subtly enough we don’t immediately notice.
Advertisement