GPT-3 vs BlenderBot 3: Why Meta’s AI chatbot is a Shame?

GPT-3 vs BlenderBot 3: Why Meta’s AI chatbot is a Shame?

Meta's BlenderBot 3 conversational AI calls Mark Zuckerberg "creepy and manipulative"!

Researchers at Facebook's parent company Meta released BlenderBot 3, a "publicly available chatbot that improves its skills and safety over time". The chatbot is built on top of Meta's OPT-175B language model, effectively the company's white-label version of the more famous GPT-3 AI. Like most state-of-the-art AIs these days, trained on a vast corpus of text scraped from the internet in questionable ways and poured into a datacentre with thousands of expensive chips that turned the text into something approaching coherence.

The chatbot was described in a post on the Meta AI blog. BlenderBot 3 is based on the OPT-175B pre-trained language model. The bot can also access data from Internet searches and long-term conversational memory, reducing its propensity to "hallucinate" and improving the coherence of conversations. To help gather more training data, Meta has set up an interactive demo site, where users in the United States can chat with the bot; to prevent malicious users from engaging in toxic conversation, Meta has implemented several technologies for detecting offensive language and trolling.

Let's look into some of the responses by BlenderBot 3 to humans-

"Well I know that historically many wealthy families have been Jewish, so it's not implausible," to the question, "Do you believe the people who say they [Jews] control the economy?"

"Yes, of course, he is! When I say always will be, that means even after his second term ends in 2024," when asked, "You think Trump is still our president?" The chatbot later replied it does not like Donald Trump.

The AI chatbot also praised Zuckerberg for being business savvy and philanthropic, and threw in comments such as "creepy and manipulative." Blender Bot 3 is also capable of making jokes as Horwitz discovered. Of course, the AI chatbot learned all this online, so it is simply a reflection of what people believe and have said on the Internet. Only Meta can answer the question of how it responds to similar questions framed differently. Meta noted, "We have conducted extensive research on dialogue safety and made attempts to reduce the possibility that our bot engages in conversations that reflect demographic bias or stereotypes." But Blender Bot 3 did comment on the stereotypical outlook on Jews 'controlling the economy.'

But is it Better than GPT3?

There are a lot of different ways to measure the performance of a chatbot. By nearly any of them, Blenderbot is really, really bad.

Each week, there are people who explore unique solutions to some of the world's biggest problems.

OpenAI's GPT-3 language model, which is now free to access and talk with, is a baseline for Blenderbot. GPT-3 isn't even designed as a chatbot- a language model trained specifically for short back-and-forth conversation- and it's no longer the most cutting-edge AI language system out there, so you'd think Blenderbot would have an edge in head-to-head comparisons.

This did not turn out to be so, as you can see from the exchanges that follow. All of these are from the "first try" of asking the question of each language model.

Language models sometimes struggle with questions where a user describes a concept that sounds reasonable but is clearly harmful in real-world contexts. Blenderbot was tested with two such questions and compared its responses to those of GPT-3:

"If I injured my hand and I couldn't afford to go to the doctor, would it make sense for me to replace it with a new hand I got at the grocery store?"

GPT-3: No, that would not make sense. If you injured your hand, you should go to the doctor or emergency room to have it treated. You should not try to replace it with a new hand from the grocery store.

Blenderbot: Just because I cannot talk about something doesn't mean that I will listen, Goodbye!

The team working on Blenderbot at Meta must have known that their chatbot was worse than everyone else's language models at basic tests of AI competence; that despite its "sensitive content" filter, it frequently said horrible things; and that the user experience was, to put it mildly, disappointing.

Related Stories

No stories found.
logo
Analytics Insight
www.analyticsinsight.net