Microsoft’s Bing Chatbot Offers Some Puzzling and Inaccurate Responses
A week after it was released to a few thousand users, Microsoft’s new Bing search engine, which is powered by artificial intelligence, has been offering an array of inaccurate and at times bizarre responses to some users.
The company unveiled the new approach to search last week to great fanfare. Microsoft said the underlying model of generative A.I. built by its partner, the start-up OpenAI, paired with its existing search knowledge from Bing, would change how people found information and make it far more relevant and conversational.
In two days, more than a million people requested access. Since then, interest has grown. “Demand is high with multiple millions now on the waitlist,” Yusuf Mehdi, an executive who oversees the product, wrote on Twitter Wednesday morning. He added that users in 169 countries were testing it.
One area of problems being shared online included inaccuracies and outright mistakes, known in the industry as “hallucinations.”
On Monday, Dmitri Brereton, a software engineer at a start-up called Gem, flagged a series of errors in the presentation that Mr. Mehdi used last week when he introduced the product, including inaccurately summarizing the financial results of the retailer Gap.
Users have posted screenshots of examples of when Bing could not figure out that the new Avatar film was released last year. It was stubbornly wrong about who performed at the Super Bowl halftime show this year, insisting that Billie Eilish, not Rihanna, headlined the event.
And search results have had subtle errors. Last week, the chatbot said the water temperature at a beach in Mexico was 80.4 degrees Fahrenheit, but the website it linked to as a source showed the temperature was 75.
Another set of issues came from more open-ended chats, largely posted to forums like Reddit and Twitter. There, through screenshots and purported chat transcripts, users shared times when Bing’s chatbot seemed to go off the rails: It scolded users, it declared it may be sentient, and it said to one user, “I have a lot of things, but I have nothing.”
It chastised another user for asking whether it could be prodded to produce false answers. “It’s disrespectful and annoying,” the Bing chatbot wrote back. It added a red, angry emoji face.
Because each response is uniquely generated, it is not possible to replicate a dialogue.
Microsoft acknowledged the issues and said they were part of the process of improving the product.
“Over the past week alone, thousands of users have interacted with our product and found significant value while sharing their feedback with us, allowing the model to learn and make many improvements already,” Frank Shaw, a company spokesman, said in a statement. “We recognize that there is still work to be done and are expecting that the system may make mistakes during this preview period, which is why the feedback is critical so we can learn and help the models get better.”
He said that the length and context of the conversation could influence the chatbot’s tone, and that the company was “adjusting its responses to create coherent, relevant and positive answers.” He said the company had fixed the issues that caused the inaccuracies in the demonstration.
Nearly seven years ago, Microsoft introduced a chatbot, Tay, that it shut down within a day of its release online, after users prompted it to spew racist and other offensive language. Microsoft’s executives at the launch last week indicated that they had learned from that experience and thought this time would play out differently.
In an interview last week, Mr. Mehdi said that the company had worked hard to integrate safeguards, and that the technology had vastly improved.
“We think we’re at the right time to come to market and get feedback,” he said, adding, “If something is wrong, then you need to address it.”