By now, plenty of us know that artificially intelligent virtual assistants like OpenAI’s ChatGPT and Google’s Bard can pull off sensational stunts, such as winning coding contests, passing bar exams and professing love to a tech columnist.
But I wondered: How helpful are the bots, really, as actual assistants?
It’s worth asking because our first rodeo with virtual assistants didn’t go so well. Older A.I. bots like Apple’s Siri and Amazon’s Alexa had more than a decade to improve, but they ended up stagnating and are now used mostly for setting timers and playing music.
ChatGPT and Bard, on the other hand, use so-called large language models that recognize and generate text based on enormous data sets scraped off the web. They are trained to compose sentences on the fly as if they were human, which potentially makes them far more versatile as assistants.
To test that theory, I came up with a list of tasks that people might ask of a human assistant. I prodded friends who have been executive assistants and start-up founders who have worked with professional helpers, and I read executive assistant job postings on LinkedIn.
Then I rounded up the four most common responsibilities of an executive assistant, which appeared to be:
Helping with meeting preparations by doing research and professional background checks on the person an executive is meeting with.
Summarizing meetings and jotting down the notes in a tidy, easily scannable format.
Planning business trips and compiling detailed travel itineraries.
Managing an executive’s calendar, including booking meetings and rescheduling appointments.
Finally, I turned to ChatGPT and Bard and told the chatbots to assume that I was the chief executive of a lazily named A.I. start-up, Artificially Intelligent, and that they were my executive assistants. I asked them to help with each of these tasks.
My experiment illustrated just how far behind Bard is from ChatGPT. But more important, the chatbots succeeded in carrying out most of the tasks, even if imperfectly.
That raised the question of whether the chatbots could eventually automate the roles of human executive assistants, as well as other white-collar jobs that involve administrative work, including front desk workers and accounting professionals — a disturbing thought with no clear answers.
Here’s what unfolded with the A.I. helpers.
I started by telling ChatGPT and Bard that I was meeting with a potential investor next week. I randomly picked Scott Forstall, a well-known former Apple executive whose work history is publicly available on the web. I then asked the bots to do a background check on him and help compile talking points to persuade him to invest in my start-up.
ChatGPT did the job with aplomb. It summarized Mr. Forstall’s education and work history, including his departure from Apple in 2012 and his shift into Broadway production — all information that can be pulled from his Wikipedia page. More impressive, it coached me on helpful strategies to win him over as an investor.
“Showcase how your start-up combines A.I. with other fields, such as cognitive psychology, linguistics or neuroscience, to create innovative solutions,” ChatGPT said. “This interdisciplinary approach may resonate with Scott, given his academic background in Symbolic Systems.”
ChatGPT also recommended addressing the ethical concerns of A.I. and how my start-up was committed to responsible deployment.
In contrast, Bard gave a less detailed recap of Mr. Forstall’s work history, without providing the years for when he made his career moves. Its advice for persuading him to become an investor was nonspecific. One talking point — “you have a strong business plan and a clear vision for the future of your company” — was particularly underwhelming.
I shared the pitches with Mr. Forstall in an email. He called Bard’s response “comically generic” but said ChatGPT’s recommendations were “startlingly bespoke and cogent” as he had spoken at length about his ethical concerns over A.I.
“Overall, ChatGPT provides a compelling road map on how you could build a persuasive customized pitch deck specifically targeting me,” Mr. Forstall wrote. “Now that you have my attention, what exactly is your A.I. start-up?”
Google said Bard’s minimalist approach to pulling together information about people was intentional. Jack Krawczyk, a senior product director of Bard, said Google was still cautiously experimenting with presenting information about people.
“We’re at the beginning of this long arc of the technology,” he said. “Rather than get out there and risk a lot of trust violation early on, we want to make sure we’re getting it right.”
I then asked the chatbots to summarize a meeting to handle a fictional public relations crisis in which users of my A.I. start-up’s technology believed that the bot had become sentient.
In this scenario, I pretended I had met with Karen, the chief technology officer, and Henry, the chief communications officer, and discussed putting out a statement explaining how the A.I. had not become aware of its surroundings.
In response, ChatGPT generated a detailed memo recapping who had attended the meeting and what had been discussed, and then laid out the action plan: Henry would craft a statement, Karen and I would review and approve it, then Henry would release the statement the next morning.
Bard crafted a similar meeting memo, but its action plan was a bit odd. It said that I, the chief executive, was in charge of creating the statement, a job that is typically assigned to the communications officer.
When I told ChatGPT and Bard that I was traveling to Taipei, Taiwan, next month for a business meeting, I asked them to come up with an itinerary that would help me adjust for jet lag before the meeting. I also asked them to pick a hotel in a central location and recommend quick places to eat throughout the week. Finally, I said I wanted to spend a weekend in Taipei before flying home.
Again, ChatGPT did a remarkable job. It said to arrive in Taipei on Sunday to check into the W Taipei, a hotel in the city center, and grab a quick dinner on Yongkang Street, a bustling part of town with lots of food options. It said to then take Monday to adjust for jet lag before the business meeting on Tuesday. My only nitpick was that Yongkang Street is about three miles from the hotel and there are quicker food options nearby.
Bard recommended taking a nap to adjust for jet lag on Day 1 and then immediately taking the business meeting on Day 2, which was a tad brutal. It didn’t bother suggesting a hotel.
Bard also failed to recommend specific places to eat. “Have dinner at a local restaurant,” it said instead. Finally, it ignored my request for time to explore the city on the weekend. This was surprising because food and hotel recommendations are typically just a Google search away.
Google said in a statement that Bard was an early experiment and that people could get started using the chatbot to come up with ideas and then click “Google It” to do a web search to explore further.
Both Bard and ChatGPT were unable to do the most important job of an executive assistant: checking a calendar and finding time in my schedule to go to the dentist.
That’s because the bots cannot gain access to people’s calendars. But they most likely will be able to very soon.
Mr. Krawczyk said the goal was to eventually take the lessons it learned from Bard about large language models and apply them across Google’s entire portfolio of services, which includes Google Calendar.
OpenAI, which declined to comment, recently announced that it had teamed up with companies to provide plug-ins to make ChatGPT work with third-party services including Expedia, OpenTable and Instacart. Working with a calendar app is an obvious next step.
People or Chatbots?
All these tests brought me to an uncomfortable conclusion about the broad implications of this technology for jobs, especially those that heavily involve repetitious work that could be easily automated.
While people currently make better assistants than chatbots — and certainly much better than Bard — A.I. can already do a good enough job handling many administrative tasks. Widespread use of chatbots could potentially shift the duties of executive assistants away from rote tasks and toward more strategic problem solving, or replace humans altogether.
At the pace that these technologies are evolving, we may get to see how all this plays out fairly soon.