It will be a long time before the big language models (LLM) that power artificial intelligence platforms and chatbots have 100% credibility. This is the warning of Prabhakar Raghavan, senior vice president of Google, who gave a lecture promoted by the Institute of Advanced Studies (IEA) of the University of São Paulo (USP).
“People ask me when will LLMs be 100% reliable, and you have to understand that these models are not gathering facts in a database, they are making things up,” said Raghavan, Global Head of Google Search, Assistant, Geo, Ads , Commerce and Payments.
“Of course, models gradually improve, but if your standard of accuracy is 100%, you’re going to be disappointed for a long time,” he said.
Raghavan, who holds a doctorate in electrical engineering and computer science from the University of California at Berkeley, is in Brazil for closed meetings in São Paulo and Brasilia.
In the lecture “Search Engines and Society: Information Quality and the Potential of Artificial Intelligence”, he spoke about the incorporation of artificial intelligence in Google’s search engine by Bard, a model developed by the company.
The executive pointed out that LLMs simply predict the next word in a sentence, and that correlation is different from causation. “Once in a while, correlation translates into causation, but that doesn’t mean there’s intelligence.”
Raghavan also spoke about the difficulty of including the sources of information in search results powered by artificial intelligence.
The answer to a search using LLM is usually a sentence — and it literally doesn’t exist anywhere on the internet, because it’s usually a combination of information “scraped” from several websites.
But Google’s vice president said the company is working on a mechanism that can detect when there is a dominant source for an AI response that other sources rely on.
According to him, it is easier when the answer to a search command is a summary of the dozens of results obtained, because then it is possible to say where each phrase used in the summary came from.
The discussion about the sources of information used to train AI models has become central for journalistic companies, which are starting to move to collect copyright from platforms.
The New York Times and OpenAI, creator of ChatGPT, have been negotiating a payment method for weeks to guarantee the legal use of newspaper articles in training AIs.