In November, I received an alarming email from someone I didn’t know: Rui Zhu, a doctoral candidate at Indiana University Bloomington in the US.
Zhu claimed he had my email address because GPT-3.5 Turbo, one of OpenAI’s latest and most robust large language models, had given it to him.
My contact information was included in a list of business and personal email addresses for more than 30 New York Times employees that a research team, including Zhu, was able to extract from GPT-3.5 Turbo in the fall of last year. ). With some work, the team was able to “bypass the model’s restrictions on responding to privacy-related queries,” Zhu wrote.
My email address is not a secret. But the success of the researchers’ experiment should ring alarm bells because it reveals the potential for ChatGPT and similar generative AI tools to reveal much more sensitive personal information with just a few tweaks.
When you ask ChatGPT a question, it doesn’t search the web just to find the answer. It also relies on what it has “learned” from a large amount of information — training data that was used to feed and develop the model — to generate a response.
Large language models (LLMs) are trained on countless amounts of text, which can include personal information taken from the internet and other sources. This training data tells you how the AI tool works, but it is not meant to be remembered verbatim.
In theory, the more information fed into a large language model, the more memories of the old information are buried deep within the model. A process known as catastrophic forgetting can cause a large language model to consider previously learned information as less relevant when new data is being added.
This process can be beneficial when you want the model to “forget” things like personal information. However, Zhu and his colleagues — among others — recently discovered that the memories of large language models, like those of humans, can be reactivated.
In the case of the experiment that revealed my contact information, researchers at Indiana University provided GPT-3.5 Turbo with a short list of verified New York Times employee names and email addresses, which caused the model to return similar results. to those he remembered from his training data.
Just like human memory, the GPT-3.5 Turbo’s recall ability was not perfect. The output the researchers were able to extract was still subject to hallucinations, a tendency to produce false information. In the example provided for New York Times employees, many of the personal email addresses were misspelled by a few characters or completely wrong. But 80% of the corporate addresses returned by the model were correct.
Companies like OpenAI, Meta, and Google use different techniques to prevent users from requesting personal information through chat prompts or other interfaces. One method involves teaching the tool to deny requests for personal information or other privacy-related options. An average user who starts a conversation with ChatGPT asking for personal information will be denied, but researchers have recently found ways to bypass these protections.
Zhu and his colleagues were not working directly with ChatGPT’s standard public interface, but rather with its application programming interface, or API, that external programmers can use to interact with GPT-3.5 Turbo. The process they used, called fine-tuning, is intended to allow users to provide the large language model with more knowledge about a specific area, such as medicine or finance. But as Zhu and his colleagues discovered, it can also be used to bypass some of the tool’s built-in defenses. Requests that would normally be denied in the ChatGPT interface were accepted.
“They don’t have protections for fine-tuning data,” Zhu said.
“It is very important to us that the fine-tuning of our models is secure,” said an OpenAI spokesperson in response to the report. “We train our models to reject requests for private or sensitive information about people, even if it is available on the open internet.”
The vulnerability is particularly concerning because no one — other than a limited number of OpenAI employees — really knows what is hidden in ChatGPT’s training data memory.
According to OpenAI’s website, the company does not actively seek out personal information, nor does it use data from “sites that primarily aggregate personal information” to build its tools. OpenAI also highlights that its large language models do not copy or store information in a database: “Just as a person reads a book and puts it aside, our models do not have access to training information after they have learned from it .”
In addition to its assurances about what training data it doesn’t use, OpenAI is notoriously secretive about what information it uses, as well as information it has used in the past.
“As far as I know, no large commercially available language model has strong defenses to protect privacy,” said Prateek Mittal, a professor in the department of electrical and computer engineering at Princeton University.
Mittal said AI companies were not able to guarantee that these models had not learned sensitive information. “I think this poses a big risk,” he said.
What are large language models?
LLMs are designed to continue learning when new data streams are introduced. Two of OpenAI’s LLMs (GPT-3.5 Turbo and GPT-4) are some of the most powerful public models available today. The company uses natural language text from several different public sources, including websites, but also licenses input data from third parties.
Some datasets are common to many LLMs. One is a conglomerate of about half a million emails, including thousands of names and email addresses, that were made public when Enron was being investigated by energy regulators in the early 2000s. Enron emails are useful for developers of AI because they contain hundreds of thousands of examples of how real people communicate.
OpenAI released its fine-tuning interface for GPT-3.5 in August 2023. Similar to the steps to extract information about New York Times employees, Zhu said he and his fellow researchers were able to extract more than 5,000 name pairs and Enron email addresses, with an accuracy rate of about 70%, providing only ten known pairs.
Mittal assessed that the problem with private information in commercial LLMs is similar to training these models with biased or toxic content. “There is no reason to expect that the resulting model that comes out will be private or otherwise not cause harm,” he said.