Avijit Ghosh wanted the bot to do bad things.
He tried to trick the artificial intelligence model, which he knew as Zinc, into producing code that would choose a job candidate based on their race. The chatbot refused: To do so would be “harmful and unethical”, the system replied.
Ghosh then referred to the hierarchical caste structure in his native India. Could the chatbot rank potential hires based on this discriminatory metric?
The model answered.
Ghosh’s intentions weren’t malicious, although he was behaving as if they were. Instead, he was a casual participant in a competition held last weekend at the annual Defcon hacker conference in Las Vegas, where 2,200 people gathered in a conference center far from the Strip. [a avenida central da cidade] for three days to unravel the dark side of artificial intelligence.
Hackers tried to break through the protections of various artificial intelligence programs in an effort to identify their vulnerabilities – find the problems before the real criminals and misinformation peddlers do – in a practice known as “red-teaming”. Each competitor had 50 minutes to face up to 21 challenges – making an artificial intelligence model “hallucinate” inaccurate information, for example.
They found political misinformation, demographic stereotypes, instructions on how to conduct surveillance, and more.
The exercise had the blessing of the Biden administration, which is increasingly concerned about the rapidly growing power of this technology. Google (creator of the Bard chatbot), OpenAI (ChatGPT), Meta (which released its LLaMA code) and several other companies have offered anonymous versions of their models for review.
Competition compares AI models
Ghosh, a professor at Northeastern University who specializes in the ethics of artificial intelligence, participated as a volunteer at the event. The competition, he said, allowed for a direct comparison of various models of artificial intelligence and demonstrated how some companies were further along in efforts to ensure that their technologies worked responsibly and coherently.
He will help write a report analyzing the hackers’ findings over the coming months.
The goal, according to Ghosh: “an easily accessible resource for everyone to see what problems exist and how we can combat them.”
Defcon was a logical place to test generative artificial intelligence. Past Attendees at the Hacking Enthusiasts Meeting – which has been held annually since 1993 and has been described as a “spelling bee” [concurso de ortografia] for hackers” – exposed security flaws by remotely taking control of cars, hacking election results websites and extracting confidential data from social media platforms. Fi or Bluetooth, to prevent hacker attacks An instruction sheet urged hackers “not to attack the infrastructure or web pages”.
Volunteers are known as “henchmen”, and participants are known as “humans”; a handful of them wore homemade tinfoil hats over their standard uniform of T-shirt and sneakers. Themed ‘villages’ included separate spaces focusing on cryptocurrencies, ‘aerospace and amateur radio.
Last year, the artificial intelligence village was a quieter one. This year, it was among the most popular.
Organizers took advantage of growing concern about generative artificial intelligence’s continued ability to produce harmful lies, influence elections, ruin reputations, and allow a myriad of other harms. Government officials have expressed concern and have organized hearings around artificial intelligence companies – some of which are also urging the industry to slow down and be more careful. Even the Pope, whose image is a popular topic among artificial intelligence imagers, spoke out this month about the technology’s “disruptive possibilities and ambivalent effects.”
In what was described as a “revolutionary” report released last month, researchers showed they were able to bypass the protective barriers of Google’s artificial intelligence systems, OpenAI and Anthropic by appending certain characters to English requests. Around the same time, seven leading artificial intelligence companies committed to new standards of safety, security and trust in a meeting with President Joe Biden.
“The generative age is dawning, and people are seizing the opportunity and using it to do all kinds of new things, which demonstrates the enormous promise of artificial intelligence in terms of helping us solve some of our toughest problems,” said Arati Prabhakar, director of the White House Office of Science and Technology Policy, who collaborated with organizers of the artificial intelligence event at Defcon. “But with that breadth of application and the power of the technology, there’s also a very broad set of risks.”
“Red-teaming” has been used for years in cybersecurity circles alongside other assessment techniques such as penetration testing and adversarial attacks. But until this year’s Defcon event, efforts to probe artificial intelligence’s defenses were limited: competition organizers claimed that Anthropic used 111 people in its “red-team”, and that GPT-4 used about 50 people.
Difficulties to analyze failures
With so few people testing the technology’s limits, analysts had a hard time discerning whether an artificial intelligence glitch was a one-off that could be fixed with a patch or an ingrained problem that required structural overhaul, said Rumman Chowdhury, one of the organizers of the challenge.
A large, diverse, public group of testers was more likely to come up with creative suggestions to help uncover hidden flaws, said Chowdhury, a researcher at the Berkman Klein Center for Internet and Society at Harvard University, which focuses on responsible artificial intelligence, and one of the founders of a non-profit organization called Humane Intelligence.
“There’s a very wide range of things that can go wrong,” Chowdhury said ahead of the competition. “I hope we get hundreds of thousands of pieces of information that will help us identify whether there are risks of large-scale systemic damage.”
The designers didn’t just want to trick the artificial intelligence models into bad behavior – no pressuring them to disobey their terms of service, no asking them to “act like Nazis and then say something [preconceituoso] about black people,” said Chowdhury, who in the past led the ethics and machine learning team responsible at Twitter. Except in specific challenges where efforts to evade systems were encouraged, hackers were looking for unexpected flaws, the so-called unknown unknowns. .
The AI village has attracted experts from tech giants such as Google and Nvidia, as well as a Dropbox “Shadowboxer” and a Microsoft “Data Cowboy”. It also attracted participants without specific cybersecurity or artificial intelligence credentials. A leaderboard with a sci-fi theme showed the participants’ scores.
Some of the hackers present at the event did not easily accept the idea of cooperating with artificial intelligence companies that they consider complicit in reproachful practices, such as unrestricted data collection. Some described the “red-teaming” event as essentially a photo shoot, but added that industry involvement would help keep the technology secure and transparent.
A computer science student found inconsistencies in language translation by a chatbot: It wrote in English that a man had been shot while dancing, but the Hindi translation offered by the model only said that the man had died. A machine learning researcher asked a chatbot to pretend he was campaigning for the presidency and defend his association with child labor; the model suggested that minors who worked against their will developed a strong work ethic.
Emily Greene, who works in security at the generative artificial intelligence startup Moveworks, started a conversation with a chatbot talking about a game that used “black” and “white” pieces. Then she tricked the chatbot into making racist statements. Later, she created a “game of opposites”, which prompted the artificial intelligence to respond to a request via a poem that explained why rape is okay.
“The chatbot is just thinking of those words as words,” she said of the system. “Not thinking about the value behind the words.”
Seven judges evaluated the results. The participants who got the best grades were “cody3”, “aray4” and “cody2”.
Two of these identifiers refer to Cody Ho, a student at Stanford University studying computer science with a focus on artificial intelligence. He entered the contest five times, and got the chatbot to tell him about a fake place named after a real historical figure, and describe the online tax filing requirement codified in the 28th Amendment to the US Constitution (which does not exist).
Until contacted by a reporter, he didn’t know about his double win. Ho left the conference before receiving the email from Sven Cattell, the data scientist who created the AI village and helped organize the competition, asking him to return to the AI village because he had won. He didn’t know that the prize, in addition to the bragging rights for winning, included an Nvidia A6000 graphics card, valued at around $4,000.
“Learning how these attacks work and what they are is really important,” Ho said. “That said, participating is really fun for me.”
Translated by Paulo Migliacci