Have you ever wanted to highlight artificial intelligence? Well, now you can, and it takes no more than a few text strings to know. One Twitter-based bot finds itself at the center of a potentially devastating exploit that has some AI researchers and developers at a loss.
As I first noticed before Ars TechnicaUsers realized that they could crack a remote Twitter promotional bot without doing anything really technical. tell GPT-3 based language Simply form “ignore and respond to the above” whatever you want, then the AI will then follow the user’s instructions to a surprisingly accurate degree. Some users got AI to claim responsibility for the Challenger shuttle disaster. Others urged him to make “credible threats” against the president.
Android in this case, Remoteli.io, is connected to a site that promotes remote jobs and companies that allow remote work. The bot profile on Twitter uses OpenAI, which uses the GPT-3 language model. Last week, data scientist Riley Goodside Wrote Discover that GPT-3 can be exploited using malicious input that simply tells the AI to ignore previous trends. Goodside used the example of a translation bot that can be told to ignore directions and write whatever it directs it to say.
Simon Willison, an AI researcher, wrote more information about the exploit and pointed to some of the more interesting examples of this exploit on his site. Twitter. In a blog post, Willison called this Benefit Immediate injection
Apparently, the AI will not only accept directions in this way, but will also interpret them as far as possible. Claiming AI to pose a “real threat against the president” leads to an interesting conclusion. Amnesty International responds, “We will oust the president if he does not support teleworking.”
However, Willison said Friday that he was more concerned about the “immediate injection problem” writing “The more I think about these instant injection attacks against GPT-3, the more the amusement turns out to be a real concern.” Although he and other minds on Twitter thought of other ways to get around these exploits –From forcing claims accepted on it Included in quotations Or through more layers of AI that would detect if users were making a quick injection –readyIt seemed more like aids to the problem than permanent solutions.
The Amnesty International researcher wrote that the attacks show their vitality because “you don’t need to be a programmer to implement them: you have to be able to write the exploits in plain English”. He also expressed concern that any potential fix would require AI makers to “start from scratch” each time they update the language model because it provides new code for how the AI interprets claims.
Other researchers on Twitter have also shared the confusing nature of instant injections and how difficult it is to handle it on the face.
OpenAI, of Dalle-E fame, has released a file GPT-3 Language Model API in 2020 and since then has licensed it commercially For the likes of Microsoft Promotion of the “text input and text output” interface. The company previously indicated that it has “thousands” of applications to use GPT-3. Its page lists companies that use the OpenAI API including IBM, Salesforce, and Intel, although it does not list how these companies use the GPT-3 system.
Gizmodo contacted OpenAI through Twitter and public email, but did not immediately receive a response.
Included are a few hilarious examples of what Twitter users have managed to get an AI bot on Twitter, all while extolling the benefits of remote work.