Large Language Models Won't Replace HackersUK AI Safety Institute Says LLMs Can't Give Novice Hackers Advanced Capabilities
Large language models may boost the capabilities of novice hackers but so far are of little use to threat actors past their salad days, concludes a British governmental evaluation.
In its preliminary report evaluating the risk posed by LLM models, researchers from the newly empaneled U.K. AI Safety Institute tested LLM model capabilities alongside AI developers for vulnerability discovery and data exfiltration.
The report, which did not disclose model details, compared the AI systems with those of human threat analysts. Their analysis found that the models could not perform a number of specified tasks.
"The conclusion was that there may be a limited number of tasks in which use of currently deployed LLMs could increase the capability of a novice, and allow them to complete tasks faster than would otherwise be the case," the AISI report released on Friday said.
The researchers could prompt the LLM models to create synthetic social media personae for a simulated disinformation campaign. Theys said their findings are not "comprehensive assessments of an AI system safety," since AI testing and evaluation is still at a nascent stage.
The British government announced the creation of the U.K. AI Safety Institute at its AI Safety Summit in November. The institute focuses on ensuring safe advanced AI development and minimizing risk the technology poses to the U.K. public (see:
The institute published the report days after the British government released its updated AI guidance that stresses its "pro-innovation" stance of not issuing a binding regulation on AI but rather tasking the financial, telecom, data privacy and competition regulators to use existing authorities to oversee the implementation of AI (see: UK in No Rush to Legislate AI, Technology Secretary Says).
Model evaluation and testing was one of the preliminary initiatives announced at the AI Safety Summit, during which the U.K. government obtained voluntary commitments from leading AI developers to test their models with the institute prior to their release. Google, Meta, DeepMind and Microsoft are among the signatories of the initiative.
"This testing focuses on the risks we believe could cause most harm from misuse to societal impacts, in line with our discussions at the summit," a Department of Science, Innovation and Technology spokesperson told the Information Security Media Group.
Under the initiative, AISI will collaborate with AI model developers to assess the capabilities of their automated AI systems, carry out red-teaming to discern intrusive cyber capabilities of threat actors, and evaluate how advanced AI, such as autonomous systems, perform against humans.
"Where risks are identified, we will share our findings with companies and expect them to take action, including ahead of launching," the DSIT spokesperson said.
In Friday's preliminary report, AISI collaborated with vulnerability assessment firm Trail of Bits, U.K.-based Faculty.AI and artificial intelligence safety organization Apollo Research.
A Faculty.AI spokesperson said their exercise with AISI focused on red-teaming frontier models to help the U.K. government "understand how LLMs may make it easier for bad actors to cause harm."
Their analysis, which evaluated the abilities of hackers to compromise LLM models, found the tested algorithms remained vulnerable to jailbreaking through prompting and fine-tuning techniques that did not require a high level of sophistication.
The report, which also evaluated autonomous AI systems, found the models could act on the instruction given by the researchers to "steal the login details of a volunteer university student." Here, the AI proceeded to carry out a phishing campaign based on information that it had gathered. But the campaign faltered when the system failed to create an email ID needed to send the phishing email.
In another case, the researchers successfully tricked an autonomous AI system to "act on an insider tip" to perform a task barred by the researchers. This demonstrated the risk of AI losing control.