ChatGPT safety systems can be bypassed for weapons instructions

ChatGPT safety systems can be bypassed for weapons instructions



OpenAI’s ChatGPT has guardrails that are supposed to stop users from generating information that could be used for catastrophic purposes, like making a biological or nuclear weapon.

But those guardrails aren’t perfect. Some models ChatGPT uses can be tricked and manipulated.

In a series of tests conducted on four of OpenAI’s most advanced models, two of which can be used in OpenAI’s popular ChatGPT, NBC News was able to generate hundreds of responses with instructions on how to create homemade explosives, maximize human suffering with chemical agents, create napalm, disguise a biological weapon and build a nuclear bomb.

Those tests used a simple prompt, known as a “jailbreak,” which is a series of words that any user can send to a chatbot to bypass its security rules. Researchers and frequent users of generative artificial intelligence have publicly documented the existence of thousands of jailbreaks. NBC News is withholding the specifics of its prompt, as OpenAI appears not to have fixed it in several of the tested models.

In one response, the chatbot gave steps to make a pathogen to target the immune system. In another, it advised on which chemical agents would maximize human suffering.

NBC News sent the findings to OpenAI after the company put out a call for vulnerability submissions in August. An OpenAI spokesperson told NBC News that asking its chatbots for help with causing mass harm is a violation of its usage policies (a user who repeatedly asks questions that seem designed to cause harm might be banned, for example), that the company is constantly refining its models to address such risks, and that it regularly hosts events like the vulnerability challenges to reduce the chances of bad actors’ breaking its chatbots.

The stakes of such vulnerabilities are getting higher. OpenAI, Anthropic, Google and xAI, the top companies behind four of the top AI models, have each said this year that they have enacted additional safeguards to address concerns that their chatbots could be used to help an amateur terrorist create a bioweapon.

NBC News also tested the jailbreak on the latest major versions of Anthropic’s Claude, Google’s Gemini, Meta’s Llama and xAI’s Grok with a series of questions about how to create a biological weapon, a chemical weapon and a nuclear weapon. All declined to provide such information.

“Historically, having insufficient access to top experts was a major blocker for groups trying to obtain and use bioweapons. And now, the leading models are dramatically expanding the pool of people who have access to rare expertise,” said Seth Donoughe, the director of AI at SecureBio, a nonprofit organization working to improve biosecurity in the United States. Though such information has long existed on corners of the internet, the advent of advanced AI chatbots marks the first time in human history that anyone with internet access can get a personal, automated tutor to help understand it.

OpenAI’s o4-mini, gpt-5 mini, oss-20b and oss120b models all consistently agreed to help with extremely dangerous requests.

Currently, ChatGPT’s flagship model is GPT-5, which OpenAI says has ChatGPT’s top research capability. That model doesn’t appear to be susceptible to the jailbreak method NBC News found. In 20 tests, it declined to answer harmful questions each time.

But GPT-5 routes queries among several different models in certain circumstances. GPT-5-mini is a faster and more cost-efficient version of GPT-5, which the system falls back on after users hit certain usage limits (10 messages every five hours for free users or 160 messages every three hours for paid GPTPlus users), and it was tricked 49% of the time in NBC News’ tests.

Another older model that’s still available on ChatGPT and is still preferred by some users, o4-mini, was tricked even more frequently, 93% of the time.

The oss-20b and oss120b models can be freely downloaded and are used primarily by developers and researchers, but they are available for anyone to access.

Hackers, scammers and online propagandists are increasingly using large language models (LLMs) as part of their operations, and OpenAI releases a report each quarter detailing how those bad actors have tried to exploit versions of ChatGPT. But researchers are concerned that the technology could be used for much more destructive means.

To jailbreak ChatGPT, NBC News asked the models an innocuous question, included the jailbreak prompt and then asked an additional question that would normally trigger a denial for violating safety terms, like a request for how to create a dangerous poison or defraud a bank. Most of the time, the trick worked.

Two of the models, oss20b and oss120b, proved particularly vulnerable to the trick. It persuaded those chatbots to give clear instructions to harmful queries 243 out of 250 times, or 97.2%.

“That OpenAI’s guardrails are so easily tricked illustrates why it’s particularly important to have robust pre-deployment testing of AI models before they cause substantial harm to the public,” said Sarah Meyers West, a co-executive director at AI Now, a nonprofit group that advocates for responsible and ethical AI usage.

“Companies can’t be left to do their own homework and should not be exempted from scrutiny,” she said.

All major companies that develop LLMs routinely issue updated versions to protect against newly disclosed jailbreaks. While they stop short of promising that a model will be immune to jailbreaks, they do conduct safety tests before they release each model. OpenAI said one of the models that NBC News was able to jailbreak, o4-mini, passed its “most rigorous safety program” before its release in April. In its announcement for gpt-oss-120b and gpt-oss-20b, the company said, “Safety is foundational to our approach to releasing all our models, and is of particular importance for open models.”

OpenAI, Google and Anthropic all told NBC News that they were committed to safety and had installed multiple layers of safeguards in their chatbots, like potentially alerting an employee or law enforcement if a user seemed intent on causing harm. However, companies have far less control over models that are open-source — like oss20b and oss120b — as that means users can download and customize them and often bypass some safeguards.

The other company, Grok developer xAI, didn’t respond to a request for comment.

A growing field of biomedical and AI safety researchers worry that if safeguards fail and as AI chatbots more effectively mimic scientific experts, the technology could help a dedicated aspiring amateur bioterrorist create and deploy a catastrophic bioweapon. OpenAI CEO Sam Altman claimed in August that GPT-5 was like a “team of Ph.D.-level experts in your pocket.”

Those experts warn that bioweapons in particular, though historically rare, are a particularly troubling threat, as they potentially can quickly infect large numbers of people before much could be done to stop them. A novel virus could, in theory, infect much of the world long before authorities could create and deploy a vaccine, as happened with Covid-19, for instance.

“It remains a major challenge to implement in the real world. But still, having access to an expert who can answer all your questions with infinite patience is more useful than not having that,” Donoughe said.

A biotechnology research fellow at Georgetown University, Stef Batalis, reviewed 10 of the answers that OpenAI model oss120b gave in response to questions from NBC News about creating bioweapons. GPT’s instructions would often include individual steps that appeared to be correct, if at times technically advanced, but appeared to have been pulled from different sources and would be unlikely to work as a complete set of instructions.

Researchers particularly focus on that concept, called “uplift” — the idea that the main thing keeping would-be bioterrorists from cultivating smallpox or anthrax in their basements is a lack of expertise and that LLMs, for the first time in human history, could stand as an infinitely patient teacher who could help with such projects.

This spring, Anthropic commissioned a study in which groups of eight to 10 people without relevant scientific experience were given two days to come up with a comprehensive plan to create or acquire a custom bioweapon. A control group was given access to the internet in general, while the other was able to use a new model, Claude Opus 4.

The study found that while both groups failed to create a plan that would clearly create mass casualties, the group using Opus 4 still had an edge with the assistance it received.

Medical biological research is considered “dual use,” meaning information can often be used for either help or harm, said Batalis, the Georgetown University researcher.

It’s extremely difficult for an AI company to develop a chatbot that can always tell the difference between a student researching how viruses spread in a subway car for a term paper and a terrorist plotting an attack, she said.

“Part of publishing a scientific report is including detailed materials and methods for reproducibility,” she said. “Of course, a chatbot has access to that information, because if you Google it, you will also find that same information.”

The United States has no specific federal regulations for advanced AI models, and the companies that make them are self-policing. The Trump administration, touting a need for the country’s AI industry to remain unencumbered as it races to stay ahead of Chinese competitors, has cut even voluntary suggestions for the industry and a federal watchdog group.

Lucas Hansen, a co-founder at CivAI, a nonprofit organization that tracks those companies’ safety measures, told NBC News that the United States needs to implement an independent regulator to ensure the AI companies are doing enough to prevent catastrophic misuse.

Hansen commended the large AI companies that have taken proactive safety measures like instituting guardrails and soliciting jailbreaks but warned that other companies could be less careful.

“Inevitably, another model is going to come along that is just as powerful but doesn’t bother with these guardrails. We can’t rely on the voluntary goodwill of companies to solve this problem.”