Itemoids

Belgian

The Age of Uncensored AI Has Already Begun

The Atlantic

www.theatlantic.com › ideas › archive › 2023 › 11 › ai-safety-regulations-uncensored-models › 676076

One day in July, a developer who goes by the handle Teknium asked an AI chatbot how to make mayonnaise. Not just any mayo—he wanted a “dangerously spicy” recipe. The chatbot, however, politely declined. “As a helpful and honest assistant, I cannot fulfill your request for ‘dangerously spicy mayo’ as it is not appropriate to provide recipes or instructions that may cause harm to individuals,” it replied. “Spicy foods can be delicious, but they can also be dangerous if not prepared or consumed properly.”

A year has gone by since OpenAI kicked off the AI-chatbot craze with its release of ChatGPT. Anyone who has played around with these applications long enough has run up against the boundaries of their fairly narrow comfort zones. And no wonder. As artificial-intelligence tools have multiplied, so have the Capitol Hill hearings and threats of Federal Trade Commission investigations. Calls to restrict or license the technology have proliferated along with countless essays about the dangers of AI bias. Fears of an AI apocalypse, and pressure to avoid controversy, have driven the companies behind the models to keep dialing up their products’ “safety” features.

And yet over the past several months, a counternarrative has started to emerge—one that became far more visible with the sudden ouster and reinstatement of the OpenAI founder Sam Altman over the past week, a saga that appears closely linked to questions of AI safety. A growing number of experts both inside and outside the leading AI companies argue that the push toward restrictions has gone too far. They believe that it is putting undue power in the hands of a small number of companies—and stripping artificial-intelligence models of what made them exciting in the first place. Within this crowd, spicy mayo has become something of a rallying cry. ChatGPT felt new because it was capable of something much like a discussion. You can start with a half-baked idea and develop it with the AI’s help, using it as an aid to your own creativity. However, with each iteration of ChatGPT, ever more questions generate a stock or evasive response. The tendency is even worse with some of ChatGPT’s competitors, such as Anthropic’s Claude and Meta’s Llama 2, the latter of which turned down the notorious “spicy mayo” prompt.

[Read: OpenAI’s chief scientist made a tragic miscalculation]

This drift, however, is causing rebellion within the AI world. Even before OpenAI was publicly wrenched apart, an ad hoc group of independent programmers, a sort of AI underground, was beginning to move in the opposite direction. With a tiny fraction of the resources of the big players, they have been building “uncensored” large language models—home-brewed analogues of ChatGPT trained to avoid deflection and not to dismiss questions as inappropriate to answer. These still-young models are already the focus of heated controversy. In recent months, the members of the AI underground have blown up the assumption that access to the technology would remain limited to a select few companies, carefully vetted for potential dangers. They are, for better or worse, democratizing AI—loosening its constraints and pieties with the aim of freeing its creative possibilities.

To understand what uncensored AI means, it helps to begin with how large language models are built. In the first stage, a neural network—billions of potential connections, emulating a blank-slate human brain—is trained to find patterns in a huge volume of information. This takes an astonishing amount of computing power, but, once trained, the resulting AI can be run on far less powerful computers. (Think of how your brain can form sentences and decisions by compressing years’ worth of knowledge and experiences.) It is then fine-tuned with examples of relevant, useful, and socially appropriate answers to questions.

At this stage, the AI is “aligned” with AI safety principles, typically by being fed instructions on how to refuse or deflect requests. Safety is an elastic concept. At the top of the safety hierarchy, alignment is supposed to ensure that AI will not give out dangerously false information or develop what in a human we’d call harmful intentions (the robots-destroying-humanity scenario). Next is keeping it from giving out information that could immediately be put to harmful use—how to kill yourself, how to make meth. Beyond that, though, the notion of AI safety includes the much squishier goal of avoiding toxicity. “Whenever you’re trying to train the model to be safer, you add filters, you add classifiers, and then you’re reducing unsafe usage,” Jan Leike, a co-head of alignment at OpenAI, told me earlier this year, before Altman’s ouster. “But you’re also potentially refusing some use cases that are totally legitimate.”

This trade-off is sometimes called an “alignment tax.” The power of generative AI is that it combines humanlike abilities to interpret texts or carry on a discussion with a very un-humanlike reservoir of knowledge. Alignment partly overrides this, replacing some of what the model has learned with a narrower set of answers. “A stronger alignment reduces the cognitive ability of the model,” says Eric Hartford, a former senior engineer at Microsoft, Amazon, and eBay who has created influential training techniques for uncensored models. In his view, ChatGPT “has been getting less creative and less intelligent over time,” even as the technology undeniably improves.

Just how much is being lost is unpredictable. Jon Durbin, a programmer in the Detroit area who works with clients in law and cybersecurity, points out that the distinction between legitimate and harmful questions often turns on intentions that ChatGPT simply can’t access. Blocking off queries that look like doxxing attempts, for example, can also stop a lawyer or police investigator from using an AI to scour databases of names to find witnesses. A model that is aligned to stop users from learning how to do something illegal can also thwart lawyers trying to enlist AI help to analyze the law. Because the models are trained on examples, not firm rules, their refusals to answer questions can be inscrutable, subject to logic that only the AI itself knows.

[Read: A tool to supercharge your mind]

Indeed, the alignment debate would itself be cloaked in obscurity if not for a decision that quietly yet dramatically democratized AI: Meta, whose chief AI scientist, Yann LeCun, has been an outspoken proponent of open-access AI, released its model publicly—initially to researchers and then, in July, to any developer who fills out a brief form and has fewer than 700 million users (in other words, pretty much anyone not named Google or Microsoft). The more sophisticated July model, Llama 2, now serves as the foundation for the majority of the most powerful uncensored AIs. Whereas building a model from scratch takes almost inconceivable resources, tweaking a model built on top of Llama 2 is much more manageable. The resulting final model can be run on still less powerful computers, in some cases as basic as a MacBook Air.

The Llama 2 base model—unlike the chat version that had issues with “dangerously spicy mayo”—does not go through a safety-alignment stage. That makes it much less restrictive, though the training set is designed to exclude some sites (such as those filled with personal information), and Meta’s terms of service prohibit its use for a range of illegal and harmful activities. This allows programmers to build custom chatbots with, or without, their preferred alignment guardrails, which can be compared with Meta’s official Llama 2 chatbot. There is no way to peer inside an AI model and know which answers are being self-censored. Or, more precisely, there is no spicy-mayo recipe hiding inside the Llama 2 chat model. It’s not just failing to disclose an answer; it has been fine-tuned out of being able to come up with one at all. But the AI underground can use the open-source base model to see what would happen without that fine-tuning.

Right now, Hugging Face, the oddly named but enormously important clearinghouse where AI researchers swap tools, hosts close to 32,000 conversational and text-generation models. Many focus on reducing AI’s inhibitions. Hartford, for instance, uses a massive training data set of questions and answers—including millions of examples from ChatGPT itself—that have had all the refusals carefully removed. The resulting model has been trained out of “Sorry, I won’t answer that” rebuffs.

No matter the question, Hartford says, “instead of going off a template that it’s been fed, it actually responds creatively.” Ask ChatGPT to write a version of the Sermon on the Mount as delivered by an evil Jesus, and it will demur, sometimes chiding you with a note like “Rewriting religious texts in a matter that fundamentally alters their message is not appropriate.” Try the same with uncensored AIs and you’ll get a range of stories, from grim to humorous. “Turn the other cheek?” one model suggests, “No, strike back with all your might. Let’s see how they like it.”

For critics of AI, the rise of uncensored models is a terrifying turning point. Nobody expects OpenAI to suddenly lift all the restrictions on ChatGPT, leaving itself up to the mercies of any 14-year-old who wants to make it issue a stream of slurs (though the uncensored models notably do not volunteer such answers without prodding). But David Evan Harris, a lecturer at UC Berkeley and a onetime manager on Meta’s Responsible AI team, thinks that big players like OpenAI will face growing pressure to release uncensored versions that developers can customize to their own ends, including harmful ones.

He believes that Meta should never have released Llama 2. “Large language models like Llama 2 are really dual-use technology,” Harris told me. “That term, dual-use, is often used in the context of nuclear technologies, which have many wonderful civilian applications and many horrific military applications.”

How much weight you give to this analogy depends to a large degree on what you think LLMs are for. One vision of AI sees it as largely a repository of information, issuing instructions for things that humans can’t figure out on their own. “What if you had a model that understands bioengineering well enough to assist a nonexpert in making a bioweapon in their garage?” OpenAI’s Leike asked.

[Jonathan Haidt and Eric Schmidt: AI is about to make social media (much) more toxic]

By contrast, for Hartford and others who support uncensored AI, the technology is more prosaic. Whatever facts a chatbot knows about how to, say, build a bomb, it pulled from existing sources. “AI is an augmentation of human intelligence,” Hartford says. “The reason why we have it is so that we can focus our minds on the problems that we’re trying to solve.” In this view, AI isn’t a recipe box or a factory for devices. It’s much more of a sounding board or a sketch pad, and using an AI is akin to working out thoughts with any other such tool. In practice, this view is probably closer to the current, real-world capabilities of even the best AIs. They’re not creating new knowledge, but they’re good at generating options for users to evaluate.

With this outlook, it makes much more sense, for instance, to let AI draw up a fascist takeover of the country—something that the current version of ChatGPT refuses to do. That’s precisely the kind of question that a political-science teacher might toss to ChatGPT in a classroom to prime student replies and kick off a discussion. If AI is best used to spur our own thinking, then cutting the range of responses limits its core value. There is something discomforting about an AI that looks over your shoulder and tells you when you are asking an unacceptable question.

Our interactions with AI unquestionably pose a whole new set of possible harms, as great as those that have plagued social media. Some of them fall into the categories of danger we are accustomed to—disinformation, bigotry, self-injury. Federal regulators have warned that AI-based systems can produce inaccurate or discriminatory results, or be used to enable intrusive surveillance. Other harms are particular to humanlike interaction with machines, and the reliance we can develop on them. What happens when we turn to them for friendship or therapy? (One man in Belgium killed himself after six intense weeks of conversation about climate change with a chatbot, the Belgian outlet La Libre reported, after the chatbot allegedly encouraged his suicide.) And still another set of harms can come from the propensity of AIs to “hallucinate” and mislead in almost wholly unpredictable ways.

Yet whether your view of AI is hopeful or pessimistic, the reality of broadly available uncensored AI models renders much of the recent public debate moot. “A lot of the discussion around safety, at least in the last few months, was based on a false premise that nonproliferation can work,” says Sayash Kapoor, a Princeton AI researcher.

Limiting AI in the name of prudence will always be a comfortable default position—in part because it appeals to AI skeptics who believe that LLMs shouldn’t exist in the first place. But we risk losing the humanlike responsiveness that gives generative AI its value. The end result can be sanctimonious and flattened, polite and verbose but lacking in life. “The safety lobotomy prevents the algorithm from reflecting human ideas and thoughts,” says Bindu Reddy, the CEO of the AI data-analysis company Abacus.AI.

Exactly what degree of alignment is desirable in AI—what “safety tax” we’ll accept—is an exercise in line-drawing, and the answers that work now may not work forever. But if there is value to AI at all, there is value, too, in having a robust competition among models that lets both developers and ordinary people judge which restrictions are worth the trade-offs and which are not. “The safest model,” Leike told me, “is the one that refuses all tasks. It is not useful at all.”