Itemoids

ChatGPT

We Programmed ChatGPT Into This Article

The Atlantic

www.theatlantic.com › technology › archive › 2023 › 03 › chatgpt-api-software-integration › 673340

ChatGPT, the internet-famous AI text generator, has taken on a new form. Once a website you could visit, it is now a service that you can integrate into software of all kinds, from spreadsheet programs to delivery apps to magazine websites such as this one. Snapchat added ChatGPT to its chat service (it suggested that users might type “Can you write me a haiku about my cheese-obsessed friend Lukas?”), and Instacart plans to add a recipe robot. Many more will follow.

They will be weirder than you might think. Instead of one big AI chat app that delivers knowledge or cheese poetry, the ChatGPT service (and others like it) will become an AI confetti bomb that sticks to everything. AI text in your grocery app. AI text in your workplace-compliance courseware. AI text in your HVAC how-to guide. AI text everywhere—even later in this article—thanks to an API.

API is one of those three-letter acronyms that computer people throw around. It stands for “application programming interface”: It allows software applications to talk to one another. That’s useful because software often needs to make use of the functionality from other software. An API is like a delivery service that ferries messages between one computer and another.

Despite its name, ChatGPT isn’t really a chat service—that’s just the experience that has become most familiar, thanks to the chatbot’s pop-cultural success. “It’s got chat in the name, but it’s really a much more controllable model,” Greg Brockman, OpenAI’s co-founder and president, told me. He said the chat interface offered the company and its users a way to ease into the habit of asking computers to solve problems, and a way to develop a sense of how to solicit better answers to those problems through iteration.

But chat is laborious to use and eerie to engage with. “You don’t want to spend your time talking to a robot,” Brockman said. He sees it as “the tip of an iceberg” of possible future uses: a “general-purpose language system.” That means ChatGPT as a service (rather than a website) may mature into a system of plumbing for creating and inserting text into things that have text in them.

As a writer for a magazine that’s definitely in the business of creating and inserting text, I wanted to explore how The Atlantic might use the ChatGPT API, and to demonstrate how it might look in context. The first and most obvious idea was to create some kind of chat interface for accessing magazine stories. Talk to The Atlantic, get content. So I started testing some ideas on ChatGPT (the website) to explore how we might integrate ChatGPT (the API). One idea: a simple search engine that would surface Atlantic stories about a requested topic.

But when I started testing out that idea, things quickly went awry. I asked ChatGPT to “find me a story in The Atlantic about tacos,” and it obliged, offering a story by my colleague Amanda Mull, “The Enduring Appeal of Tacos,” along with a link and a summary (it began: “In this article, writer Amanda Mull explores the cultural significance of tacos and why they continue to be a beloved food.”). The only problem: That story doesn’t exist. The URL looked plausible but went nowhere, because Mull had never written the story. When I called the AI on its error, ChatGPT apologized and offered a substitute story, “Why Are American Kids So Obsessed With Tacos?”—which is also completely made up. Yikes.

How can anyone expect to trust AI enough to deploy it in an automated way? According to Brockman, organizations like ours will need to build a track record with systems like ChatGPT before we’ll feel comfortable using them for real. Brockman told me that his staff at OpenAI spends a lot of time “red teaming” their systems, a term from cybersecurity and intelligence that names the process of playing an adversary to discover vulnerabilities.

Brockman contends that safety and controllability will improve over time, but he encourages potential users of the ChatGPT API to act as their own red teamers—to test potential risks—before they deploy it. “You really want to start small,” he told me.

Fair enough. If chat isn’t a necessary component of ChatGPT, then perhaps a smaller, more surgical example could illustrate the kinds of uses the public can expect to see. One possibility: A magazine such as ours could customize our copy to respond to reader behavior or change information on a page, automatically.  

Working with The Atlantic’s product and technology team, I whipped up a simple test along those lines. On the back end, where you can’t see the machinery working, our software asks the ChatGPT API to write an explanation of “API” in fewer than 30 words so a layperson can understand it, incorporating an example headline of the most popular story on The Atlantic’s website at the time you load the page. That request produces a result that reads like this:

As I write this paragraph, I don’t know what the previous one says. It’s entirely generated by the ChatGPT API—I have no control over what it writes. I’m simply hoping, based on the many tests that I did for this type of query, that I can trust the system to produce explanatory copy that doesn’t put the magazine’s reputation at risk because ChatGPT goes rogue. The API could absorb a headline about a grave topic and use it in a disrespectful way, for example.

In some of my tests, ChatGPT’s responses were coherent, incorporating ideas nimbly. In others, they were hackneyed or incoherent. There’s no telling which variety will appear above. If you refresh the page a few times, you’ll see what I mean. Because ChatGPT often produces different text from the same input, a reader who loads this page just after you did is likely to get a different version of the text than you see now.

Media outlets have been generating bot-written stories that present sports scores, earthquake reports, and other predictable data for years. But now it’s possible to generate text on any topic, because large language models such as ChatGPT’s have read the whole internet. Some applications of that idea will appear in new kinds of word processors, which can generate fixed text for later publication as ordinary content. But live writing that changes from moment to moment, as in the experiment I carried out on this page, is also possible. A publication might want to tune its prose in response to current events, user profiles, or other factors; the entire consumer-content internet is driven by appeals to personalization and vanity, and the content industry is desperate for competitive advantage. But other use cases are possible, too: prose that automatically updates as a current event plays out, for example.

Though simple, our example reveals an important and terrifying fact about what’s now possible with generative, textual AI: You can no longer assume that any of the words you see were created by a human being. You can’t know if what you read was written intentionally, nor can you know if it was crafted to deceive or mislead you. ChatGPT may have given you the impression that AI text has to come from a chatbot, but in fact, it can be created invisibly and presented to you in place of, or intermixed with, human-authored language.

Carrying out this sort of activity isn’t as easy as typing into a word processor—yet—but it’s already simple enough that The Atlantic product and technology team was able to get it working in a day or so. Over time, it will become even simpler. (It took far longer for me, a human, to write and edit the rest of the story, ponder the moral and reputational considerations of actually publishing it, and vet the system with editorial, legal, and IT.)

That circumstance casts a shadow on Greg Brockman’s advice to “start small.” It’s good but insufficient guidance. Brockman told me that most businesses’ interests are aligned with such care and risk management, and that’s certainly true of an organization like The Atlantic. But nothing is stopping bad actors (or lazy ones, or those motivated by a perceived AI gold rush) from rolling out apps, websites, or other software systems that create and publish generated text in massive quantities, tuned to the moment in time when the generation took place or the individual to which it is targeted. Brockman said that regulation is a necessary part of AI’s future, but AI is happening now, and government intervention won’t come immediately, if ever. Yogurt is probably more regulated than AI text will ever be.

Some organizations may deploy generative AI even if it provides no real benefit to anyone, merely to attempt to stay current, or to compete in a perceived AI arms race. As I’ve written before, that demand will create new work for everyone, because people previously satisfied to write software or articles will now need to devote time to red-teaming generative-content widgets, monitoring software logs for problems, running interference with legal departments, or all other manner of tasks not previously imaginable because words were just words instead of machines that create them.

Brockman told me that OpenAI is working to amplify the benefits of AI while minimizing its harms. But some of its harms might be structural rather than topical. Writing in these pages earlier this week, Matthew Kirschenbaum predicted a textpocalypse, an unthinkable deluge of generative copy “where machine-written language becomes the norm and human-written prose the exception.” It’s a lurid idea, but it misses a few things. For one, an API costs money to use—fractions of a penny for small queries such as the simple one in this article, but all those fractions add up. More important, the internet has allowed humankind to publish a massive deluge of text on websites and apps and social-media services over the past quarter century—the very same content ChatGPT slurped up to drive its model. The textpocalypse has already happened.

Just as likely, the quantity of generated language may become less important than the uncertain status of any single chunk of text. Just as human sentiments online, severed from the contexts of their authorship, take on ambiguous or polyvalent meaning, so every sentence and every paragraph will soon arrive with a throb of uncertainty: an implicit, existential question about the nature of its authorship. Eventually, that throb may become a dull hum, and then a familiar silence. Readers will shrug: It’s just how things are now.

Even as those fears grip me, so does hope—or intrigue, at least—for an opportunity to compose in an entirely new way. I am not ready to give up on writing, nor do I expect I will have to anytime soon—or ever. But I am seduced by the prospect of launching a handful, or a hundred, little computer writers inside my work. Instead of (just) putting one word after another, the ChatGPT API and its kin make it possible to spawn little gremlins in my prose, which labor in my absence, leaving novel textual remnants behind long after I have left the page. Let’s see what they can do.

Duck Off, Autocorrect

The Atlantic

www.theatlantic.com › technology › archive › 2023 › 03 › ai-chatgpt-autocorrect-limitations › 673338

By most accounts, I’m a reasonable, levelheaded individual. But some days, my phone makes me want to hurl it across the room. The problem is autocorrect, or rather autocorrect gone wrong—that habit to take what I am typing and mangle it into something I didn’t intend. I promise you, dear iPhone, I know the difference between its and it’s, and if you could stop changing well to we’ll, that’d be just super. And I can’t believe I have to say this, but I have no desire to call my fiancé a “baboon.”

It’s true, perhaps, that I am just clumsy, mistyping words so badly that my phone can’t properly decipher them. But autocorrect is a nuisance for so many of us. Do I even need to go through the litany of mistakes, involuntary corrections, and everyday frustrations that can make the feature so incredibly ducking annoying? “Autocorrect fails” are so common that they have sprung endless internet jokes. Dear husband getting autocorrected to dead husband is hilarious, at least until you’ve seen a million Facebook posts about it.

Even as virtually every aspect of smartphones has gotten at least incrementally better over the years, autocorrect seems stuck. An iPhone 6 released nearly a decade ago lacks features such as Face ID and Portrait Mode, but its basic virtual keyboard is not clearly different from the one you use today. This doesn’t seem to be an Apple-specific problem, either: Third-party keyboards can be installed on both iOS and Android that claim to be better at autocorrect. Disabling the function altogether is possible, though it rarely makes for a better experience. Autocorrect’s lingering woes are especially strange now that we have chatbots that are eerily good at predicting what we want or need. ChatGPT can spit out a passable high-school essay while autocorrect still can’t seem to consistently figure out when it’s messing up my words. If everything in tech gets disrupted sooner or later, why not autocorrect?

[Read: The end of high-school English]

At first, autocorrect as we now know it was a major disruptor itself. Although text correction existed on flip phones, the arrival of devices without a physical keyboard required a new approach. In 2007, when the first iPhone was released, people weren’t used to messaging on touchscreens, let alone on a 3.5-inch screen where your fingers covered the very letters you were trying to press. The engineer Ken Kocienda’s job was to make software to help iPhone owners deal with inevitable typing errors; in the quite literal sense, he is the inventor of Apple’s autocorrect. (He retired from the company in 2017, though, so if you’re still mad at autocorrect, you can only partly blame him.)

Kocienda created a system that would do its best to guess what you meant by thinking about words not as units of meaning but as patterns. Autocorrect essentially re-creates each word as both a shape and a sequence, so that the word hello is registered as five letters but also as the actual layout and flow of those letters when you type them one by one. “We took each word in the dictionary and gave it a little representative constellation,” he told me, “and autocorrect did this little geometry that said, ‘Here’s the pattern you created; what’s the closest-looking [word] to that?’”

That’s how it corrects: It guesses which word you meant by judging when you hit letters close to that physical pattern on the keyboard. This is why, at least ideally, a phone will correct teh or thr to the. It’s all about probabilities. When people brand ChatGPT as a “super-powerful autocorrect,” this is what they mean: so-called large language models work in a similar way, guessing what word or phrase comes after the one before.

When early Android smartphones from Samsung, Google, and other companies were released, they also included autocorrect features that work much like Apple’s system: using context and geometry to guess what you meant to type. And that does work. If you were to pick up your phone right now and type in any old nonsense, you would almost certainly end up with real words. When you think about it, that’s sort of incredible. Autocorrect is so eager to decipher letters that out of nonsense you still get something like meaning.

Apple’s technology has also changed quite a bit since 2007, even if it doesn’t always feel that way. As language processing has evolved and chips have become more powerful, tech has gotten better at not just correcting typing errors but doing so based on the sentence it thinks we’re trying to write. In an email, a spokesperson for Apple said the basic mix of syntax and geometry still factors into autocorrect, but the system now also takes into account context and user habit.

And yet for all the tweaking and evolution, autocorrect is still far, far from perfect. Peruse Reddit or Twitter and frustrations with the system abound. Maybe your keyboard now recognizes some of the quirks of your typing—thankfully, mine finally gets Navneet right—but the advances in autocorrect are also partly why the tech remains so annoying. The reliance on context and user habit is genuinely helpful most of the time, but it also is the reason our phones will sometimes do that maddening thing where they change not only the word you meant to type but the one you’d typed before it too.

In some cases, autocorrect struggles because it tries to match our uniqueness to dictionaries or patterns it has picked out in the past. In attempting to learn and remember patterns, it can also learn from our mistakes. If you accidentally type thr a few too many times, the system might just leave it as is, precisely because it’s trying to learn. But what also seems to rile people up is that autocorrect still trips over the basics: It can be helpful when Id changes to I’d or Its to It’s at the beginning of a sentence, but infuriating when autocorrect does that when you neither want nor need it to.

That’s the thing with autocorrect: anticipating what you meant to say is tricky, because the way we use language is unpredictable and idiosyncratic. The quirks of idiom, the slang, the deliberate misspellings—all of the massive diversity of language is tough for these systems to understand. How we text our families or partners can be different from how we write notes or type things into Google. In a serious work email, autocorrect may be doing us a favor by changing np to no, but it’s just a pain when we meant “no problem” in a group chat with friends.

[Read: The difference between speaking and thinking]

Autocorrect is limited by the reality that human language sits in this strange place where it is both universal and incredibly specific, says Allison Parrish, an expert on language and computation at NYU. Even as autocorrect learns a bit about the words we use, it must, out of necessity, default to what is most common and popular: The dictionaries and geometric patterns accumulated by Apple and Google over years reflect a mean, an aggregate norm. “In the case of autocorrect, it does have a normative force,” Parrish told me, “because it’s built as a system for telling you what language should be.”

She pointed me to the example of twerk. The word used to get autocorrected because it wasn’t a recognized term. My iPhone now doesn’t mess with I love to twerk, but it doesn’t recognize many other examples of common Black slang, such as simp or finna. Keyboards are trying their best to adhere to how “most people” speak, but that concept is something of a fiction, an abstract idea rather than an actual thing. It makes for a fiendishly difficult technical problem. I’ve had to turn off autocorrect on my parents’ phones because their very ordinary habit of switching between English, Punjabi, and Hindi on the fly is something autocorrect simply cannot handle.

That doesn’t mean that autocorrect is doomed to be like this forever. Right now, you can ask ChatGPT to write a poem about cars in the style of Shakespeare and get something that is precisely that: “Oh, fair machines that speed upon the road, / With wheels that spin and engines that doth explode.” Other tools have used the text messages of a deceased loved one to create a chatbot that can feel unnervingly real. Yes, we are unique and irreducible, but there are patterns to how we text, and learning patterns is precisely what machines are good at. In a sense, the sudden chatbot explosion means that autocorrect has won: It is moving from our phones to all the text and ideas of the internet.

But how we write is a forever-unfinished process in a way that Shakespeare’s works are not. No level of autocorrect can figure out how we write before we’ve fully decided upon it ourselves, even if fulfilling that desire would end our constant frustration. The future of autocorrect will be a reflection of who or what is doing the improving. Perhaps it could  get better by somehow learning to treat us as unique. Or it could continue down the path of why it fails so often now: It thinks of us as just like everybody else.