Itemoids

OpenAI

What Happens When AI Has Read Everything?

The Atlantic

www.theatlantic.com › technology › archive › 2023 › 01 › artificial-intelligence-ai-chatgpt-dall-e-2-learning › 672754

Artificial intelligence has in recent years proved itself to be a quick study, although it is being educated in a manner that would shame the most brutal headmaster. Locked into airtight Borgesian libraries for months with no bathroom breaks or sleep, AIs are told not to emerge until they’ve finished a self-paced speed course in human culture. On the syllabus: a decent fraction of all the surviving text that we have ever produced.

When AIs surface from these epic study sessions, they possess astonishing new abilities. People with the most linguistically supple minds—hyperpolyglots—can reliably flip back and forth between a dozen languages; AIs can now translate between more than 100 in real time. They can churn out pastiche in a range of literary styles and write passable rhyming poetry. DeepMind’s Ithaca AI can glance at Greek letters etched into marble and guess the text that was chiseled off by vandals thousands of years ago.

These successes suggest a promising way forward for AI’s development: Just shovel ever-larger amounts of human-created text into its maw, and wait for wondrous new skills to manifest. With enough data, this approach could perhaps even yield a more fluid intelligence, or a humanlike artificial mind akin to those that haunt nearly all of our mythologies of the future.

The trouble is that, like other high-end human cultural products, good prose ranks among the most difficult things to produce in the known universe. It is not in infinite supply, and for AI, not any old text will do: Large language models trained on books are much better writers than those trained on huge batches of social-media posts. (It’s best not to think about one’s Twitter habit in this context.) When we calculate how many well-constructed sentences remain for AI to ingest, the numbers aren’t encouraging. A team of researchers led by Pablo Villalobos at Epoch AI recently predicted that programs such as the eerily impressive ChatGPT will run out of high-quality reading material by 2027. Without new text to train on, AI’s recent hot streak could come to a premature end.

It should be noted that only a slim fraction of humanity’s total linguistic creativity is available for reading. More than 100,000 years have passed since radically creative Africans transcended the emotive grunts of our animal ancestors and began externalizing their thoughts into extensive systems of sounds. Every notion expressed in those protolanguages—and many languages that followed—is likely lost for all time, although it gives me pleasure to imagine that a few of their words are still with us. After all, some English words have a shockingly ancient vintage: Flow, mother, fire, and ash come down to us from Ice Age peoples.

Writing has allowed human beings to capture and store a great many more of our words. But like most new technologies, writing was expensive at first, which is why it was initially used primarily for accounting. It took time to bake and dampen clay for your stylus, to cut papyrus into strips fit to be latticed, to house and feed the monks who inked calligraphy onto vellum. These resource-intensive techniques could preserve only a small sampling of humanity’s cultural output.   

Not until the printing press began machine-gunning books into the world did our collective textual memory achieve industrial scale. Researchers at Google Books estimate that since Gutenberg, humans have published more than 125 million titles, collecting laws, poems, myths, essays, histories, treatises, and novels. The Epoch team estimates that 10 million to 30 million of these books have already been digitized, giving AIs a reading feast of hundreds of billions of, if not more than a trillion, words.

[Read: The end of high-school English]

Those numbers may sound impressive, but they’re within range of the 500 billion words that trained the model that powers ChatGPT. Its successor, GPT-4, might be trained on tens of trillions of words. Rumors suggest that when GPT-4 is released later this year, it will be able to generate a 60,000-word novel from a single prompt.

Ten trillion words is enough to encompass all of humanity’s digitized books, all of our digitized scientific papers, and much of the blogosphere. That’s not to say that GPT-4 will have read all of that material, only that doing so is well within its technical reach. You could imagine its AI successors absorbing our entire deep-time textual record across their first few months, and then topping up with a two-hour reading vacation each January, during which they could mainline every book and scientific paper published the previous year.

Just because AIs will soon be able to read all of our books doesn’t mean they can catch up on all of the text we produce. The internet’s storage capacity is of an entirely different order, and it’s a much more democratic cultural-preservation technology than book publishing. Every year, billions of people write sentences that are stockpiled in its databases, many owned by social-media platforms.

Random text scraped from the internet generally doesn’t make for good training data, with Wikipedia articles being a notable exception. But perhaps future algorithms will allow AIs to wring sense from our aggregated tweets, Instagram captions, and Facebook statuses. Even so, these low-quality sources won’t be inexhaustible. According to Villalobos, within a few decades, speed-reading AIs will be powerful enough to ingest hundreds of trillions of words—including all those that human beings have so far stuffed into the web.

Not every AI is an English major. Some are visual learners, and they too may one day face a training-data shortage. While the speed-readers were bingeing the literary canon, these AIs were strapped down with their eyelids held open, Clockwork Orange–style, for a forced screening comprising millions of images. They emerged from their training with superhuman vision. They can recognize your face behind a mask, or spot tumors that are invisible to the radiologist’s eye. On night drives, they can see into the gloomy roadside ahead where a young fawn is working up the nerve to chance a crossing.

Most impressive, AIs trained on labeled pictures have begun to develop a visual imagination. OpenAI’s DALL-E 2 was trained on 650 million images, each paired with a text label. DALL-E 2 has seen the ocher handprints that Paleolithic humans pressed onto cave ceilings. It can emulate the different brushstroke styles of Renaissance masters. It can conjure up photorealistic macros of strange animal hybrids. An animator with world-building chops can use it to generate a Pixar-style character, and then surround it with a rich and distinctive environment.

[Read: Generative art is stupid]

Thanks to our tendency to post smartphone pics on social media, human beings produce a lot of labeled images, even if the label is just a short caption or geotag. As many as 1 trillion such images are uploaded to the internet every year, and that doesn’t include YouTube videos, each of which is a series of stills. It’s going to take a long time for AIs to sit through our species’ collective vacation-picture slideshow, to say nothing of our entire visual output. According to Villalobos, our training-image shortage won’t be acute until sometime between 2030 and 2060.

If indeed AIs are starving for new inputs by midcentury—or sooner, in the case of text—the field’s data-powered progress may slow considerably, putting artificial minds and all the rest out of reach. I called Villalobos to ask him how we might increase human cultural production for AI. “There may be some new sources coming online,” he told me. “The widespread adoption of self-driving cars would result in an unprecedented amount of road video recordings.”

Villalobos also mentioned “synthetic” training data created by AIs. In this scenario, large language models would be like the proverbial monkeys with typewriters, only smarter and possessed of functionally infinite energy. They could pump out billions of new novels, each of Tolstoyan length. Image generators could likewise create new training data by tweaking existing snapshots, but not so much that they fall afoul of their labels. It’s not yet clear whether AIs will learn anything new by cannibalizing data that they themselves create. Perhaps doing so will only dilute the predictive potency they gleaned from human-made text and images. “People haven’t used a lot of this stuff, because we haven’t yet run out of data,” Jaime Sevilla, one of Villalobos’s colleagues, told me.

Villalobos’s paper discusses a more unsettling set of speculative work-arounds. We could, for instance, all wear dongles around our necks that record our every speech act. According to one estimate, people speak 5,000 to 20,000 words a day on average. Across 8 billion people, those pile up quickly. Our text messages could also be recorded and stripped of identifying metadata. We could subject every white-collar worker to anonymized keystroke recording, and firehose what we capture into giant databases to be fed into our AIs. Villalobos noted drily that fixes such as these are currently “well outside the Overton window.”

Perhaps in the end, big data will have diminishing returns. Just because our most recent AI winter was thawed out by giant gobs of text and imagery doesn’t mean our next one will be. Maybe instead, it will be an algorithmic breakthrough or two that at last populate our world with artificial minds. After all, we know that nature has authored its own modes of pattern recognition, and that so far, they outperform even our best AIs. My 13-year-old son has ingested orders of magnitude fewer words than ChatGPT, yet he has a much more subtle understanding of written text. If it makes sense to say that his mind runs on algorithms, they’re better algorithms than those used by today’s AIs.

[Read: Five remarkable chats that will help you understand ChatGPT]

If, however, our data-gorging AIs do someday surpass human cognition, we will have to console ourselves with the fact that they are made in our image. AIs are not aliens. They are not the exotic other. They are of us, and they are from here. They have gazed upon the Earth’s landscapes. They have seen the sun setting on its oceans billions of times. They know our oldest stories. They use our names for the stars. Among the first words they learn are flow, mother, fire, and ash.

AI Is Not the New Crypto

The Atlantic

www.theatlantic.com › newsletters › archive › 2023 › 01 › ai-is-not-the-new-crypto › 672746

This is an edition of The Atlantic Daily, a newsletter that guides you through the biggest stories of the day, helps you discover new ideas, and recommends the best in culture. Sign up for it here.

Recent breakthroughs in generative AI, such as the image generator DALL-E and the large language model ChatGPT, are “potentially akin to the release of the iPhone in 2007, or to the invention of the desktop computer,” Derek Thompson told me in December. Here are the latest AI developments to watch in the coming weeks and months.

But first, three new stories from The Atlantic.

The Supreme Court justices do not seem to be getting along. Asymmetrical conspiracism is hurting democracy. Western aid to Ukraine is still not enough.

Hype Machines

Investors are pouring money into AI.

Last year, investors put at least $1.37 billion into generative-AI companies across 78 deals—almost as much as they invested in the previous five years combined, according to the market-data company Pitchbook.

Microsoft, in particular, has taken a big leap: Since 2019, the company has invested $3 billion in OpenAI, which designed DALL-E and ChatGPT, and it’s reportedly in talks to invest another $10 billion. Microsoft purchased an exclusive license to some of OpenAI’s technology, and it’s working with OpenAI on a new version of its search engine, Bing, that would incorporate a ChatGPT-like tool.

Schools are concerned about academic integrity.

How will these tools change our lives? As Derek told me recently: “We don’t know. The architects of those technologies barely know. But it’s so interesting to play with, and the technology is improving so quickly, that we should absolutely take it seriously, as if it’s something that can’t be avoided.”

Some universities are modifying their courses to minimize the risk of students handing in essays generated by an AI tool. And they’ll likely have to deal with even more capable tools soon—OpenAI reportedly plans to release GPT-4, which would be better than the current versions at generating text. Meanwhile, a 22-year-old computer-science student has built an app to identify whether a piece of text was written by a bot.

It may be time to worry about deepfakes—again.

You might remember that term from back in 2018, when media outlets and misinformation experts panicked about a rise of fake, realistic-looking videos. (In a famous example that BuzzFeed engineered, Barack Obama appeared to say “President Trump is a total and complete dipshit.”)

While that panic remained just that—a panic—advances in generative AI “have experts concerned that a deepfake apocalypse” is on the horizon, our assistant editor Matteo Wong reported last month. As AI-generated media get more advanced, these experts argue, in the next few years the internet will be flooded with forged videos and audio touting false information.

Tools such as ChatGPT might not be as smart as they seem …

Last week, the Atlantic staff writer Ian Bogost injected some skepticism into the debate over AI. “ChatGPT doesn’t actually know anything—instead, it outputs compositions that simulate knowledge through persuasive structure,” Bogost wrote. “As the novelty of that surprise wears off, it is becoming clear that ChatGPT is less a magical wish-granting machine than an interpretive sparring partner.” Could all this investment into the tech, he asks, be chasing after a bad idea?

But don’t expect the hype to evaporate anytime soon.

Some have asked whether we’re witnessing Crypto 2.0: A complex new technology captures media attention and investor money, only for some of the high-profile businesses built around it to spectacularly crash. But crypto is not a good model for thinking about artificial intelligence, Derek told me. “Crypto was money without utility,” he argued, while tools such as ChatGPT are, “for now, utility without money.” Generative AI is “clearly something, even if one wants to argue that the thing it is is, for now, a toy,” he said.

Plus, AI has already succeeded in a way that crypto never did, Derek noted. Although you may hear some people use artificial intelligence as a catch-all term, the technology that’s currently breaking ground is the generative kind—tools with the ability to create new content, such as text or images. We’ve all been living with artificial intelligence for years now. “Go on Instagram. Why are certain stories or posts above others? Because of AI,” Derek said. “You’re living in a world that AI built when you use the most famous social-media apps.”

Related:

Your creativity won’t save your job from AI. Generative art is stupid. That’s how it should be.

Today’s News

Last year, deaths in China outnumbered births for the first time in six decades, the government announced. The Swedish climate activist Greta Thunberg was detained by German police while protesting the planned expansion of a coal mine. The Tampa Bay Buccaneers wide receiver Russell Gage was taken to the hospital after suffering a concussion in Monday’s playoff game against the Dallas Cowboys.

Dispatches

Up for Debate: Readers share their thoughts about lab-grown meat.

Explore all of our newsletters here.

Evening Read

Martin Parr / Magnum

American Religion Is Not Dead Yet

By Wendy Cadge and Elan Babchuck

Take a drive down Main Street of just about any major city in the country, and—with the housing market ground to a halt—you might pass more churches for sale than homes. This phenomenon isn’t likely to change anytime soon; according to the author of a 2021 report on the future of religion in America, 30 percent of congregations are not likely to survive the next 20 years. Add in declining attendance and dwindling affiliation rates, and you’d be forgiven for concluding that American religion is heading toward extinction.

But the old metrics of success—attendance and affiliation, or, more colloquially, “butts, budgets, and buildings”—may no longer capture the state of American religion. Although participation in traditional religious settings (churches, synagogues, mosques, schools, etc.) is in decline, signs of life are popping up elsewhere: in conversations with chaplains, in communities started online that end up forming in-person bonds as well, in social-justice groups rooted in shared faith.

Read the full article.

More From The Atlantic

Elon Musk can’t solve Twitter’s “shadowbanning” problem. The literary legacy of C. Michael Curtis People’s choice: Wildlife photographer of the year

Culture Break

Still from Netflix's 'The Lying Life of Adults' (Eduardo Castaldo / Netflix)

Read. Still Pictures, Janet Malcolm’s posthumous memoir, critiques the idea of memoir itself.

Watch. The Lying Life of Adults, Netflix’s adaptation of Elena Ferrante’s novel, is at times maddening in its slowness—but it’s also stunning in a way that nothing has really been since Mad Men, our critic writes.

Play our daily crossword.

P.S.

If you’re looking to dive deeper down the AI rabbit hole, I recommend the technology writer Max Read’s newsletter, Read Max. Read is undertaking a project to figure out how we should be thinking about AI, and last week, he listed seven thoughtful, provocative questions he’s using to guide his research, including “Why didn’t previous advances in AI tech create as much of a stir?” and “Is AI bullshit?”

— Isabel