Itemoids

ChatGPT

AI’s Fingerprints Were All Over the Election

The Atlantic

www.theatlantic.com › technology › archive › 2024 › 11 › ai-election-propaganda › 680677

The images and videos were hard to miss in the days leading up to November 5. There was Donald Trump with the chiseled musculature of Superman, hovering over a row of skyscrapers. Trump and Kamala Harris squaring off in bright-red uniforms (McDonald’s logo for Trump, hammer-and-sickle insignia for Harris). People had clearly used AI to create these—an effort to show support for their candidate or to troll their opponents. But the images didn’t stop after Trump won. The day after polls closed, the Statue of Liberty wept into her hands as a drizzle fell around her. Trump and Elon Musk, in space suits, stood on the surface of Mars; hours later, Trump appeared at the door of the White House, waving goodbye to Harris as she walked away, clutching a cardboard box filled with flags.

[Read: We haven’t seen the worst of fake news]

Every federal election since at least 2018 has been plagued with fears about potential disruptions from AI. Perhaps a computer-generated recording of Joe Biden would swing a key county, or doctored footage of a poll worker burning ballots would ignite riots. Those predictions never materialized, but many of them were also made before the arrival of ChatGPT, DALL-E, and the broader category of advanced, cheap, and easy-to-use generative-AI models—all of which seemed much more threatening than anything that had come before. Not even a year after ChatGPT was released in late 2022, generative-AI programs were used to target Trump, Emmanuel Macron, Biden, and other political leaders. In May 2023, an AI-generated image of smoke billowing out of the Pentagon caused a brief dip in the U.S. stock market. Weeks later, Ron DeSantis’s presidential primary campaign appeared to have used the technology to make an advertisement.

And so a trio of political scientists at Purdue University decided to get a head start on tracking how generative AI might influence the 2024 election cycle. In June 2023, Christina Walker, Daniel Schiff, and Kaylyn Jackson Schiff started to track political AI-generated images and videos in the United States. Their work is focused on two particular categories: deepfakes, referring to media made with AI, and “cheapfakes,” which are produced with more traditional editing software, such as Photoshop. Now, more than a week after polls closed, their database, along with the work of other researchers, paints a surprising picture of how AI appears to have actually influenced the election—one that is far more complicated than previous fears suggested.

The most visible generated media this election have not exactly planted convincing false narratives or otherwise deceived American citizens. Instead, AI-generated media have been used for transparent propaganda, satire, and emotional outpourings: Trump, wading in a lake, clutches a duck and a cat (“Protect our ducks and kittens in Ohio!”); Harris, enrobed in a coppery blue, struts before the Statue of Liberty and raises a matching torch. In August, Trump posted an AI-generated video of himself and Musk doing a synchronized TikTok dance; a follower responded with an AI image of the duo riding a dragon. The pictures were fake, sure, but they weren’t feigning otherwise. In their analysis of election-week AI imagery, the Purdue team found that such posts were far more frequently intended for satire or entertainment than false information per se. Trump and Musk have shared political AI illustrations that got hundreds of millions of views. Brendan Nyhan, a political scientist at Dartmouth who studies the effects of misinformation, told me that the AI images he saw “were obviously AI-generated, and they were not being treated as literal truth or evidence of something. They were treated as visual illustrations of some larger point.” And this usage isn’t new: In the Purdue team’s entire database of fabricated political imagery, which includes hundreds of entries, satire and entertainment were the two most common goals.

That doesn’t mean these images and videos are merely playful or innocuous. Outrageous and false propaganda, after all, has long been an effective way to spread political messaging and rile up supporters. Some of history’s most effective propaganda campaigns have been built on images that simply project the strength of one leader or nation. Generative AI offers a low-cost and easy tool to produce huge amounts of tailored images that accomplish just this, heightening existing emotions and channeling them to specific ends.

These sorts of AI-generated cartoons and agitprop could well have swayed undecided minds, driven turnout, galvanized “Stop the Steal” plotting, or driven harassment of election officials or racial minorities. An illustration of Trump in an orange jumpsuit emphasizes Trump’s criminal convictions and perceived unfitness for the office, while an image of Harris speaking to a sea of red flags, a giant hammer-and-sickle above the crowd, smears her as “woke” and a “Communist.” An edited image showing Harris dressed as Princess Leia kneeling before a voting machine and captioned “Help me, Dominion. You’re my only hope” (an altered version of a famous Star Wars line) stirs up conspiracy theories about election fraud. “Even though we’re noticing many deepfakes that seem silly, or just seem like simple political cartoons or memes, they might still have a big impact on what we think about politics,” Kaylyn Jackson Schiff told me. It’s easy to imagine someone’s thought process: That image of “Comrade Kamala” is AI-generated, sure, but she’s still a Communist. That video of people shredding ballots is animated, but they’re still shredding ballots. That’s a cartoon of Trump clutching a cat, but immigrants really are eating pets. Viewers, especially those already predisposed to find and believe extreme or inflammatory content, may be further radicalized and siloed. The especially photorealistic propaganda might even fool someone if reshared enough times, Walker told me.

[Read: I’m running out of ways to explain how bad this is]

There were, of course, also a number of fake images and videos that were intended to directly change people’s attitudes and behaviors. The FBI has identified several fake videos intended to cast doubt on election procedures, such as false footage of someone ripping up ballots in Pennsylvania. “Our foreign adversaries were clearly using AI” to push false stories, Lawrence Norden, the vice president of the Elections & Government Program at the Brennan Center for Justice, told me. He did not see any “super innovative use of AI,” but said the technology has augmented existing strategies, such as creating fake-news websites, stories, and social-media accounts, as well as helping plan and execute cyberattacks. But it will take months or years to fully parse the technology’s direct influence on 2024’s elections. Misinformation in local races is much harder to track, for example, because there is less of a spotlight on them. Deepfakes in encrypted group chats are also difficult to track, Norden said. Experts had also wondered whether the use of AI to create highly realistic, yet fake, videos showing voter fraud might have been deployed to discredit a Trump loss. This scenario has not yet been tested.

Although it appears that AI did not directly sway the results last week, the technology has eroded Americans’ overall ability to know or trust information and one another—not deceiving people into believing a particular thing so much as advancing a nationwide descent into believing nothing at all. A new analysis by the Institute for Strategic Dialogue of AI-generated media during the U.S. election cycle found that users on X, YouTube, and Reddit inaccurately assessed whether content was real roughly half the time, and more frequently thought authentic content was AI-generated than the other way around. With so much uncertainty, using AI to convince people of alternative facts seems like a waste of time—far more useful to exploit the technology to directly and forcefully send a motivated message, instead. Perhaps that’s why, of the election-week, AI-generated media the Purdue team analyzed, pro-Trump and anti-Kamala content was most common.

More than a week after Trump’s victory, the use of AI for satire, entertainment, and activism has not ceased. Musk, who will soon co-lead a new extragovernmental organization, routinely shares such content. The morning of November 6, Donald Trump Jr. put out a call for memes that was met with all manner of AI-generated images. Generative AI is changing the nature of evidence, yes, but also that of communication—providing a new, powerful medium through which to illustrate charged emotions and beliefs, broadcast them, and rally even more like-minded people. Instead of an all-caps thread, you can share a detailed and personalized visual effigy. These AI-generated images and videos are instantly legible and, by explicitly targeting emotions instead of information, obviate the need for falsification or critical thinking at all. No need to refute, or even consider, a differing view—just make an angry meme about it. No need to convince anyone of your adoration of J. D. Vance—just use AI to make him, literally, more attractive. Veracity is beside the point, which makes the technology perhaps the nation’s most salient mode of political expression. In a country where facts have gone from irrelevant to detestable, of course deepfakes—fake news made by deep-learning algorithms—don’t matter; to growing numbers of people, everything is fake but what they already know, or rather, feel.

A Classic Blockbuster for a Sunday Afternoon

The Atlantic

www.theatlantic.com › newsletters › archive › 2024 › 11 › a-classic-blockbuster-for-a-sunday-afternoon › 680671

This story seems to be about:

This is an edition of The Atlantic Daily, a newsletter that guides you through the biggest stories of the day, helps you discover new ideas, and recommends the best in culture. Sign up for it here.

Welcome back to The Daily’s Sunday culture edition, in which one Atlantic writer or editor reveals what’s keeping them entertained. Today’s special guest is Jen Balderama, a Culture editor who leads the Family section and works on stories about parenting, language, sex, and politics (among other topics).

Jen grew up training as a dancer and watching classic movies with her mom, which instilled in her a love for film and its artistry. Her favorites include Doctor Zhivago, In the Mood for Love, and Pina; she will also watch anything starring Cate Blanchett, an actor whose “ability to inhabit is simply unmatched.”

The Culture Survey: Jen Balderama

My favorite blockbuster film: I’m grateful that when I was quite young, my mom started introducing me to her favorite classic movies—comedies, romances, noirs, epics—which I’m pretty sure had a lasting influence on my taste. So for a blockbuster, I have to go with a nostalgia pick: Doctor Zhivago. The hours we spent watching this movie, multiple times over the years, each viewing an afternoon-long event. (The film, novelty of novelties, had its own intermission!) My mom must have been confident that the more adult elements—the rape, the politics—would go right over my head, but that I could appreciate the movie for its aesthetics. She had a huge crush on Omar Sharif and swooned over the soft-focus close-ups of his watering eyes. I was entranced by the landscapes and costumes and sets—the bordello reds of the Sventitskys’ Christmas party, the icy majesty of the Varykino dacha in winter. But I was also taken by the film’s sheer scope, its complexity, and the fleshly and revolutionary messiness. I’m certain it helped ingrain in me, early, an enduring faith in art and artists as preservers of humanity, especially in dark, chaotic times. [Related: Russia from within: Boris Pasternak’s first novel]

My favorite art movie: May I bend the rules? Because I need to pick two: Wong Kar Wai’s In the Mood for Love and Wim Wenders’s Pina. One is fiction, the other documentary. Both are propelled by yearning and by music. Both give us otherworldly depictions of bodies in motion. And both delve into the ways people communicate when words go unspoken.

In the Mood for Love might be the dead-sexiest film I’ve ever seen, and no one takes off their clothes. Instead we get Maggie Cheung and Tony Leung in a ravishing tango of loaded phone calls and intense gazes, skin illicitly brushing skin, figures sliding past each other in close spaces: electricity.

Pina is Wenders’s ode to the German choreographer Pina Bausch, a collaboration that became an elegy after Bausch died when the film was in preproduction. Reviewing the movie for The New York Times in 2017, the critic Gia Kourlas, whom I admire, took issue with one of Wenders’s choices: In between excerpts of Bausch’s works, her dancers sit for “interviews,” but they don’t speak to camera; recordings of their voices play as they look toward the audience or off into the distance. Kourlas wrote that these moments felt “mannered, self-conscious”; they made her “wince.” But to me, a (highly self-conscious) former dancer, Wenders nailed it—I’ve long felt more comfortable expressing myself through dance than through spoken words. These scenes are a brilliantly meta distillation of that tension: Dancers with something powerful to say remain outwardly silent, their insights played as inner narrative. Struck by grief, mouths closed, they articulate how Bausch gave them the gift of language through movement—and thus offered them the gift of themselves. Not for nothing do I have one of Bausch’s mottos tattooed on my forearm: “Dance, dance, otherwise we are lost.”

An actor I would watch in anything: Cate Blanchett. Her ability to inhabit is simply unmatched: She can play woman, man, queen, elf, straight/gay/fluid, hero/antihero/villain. Here I’m sure I’ll scandalize many of our readers by saying out loud that I am not a Bob Dylan person, but I watched Todd Haynes’s I’m Not There precisely because Blanchett was in it—and her roughly 30 minutes as Dylan were all I needed. She elevates everything she appears in, whether it’s deeply serious or silly. I’m particularly captivated by her subtleties, the way she turns a wrist or tilts her head with the grace and precision of a dancer’s épaulement. (Also: She is apparently hilarious.)

An online creator I’m a fan of: Elle Cordova, a musician turned prolific writer of extremely funny, often timely, magnificently nerdy poems, sketches, and songs, performed in a winning low-key deadpan. I was tipped off to her by a friend who sent a link to a video and wrote: “I think I’m falling for this woman.” The vid was part of a series called “Famous authors asking you out”—Cordova parroting Jane Austen, Charles Bukowski, Franz Kafka, Edgar Allan Poe (“Should I come rapping at your chamber door, or do you wanna rap at mine?”), Dr. Seuss, Kurt Vonnegut, Virginia Woolf, James Joyce (“And what if we were to talk a pretty yes in the endbegin of riverflow and moon’s own glimpsing heartclass …”). She does literature. She does science. She parodies pretentious podcasters; sings to an avocado; assumes the characters of fonts, planets, ChatGPT, an election ballot. Her brain is a marvel; no way can AI keep up.

Something delightful introduced to me by a kid in my life: Lego Masters Australia. Technically, we found this one together, but I watch Lego Masters because my 10-year-old is a Lego master himself—he makes truly astonishing creations!—and this is the kind of family entertainment I can get behind: Skilled obsessives, working in pairs, turn the basic building blocks of childhood into spectacular works of architecture and engineering, in hopes of winning glory, prize money, and a big ol’ Lego trophy. They can’t churn out the episodes fast enough for us. The U.S. has a version hosted by Will Arnett, which we also watch, but our family finds him a bit … over-the-top. We much prefer the Australian edition, hosted by the comedian Hamish Blake and judged by “Brickman,” a.k.a. Lego Certified Professional Ryan McNaught, both of whom exude genuine delight and affection for the contestants. McNaught has teared up during critiques of builds, whether gobsmacked by their beauty or moved by the tremendous effort put forth by the builders. It’s a show about teamwork, ingenuity, artistry, hilarity, physics, stamina, and grit—with a side helping of male vulnerability. [Related: Solving a museum’s bug problem with Legos]

A poem that I return to: Joint Custody,” by Ada Limón. My family is living this. Limón, recalling a childhood of being “taken /  back and forth on Sundays,” of shifting between “two different / kitchen tables, two sets of rules,” reassures me that even though this is sometimes “not easy,” my kids will be okay—more than okay—as long as they know they are “loved each place.” That beautiful wisdom guides my every step with them.

Something I recently rewatched: My mom died when my son was 2 and my daughter didn’t yet exist, and each year around this time—my mom’s birthday—I find little ways to celebrate her by sharing with my kids the things she loved. Chocolate was a big one, I Love Lucy another. So on a recent weekend, we snuggled up and watched Lucille Ball stuffing bonbons down the front of her shirt, and laughed and laughed and laughed. And then we raided a box of truffles.

Here are three Sunday reads from The Atlantic:

How the Ivy League broke America The secret to thinking your way out of anxiety How one woman became the scapegoat for America’s reading crisis

The Week Ahead

Gladiator II, an action film starring Paul Mescal as Lucius, the son of Maximus, who becomes a gladiator and seeks to save Rome from tyrannical leaders (in theaters Friday) Dune: Prophecy, a spin-off prequel series about the establishment of the Bene Gesserit (premieres today on HBO and Max) An Earthquake Is a Shaking of the Surface of the Earth, a novel by Anna Moschovakis about an unnamed protagonist who attempts to find—and eliminate—her housemate, who was lost after a major earthquake (out Tuesday)

Essay

Illustration by Raisa Álava

What the Band Eats

By Reya Hart

I grew up on the road. First on the family bus, traveling from city to city to watch my father, Mickey Hart, play drums with the Grateful Dead and Planet Drum, and then later with the various Grateful Dead offshoots. When I was old enough, I joined the crew, working for Dead & Company, doing whatever I could be trusted to handle … Then, late-night, drinking whiskey from the bottle with the techs, sitting in the emptying parking lot as the semitrucks and their load-out rumble marked the end of our day.

But this summer, for the first time in the band’s history, there would be no buses; there would be no trucks. Instead we stayed in one place, trading the rhythms of a tour for the dull ache of a long, endlessly hot Las Vegas summer.

Read the full article.

More in Culture

The exhibit that will change how you see Impressionism SNL isn’t bothering with civility anymore. Abandon the empty nest. Instead, try the open door. Richard Price’s radical, retrograde novel “Dear James”: How can I find more satisfaction in work?

Catch Up on The Atlantic

Why the Gaetz announcement is already destroying the government The sanewashing of RFK Jr. The not-so-woke Generation Z

Photo Album

People feed seagulls in the Yamuna River, engulfed in smog, in New Delhi, India. (Arun Sankar / AFP / Getty)

Check out these photos of the week, showing speed climbing in Saudi Arabia, wildfires in California and New Jersey, a blanket of smog in New Delhi, and more.

Explore all of our newsletters.

When you buy a book using a link in this newsletter, we receive a commission. Thank you for supporting The Atlantic.

The Hollywood AI Database

The Atlantic

www.theatlantic.com › technology › archive › 2024 › 11 › opensubtitles-ai-data-set › 680650

Editor’s note: This analysis is part of The Atlantic’s investigation into the OpenSubtitles data set. You can access the search tool directly here. Find The Atlantic's search tool for books used to train AI here.

For as long as generative-AI chatbots have been on the internet, Hollywood writers have wondered if their work has been used to train them. The chatbots are remarkably fluent with movie references, and companies seem to be training them on all available sources. One screenwriter recently told me he’s seen generative AI reproduce close imitations of The Godfather and the 1980s TV show Alf, but he had no way to prove that a program had been trained on such material.

I can now say with absolute confidence that many AI systems have been trained on TV and film writers’ work. Not just on The Godfather and Alf, but on more than 53,000 other movies and 85,000 other TV episodes: Dialogue from all of it is included in an AI-training data set that has been used by Apple, Anthropic, Meta, Nvidia, Salesforce, Bloomberg, and other companies. I recently downloaded this data set, which I saw referenced in papers about the development of various large language models (or LLMs). It includes writing from every film nominated for Best Picture from 1950 to 2016, at least 616 episodes of The Simpsons, 170 episodes of Seinfeld, 45 episodes of Twin Peaks, and every episode of The Wire, The Sopranos, and Breaking Bad. It even includes prewritten “live” dialogue from Golden Globes and Academy Awards broadcasts. If a chatbot can mimic a crime-show mobster or a sitcom alien—or, more pressingly, if it can piece together whole shows that might otherwise require a room of writers—data like this are part of the reason why.

[Read: These 183,000 books are fueling the biggest fight in publishing and tech]

The files within this data set are not scripts, exactly. Rather, they are subtitles taken from a website called OpenSubtitles.org. Users of the site typically extract subtitles from DVDs, Blu-ray discs, and internet streams using optical-character-recognition (OCR) software. Then they upload the results to OpenSubtitles.org, which now hosts more than 9 million subtitle files in more than 100 languages and dialects. Though this may seem like a strange source for AI-training data, subtitles are valuable because they’re a raw form of written dialogue. They contain the rhythms and styles of spoken conversation and allow tech companies to expand generative AI’s repertoire beyond academic texts, journalism, and novels, all of which have also been used to train these programs. Well-written speech is a rare commodity in the world of AI-training data, and it may be especially valuable for training chatbots to “speak” naturally.

According to research papers, the subtitles have been used by Anthropic to train its ChatGPT competitor, Claude; by Meta to train a family of LLMs called Open Pre-trained Transformer (OPT); by Apple to train a family of LLMs that can run on iPhones; and by Nvidia to train a family of NeMo Megatron LLMs. It has also been used by Salesforce, Bloomberg, EleutherAI, Databricks, Cerebras, and various other AI developers to build at least 140 open-source models distributed on the AI-development hub Hugging Face. Many of these models could potentially be used to compete with human writers, and they’re built without permission from those writers.

When I reached out to Anthropic for this article, the company did not provide a comment on the record. When I’ve previously spoken with Anthropic about its use of this data set, a spokesperson told me the company had “trained our generative-AI assistant Claude on the public dataset The Pile,” of which OpenSubtitles is a part, and “which is commonly used in the industry.” A Salesforce spokesperson told me that although the company has used OpenSubtitles in generative-AI development, the data set “was never used to inform or enhance any of Salesforce’s product offerings.” Apple similarly told me that its small LLM was intended only for research. However, both Salesforce and Apple, like other AI developers, have made their models available for developers to use in any number of different contexts. All other companies mentioned in this article—Nvidia, Bloomberg, EleutherAI, Databricks, and Cerebras—either declined to comment or did not respond to requests for comment.

You may search through the data set using the tool below.

Two years after the release of ChatGPT, it may not be surprising that creative work is used without permission to power AI products. Yet the notion remains disturbing to many artists and professionals who feel that their craft and livelihoods are threatened by programs. Transparency is generally low: Tech companies tend not to advertise whose work they use to train their products. The legality of training on copyrighted work also remains an open question. Numerous lawsuits have been brought against tech companies by writers, actors, artists, and publishers alleging that their copyrights have been violated in the AI-training process: As Breaking Bad’s creator, Vince Gilligan, wrote to the U.S. Copyright Office last year, generative AI amounts to “an extraordinarily complex and energy-intensive form of plagiarism.” Tech companies have argued that training AI systems on copyrighted work is “fair use,” but a court has yet to rule on this claim. In the language of copyright law, subtitles are likely considered derivative works, and a court would generally see them as protected by the same rules against copying and distribution as the movies they’re taken from. The OpenSubtitles data set has circulated among AI developers since 2020. It is part of the Pile, a collection of data sets for training generative AI. The Pile also includes text from books, patent applications, online discussions, philosophical papers, YouTube-video subtitles, and more. It’s an easy way for companies to start building AI systems without having to find and download the many gigabytes of high-quality text that LLMs require.

[Read: Generative AI is challenging a 234-year-old law]

OpenSubtitles can be downloaded by anyone who knows where to look, but as with most AI-training data sets, it’s not easy to understand what’s in it. It’s a 14-gigabyte text file with short lines of unattributed dialogue—meaning the speaker is not identified. There’s no way to tell where one movie ends and the next begins, let alone what the movies are. I downloaded a “raw” version of the data set, in which the movies and episodes were separated into 446,612 files and stored in folders whose names corresponded to the ID numbers of movies and episodes listed on IMDb.com. Most folders contained multiple subtitle versions of the same movie or TV show (different releases may be tweaked in various ways), but I was able to identify at least 139,000 unique movies and episodes. I downloaded metadata associated with each title from the OpenSubtitles.org website—allowing me to map actors and directors to each title, for instance—and used it to build the tool above.

The OpenSubtitles data set adds yet another wrinkle to a complex narrative around AI, in which consent from artists and even the basic premise of the technology are points of contention. Until very recently, no writer putting pen to paper on a script would have thought their creative work might be used to train programs that could replace them. And the subtitles themselves were not originally intended for this purpose, either. The multilingual OpenSubtitles data set contained subtitles in 62 different languages and 1,782 language-pair combinations: It is meant for training the models behind apps such as Google Translate and DeepL, which can be used to translate websites, street signs in a foreign country, or an entire novel. Jörg Tiedemann, one of the data set’s creators, wrote in an email that he was happy to see OpenSubtitles being used in LLM development, too, even though that was not his original intention.

He is, in any case, powerless to stop it. The subtitles are on the internet, and there’s no telling how many independent generative-AI programs they’ve been used for, or how much synthetic writing those programs have produced. But now, at least, we know a bit more about who is caught in the machinery. What will the world decide they are owed?