Itemoids

Academy Awards

Cher Has No Time for Nostalgia

The Atlantic

www.theatlantic.com › culture › archive › 2024 › 11 › cher-memoir-review › 680726

File this under something that should have been self-evident: When it came time for the artist known as Cher to finish her memoir, she discovered she had too much material. Where to even begin? Decades before Madonna had reinventions and Taylor Swift had eras, Cher had comebacks—triumphs over decline in which she’d reemerge stronger, shinier, and more resolute than ever. “It’s a thousand times harder to come back than to become,” she writes in the first volume of her autobiography, titled—naturally—Cher. And yet something in her soul seems to always relish the challenge. A walking, singing eye roll, Cher has never met an obstacle without theatrically raising a middle finger. Consider the gown she wore to present at the Academy Awards in 1986 after having been snubbed for her performance in Peter Bogdanovich’s Mask: the cobwebbed, midsection-baring, black sequined supervillainess outfit that became known as her fuck the Oscars dress. Radiantly moody, she glowered her way right into awards-show history.

But much of that later timeline is for the second volume, supposedly arriving next year. Cher, which documents the four decades between her birth, in 1946, and the start of her serious acting career, in 1980, is concerned with the essentials: where she came from, who she is, all the incidents that helped her become one of music’s most indelible mononyms. I guarantee that, as you read, you’ll be able to conjure the sound of her voice in your mind, velvety and sonorous. (“You couldn't tell who was singing the baritone parts,” The New York Times noted in 1988 about “I Got You, Babe,” her duet with Sonny Bono, “but you had the disturbing feeling that it probably wasn't Sonny.”) And likely her face, too: her doll-like features, sphinxlike smile, and black, black hair. More than anything, though, Cher has come to stand for a brassy, strutting kind of survival over the years, and on this front, her memoir is awash in insight and rich in details.

Cher is a bracing read, peppered with caustic quips and self-effacing anecdotes, but fundamentally frank. This, you might agree, is no moment for nostalgia. (She does not—forgive the cheap gag—actually want to turn back time.) “Ours was a sad, strange story of Southern folk coming from nothing and carving out a life after the Great Depression,” Cher writes. “It wasn’t pretty and it was never easy … Resilience is in my DNA.” Her grandmother was 12 years old when she became pregnant with Cher’s mother, Jackie Jean; her grandfather Roy was a baker’s assistant turned bootlegger who beat his new wife, made his daughter sing for pennies on top of the bars he’d drink at, and once tried to murder both his children by leaving the gas stove on. For much of Cher’s infancy—she was born Cheryl Sarkisian but changed her name in 1978—she was raised by nuns, after her father abandoned her 20-year-old mother. Later, her mother, who had a muted acting career, cycled through seven or eight husbands and two illegal abortions that almost killed her. Although Jackie was a talented performer and luminously beautiful, “my mom missed out on several major acting roles because she refused to sleep with men who promised her a break,” Cher notes. The stepfather who was kindest to young Cher was also a nasty drunk, to the point where, even now, “I still can’t stand the sound of a belt coming out of pant loops.”

From early childhood, Cher was a dynamo—singing perpetually into a hairbrush, dancing around the house, and peeing her pants during a screening of Dumbo rather than miss any of the movie. She dreamed of being a star, and, less conventionally, of discovering a cure for polio. (“When Jonas Salk invented a vaccine, I was so pissed off,” she writes.) Because of her mother’s erratic relationships, she moved constantly, all over the country. By 15, she was living in Los Angeles, where she recounts being leered at by Telly Savalas in a photographer’s studio and spending a wild night or two with Warren Beatty. At 16, she met the man who’d become her partner in all senses of the word: a divorced, charming, slightly squirelly 27-year-old named Sonny Bono. “He liked that I was quirky and nonjudgmental,” Cher writes. “I liked that he was funny and different. He was a grown-up without being too grown up, and I was a sixteen-year-old lying about my age.” Their relationship was platonic at first—when she found herself homeless, she moved in with him, the pair sleeping in twin beds next to each other like characters in a 1950s sitcom. One day, he kissed her, and that was that.

If Cher’s early life is a Steinbeckian saga of grim endurance, her life with Bono is a volatile scrapbook of life in 20th-century entertainment. Thanks to Bono’s connections with Phil Spector, she became a singer, performing backing vocals on the Righteous Brothers’ “You’ve Lost That Loving Feeling.” When Cher and Bono formed a duo and became wildly famous in 1965 with “I Got You, Babe,” the American musical establishment initially deemed her too outré in her bell bottoms and furs, and then—as the sexual revolution and rock music caught fire—too square. In her first flush of fame, the recently widowed Jackie Kennedy requested that Sonny & Cher perform for a private dinner party in New York. The fashion editor Diana Vreeland had Cher photographed for Vogue. At a party in his hotel suite, Salvador Dalí explained to her that an ornamental fish she was admiring was actually a vibrator. (“I couldn’t drop that fish fast enough.”) Having entrusted all the financial details of their partnership to Bono, she was stunned when he revealed that they owed hundreds of thousands in back taxes, right as their musical success was stalling.

[Read: What Madonna knows]

“Remembrance of things past is not necessarily the remembrance of things as they were,” Marcel Proust declared in In Search of Lost Time. Show-business memoirs can be gritty—Al Pacino’s Sonny Boy recounts a similarly bleak childhood—but I’m hard pressed to think of another celebrity author so insistent on dispensing with rose-tinted reminiscences. Cher wants you to know that for most people—and absolutely for most women—the 20th century was no cakewalk. She loved Bono, and is the first to admit how enchanting their dynamic could be. But the partner she describes was controlling, vengeful (he reportedly burned her tennis clothes after he saw her talking to another man), and shockingly callous. When she left him, she discovered that her contract was one of “involuntary servitude”—he owned 95 percent of a company called Cher Enterprises, of which she was an employee who never received a paycheck. (His lawyer owned the other 5 percent.) Their divorce was finalized in 1975, a year or so after women were granted the right to apply for credit cards in their own names.

Promoting her book, Cher told CBS Sunday Morning, “I didn’t want to give information, ’cause you could go to Wikipedia [for that]. I just wanted to tell stories.” And she does, but in a form that can’t help doubling as a broader history—an account of all the things women have suffered through (casting couches, financial ruin, humiliating public scrutiny) and fought for (authority over their own bodies). Unlike her mother, Cher was, via carefully coded language, offered a legal abortion in her doctor’s office in 1975, during a period when her life was in flux. (Her second husband, the musician Gregory Allman, was addicted to heroin and had deserted her; she was about to return to work on her CBS variety show, also titled Cher.) “I needed to be at work on Monday,” she remembers. “I needed to be singing and dancing. I had a child, mother, and sister to take care of. I knew I had to make a choice, and I knew what it was. It made it harder that I didn’t have Gregory to talk to about it, but I made my decision and I was so grateful to my doctor’s compassion for giving me one.” (Cher and Bono's son, Chaz Bono, had been born in 1969. By 1976, Cher and Allman had reconciled, and Cher gave birth to Elijah Blue Allman.)

Gratitude. Compassion. Choice. What is resilience reliant on if not all three? We have to wait for book two for Cher’s account of her ups and downs in the ’80s and ’90s—her new acting career, her Best Actress Oscar for Moonstruck, her turn to infomercials for income after a severe bout of chronic fatigue syndrome, her auto-tuned path with “Believe” to one of the best-selling pop singles of all time. But in Cher, she offers a persuasive, wry, rousing account of what made her, and what she was able to make in turn. “I’ve always thought that whether you get a break or not is purely down to luck,” she writes, adding, “These were the key moments that changed my luck.” But that read of things understates her sheer force of will—her outright refusal, as with the Oscars dress, to ever be counted out.

The Hollywood AI Database

The Atlantic

www.theatlantic.com › technology › archive › 2024 › 11 › opensubtitles-ai-data-set › 680650

Editor’s note: This analysis is part of The Atlantic’s investigation into the OpenSubtitles data set. You can access the search tool directly here. Find The Atlantic's search tool for books used to train AI here.

For as long as generative-AI chatbots have been on the internet, Hollywood writers have wondered if their work has been used to train them. The chatbots are remarkably fluent with movie references, and companies seem to be training them on all available sources. One screenwriter recently told me he’s seen generative AI reproduce close imitations of The Godfather and the 1980s TV show Alf, but he had no way to prove that a program had been trained on such material.

I can now say with absolute confidence that many AI systems have been trained on TV and film writers’ work. Not just on The Godfather and Alf, but on more than 53,000 other movies and 85,000 other TV episodes: Dialogue from all of it is included in an AI-training data set that has been used by Apple, Anthropic, Meta, Nvidia, Salesforce, Bloomberg, and other companies. I recently downloaded this data set, which I saw referenced in papers about the development of various large language models (or LLMs). It includes writing from every film nominated for Best Picture from 1950 to 2016, at least 616 episodes of The Simpsons, 170 episodes of Seinfeld, 45 episodes of Twin Peaks, and every episode of The Wire, The Sopranos, and Breaking Bad. It even includes prewritten “live” dialogue from Golden Globes and Academy Awards broadcasts. If a chatbot can mimic a crime-show mobster or a sitcom alien—or, more pressingly, if it can piece together whole shows that might otherwise require a room of writers—data like this are part of the reason why.

[Read: These 183,000 books are fueling the biggest fight in publishing and tech]

The files within this data set are not scripts, exactly. Rather, they are subtitles taken from a website called OpenSubtitles.org. Users of the site typically extract subtitles from DVDs, Blu-ray discs, and internet streams using optical-character-recognition (OCR) software. Then they upload the results to OpenSubtitles.org, which now hosts more than 9 million subtitle files in more than 100 languages and dialects. Though this may seem like a strange source for AI-training data, subtitles are valuable because they’re a raw form of written dialogue. They contain the rhythms and styles of spoken conversation and allow tech companies to expand generative AI’s repertoire beyond academic texts, journalism, and novels, all of which have also been used to train these programs. Well-written speech is a rare commodity in the world of AI-training data, and it may be especially valuable for training chatbots to “speak” naturally.

According to research papers, the subtitles have been used by Anthropic to train its ChatGPT competitor, Claude; by Meta to train a family of LLMs called Open Pre-trained Transformer (OPT); by Apple to train a family of LLMs that can run on iPhones; and by Nvidia to train a family of NeMo Megatron LLMs. It has also been used by Salesforce, Bloomberg, EleutherAI, Databricks, Cerebras, and various other AI developers to build at least 140 open-source models distributed on the AI-development hub Hugging Face. Many of these models could potentially be used to compete with human writers, and they’re built without permission from those writers.

When I reached out to Anthropic for this article, the company did not provide a comment on the record. When I’ve previously spoken with Anthropic about its use of this data set, a spokesperson told me the company had “trained our generative-AI assistant Claude on the public dataset The Pile,” of which OpenSubtitles is a part, and “which is commonly used in the industry.” A Salesforce spokesperson told me that although the company has used OpenSubtitles in generative-AI development, the data set “was never used to inform or enhance any of Salesforce’s product offerings.” Apple similarly told me that its small LLM was intended only for research. However, both Salesforce and Apple, like other AI developers, have made their models available for developers to use in any number of different contexts. All other companies mentioned in this article—Nvidia, Bloomberg, EleutherAI, Databricks, and Cerebras—either declined to comment or did not respond to requests for comment.

You may search through the data set using the tool below.

Two years after the release of ChatGPT, it may not be surprising that creative work is used without permission to power AI products. Yet the notion remains disturbing to many artists and professionals who feel that their craft and livelihoods are threatened by programs. Transparency is generally low: Tech companies tend not to advertise whose work they use to train their products. The legality of training on copyrighted work also remains an open question. Numerous lawsuits have been brought against tech companies by writers, actors, artists, and publishers alleging that their copyrights have been violated in the AI-training process: As Breaking Bad’s creator, Vince Gilligan, wrote to the U.S. Copyright Office last year, generative AI amounts to “an extraordinarily complex and energy-intensive form of plagiarism.” Tech companies have argued that training AI systems on copyrighted work is “fair use,” but a court has yet to rule on this claim. In the language of copyright law, subtitles are likely considered derivative works, and a court would generally see them as protected by the same rules against copying and distribution as the movies they’re taken from. The OpenSubtitles data set has circulated among AI developers since 2020. It is part of the Pile, a collection of data sets for training generative AI. The Pile also includes text from books, patent applications, online discussions, philosophical papers, YouTube-video subtitles, and more. It’s an easy way for companies to start building AI systems without having to find and download the many gigabytes of high-quality text that LLMs require.

[Read: Generative AI is challenging a 234-year-old law]

OpenSubtitles can be downloaded by anyone who knows where to look, but as with most AI-training data sets, it’s not easy to understand what’s in it. It’s a 14-gigabyte text file with short lines of unattributed dialogue—meaning the speaker is not identified. There’s no way to tell where one movie ends and the next begins, let alone what the movies are. I downloaded a “raw” version of the data set, in which the movies and episodes were separated into 446,612 files and stored in folders whose names corresponded to the ID numbers of movies and episodes listed on IMDb.com. Most folders contained multiple subtitle versions of the same movie or TV show (different releases may be tweaked in various ways), but I was able to identify at least 139,000 unique movies and episodes. I downloaded metadata associated with each title from the OpenSubtitles.org website—allowing me to map actors and directors to each title, for instance—and used it to build the tool above.

The OpenSubtitles data set adds yet another wrinkle to a complex narrative around AI, in which consent from artists and even the basic premise of the technology are points of contention. Until very recently, no writer putting pen to paper on a script would have thought their creative work might be used to train programs that could replace them. And the subtitles themselves were not originally intended for this purpose, either. The multilingual OpenSubtitles data set contained subtitles in 62 different languages and 1,782 language-pair combinations: It is meant for training the models behind apps such as Google Translate and DeepL, which can be used to translate websites, street signs in a foreign country, or an entire novel. Jörg Tiedemann, one of the data set’s creators, wrote in an email that he was happy to see OpenSubtitles being used in LLM development, too, even though that was not his original intention.

He is, in any case, powerless to stop it. The subtitles are on the internet, and there’s no telling how many independent generative-AI programs they’ve been used for, or how much synthetic writing those programs have produced. But now, at least, we know a bit more about who is caught in the machinery. What will the world decide they are owed?