Itemoids

AI

The AI War Was Never Just About AI

The Atlantic

www.theatlantic.com › technology › archive › 2024 › 11 › google-antitrust-generative-ai › 680803

For almost two years now, the world’s biggest tech companies have been at war over generative AI. Meta may be known for social media, Google for search, and Amazon for online shopping, but since the release of ChatGPT, each has made tremendous investments in an attempt to dominate in this new era. Along with start-ups such as OpenAI, Anthropic, and Perplexity, their spending on data centers and chatbots is on track to eclipse the costs of sending the first astronauts to the moon.

To be successful, these companies will have to do more than build the most “intelligent” software: They will need people to use, and return to, their products. Everyone wants to be Facebook, and nobody wants to be Friendster. To that end, the best strategy in tech hasn’t changed: build an ecosystem that users can’t help but live in. Billions of people use Google Search every day, so Google built a generative-AI product known as “AI Overviews” right into the results page, granting it an immediate advantage over competitors.

This is why a recent proposal from the Department of Justice is so significant. The government wants to break up Google’s monopoly over the search market, but its proposed remedies may in fact do more to shape the future of AI. Google owns 15 products that serve at least half a billion people and businesses each—a sprawling ecosystem of gadgets, search and advertising, personal applications, and enterprise software. An AI assistant that shows up in (or works well with) those products will be the one that those people are most likely to use. And Google has already woven its flagship Gemini AI models into Search, Gmail, Maps, Android, Chrome, the Play Store, and YouTube, all of which have at least 2 billion users each. AI doesn’t have to be life-changing to be successful; it just has to be frictionless. The DOJ now has an opportunity to add some resistance. (In a statement last week, Kent Walker, Google’s chief legal officer, called the Department of Justice’s proposed remedy part of an “interventionist agenda that would harm Americans and America’s global technology leadership,” including the company’s “leading role” in AI.)

[Read: The horseshoe theory of Google Search]

Google is not the only competitor with an ecosystem advantage. Apple is integrating its Apple Intelligence suite across eligible iPhones, iPads, and Macs. Meta, with more than 3 billion users across its platforms, including Facebook, Instagram, and WhatsApp, enjoys similar benefits. Amazon’s AI shopping assistant, Rufus, has garnered little major attention but nonetheless became available to the website’s U.S. shoppers this fall. However much of the DOJ’s request the court ultimately grants, these giants will still lead the AI race—but Google had the clearest advantage among them.

Just how good any of these companies’ AI products are has limited relevance to their adoption. Google’s AI tools have repeatedly shown major flaws, such as confidently recommending eating rocks for good health, but the features continue to be used by more and more people simply because they’re there. Similarly, Apple’s AI models are less powerful than Gemini or ChatGPT, but they will have a huge user base simply because of how popular the iPhone is. Meta’s AI models may not be state-of-the-art, but that doesn’t matter to billions of Facebook, Instagram, and WhatsApp users who just want to ask a chatbot a silly question or generate a random illustration. Tech companies without such an ecosystem are well aware of their disadvantage: OpenAI, for instance, is reportedly considering developing its own web browser, and it has partnered with Apple to integrate ChatGPT across the company’s phones, tablets, and computers.

[Read: AI search is turning into the problem everyone worried about]

This is why it’s relevant that the DOJ’s proposed antitrust remedy takes aim at Google’s broader ecosystem. Federal and state attorneys asked the court to force Google to sell off its Chrome browser; cease preferencing its search products in the Android mobile operating system; prevent it from paying other companies, including Apple and Samsung, to make Google the default search engine; and allow rivals to syndicate Google’s search results and use its search index to build their own products. All of these and the DOJ’s other requests, under the auspices of search, are really shots at Google’s expansive empire.

As my colleague Ian Bogost has argued, selling Chrome might not affect Google’s search dominance: “People returned to Google because they wanted to, not just because the company had strong-armed them,” he wrote last week. But selling Chrome and potentially Android, as well as preventing Google from making its search engine the default option for various other companies’ products, would make it harder for Google to funnel billions of people to the rest of its software, including AI. Meanwhile, access to Google’s search index could provide a huge boost to OpenAI, Perplexity, Microsoft, and other AI search competitors: Perhaps the hardest part of building a searchbot is trawling the web for reliable links, and rivals would gain access to the most coveted way of doing so.

[Read: Google already won]

The Justice Department seems to recognize that the AI war implicates and goes beyond search. Without intervention, Google’s search monopoly could give it an unfair advantage over AI as well—and an AI monopoly could further entrench the company’s control over search. The court, attorneys wrote, must prevent Google from “manipulating the development and deployment of new technologies,” most notably AI, to further throttle competition.

And so the order also takes explicit aim at AI. The DOJ wants to bar Google from self-preferencing AI products, in addition to Search, in Chrome, Android, and all of its other products. It wants to stop Google from buying exclusive rights to sources of AI-training data and disallow Google from investing in AI start-ups and competitors that are in or might enter the search market. (Two days after the DOJ released its proposal, Amazon invested another $4 billion into Anthropic, the start-up and OpenAI rival that Google has also heavily backed to this point, suggesting that the e-commerce giant might be trying to lock in an advantage over Google.) The DOJ also requested that Google provide a simple way for publishers to opt out of their content being used to train Google’s AI models or be cited in AI-enhanced search products. All of that will make it harder for Google to train and market future AI models, and easier for its rivals to do the same.

When the DOJ first sued Google, in 2020, it was concerned with the internet of old: a web that appeared intractably stuck, long ago calcified in the image of the company that controls how billions of people access and navigate it. Four years and a historic victory later, its proposed remedy enters an internet undergoing an upheaval that few could have foreseen—but that the DOJ’s lawsuit seems to have nonetheless anticipated. A frequently cited problem with antitrust litigation in tech is anachronism, that by the time a social-media, or personal-computing, or e-commerce monopoly is apparent, it is already too late to disrupt. With generative AI, the government may finally have the head start it needs.

Why That Chatbot Is So Good at Imitating Bart Simpson

The Atlantic

www.theatlantic.com › newsletters › archive › 2024 › 11 › why-that-chatbot-is-so-good-at-imitating-bart-simpson › 680775

This is Atlantic Intelligence, a newsletter in which our writers help you wrap your mind around artificial intelligence and a new machine age. Did someone forward you this newsletter? Sign up here.

Earlier this week, The Atlantic published a new investigation by Alex Reisner into the data that are being used without permission to train generative-AI programs. In this case, dialogue from tens of thousands of movies and TV shows has been harvested by companies such as Apple, Anthropic, Meta, and Nvidia to develop large language models (or LLMs).

The data have a strange provenance: Rather than being pulled from scripts or books, the dialogue is taken from subtitle files that have been extracted from DVDs, Blu-ray discs, and internet streams. “Though this may seem like a strange source for AI-training data, subtitles are valuable because they’re a raw form of written dialogue,” Reisner writes. “They contain the rhythms and styles of spoken conversation and allow tech companies to expand generative AI’s repertoire beyond academic texts, journalism, and novels, all of which have also been used to train these programs.”

Perhaps it no longer comes as a major shock that creative humans are having their work ripped off to train machines that threaten to replace them. But evidence demonstrating exactly what data have been used, and for what purposes, is hard to come by, thanks to the secretive nature of these tech companies. “Now, at least, we know a bit more about who is caught in the machinery,” Reisner writes. “What will the world decide they are owed?”

Illustration by Matteo Giuseppe Pani / The Atlantic

There’s No Longer Any Doubt That Hollywood Writing Is Powering AI

By Alex Reisner

For as long as generative-AI chatbots have been on the internet, Hollywood writers have wondered if their work has been used to train them. The chatbots are remarkably fluent with movie references, and companies seem to be training them on all available sources. One screenwriter recently told me he’s seen generative AI reproduce close imitations of The Godfather and the 1980s TV show Alf, but he had no way to prove that a program had been trained on such material.

I can now say with absolute confidence that many AI systems have been trained on TV and film writers’ work. Not just on The Godfather and Alf, but on more than 53,000 other movies and 85,000 other TV episodes: Dialogue from all of it is included in an AI-training data set that has been used by Apple, Anthropic, Meta, Nvidia, Salesforce, Bloomberg, and other companies. I recently downloaded this data set, which I saw referenced in papers about the development of various large language models (or LLMs). It includes writing from every film nominated for Best Picture from 1950 to 2016, at least 616 episodes of The Simpsons, 170 episodes of Seinfeld, 45 episodes of Twin Peaks, and every episode of The Wire, The Sopranos, and Breaking Bad. It even includes prewritten “live” dialogue from Golden Globes and Academy Awards broadcasts. If a chatbot can mimic a crime-show mobster or a sitcom alien—or, more pressingly, if it can piece together whole shows that might otherwise require a room of writers—data like this are part of the reason why.

Read the full article.

What to Read Next

“What I found in a database Meta uses to train generative AI”: “Nobel-winning authors, Dungeons and Dragons, Christian literature, and erotica all serve as datapoints for the machine,” Alex Reisner wrote in an earlier investigation for The Atlantic. AI’s fingerprints were all over the election: “But deepfakes and disinformation weren’t the main issues,” Matteo Wong writes.