Itemoids

Western

What I Found in a Database Meta Uses to Train Generative AI

The Atlantic

www.theatlantic.com › technology › archive › 2023 › 09 › books3-ai-training-meta-copyright-infringement-lawsuit › 675411

Editor’s note: This article is part of The Atlantic’s series on Books3. You can search the database for yourself here, and read about its origins here.

This summer, I reported on a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. “Books3,” as it’s called, was based on a collection of pirated ebooks that includes travel guides, self-published erotic fiction, novels by Stephen King and Margaret Atwood, and a lot more. It is now at the center of several lawsuits brought against Meta by writers who claim that its use amounts to copyright infringement.

Books play a crucial role in the training of generative-AI systems. Their long, thematically consistent paragraphs provide information about how to construct long, thematically consistent paragraphs—something that’s essential to creating the illusion of intelligence. Consequently, tech companies use huge data sets of books, typically without permission, purchase, or licensing. (Lawyers for Meta argued in a recent court filing that neither outputs from the company’s generative AI nor the model itself are “substantially similar” to existing books.)

In its training process, a generative-AI system essentially builds a giant map of English words—the distance between two words correlates with how often they appear near each other in the training text. The final system, known as a large language model, will produce more plausible responses for subjects that appear more often in its training text. (For further details on this process, you can read about transformer architecture, the innovation that precipitated the boom in large language models such as LLaMA and ChatGPT.) A system trained primarily on the Western canon, for example, will produce poor answers to questions about Eastern literature. This is just one reason it’s important to understand the training data used by these models, and why it’s troubling that there is generally so little transparency.

With that in mind, here are some of the most represented authors in Books3, with the approximate number of entries contributed:

Although 24 of the 25 authors listed here are fiction writers (the lone exception is Betty Crocker), the data set is two-thirds nonfiction overall. It includes several thousand technical manuals; more than 1,500 books from Christian publishers (including at least 175 Bibles and Bible commentaries); more than 400 Dungeons & Dragons– and Magic the Gathering–themed books; and 46 titles by Charles Bukowski. Nearly every subject imaginable is covered (including How to Housebreak Your Dog in 7 Days), but the collection skews heavily toward the interests and perspectives of the English-speaking Western world.

Many people have written about bias in AI systems. An AI-based face-recognition program, for example, that’s trained disproportionately on images of light-skinned people might work less well on images of people with darker skin—with potentially disastrous outcomes. Books3 helps us see the problem from another angle: What combination of books would be unbiased? What would be an equitable distribution of Christian, Muslim, Buddhist, and Jewish subjects? Are extremist views balanced by moderate ones? What’s the proper ratio of American history to Chinese history, and what perspectives should be represented within each? When knowledge is organized and filtered by algorithm rather than by human judgment, the problem of perspective becomes both crucial and intractable.

Books3 is a gigantic dataset. Here are just a few different ways to consider the authors, books, and publishers contained within. Note that the samples presented here are not comprehensive; they are chosen to give a quick sense of the many different types of writing used to train generative AI. As above, book counts may include multiple editions.

As AI chatbots begin to replace traditional search engines, the tech industry’s power to constrain our access to information and manipulate our perspective increases exponentially. If the internet democratized access to information by eliminating the need to go to a library or consult an expert, the AI chatbot is a return to the old gatekeeping model, but with a gatekeeper that’s opaque and unaccountable—a gatekeeper, moreover, that is prone to “hallucinations” and might or might not cite sources.

In its recent court filing—a motion to dismiss the lawsuit brought by the authors Richard Kadrey, Sarah Silverman, and Christopher Golden—Meta observed that “Books3 comprises an astonishingly small portion of the total text used to train LLaMA.” This is technically true (I estimate that Books3 is about 3 percent of LLaMA’s total training text) but sidesteps a core concern: If LLaMA can summarize Silverman’s book, then it likely relies heavily on the text of her book to do so. In general, it’s hard to know how much any given source contributes to a generative-AI system’s output, given the impenetrability of current algorithms.

Still, our only clue to the kinds of information and opinions AI chatbots will dispense is their training data. A look at Books3 is a good start, but it’s just one corner of the training-data universe, most of which remains behind closed doors.

China Is All About Sovereignty. So Why Not Ukraine’s?

The Atlantic

www.theatlantic.com › international › archive › 2023 › 09 › beijing-china-ukraine-sovereignty-xi-jinping › 675434

By Beijing’s reckoning, the U.S.-led global order is in turmoil, and a Washington in decline has no answers to the world’s mounting problems. Fortunately for the future of humanity, however, the Chinese leader Xi Jinping does. He would like to replace Washington’s “rules-based” world order with a framework of his own—one whose most sacred principle is national sovereignty, or the right of states to govern themselves, free from outside interference.

In the world Xi envisions, nations will no longer have to endure Washington’s preaching about democracy and human rights. All governments, no matter how repressive, will be equals, with their sovereignty assured. Xi enshrined the protection of sovereignty as the very first plank of his Global Security Initiative, an ideological blueprint for a new global system that he introduced, probably not coincidentally, several weeks after the start of the Ukraine conflict in 2022.

That war has posed a bit of a problem for China’s professed position, however. Russia, China’s strategic partner, trammeled an international border to invade a neighboring country in what could hardly be a clearer violation of that country’s sovereignty. But rather than sympathize with Ukraine’s desperate struggle to preserve its independent existence, Xi cemented his partnership with the Russian invaders intent on annihilating it.

“You can’t be helping Russia conduct this war and say you believe in Ukraine’s territorial integrity,” John Herbst, a former U.S. ambassador to Ukraine, told me. “Obviously, you can’t square that circle.”

Yet Xi has tried to do so. His contradictory stance on the war has forced his diplomats to tap dance, seeking to preserve Beijing’s pretense of principled neutrality even to the point of staging a purported peace mission. Meanwhile, the war has raised serious questions about the place of sovereignty in Xi’s vision for a new world order, and, relatedly, about his ability to achieve his grandiose plans.

In practically every diplomatic statement, Communist China affirms its commitment to honoring the sovereignty of other countries. It expects no less in return: Sovereignty, China’s leaders claim, confers upon the Communist Party the authority to govern as it wishes within China’s borders. Sovereignty, from the Chinese viewpoint, gives Beijing the right to lock up Uyghurs in Xinjiang and democracy advocates in Hong Kong, and it forbids Washington from interfering in China’s internal affairs by complaining about its human-rights record. Beijing rejects the notion of  “universal values” that apply to all people, no matter where they live.

[Read: Xi Jinping is done with the established world order]

Beijing’s fixation on sovereignty is inseparable from its claim that Taiwan is part of China: By so much as interacting with Taiwan’s government, other countries are violating China’s sovereignty, Beijing maintains. Because they believe the country is not yet completely unified, says Maria Adele Carrai, an international-law expert at NYU’s Shanghai campus, Chinese leaders “feel very sensitive and also partly fragile about their sovereignty.”

Xi’s position on sovereignty holds obvious appeal for other autocrats intent on suppressing dissent without interference. But it also attracts adherents in the developing world, where many leaders still contend with the persistent, detrimental legacy of Western colonialism. For those leaders, says Jonathan Fulton, a specialist in Chinese relations with the Middle East at Zayed University, in Abu Dhabi, “when they hear a great power say, ‘We’re not going to do the kind of stuff that the West did to you,’ that resonates.”

The deeper Xi wades into international affairs, however, the more his purported principles come into conflict with his strategic goals. His government routinely intrudes on other countries’ sovereignty; witness the Chinese spy balloon caught floating in American airspace, or the scandal over alleged Chinese interference in Canada’s national elections. But little has challenged Xi’s ideological framework more than the Ukraine war. His choice was stark: Stand with Russian President Vladimir Putin, whom Xi has called his “best” friend, and sacrifice his supposed commitment to sovereignty, or stand for sovereignty by siding with Ukraine, thereby breaking a partnership that he perceives as crucial to his campaign against U.S. hegemony.

At the war’s outbreak, Chinese leaders seemed ambivalent, even conflicted. Though Foreign Minister Wang Yi asserted that Putin’s security concerns were “legitimate,” he also came out clearly in defense of Ukraine’s sovereignty. “All countries’ sovereignty, independence, and territorial integrity must be safeguarded,” he told the Munich Security Conference only days before the invasion began. “This is also what China has been upholding, with no exception regarding Ukraine.”

As the war has ground on, Xi has strengthened his relations with Russia. He has done so without directly aiding Moscow’s war effort but by supplying political and economic support as Russia has become isolated from the West. Chinese diplomats still sometimes talk about sovereignty, but they do so with greater ambiguity. In a March press briefing, then–Foreign Minister Qin Gang reiterated Beijing’s position that all countries’ sovereignty should be respected but brought up Ukraine’s specifically only to criticize Washington: “Why does the U.S. talk at length about respecting sovereignty and territorial integrity on Ukraine, while disrespecting China’s sovereignty and territorial integrity on China’s Taiwan question?” he asked rhetorically.

[Read: What is Putin worth to China?]

Last February, Xi issued a 12-point proposal for ending the Ukraine conflict. The first entry asserts that “the sovereignty, independence and territorial integrity of all countries must be effectively upheld”—but it does not mention Ukraine in this regard. In an April conversation with Ukrainian President Volodymyr Zelensky, Xi stressed—apparently without irony—that “mutual respect for sovereignty and territorial integrity is the political foundation” of relations between the two countries, but he did not pledge to ensure Ukraine’s or offer any specific proposal for preserving it, at least according to the official Chinese summary of their talk.

For the Ukrainians, the principle of sovereignty affords no ambiguity. Zelensky told Xi in April 2023, “We did not start this war, but we have to restore the sovereignty and territorial integrity of our country.” He added that “there can be no peace at the expense of territorial compromises. The territorial integrity of Ukraine must be restored.”

If Zelensky’s words made Xi uncomfortable, the Chinese leader did not let on. Just days earlier, the Chinese diplomat Lu Shaye had let slip a remark that opened a window on Beijing’s thinking. Then serving as China’s ambassador to France, Lu claimed that the sovereignty of the countries formed from the ruins of the Soviet Union—such as Ukraine—had no basis in international law, because no international agreement had specified their status. They had asserted their own sovereignty, and Lu’s comments suggested that he did not recognize such a path to independent statehood. His words sparked outrage across Europe. China’s foreign ministry clarified that the government officially recognized the sovereignty of those states—but Chinese diplomats rarely stray far from approved talking points. More likely than not, Lu’s ideas carry some currency among the Chinese leadership.

Chinese leaders could possibly see Moscow’s assertion of control over territory once included in the Soviet Union as parallel to its own yearning after lands, including Taiwan, that were once ruled from Beijing under the Qing dynasty. In both cases, earlier political entities claimed these territories, suggesting that sovereignty can be a slippery idea when aggressive or nationalist leaders wish it to be.

Will the incipient allies attracted to Xi’s sovereignty rhetoric be put off by China’s lack of regard for Ukraine? Herbst believes that the leadership’s contradictory stance “certainly makes it harder for them to present themselves as some new sort of power representing something even better than the Western-organized international system.” But he did not think the inconsistency would cost China much in the global South.

Many developing countries lie far removed from Ukraine’s crisis and are not much invested in it. And according to Fulton, the countries of the global South are less interested in Xi’s transgressions of avowed principles than in its promise of counterbalance: Many leaders “want to see a shift in the distribution of power so the West doesn’t get to behave the way it has in the past and the global South has more influence,” he told me.

In that sense, Xi may be onto something. The United States has set aside its commitment to democracy to promote its strategic interests on any number of occasions, but its ideals have still given common cause to a worldwide network of alliances and inspired many of those suffering under oppressive regimes to dream of greater liberty—including within China. Perhaps Xi’s ideological blueprint, no matter how unworkable or compromised, could play the same role: that of a glue binding partnerships opposed to American ideals and American power. Perhaps in global diplomacy, what leaders say can matter more than what they do.