Itemoids

Parliament

Violence Is the Engine of Modi’s Politics

The Atlantic

www.theatlantic.com › international › archive › 2023 › 08 › narendra-modi-india-gurugram › 675171

In the first week of August, the glitzy megacity of Gurugram, an hour’s drive from New Delhi, was burning.

With its gleaming malls and opulent high-rises, Gurugram had become symbolic of India’s economic rise. But for much of this month, the city has been in a state of siege from Hindu mobs running amok, attacking Muslim homes, commercial establishments, and places of worship. Smoke billowed from buildings set ablaze, riot police trawled the streets, and multinational corporations ordered their employees to stay home. Large numbers of working-class Muslims, the human capital underpinning the city’s prosperity, took flight.

The mayhem in Gurugram was a direct result of Prime Minister Narendra Modi’s growing sense of political insecurity. Two recent setbacks had rattled him and the Hindu-supremacist movement he leads. In May, Modi’s Bharatiya Janata Party suffered a chastening defeat in a high-stakes election in Karnataka, the southern-Indian state that is home to Bangalore and a powerhouse of India’s information-technology sector. With Karnataka, the Hindu right lost its only foothold in southern India, the country’s most prosperous and wealthy region.

[Read: India is not Modi, we once said. I wish I still believed it.]

Then, in mid-July, two weeks before the violence erupted in Gurugram, the Indian opposition announced an electoral alliance to take on Modi in next year’s national elections. The big-tent coalition was a remarkable show of unity, something that had mostly eluded Modi’s rivals since his ascent to power in 2014. A juggernaut comprising 26 parties, the opposition alliance christened itself the Indian National Developmental Inclusive Alliance—INDIA.

These twin events felt like political earthquakes. They cast doubt on what until recently had seemed certain: Modi’s reelection as prime minister for a third consecutive term in 2024. And as Modi and his party have begun to feel politically threatened, they have let loose the foot soldiers of the Hindu right upon India’s minorities.

For a century, since the rise of the Hindu right in the 1920s, religious disturbances in India have followed a dismayingly predictable pattern. Members of Hindu organizations stage threatening parades in Muslim neighborhoods, chanting provocative slogans and blaring music outside mosques in order to arouse a response. Community members retaliate, and confrontation follows, escalating into a riot. Soon after a July 31 Hindu parade in Nuh, the Muslim-majority district adjacent to Gurugram, violence spread across the northern state of Haryana, of which Gurugram is the largest city.

The organizational machinery of the Hindu right has made a science of engineering such conflagrations. It needs only to activate the ecosystem that Paul R. Brass, a doyen of South Asian studies, has termed an “institutionalised system of riot production.” That system reliably generates political rewards: An exhaustive study by Yale, analyzing the effects of such riots over a period of nearly four decades beginning in the 1960s, concluded that the parties of the Hindu right typically “saw a 0.8 percentage point increase in their vote share following a riot in the year prior to an election.”

The benefits of such religious polarization have surely risen under Modi, the most charismatic leader the Hindu-supremacist movement has ever produced. Delivering successive majorities in Parliament in 2014 and 2019, Modi has taken the Hindu right to the kind of unchallenged power it always dreamed of.

Modi first came to international attention following the 2002 religious riots in the western-Indian state of Gujarat, where he was chief minister. Several coaches of a train carrying Hindu pilgrims were burned down under inscrutable circumstances, killing 59 people, and Gujarat witnessed a paroxysm of violence that included acts of brutality shocking even within the history of religious conflict in India. Ultimately, more than 1,000 people, mostly Muslims, were killed.

The 2002 violence, perpetrated by militant organizations of the Hindu right as the state machinery stood by, has often been described as an anti-Muslim pogrom. Modi was subsequently banned from the United States “for severe violations of religious freedom,” a prohibition that was lifted only after his elevation as India’s prime minister in 2014.

After the riots, Hindu consolidation ensured that Modi retained an iron grip on power within Gujarat. But nationally and abroad, he was tainted—viewed as a dark, unsettling figure who could not be trusted to lead India. In 2004, India’s Supreme Court described Modi as a modern-day Nero who had watched while women and children were butchered.

Modi had visited America frequently during the 1990s, when he was a party ideologue seeking to build support among affluent and influential Indian Americans from Gujarat. Like many conservative Indians, he admired the United States not for its liberal and constitutional values, but for its economic and technological power, and he craved American acceptance. But following his ban from the United States, Modi avoided visiting Western democracies, perhaps fearing that he would share the fate of Augusto Pinochet, the former Chilean dictator who was arrested in London in 1998 for his human-rights abuses. Modi made multiple trips to China instead.

When he became prime minister in 2014, he changed tack. He sought to keep his Hindu base energized without attracting the sort of global notoriety that had come his way in 2002. The first test came in 2015, a year after his ascension to power.

A 52-year-old ironsmith named Mohammed Akhlaq was lynched by his Hindu neighbors in a village on the outskirts of Delhi. The cow holds a sacred, hallowed place in the Hindu imagination, and slaughtering cows is illegal in most Indian states. Akhlaq’s neighbors suspected him of storing beef in his fridge. They dragged him out of his house, where a mob, in an act of medieval bloodletting, killed him with sticks and stones.

The gruesome nature of the crime stunned India. Almost immediately, calls arose for Modi to condemn it. No full-throated condemnation ever came. Instead, for more than two weeks, while agitators on the Hindu right orchestrated a campaign of hate, Modi retreated into a mysterious silence that its followers interpreted as assent. Such tactical silence, in some ways even more significant than speech, has since become a hallmark of his politics.

Aakar Patel, a longtime newspaper editor who is now the chair of Amnesty International India, observed that in his years in the newsroom he never encountered a report about cow-based lynchings. “‘Beef lynching’ as category of violence has been introduced to India after 2014,” he wrote in his book Price of the Modi Years. Patel collated a spate of such lynchings that followed Akhlaq’s killing, as incendiary rhetoric around cow slaughter emanated from Modi and the Hindu right. In 2018, one of Modi’s ministers went so far as to celebrate those convicted of having carried out a beef lynching with garlands, a high mark of respect in Hindu society. Such crimes have become so routine in today’s India that they are relegated to the inside pages of newspapers, usually truncated to single-column reports.

[Read: The meaning of India’s ‘beef lynchings’]

In speeches in Western capitals, including in his recent address to a joint session of the U.S. Congress, Modi recites florid paeans to democracy and human rights that ring farcical in the ears of critics and dissidents back home. Ahead of India’s hosting of the G20 summit this September, Modi even, bizarrely, claimed that India is the “mother of democracy.”

All the while, spectacular eruptions of violence that draw the world’s attention have been replaced by constant, low-intensity terror that keeps India’s Muslims on edge and the majoritarian pot stirring. Hindu supremacists have declared war on interfaith marriage, terming it a form of “love jihad.” Extrajudicial killings of Muslims by police officials and arbitrary, illegal demolitions of Muslim homes by civic authorities have grown exponentially.

The terror is sustained by a nexus between emboldened vigilantes and a partisan state. Of all the hate crimes committed in India between 2009 and 2018, 90 percent occurred after Modi’s arrival in New Delhi in 2014. Hindu supremacism is bleeding India by a thousand cuts.

From political wilderness to global prominence, Modi has essentially remained an unreconstructed Hindu supremacist. The current, unrelenting hard press on India’s Muslims is nothing but a pursuance of the logic of the 2002 violence by other means: The violence is now geographically dispersed, continuous, and chillingly unpredictable.

On July 31, just as the Gurugram violence began, a railway-security official shot his superior on an express train to Mumbai. The official then walked through seven coaches, found three men who could be identified visually as Muslim, and shot them dead. He made a video of himself with the body of one victim at his feet, hailing Modi and Adityanath, the radical, hate-spewing priest who is the chief minister of India’s most populous province. These leaders were the only choices if you wanted to live in India, the killer declared. The implication was that those who voted for other leaders were effectively traitors.

Connecting the Gurugram violence to the train shooting, the prominent Hindi-language intellectual Apoorvanand remarked that both events “were part of the same soap opera where different characters keep appearing.” Violence was producing its own logic. Between lone wolves and an organized mob, Apoorvanand concluded, nowhere in India were Muslims safe.

In South Asia, the rule of law is weak and state capacity is thin on the ground. Violence can easily spiral out of control. The Indian subcontinent is still haunted by the memory of Partition, the bitter, bloody division of the region into the modern nations of India and Pakistan, which displaced 15 million people and left more than 1 million dead.

Under Modi, the Indian state has ceased to emphasize pluralism and diversity, and fears abound that the nation again stands at the precipice of such a calamity. For the fourth consecutive year, the bipartisan United States Commission on International Religious Freedom has flagged India as a “Country of Particular Concern.” The Early Warning Project, an initiative partly supported by the United States Holocaust Memorial Museum that assesses likelihood of genocide and large-scale atrocities across the world, ranks India eighth among countries at highest risk for mass killing.

The Hindu right sometimes spends years laying the foundations for violence. In old cities, such as Delhi, mosques sprang up organically over centuries. Gurugram, by contrast, was new, and its growing migrant Muslim population had few places of worship when Hindu-supremacist groups began attacking its Friday-prayer sites in 2018. The state had assigned the community fallow lands for these meetings. Although many such informal arrangements exist in India, the Hindu supremacists termed the prayer sites illegal and began imputing shadowy, fantastical motives to Muslim worship.    

Writing for The Caravan earlier this year, I sought to understand how the Hindu-supremacist machinery operated in Gurugram, not only through the organizations of the Hindu right, but in conjunction with an autonomous “alt-right” movement that was emerging in India, and how a genocidal imagination had taken hold in sizable portions of the society and state under Modi. In April last year, I visited the base of operations for the Bajrang Dal, a thuggish armed wing of the Hindu right, comparable to the Proud Boys, which met in the basement of an unoccupied building. A few blocks away was a half-constructed mosque that had become the subject of a simmering dispute in Gurugram.

The state had awarded a land grant for the mosque in 2004, but the mobilization of the Hindu neighborhoods around the site kept it mired in litigation for nearly two decades. The mosque was stillborn when I visited, iron rods jutting out of its half-finished pillars. In May, India’s Supreme Court gave the Muslim community permission to go ahead with construction. That judgment did not go down well in the neighborhood.

When the violence erupted in Gurugram at the beginning of the month, a darkness seized me. This was exactly the sequel I’d been dreading, and the Bajrang Dal was at the forefront of the violence.

In the early hours of August 1, a Hindu mob stormed the mosque. A young cleric named Mohammad Saad, who lived in the compound, was pierced to death with swords. Saad’s colleague, a helper at the mosque, spent two weeks in intensive care, having been smashed in the head with a steel rod and shot in the foot. A few Muslim boys lingering in the compound hid in trunks in a decrepit storeroom that somehow escaped the mob’s attention. Two police vans had been stationed outside the mosque, but the cops stood motionless.

In the most poignant of ironies, an hour before Saad was killed, his brother had called to tell him about the train shooting. Saad had been scheduled to travel home by train the following day. His brother had urged him to cancel the ticket.

Last week, I visited the mosque again. The acrid smell and soot-black walls were familiar from the sites of other riots I had covered. The last time I’d been inside a desecrated mosque was during the Delhi violence of 2020, when 53 people, mostly Muslims, were killed while Modi entertained Trump, on a state visit to India, less than 10 miles away.  

[Mira Kamdar: What happened in Delhi was a pogrom]

Historically, religious violence has been largely confined to impoverished neighborhoods where Hindus and Muslims lived cheek by jowl. The Gurugram mosque, by contrast, was situated in a well-heeled enclave—an island of privilege of a sort no longer insulated from the onward march of Hindu supremacism. Similarly, in the middle of August, a video emerged of a mob in Mumbai beating a Muslim man for going out with a Hindu girl. The assault took place in the city’s posh Bandra neighborhood, home to the Bollywood elite and India’s super-rich—the quarter where Tim Cook had recently inaugurated an Apple Store.

To live in India in the Modi era, now approaching a decade, is to feel in your bones the violence accelerating, its scope ever widening. The Hindu right is never more dangerous than when it feels its hold on political power becoming imperiled. The electoral setback in Karnataka was an early sign of growing psychological fatigue with the talking points of Hindu supremacism and the perpetually high temperature at which this politics of grievance is conducted.

With Gurugram, the Hindu supremacists have brought their polarization playbook to rich and middle-class neighborhoods, where they will likely be seeking to shore up support for the Bharatiya Janata Party ahead of next year’s elections. The tactics remain familiar—mosque disputes, marches through Muslim neighborhoods—but the unpredictability of where the violence will erupt next, the thrill and fear of it, keeps the Hindu right’s base energized. Violence of this kind almost certainly requires assent from the very top, and the opaqueness and secrecy around such decisions is part of Modi’s mystique and power.

By the time I set off from Gurugram for home in New Delhi that day in August, evening had fallen. In less than 10 minutes, I reached the wide-lane, American-style freeway that connects Gurugram to the national capital. Neon lights on the glass towers of corporate headquarters and luxury hotels shimmered in the humid night. How minuscule, I thought, was the distance that remained between India’s modern vision of itself and the mobs of Hindu supremacism.

Revealed: The Authors Whose Pirated Books Are Powering Generative AI

The Atlantic

www.theatlantic.com › technology › archive › 2023 › 08 › books3-ai-meta-llama-pirated-books › 675063

This story seems to be about:

One of the most troubling issues around generative AI is simple: It’s being made in secret. To produce humanlike answers to questions, systems such as ChatGPT process huge quantities of written material. But few people outside of companies such as Meta and OpenAI know the full extent of the texts these programs have been trained on.

Some training text comes from Wikipedia and other online writing, but high-quality generative AI requires higher-quality input than is usually found on the internet—that is, it requires the kind found in books. In a lawsuit filed in California last month, the writers Sarah Silverman, Richard Kadrey, and Christopher Golden allege that Meta violated copyright laws by using their books to train LLaMA, a large language model similar to OpenAI’s GPT-4—an algorithm that can generate text by mimicking the word patterns it finds in sample texts. But neither the lawsuit itself nor the commentary surrounding it has offered a look under the hood: We have not previously known for certain whether LLaMA was trained on Silverman’s, Kadrey’s, or Golden’s books, or any others, for that matter.

In fact, it was. I recently obtained and analyzed a dataset used by Meta to train LLaMA. Its contents more than justify a fundamental aspect of the authors’ allegations: Pirated books are being used as inputs for computer programs that are changing how we read, learn, and communicate. The future promised by AI is written with stolen words.

Upwards of 170,000 books, the majority published in the past 20 years, are in LLaMA’s training data. In addition to work by Silverman, Kadrey, and Golden, nonfiction by Michael Pollan, Rebecca Solnit, and Jon Krakauer is being used, as are thrillers by James Patterson and Stephen King and other fiction by George Saunders, Zadie Smith, and Junot Díaz. These books are part of a dataset called “Books3,” and its use has not been limited to LLaMA. Books3 was also used to train Bloomberg’s BloombergGPT, EleutherAI’s GPT-J—a popular open-source model—and likely other generative-AI programs now embedded in websites across the internet. A Meta spokesperson declined to comment on the company’s use of Books3; Bloomberg did not respond to emails requesting comment; and Stella Biderman, EleutherAI’s executive director, did not dispute that the company used Books3 in GPT-J’s training data.

As a writer and computer programmer, I’ve been curious about what kinds of books are used to train generative-AI systems. Earlier this summer, I began reading online discussions among academic and hobbyist AI developers on sites such as GitHub and Hugging Face. These eventually led me to a direct download of “the Pile,” a massive cache of training text created by EleutherAI that contains the Books3 dataset, plus material from a variety of other sources: YouTube-video subtitles, documents and transcriptions from European Parliament, English Wikipedia, emails sent and received by Enron Corporation employees before its 2001 collapse, and a lot more. The variety is not entirely surprising. Generative AI works by analyzing the relationships among words in intelligent-sounding language, and given the complexity of these relationships, the subject matter is typically less important than the sheer quantity of text. That’s why The-Eye.eu, a site that hosted the Pile until recently—it received a takedown notice from a Danish anti-piracy group—says its purpose is “to suck up and serve large datasets.”

The Pile is too large to be opened in a text-editing application, so I wrote a series of programs to manage it. I first extracted all the lines labeled “Books3” to isolate the Books3 dataset. Here’s a sample from the resulting dataset:

{"text": "\n\nThis book is a work of fiction. Names, characters, places and incidents are products of the authors' imagination or are used fictitiously. Any resemblance to actual events or locales or persons, living or dead, is entirely coincidental.\n\n  | POCKET BOOKS, a division of Simon & Schuster Inc.  \n1230 Avenue of the Americas, New York, NY 10020  \nwww.SimonandSchuster.com\n\n---|---

This is the beginning of a line that, like all lines in the dataset, continues for many thousands of words and contains the complete text of a book. But what book? There were no explicit labels with titles, author names, or metadata. Just the label “text,” which reduced the books to the function they serve for AI training. To identify the entries, I wrote another program to extract ISBNs from each line. I fed these ISBNs into another program that connected to an online book database and retrieved author, title, and publishing information, which I viewed in a spreadsheet. This process revealed roughly 190,000 entries: I was able to identify more than 170,000 books—about 20,000 were missing ISBNs or weren’t in the book database. (This number also includes reissues with different ISBNs, so the number of unique books might be somewhat smaller than the total.) Browsing by author and publisher, I began to get a sense for the collection’s scope.

Of the 170,000 titles, roughly one-third are fiction, two-thirds nonfiction. They’re from big and small publishers. To name a few examples, more than 30,000 titles are from Penguin Random House and its imprints, 14,000 from HarperCollins, 7,000 from Macmillan, 1,800 from Oxford University Press, and 600 from Verso. The collection includes fiction and nonfiction by Elena Ferrante and Rachel Cusk. It contains at least nine books by Haruki Murakami, five by Jennifer Egan, seven by Jonathan Franzen, nine by bell hooks, five by David Grann, and 33 by Margaret Atwood. Also of note: 102 pulp novels by L. Ron Hubbard, 90 books by the Young Earth creationist pastor John F. MacArthur, and multiple works of aliens-built-the-pyramids pseudo-history by Erich von Däniken. In an emailed statement, Biderman wrote, in part, “We work closely with creators and rights holders to understand and support their perspectives and needs. We are currently in the process of creating a version of the Pile that exclusively contains documents licensed for that use.”

Although not widely known outside the AI community, Books3 is a popular training dataset. Hugging Face hosted it for more than two and a half years, apparently removing it around the time it was mentioned in lawsuits against OpenAI and Meta earlier this summer. The academic writer Peter Schoppert has tracked its use in his Substack newsletter. Books3 has also been cited in the research papers by Meta and Bloomberg that announced the creation of LLaMA and BloombergGPT. In recent months, the dataset was effectively hidden in plain sight, possible to download but challenging to find, view, and analyze.

Other datasets, possibly containing similar texts, are used in secret by companies such as OpenAI. Shawn Presser, the independent developer behind Books3, has said that he created the dataset to give independent developers “OpenAI-grade training data.” Its name is a reference to a paper published by OpenAI in 2020 that mentioned two “internet-based books corpora” called Books1 and Books2. That paper is the only primary source that gives any clues about the contents of GPT-3’s training data, so it’s been carefully scrutinized by the development community.

From information gleaned about the sizes of Books1 and Books2, Books1 is speculated to be the complete output of Project Gutenberg, an online publisher of some 70,000 books with expired copyrights or licenses that allow noncommercial distribution. No one knows what’s inside Books2. Some suspect it comes from collections of pirated books, such as Library Genesis, Z-Library, and Bibliotik, that circulate via the BitTorrent file-sharing network. (Books3, as Presser announced after creating it, is “all of Bibliotik.”)

Presser told me by telephone that he’s sympathetic to authors’ concerns. But the great danger he perceives is a monopoly on generative AI by wealthy corporations, giving them total control of a technology that’s reshaping our culture: He created Books3 in the hope that it would allow any developer to create generative-AI tools. “It would be better if it wasn’t necessary to have something like Books3,” he said. “But the alternative is that, without Books3, only OpenAI can do what they’re doing.” To create the dataset, Presser downloaded a copy of Bibliotik from The-Eye.eu and updated a program written more than a decade ago by the hacktivist Aaron Swartz to convert the books from ePub format (a standard for ebooks) to plain text—a necessary change for the books to be used as training data. Although some of the titles in Books3 are missing relevant copyright-management information, the deletions were ostensibly a by-product of the file conversion and the structure of the ebooks; Presser told me he did not knowingly edit the files in this way.

Many commentators have argued that training AI with copyrighted material constitutes “fair use,” the legal doctrine that permits the use of copyrighted material under certain circumstances, enabling parody, quotation, and derivative works that enrich the culture. The industry’s fair-use argument rests on two claims: that generative-AI tools do not replicate the books they’ve been trained on but instead produce new works, and that those new works do not hurt the commercial market for the originals. OpenAI made a version of this argument in response to a 2019 query from the United States Patent and Trademark Office. According to Jason Schultz, the director of the Technology Law and Policy Clinic at NYU, this argument is strong.

I asked Schultz if the fact that books were acquired without permission might damage a claim of fair use. “If the source is unauthorized, that can be a factor,” Schultz said. But the AI companies’ intentions and knowledge matter. “If they had no idea where the books came from, then I think it’s less of a factor.” Rebecca Tushnet, a law professor at Harvard, echoed these ideas, and told me the law was “unsettled” when it came to fair-use cases involving unauthorized material, with previous cases giving little indication of how a judge might rule in the future.

This is, to an extent, a story about clashing cultures: The tech and publishing worlds have long had different attitudes about intellectual property. For many years, I’ve been a member of the open-source software community. The modern open-source movement began in the 1980s, when a developer named Richard Stallman grew frustrated with AT&T’s proprietary control of Unix, an operating system he had worked with. (Stallman worked at MIT, and Unix had been a collaboration between AT&T and several universities.) In response, Stallman developed a “copyleft” licensing model, under which software could be freely shared and modified, as long as modifications were re-shared using the same license. The copyleft license launched today’s open-source community, in which hobbyist developers give their software away for free. If their work becomes popular, they accrue reputation and respect that can be parlayed into one of the tech industry’s many high-paying jobs. I’ve personally benefited from this model, and I support the use of open licenses for software. But I’ve also seen how this philosophy, and the general attitude of permissiveness that permeates the industry, can cause developers to see any kind of license as unnecessary.

This is dangerous because some kinds of creative work simply can’t be done without more restrictive licenses. Who could spend years writing a novel or researching a work of deep history without a guarantee of control over the reproduction and distribution of the finished work? Such control is part of how writers earn money to live.

Meta’s proprietary stance with LLaMA suggests that the company thinks similarly about its own work. After the model leaked earlier this year and became available for download from independent developers who’d acquired it, Meta used a DMCA takedown order against at least one of those developers, claiming that “no one is authorized to exhibit, reproduce, transmit, or otherwise distribute Meta Properties without the express written permission of Meta.” Even after it had “open-sourced” LLaMA, Meta still wanted developers to agree to a license before using it; the same is true of a new version of the model released last month. (Neither the Pile nor Books3 is mentioned in a research paper about that new model.)

Control is more essential than ever, now that intellectual property is digital and flows from person to person as bytes through airwaves. A culture of piracy has existed since the early days of the internet, and in a sense, AI developers are doing something that’s come to seem natural. It is uncomfortably apt that today’s flagship technology is powered by mass theft.

Yet the culture of piracy has, until now, facilitated mostly personal use by individual people. The exploitation of pirated books for profit, with the goal of replacing the writers whose work was taken—this is a different and disturbing trend.