Itemoids

Meta

How Meta wins even if its new AI assistant fails

Quartz

qz.com › how-meta-wins-even-if-its-new-ai-assistant-fails-1850883439

At its annual developer conference this week, Meta announced a slew of new AI-related products, including Meta AI, a conversational AI assistant that will be available on WhatsApp, Messenger, and Instagram in the US. The chatbot will also be incorporated into Ray-Ban smart glasses and its virtual reality headset…

Read more...

A Court Ruling That Targets Trump’s Persona

The Atlantic

www.theatlantic.com › newsletters › archive › 2023 › 09 › new-york-ruling-trump-organization › 675475

This is an edition of The Atlantic Daily, a newsletter that guides you through the biggest stories of the day, helps you discover new ideas, and recommends the best in culture. Sign up for it here.

Donald Trump is a deals guy. He rode his image as real-estate mogul and a maestro of transactions first to pop-culture stardom, then to the White House. Now a judge has ruled that much of that dealmaking was fraudulent: New York Judge Arthur Engoron found yesterday that Trump and his associates, including his sons Eric and Donald Jr., committed persistent fraud by toggling estimates of property values in order to get insurance and favorable terms on loans. The judge ordered that some of the Trump Organization’s “certificates,” or corporate charters, be canceled, and that a receiver be appointed by the court to dissolve some of its New York companies. This latest blow for Trump puts on record that his mythos of business acumen was largely built on lies.

This ruling on its own hinders some of the Trump Organization’s operations in New York State by cutting off Trump’s control of assets. But really, it is just a first step toward the broader business restrictions on Trump that New York Attorney General Letitia James is seeking, Celia Bigoness, a clinical professor of law at Cornell, told me. And to the extent that this ruling shows how the judge feels about James’s suit, first brought against Trump last year, things are not looking great for him. In the trial set to start next week, the judge will determine penalties for the fraud committed: James has requested that those include a $250 million fine and restrictions that prevent the former president and some of his children from running a company in New York ever again. “Trump is synonymous with New York,” Bigoness said. Losing control of his New York businesses and properties would amount to “his home and the place that he has tied himself to shutting him out entirely.” It could also be hugely costly.

This week’s summary judgment is unusual, legal experts told me: The judge essentially determined that it was so clear that Trump had committed fraud that it wasn’t worth wasting time at a trial figuring that part out. Instead, the trial will be used to determine whether Trump’s New York businesses should be further limited as punishment for the fraud—and whether the other demands of James’s suit will be met. It’s somewhat rare for a summary judgment to get to the core of a case like this, and the judge’s decision was distinctly zingy and personal. Responding to Trump’s team’s claims that the suit wasn’t valid, Judge Engoron said that he had already rejected their arguments, and that he was reminded of the “time-loop in the film ‘Groundhog Day.’” In a footnote to his ruling, he quoted a Chico Marx line from Duck Soup: “Well, who ya gonna believe, me or your own eyes?”

In another unusual move, the judge also included individual fines against Trump’s lawyers as part of the ruling, charging each $7,500 for bringing arguments so “frivolous” that they wasted the court’s time. Separately, Trump’s lawyers are trying to sue the judge (a long-shot attempt). Trump, for his part, posted on Truth Social that he had “done business perfectly”; he also called the judge “deranged.” Reached for comment, the Trump attorney Christopher Kise called the decision “outrageous” and “completely disconnected from the facts and governing law.” “President Trump and his family will seek all available appellate remedies to rectify this miscarriage of justice,” he said in an emailed statement. An appeals process from Trump’s camp could extend into the next presidential-election cycle. His team might also attempt to get an emergency stay to prevent the trial from starting next week.

This ruling, and the rest of James’s suit, are circumscribed to New York. Technically, Trump would still be free to spin up new businesses as he sees fit in another state, and he has holdings beyond New York. But even if he could legally incorporate a new business in, say, Florida or Illinois, it might not make financial or brand sense for him. The fallout from this case could wind up being very costly for Trump, so setting up shop elsewhere, although not impossible, could be a major financial hurdle. Plus, “New York is the place Trump wants to do business and has been doing business for forever,” Caroline Polisi, a white-collar defense attorney and lecturer at Columbia Law School, told me.

Yesterday’s ruling may do little to dampen Trump’s appeal among his die-hard fans, who have stuck with him through all manner of scandals, including a running list of criminal indictments. But it could puncture Trump’s persona. My colleague David A. Graham wrote today that the fact that Trump and his co-defendants, including his sons, committed fraud is not surprising. What is surprising, he argued, is that they are facing harsh consequences. “Trump’s political career is based on the myth that he was a great businessman,” David told me. “This ruling cuts straight to the root of that, showing that his business success was built on years of lies.” Indeed, when Letitia James filed suit against Trump last year, she dubbed his behavior the “art of the steal.”

Related:

The end of Trump Inc. It’s just fraud all the way down.

Today’s News

The U.S. soldier Pvt. Travis King, who sprinted across the border into North Korea two months ago, has been released into American custody. The second Republican presidential primary debate will be held in California tonight.   A federal judge struck down a Texas law that drag performers worried would ban shows in the state.

Dispatches

Up for Debate: Driverless cars are a tough sell. Conor Friedersdorf compiles reader perspectives on the future of the technology.

Explore all of our newsletters here.

Evening Read

Illustration by The Atlantic. Source: Getty.

Revealed: The Authors Whose Pirated Books Are Powering Generative AI

By Alex Reisner

One of the most troubling issues around generative AI is simple: It’s being made in secret. To produce humanlike answers to questions, systems such as ChatGPT process huge quantities of written material. But few people outside of companies such as Meta and OpenAI know the full extent of the texts these programs have been trained on.

Some training text comes from Wikipedia and other online writing, but high-quality generative AI requires higher-quality input than is usually found on the internet—that is, it requires the kind found in books. In a lawsuit filed in California last month, the writers Sarah Silverman, Richard Kadrey, and Christopher Golden allege that Meta violated copyright laws by using their books to train LLaMA, a large language model similar to OpenAI’s GPT-4—an algorithm that can generate text by mimicking the word patterns it finds in sample texts. But neither the lawsuit itself nor the commentary surrounding it has offered a look under the hood: We have not previously known for certain whether LLaMA was trained on Silverman’s, Kadrey’s, or Golden’s books, or any others, for that matter.

In fact, it was. I recently obtained and analyzed a dataset used by Meta to train LLaMA. Its contents more than justify a fundamental aspect of the authors’ allegations: Pirated books are being used as inputs for computer programs that are changing how we read, learn, and communicate. The future promised by AI is written with stolen words.

Read the full article.

More From The Atlantic

Alabama strikes out. The banality of bad-faith science “My books were used to train Meta’s generative AI. Good.”

Culture Break

Courtesy of 20th Century Studios

Read. Libra, a fictionalization of the Kennedy assassination, is a paranoid American fable that reads so realistically that it could almost be nonfiction.

Watch. Gareth Edwards’s new movie, The Creator (in theaters September 29), is set in a future where AI has already failed to save the world.

Play our daily crossword.

Katherine Hu contributed to this newsletter.

When you buy a book using a link in this newsletter, we receive a commission. Thank you for supporting The Atlantic.

My Books Were Used to Train Meta’s Generative AI. Good.

The Atlantic

www.theatlantic.com › technology › archive › 2023 › 09 › books3-database-meta-training-ai › 675461

When The Atlantic revealed last month that tens of thousands of books published in the past 20 years had been used without permission to train Meta’s AI language model, well-known authors were outraged, calling it a “smoking gun” for mega-corporate misbehavior. Now that the magazine has put out a searchable database of affected books, the outrage is redoubled: “I would never have consented for Meta to train AI on any of my books, let alone five of them,” wrote the novelist Lauren Groff. “Hyperventilating.” The original Atlantic story gestured at this sense of violation and affront: “The future promised by AI is written with stolen words,” it said.

I understand that the database in question, called “Books3,” appears to have been assembled from torrented ebooks ripped into text files, in which case any use of it could be a breach of copyright. Still I was mystified, at first, by the Sturm und Drang response, and by the claim that generative AI is “powered by mass theft.” Perhaps I was just jealous of the famous writers who were being singled out as victims—Stephen King, Zadie Smith, Michael Pollan, and others who command huge speaking fees and lucrative secondary-rights deals. Maybe I’d better understand the writers’ angst, I thought, if my work, too, was being pirated and sourced for AI power.

Now I know that it is. Yesterday, when I put my name into The Atlantic’s database search, three of the 10 books I have authored or co-authored appeared. How exciting! I’d joined the ranks of the aggrieved. But then, despite some effort, I found myself disappointingly unaggrieved. What on earth was wrong with me?

Authors who are angry—authors who are effing furious—have pointed to the fact that their work was used without permission. That is also at the heart of a lawsuit filed in California by the comedian Sarah Silverman and two other authors, Richard Kadrey and Christopher Golden, which contends that Meta failed to seek out their consent before extracting snippets of their text, called “tokens,” for use in teaching its AI. The company used their books in ways the authors didn’t anticipate and, upon consideration, in ways they don’t approve of. (Meta has filed a motion to dismiss the suit.)

Whether or not Meta’s behavior amounts to infringement is a matter for the courts to decide. Permission is a different matter. One of the facts (and pleasures) of authorship is that one’s work will be used in unpredictable ways. The philosopher Jacques Derrida liked to talk about “dissemination,” which I take to mean that, like a plant releasing its seed, an author separates from their published work. Their readers (or viewers, or listeners) not only can but must make sense of that work in different contexts. A retiree cracks a Haruki Murakami novel recommended by a grandchild. A high-school kid skims Shakespeare for a class. My mother’s tree trimmer reads my book on play at her suggestion. A lack of permission underlies all of these uses, as it underlies influence in general: When successful, art exceeds its creator’s plans.

But internet culture recasts permission as a moral right. Many authors are online, and they can tell you if and when you’re wrong about their work. Also online are swarms of fans who will evangelize their received ideas of what a book, a movie, or an album really means and snuff out the “wrong” accounts. The Books3 imbroglio reflects the same impulse to believe that some interpretations of a work are out of bounds.

[Read: What I found in a database Meta uses to train generative AI]

Perhaps Meta is an unappealing reader. Perhaps chopping prose into tokens is not how I would like to be read. But then, who am I to say what my work is good for, how it might benefit someone—even a near-trillion-dollar company? To bemoan this one unexpected use for my writing is to undermine all of the other unexpected uses for it. Speaking as a writer, that makes me feel bad.

I also feel—am I allowed to say this?—a little bored by the idea that Meta has stolen my life. If the theft and aggregation of the works in Books3 is objectionable on moral or legal grounds, then it ought to be so irrespective of those works’ absorption into one particular technology company’s large language model. But that doesn’t seem to be the case. The Books3 database was itself uploaded in resistance to the corporate juggernauts. The person who first posted the repository has described it as the only way for open-source, grassroots AI projects to compete with huge commercial enterprises. He was trying to return some control of the future to ordinary people, including book authors. In the meantime, Meta contends that the next generation of its AI model—which may or may not still include Books3 in its training data—is “free for research and commercial use,” a statement that demands scrutiny but also complicates this saga. So does the fact that hours after The Atlantic published a search tool for Books3, one writer distributed a link that allows you to access the feature without subscribing to this magazine. In other words: a free way for people to be outraged about people getting writers’ work for free.

I’m not sure what I make of all this, as a citizen of the future no less than as a book author. Theft is an original sin of the internet. Sometimes we call it piracy (when software is uploaded to USENET, or books to Books3); other times it’s seen as innovation (when Google processed and indexed the entire internet without permission) or even liberation. AI merely iterates this ambiguity. I’m having trouble drawing any novel or definitive conclusions about the Books3 story based on the day-old knowledge that some of my writing, along with trillions more chunks of words from, perhaps, Amazon reviews and Reddit grouses, have made their way into an AI training set.

Actually, what about those Amazon reviewers and Redditors? What about the Wikipedia authors who labored to write the pages for Bratz dolls and the Bosc pear, or the bloggers whose blogs were long abandoned, or the corporate-brochure copywriters, or, heck, even the search-engine-optimization landfill dumpers? All of their work likely has been or will be sucked into the giant language models too. The total volume of textual material accessible and accessed for training AI models makes books—even nearly 200,000 of them—seem a speck by comparison.

[Read: What happens when AI has read everything?]

It is understandable, I suppose, to hold literary works in greater esteem than banana-bread-recipe introductions or Am I the Asshole subreddit posts or water-inlet-valve-replacement instructions. But it is also pretentious. We who write and publish magazines and books are professionals with a personal stake in the gravity of authorship. We are also few in number. Almost anyone can write, over years, millions of words on social media, in texts and emails, in reports and memos for their work. I love books and respect them, but, as a published author and professional writer, I may be in the category least at risk of losing my connection to the written word and its spoils. If an AI collage of Stephen King and Yelp can do better than me, what business do I have calling myself a writer in the first place?

I became an author because language offers a special medium for experimenting with ideas. Words and sentences are malleable. Texts arise from basements of subtext. What I say embraces what I don’t and makes room for what you read. Once bound and published, boxed and shipped, my books find their way to places I might never have anticipated. As vessels for ideas, I hope, but also as doorstops or insect-execution devices or as the last inch of a stack that holds up a laptop for an important Zoom. Or even—even!—as a litany of tokens, chunked apart to be reassembled by the alien mind of a weird machine. Why not? I am an author, sure, but I am also a man who put some words in order amid the uncountable others who have done the same. If authorship is nothing more than vanity, then let the machines put us out of our misery.

Apple may be quiet on AI, but it’s also the biggest buyer of AI companies

Quartz

qz.com › apple-may-be-quiet-on-ai-but-it-s-also-the-biggest-buy-1850872570

If there’s one thing that has been constant in the artificial intelligence frenzy it’s that Big Tech companies Google, Microsoft, Meta, and Amazon can’t stop talking about their AI investments, whether that’s in earnings calls or new product announcements. But one industry leader tends to be absent from such chatter: A…

Read more...

Search the Books Database Powering Meta’s Generative AI

The Atlantic

www.theatlantic.com › technology › archive › 2023 › 09 › books3-database-generative-ai-training-copyright-infringement › 675363

Editor’s note: This searchable database is part of The Atlantic’s series on Books3. You can read about the origins of the database here, and an analysis of what’s in it here.

This summer, I acquired a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. I wrote in The Atlantic about how the data set, known as “Books3,” was based on a collection of pirated ebooks, most of them published in the past 20 years. Since then, I’ve done a deep analysis of what’s actually in the data set, which is now at the center of several lawsuits brought against Meta by writers such as Sarah Silverman, Michael Chabon, and Paul Tremblay, who claim that its use in training generative AI amounts to copyright infringement.

Since my article appeared, I’ve heard from several authors wanting to know if their work is in Books3. In almost all cases, the answer has been yes. These authors spent years thinking, researching, imagining, and writing, and had no idea that their books were being used to train machines that could one day replace them. Meanwhile, the people building and training these machines stand to profit enormously.

Reached for comment, a spokesperson for Meta did not directly answer questions about the use of pirated books to train LLaMA, the company’s generative-AI product. Instead, she pointed me to a court filing from last week related to the Silverman lawsuit, in which lawyers for Meta argue that the case should be dismissed in part because neither the LLaMA model nor its outputs are “substantially similar” to the authors’ books.

It may be beyond the scope of copyright law to address the harms being done to authors by generative AI, and the point remains that AI-training practices are secretive and fundamentally nonconsensual. Very few people understand exactly how these programs are developed, even as such initiatives threaten to upend the world as we know it. Books are stored in Books3 as large, unlabeled blocks of text. To identify their authors and titles, I extracted ISBNs from these blocks of text and looked them up in a book database. Of the 191,000 titles I identified, 183,000 have associated author information. You can use the search tool below to look up authors in this subset and see which of their titles are included.

Before you begin, please note several caveats: Some books appear multiple times, reflecting different editions, translations, abridgements, or annotations. Because of inconsistencies in the spelling of author names, the search may not return books that are, in fact, in Books3. It may also deliver a jumble of odd formatting: A query for Agatha Christie will also return books labeled Agatha Christie and Christie Agatha, for example. And because of possible errors in the book-identification process, which involves detecting an ISBN within the text of the books and using a book database to find their author and title, there is a very small chance of false positives.

What I Found in a Database Meta Uses to Train Generative AI

The Atlantic

www.theatlantic.com › technology › archive › 2023 › 09 › books3-ai-training-meta-copyright-infringement-lawsuit › 675411

Editor’s note: This article is part of The Atlantic’s series on Books3. You can search the database for yourself here, and read about its origins here.

This summer, I reported on a data set of more than 191,000 books that were used without permission to train generative-AI systems by Meta, Bloomberg, and others. “Books3,” as it’s called, was based on a collection of pirated ebooks that includes travel guides, self-published erotic fiction, novels by Stephen King and Margaret Atwood, and a lot more. It is now at the center of several lawsuits brought against Meta by writers who claim that its use amounts to copyright infringement.

Books play a crucial role in the training of generative-AI systems. Their long, thematically consistent paragraphs provide information about how to construct long, thematically consistent paragraphs—something that’s essential to creating the illusion of intelligence. Consequently, tech companies use huge data sets of books, typically without permission, purchase, or licensing. (Lawyers for Meta argued in a recent court filing that neither outputs from the company’s generative AI nor the model itself are “substantially similar” to existing books.)

In its training process, a generative-AI system essentially builds a giant map of English words—the distance between two words correlates with how often they appear near each other in the training text. The final system, known as a large language model, will produce more plausible responses for subjects that appear more often in its training text. (For further details on this process, you can read about transformer architecture, the innovation that precipitated the boom in large language models such as LLaMA and ChatGPT.) A system trained primarily on the Western canon, for example, will produce poor answers to questions about Eastern literature. This is just one reason it’s important to understand the training data used by these models, and why it’s troubling that there is generally so little transparency.

With that in mind, here are some of the most represented authors in Books3, with the approximate number of entries contributed:

Although 24 of the 25 authors listed here are fiction writers (the lone exception is Betty Crocker), the data set is two-thirds nonfiction overall. It includes several thousand technical manuals; more than 1,500 books from Christian publishers (including at least 175 Bibles and Bible commentaries); more than 400 Dungeons & Dragons– and Magic the Gathering–themed books; and 46 titles by Charles Bukowski. Nearly every subject imaginable is covered (including How to Housebreak Your Dog in 7 Days), but the collection skews heavily toward the interests and perspectives of the English-speaking Western world.

Many people have written about bias in AI systems. An AI-based face-recognition program, for example, that’s trained disproportionately on images of light-skinned people might work less well on images of people with darker skin—with potentially disastrous outcomes. Books3 helps us see the problem from another angle: What combination of books would be unbiased? What would be an equitable distribution of Christian, Muslim, Buddhist, and Jewish subjects? Are extremist views balanced by moderate ones? What’s the proper ratio of American history to Chinese history, and what perspectives should be represented within each? When knowledge is organized and filtered by algorithm rather than by human judgment, the problem of perspective becomes both crucial and intractable.

Books3 is a gigantic dataset. Here are just a few different ways to consider the authors, books, and publishers contained within. Note that the samples presented here are not comprehensive; they are chosen to give a quick sense of the many different types of writing used to train generative AI. As above, book counts may include multiple editions.

As AI chatbots begin to replace traditional search engines, the tech industry’s power to constrain our access to information and manipulate our perspective increases exponentially. If the internet democratized access to information by eliminating the need to go to a library or consult an expert, the AI chatbot is a return to the old gatekeeping model, but with a gatekeeper that’s opaque and unaccountable—a gatekeeper, moreover, that is prone to “hallucinations” and might or might not cite sources.

In its recent court filing—a motion to dismiss the lawsuit brought by the authors Richard Kadrey, Sarah Silverman, and Christopher Golden—Meta observed that “Books3 comprises an astonishingly small portion of the total text used to train LLaMA.” This is technically true (I estimate that Books3 is about 3 percent of LLaMA’s total training text) but sidesteps a core concern: If LLaMA can summarize Silverman’s book, then it likely relies heavily on the text of her book to do so. In general, it’s hard to know how much any given source contributes to a generative-AI system’s output, given the impenetrability of current algorithms.

Still, our only clue to the kinds of information and opinions AI chatbots will dispense is their training data. A look at Books3 is a good start, but it’s just one corner of the training-data universe, most of which remains behind closed doors.

What Big Tech Knows About Your Body

The Atlantic

www.theatlantic.com › technology › archive › 2023 › 09 › online-privacy-personal-health-data › 675182

If you were seeking online therapy from 2017 to 2021—and a lot of people were—chances are good that you found your way to BetterHelp, which today describes itself as the world’s largest online-therapy purveyor, with more than 2 million users. Once you were there, after a few clicks, you would have completed a form—an intake questionnaire, not unlike the paper one you’d fill out at any therapist’s office: Are you new to therapy? Are you taking any medications? Having problems with intimacy? Experiencing overwhelming sadness? Thinking of hurting yourself? BetterHelp would have asked you if you were religious, if you were LGBTQ, if you were a teenager. These questions were just meant to match you with the best counselor for your needs, small text would have assured you. Your information would remain private.

Except BetterHelp isn’t exactly a therapist’s office, and your information may not have been completely private. In fact, according to a complaint brought by federal regulators, for years, BetterHelp was sharing user data—including email addresses, IP addresses, and questionnaire answers—with third parties, including Facebook and Snapchat, for the purposes of targeting ads for its services. It was also, according to the Federal Trade Commission, poorly regulating what those third parties did with users’ data once they got them. In July, the company finalized a settlement with the FTC and agreed to refund $7.8 million to consumers whose privacy regulators claimed had been compromised. (In a statement, BetterHelp admitted no wrongdoing and described the alleged sharing of user information as an “industry-standard practice.”)

We leave digital traces about our health everywhere we go: by completing forms like BetterHelp’s. By requesting a prescription refill online. By clicking on a link. By asking a search engine about dosages or directions to a clinic or pain in chest dying???? By shopping, online or off. By participating in consumer genetic testing. By stepping on a smart scale or using a smart thermometer. By joining a Facebook group or a Discord server for people with a certain medical condition. By using internet-connected exercise equipment. By using an app or a service to count your steps or track your menstrual cycle or log your workouts. Even demographic and financial data unrelated to health can be aggregated and analyzed to reveal or infer sensitive information about people’s physical or mental-health conditions.  

All of this information is valuable to advertisers and to the tech companies that sell ad space and targeting to them. It’s valuable precisely because it’s intimate: More than perhaps anything else, our health guides our behavior. And the more these companies know, the easier they can influence us. Over the past year or so, reporting has found evidence of a Meta tracking tool collecting patient information from hospital websites, and apps from Drugs.com and WebMD sharing search terms such as herpes and depression, plus identifying information about users, with advertisers. (Meta has denied receiving and using data from the tool, and Drugs.com has said that it was not sharing data that qualified as “sensitive personal information.”) In 2021, the FTC settled with the period and ovulation app Flo, which has reported having more than 100 million users, after alleging that it had disclosed information about users’ reproductive health with third-party marketing and analytics services, even though its privacy policies explicitly said that it wouldn’t do so. (Flo, like BetterHelp, said that its agreement with the FTC wasn’t an admission of wrongdoing and that it didn’t share users’ names, addresses, or birthdays.)

Of course, not all of our health information ends up in the hands of those looking to exploit it. But when it does, the stakes are high. If an advertiser or a social-media algorithm infers that people have specific medical conditions or disabilities and subsequently excludes them from receiving information on housing, employment, or other important resources, this limits people’s life opportunities. If our intimate information gets into the wrong hands, we are at increased risk of fraud or identity theft: People might use our data to open lines of credit, or to impersonate us to get medical services and obtain drugs illegally, which can lead not just to a damaged credit rating, but also to canceled insurance policies and denial of care. Our sensitive personal information could even be made public, leading to harassment and discrimination.

Many people believe that their health information is private under the federal Health Insurance Portability and Accountability Act, which protects medical records and other personal health information. That’s not quite true. HIPAA only protects information collected by “covered entities” and their “business associates”: Health-insurance companies, doctors, hospitals, and some companies that do business with them are limited in how they collect, use, and share information. A whole host of companies that handle our health information—including social-media companies, advertisers, and the majority of health tools marketed directly to consumers—aren’t covered at all.

“When somebody downloads an app on their phone and starts inputting health data in it, or data that might be health indicative, there are definitely no protections for that data other than what the app has promised,” Deven McGraw, a former deputy director of health-information privacy in the Office for Civil Rights at the Department of Health and Human Services, told me. (McGraw currently works as the lead for data stewardship and data sharing at the genetic-testing company Invitae.) And even then, consumers have no way of knowing if an app is following its stated policies. (In the case of BetterHelp, the FTC complaint points out that from September 2013 to December 2020, the company displayed seals saying HIPAA on its website—despite the fact that “no government agency or other third party reviewed [its] information practices for compliance with HIPAA, let alone determined that the practices met the requirements of HIPAA.”)

Companies that sell ads are often quick to point out that information is aggregated: Tech companies use our data to target swaths of people based on demographics and behavior, rather than individuals. But those categories can be quite narrow: Ashkenazi Jewish women of childbearing age, say, or men living in a specific zip code, or people whose online activity may have signaled interest in a specific disease, according to recent reporting. Those groups can then be served hyper-targeted pharmaceutical ads at best, and unscientific “cures” and medical disinformation at worst. They can also be discriminated against: Last year, the Department of Justice settled with Meta over allegations that the latter had violated the Fair Housing Act in part by allowing advertisers to not show housing ads to users who Facebook’s data-collection machine had inferred were interested in topics including “service animal” and “accessibility.”

Recent settlements have demonstrated an increased interest on the part of the FTC in regulating health privacy. But that and most of its other actions are carried out via a consent order, or a settlement approved by the commissioners, whereby the two parties resolve a dispute without an admission of wrongdoing (as happened with both Flo and BetterHelp). If a company appears to have violated the terms of a consent decree, a federal court can then investigate. But the agency has limited enforcement resources. In 2022, a coalition of privacy and consumer advocates wrote a letter to the chairs and ranking members of the House and Senate appropriations committees, urging them to increase funding for the FTC. The commission requested $490 million for fiscal year 2023, up from the $376.5 million it received in 2022, pointing to stark increases in consumer complaints and reported consumer fraud. It ultimately received $430 million.

For its part, the FTC has created an interactive tool to help app creators be in compliance with the law as they build and market their products. And HHS’s Office for Civil Rights has provided guidance on the uses of online tracking technologies by HIPAA-covered entities and business associates. This may head off privacy issues before apps cause harm.

The nonprofit Center for Democracy & Technology has also put together its own proposed consumer-privacy framework in response to the fact that “extraordinary amounts of information reflecting mental and physical well-being are created and held by entities that are not bound by HIPAA obligations.” The framework emphasizes appropriate limits on the collection, disclosure, and use of health data as well as information that can be used to make inferences about a person’s physical or mental health. It moves the burden off consumers, patients, and users—who, it notes, may already be burdened with their health condition—and places it on the entities collecting, sharing, and using the information. It also limits data use to purposes that people anticipate and want, not ones they don’t know about or aren’t comfortable with.

But that framework is, for the time being, just a suggestion. In the absence of comprehensive federal data-privacy legislation that accounts for all the new technologies that now have access to our health information, our most intimate data are governed by a ragged patchwork of laws and regulations that are no match for the enormous companies that benefit from having access to those data—or for the very real needs that drive patients to use these tools in the first place. Patients enter their symptoms into search engines or fill out online questionnaires or download apps not because they don’t care, or aren’t thinking, about their privacy. They do these things because they want help, and the internet is the easiest or fastest or cheapest or most natural place to go for it. Tech-enabled health products provide an undeniable service, especially in a country plagued by health disparities. They’re unlikely to get less popular. It’s time the laws designed to protect our health information caught up.