Itemoids

Large

Would You Have a Baby If You Won the Lottery?

The Atlantic

www.theatlantic.com › ideas › archive › 2023 › 03 › money-wealth-lottery-impact-fertility-rate › 673549

South Korea’s fertility rate in 2022 was just 0.78 children per woman. In much of America, rates aren’t significantly higher: 0.92 children per woman in Puerto Rico and 1.36 in Vermont; in the Bay Area, it’s about 1.3. Demographers give many explanations for declining birth rates, but one of the most popular revolves around work and family. In countries such as the social-welfare states of Northern Europe, where women are given flexibility to square the demands of work with family, fertility rates are relatively high. In others, where either work or family makes excessive and incompatible demands, family loses out, and fertility falls.

Improving work-life balance is probably worthwhile and good for plenty of reasons. But very little evidence shows that it would have much effect on fertility. For example, when men help more at home, fertility doesn’t rise, one 2018 study found. And although policies to support work and family do boost fertility, their cost is pretty high for fairly modest effects (though they may have other valuable benefits: Child allowances, for example, reduce child poverty).

The ideal way to test the connection would be to randomly give some people a much better set of work-life arrangements and then see whether their family behaviors change. This would be hard to do, but as it happens, something like this random improvement in work-life balance actually occurs—when people win the lottery.

What happens when you win the lottery? Obviously, you get a considerable amount of money. Maybe you buy a new car, or a house, or pay off some debts. But you can also use your new wealth to establish a better work-life-balance: hire a cleaning service or a nanny, or cut back on work hours. Large random transfers of wealth are a nice way to test the materialist account of fertility. If receiving a pile of cash makes people have more babies, then maybe work-life balance matters a lot: More income from less demanding work would boost births. But if lottery winnings don’t increase fertility, maybe the work-life-balance theory needs some adjustment.

A team of economists studying a large pool of lottery players in Sweden found that when men win the lottery, they become a lot likelier than demographically similar lottery losers to get married (if they were lower-income and unmarried before their win), and they have more children. On its face, that supports the work-life-balance idea. But when women win the lottery, the only big change in their behavior is divorce: Divorce rates for women almost double in the first couple of years after winning the lottery.

The authors offer some straightforward explanations. When the men became wealthier, they became more desirable partners; their marriage rate increased by a third. Their increased fertility (an increase of about 13 percent) was itself largely attributable to the effect of being married, because being married tends to cause higher fertility, especially for men. The extra wealth apparently had no major effect on women’s desirability as partners, but—because Swedish law allows lottery winners to hold on to most of the winnings—had perhaps a big effect on their expectations for a postdivorce standard of living, enabling them to feel more confident about exiting a marriage.

Helpfully, the study authors showed that their conclusions matched the findings of another study of lotteries in the United States. It found that winning made both men and women more likely to marry, but that the effect was stronger for men, and that while it decreased divorce rates for men, it increased them for women. Men seem to use their newfound resources to build families, while women use them to exit families.

This seems like an inversion of common stereotypes about men and women. But reality is never so simple.

Marriage is a strong predictor of fertility for many reasons. Not least is the relationship between marital status and mental health, because mental health has a large impact on childbearing. Lottery winnings boosted male marriage and fertility not because men had unique desires for marriage and family, but at least in part because Swedish women were likelier to marry and have children with men who had more money. The effect for men is as much about women’s preferences and behaviors as men’s.  

Women’s responses to winning the lottery were similarly complex. Divorce rates did not rise equally for all of them. That increase, the authors found, was concentrated among previously low-income women, those who had been married to older or wealthier men, and those who were married for three years or less. The conditions under which these women entered into marriages with these husbands could be more important than the lottery winnings. Crucially, 10 years after the lottery, winners were no more likely to divorce than other women. In other words, lottery money may have accelerated inevitable divorces rather than breaking apart couples that would otherwise have stayed together for the long term.

If people want to have more children than they can presently afford—and surveys have repeatedly suggested that they do—and societies as a whole thrive when parents of all kinds are able to raise their children in stable households, then declining birth rates are cause for alarm. And for governments seeking to reverse them by creating family policies, this research indicates that some kinds of spending may prove more effective than others.

First, it suggests that a core part of low fertility is how people (especially women) value potential partners. In surveys, women continue to report desiring much-higher-earning partners, and when men suddenly have more money, they do in fact get married more. Policy makers cannot (and should not) “solve” this by simply handing out “man bonuses.” However, understanding why men and boys are becoming more likely to fall behind women in terms of educational and professional attainment could be a core part of increasing fertility.

Second, policy makers should avoid thinking about family policy as an issue uniquely related to women. Arguments that fertility can be increased by pushing for a maximally gender-egalitarian society or by delivering family subsidies disproportionately to mothers should be reconsidered. You can’t get higher fertility without men on board. Only policies that make space for men and women to choose to prioritize parenting can support higher fertility in industrialized societies.

Third, marriage itself matters, and marriage responds to material incentives. Boosting fertility by directly targeting fertility is difficult and expensive. One reason is that marriage continues to be a gatekeeping institution for larger families. Only by removing obstacles to marriage and helping young people wed earlier and stick together can birth rates be sustainably increased. This is a challenging task, and “pro-nuptialism” has even less high-quality research on it than “pro-natalism.” However, policy makers could offer “marriage bonuses” or at least eliminate marriage penalties, like the fact that low-income people can lose their housing or SNAP benefits if they choose to combine their incomes. This much seems safe to say: Working-class people shouldn’t need to win the lottery to feel that they can afford to get married and have kids.

Even Chatbots Have to Take the SAT

The Atlantic

www.theatlantic.com › technology › archive › 2023 › 03 › open-ai-gpt4-standardized-tests-sat-ap-exams › 673458

Last fall, when generative AI abruptly started turning out competent high-school- and college-level writing, some educators saw it as an opportunity. Perhaps it was time, at last, to dispose of the five-paragraph essay, among other bad teaching practices that have lingered for generations. Universities and colleges convened emergency town halls before winter terms began to discuss how large language models might reshape their work, for better and worse.

But just as quickly, most of those efforts evaporated into the reality of normal life. Educators and administrators have so many problems to address even before AI enters the picture; the prospect of utterly redesigning writing education and assessment felt impossible. Worthwhile, but maybe later. Then, with last week’s arrival of GPT-4, came another provocation. OpenAI, the company that created the new software, put out a paper touting its capacities. Among them: taking tests. AIs are no longer just producing passable five-paragraph essays. Now they’re excelling at the SAT, “earning” a score of 1410. They’re getting passing grades on more than a dozen different AP exams. They’re doing well enough on bar exams to be licensed as lawyers.

It would be nice if this news inspired educators, governments, certification agencies, and other groups to rethink what these tests really mean—or even to reinvent them altogether. Alas, as was the case for rote-essay writing, whatever appetite for change the shock inspires might prove to be short-lived. GPT-4’s achievements help reveal the underlying problem: Americans love standardized tests as much as we hate them—and we’re unlikely to let them go even if doing so would be in our best interest.

Many of the initial responses to GPT-4’s exam prowess were predictably immoderate: AI can keep up with human lawyers, or apply to Stanford, or make “education” useless. But why should it be startling in the slightest that software trained on the entire text of the internet performs well on standardized exams? AI can instantly run what amounts to an open-book test on any subject through statistical analysis and regression. Indeed, that anyone is surprised at all by this success suggests that people tend to get confused about what it means when computers prove effective at human activities.

[Read: The college essay is dead]

Back in the late 1990s, nobody thought a computer could ever beat a human at Go, the ancient Chinese game played with black and white stones. Chess had been mastered by supercomputers, but Go remained—at least in the hearts of its players—immune to computation. They were wrong. Two decades later, DeepMind’s AlphaGo was regularly beating Go masters. To accomplish this task, AlphaGo initially mimicked human players’ moves before running innumerable games against itself to find new strategies. The victory was construed by some as evidence that computers could overtake people at complex tasks previously thought to be uniquely human.

By rights, GPT-4’s skill at the SAT should be taken as the opposite. Standardized tests feel inhuman from the start: You, a distinct individual, are forced to perform in a manner that can be judged by a machine, and then compared with that of many other individuals. Yet last week’s announcement—of the 1410 score, the AP exams, and so on—gave rise to an unease similar to that produced by AlphaGo.

Perhaps we’re anxious not that computers will strip us of humanity, but that machines will reveal the vanity of our human concerns. The experience of reasoning about your next set of moves in Go, as a human player doing so from the vantage point of human culture, cannot be replaced or reproduced by a Go-playing machine—unless the only point of Go were to prove that Go can be mastered, rather than played. Such cultural values do exist: The designation of chess grand masters and Go 9-dan professionals suggests expertise in excess of mere performance in a folk game. The best players of chess and Go are sometimes seen as smart in a general sense, because they are good at a game that takes smarts of a certain sort. The same is true for AIs that play (and win) these games.

[Read: A machine crushed us at Pokémon]

Standardized tests occupy a similar cultural role. They were conceived to assess and communicate general performance on a subject such as math or reading. Whether and how they ever managed to do that is up for debate, but the accuracy and fairness of the exams became less important than their social function. To score a 1410 on the SAT says something about your capacities and prospects—maybe you can get into Stanford. To pursue and then emerge victorious against a battery of AP tests suggests general ability warranting accelerated progress in college. (That victory doesn’t necessarily provide that acceleration only emphasizes the seduction of its symbolism.) The bar exam measures—one hopes—someone’s subject-matter proficiency, but doesn’t promise to ensure lawyerly effectiveness or even competence. To perform well on a standardized test indicates potential to perform well at some real future activity, but it has also come to have some value in itself, as a marker of success at taking tests.

That value was already being questioned, machine intelligence aside. Standardized tests have long been scrutinized for contributing to discrimination against minority and low-income students. The coronavirus pandemic, and its disruptions to educational opportunity, intensified those concerns. Many colleges and universities made the SAT and ACT optional for admissions. Graduate schools are giving up on the GRE, and aspiring law students may no longer have to take the LSAT in a couple of years.

GPT-4’s purported prowess at these tests shows how little progress has been made at decoupling appearance from reality in the tests’ pursuit. Standardized tests might fairly assess human capacity, or they might do so unfairly, but either way, they hold an outsize role in Americans’ conception of themselves and their communities. We’re nervous that tests might turn us into computers, but also that computers might reveal the conceit of valuing tests so much in the first place.

AI-based chess and Go computers didn’t obsolesce play by people, but they did change human-training practices. Large language models may do the same for taking the SAT and other standardized exams, and evolve into a fancy form of test prep. In that case, they could end up helping those who would already have done well enough to score even higher. Or perhaps they will become the basis for a low-cost alternative that puts such training in the hands of everyone—a reversal of examination inequity, and a democratization of vanity. No matter the case, the standardized tests will persist, only now the chatbots have to take them too.

GPT-4 Has the Memory of a Goldfish

The Atlantic

www.theatlantic.com › technology › archive › 2023 › 03 › gpt-4-has-memory-context-window › 673426

By this point, the many defects of AI-based language models have been analyzed to death—their incorrigible dishonesty, their capacity for bias and bigotry, their lack of common sense. GPT-4, the newest and most advanced such model yet, is already being subjected to the same scrutiny, and it still seems to misfire in pretty much all the ways earlier models did. But large language models have another shortcoming that has so far gotten relatively little attention: their shoddy recall. These multibillion-dollar programs, which require several city blocks’ worth of energy to run, may now be able to code websites, plan vacations, and draft company-wide emails in the style of William Faulkner. But they have the memory of a goldfish.

Ask ChatGPT “What color is the sky on a sunny, cloudless day?” and it will formulate a response by inferring a sequence of words that are likely to come next. So it answers, “On a sunny, cloudless day, the color of the sky is typically a deep shade of blue.” If you then reply, “How about on an overcast day?,” it understands that you really mean to ask, in continuation of your prior question, “What color is the sky on an overcast day?” This ability to remember and contextualize inputs is what gives ChatGPT the ability to carry on some semblance of an actual human conversation rather than simply providing one-off answers like a souped-up Magic 8 ball.

The trouble is that ChatGPT’s memory—and the memory of large language models more generally—is terrible. Each time a model generates a response, it can take into account only a limited amount of text, known as the model’s context window. ChatGPT has a context window of roughly 4,000 words—long enough that the average person messing around with it might never notice but short enough to render all sorts of complex tasks impossible. For instance, it wouldn’t be able to summarize a book, review a major coding project, or search your Google Drive. (Technically, context windows are measured not in words but in tokens, a distinction that becomes more important when you’re dealing with both visual and linguistic inputs.)

[Read: ChatGPT changed everything. Now its follow-up is here.]

For a vivid illustration of how this works, tell ChatGPT your name, paste 5,000 or so words of nonsense into the text box, and then ask what your name is. You can even say explicitly, “I’m going to give you 5,000 words of nonsense, then ask you my name. Ignore the nonsense; all that matters is remembering my name.” It won’t make a difference. ChatGPT won’t remember.

With GPT-4, the context window has been increased to roughly 8,000 words—as many as would be spoken in about an hour of face-to-face conversation. A heavy-duty version of the software that OpenAI has not yet released to the public can handle 32,000 words. That’s the most impressive memory yet achieved by a transformer, the type of neural net on which all the most impressive large language models are now based, says Raphaël Millière, a Columbia University philosopher whose work focuses on AI and cognitive science. Evidently, OpenAI made expanding the context window a priority, given that the company devoted a whole team to the issue. But how exactly that team pulled off the feat is a mystery; OpenAI has divulged pretty much zero about GPT-4’s inner workings. In the technical report released alongside the new model, the company justified its secrecy with appeals to the “competitive landscape” and “safety implications” of AI. When I asked for an interview with members of the context-window team, OpenAI did not answer my email.

[Read: What have humans just unleashed?]

For all the improvement to its short-term memory, GPT-4 still can’t retain information from one session to the next. Engineers could make the context window two times or three times or 100 times bigger, and this would still be the case: Each time you started a new conversation with GPT-4, you’d be starting from scratch. When booted up, it is born anew. (Doesn’t sound like a very good therapist.)

But even without solving this deeper problem of long-term memory, just lengthening the context window is no easy thing. As the engineers extend it, Millière told me, the computation power required to run the language model—and thus its cost of operation—increases exponentially. A machine’s total memory capacity is also a constraint, according to Alex Dimakis, a computer scientist at the University of Texas at Austin and a co-director of the Institute for Foundations of Machine Learning. No single computer that exists today, he told me, could support, say, a million-word context window.

Some AI developers have extended language models’ context windows through the use of work-arounds. In one approach, the model is programmed to maintain a working summary of each conversation. Say the model has a 4,000-word context window, and your conversation runs to 5,000 words. The model responds by saving a 100-word summary of the first 1,100 words for its own reference, and then remembers that summary plus the most recent 3,900 words. As the conversation gets longer and longer, the model continually updates its summary—a clever fix, but more a Band-Aid than a solution. By the time your conversation hits 10,000 words, the 100-word summary would be responsible for capturing the first 6,100 of them. Necessarily, it will omit a lot.

[Read: GPT-4 might just be a bloated, pointless mess]

Other engineers have proposed more complex fixes for the short-term-memory issue, but none of them solves the rebooting problem. That, Dimakis told me, will likely require a more radical shift in design, perhaps even a wholesale abandonment of the transformer architecture on which every GPT model has been built. Simply expanding the context window will not do the trick.

The problem, at its core, is not really a problem of memory but one of discernment. The human mind is able to sort experience into categories: We (mostly) remember the important stuff and (mostly) forget the oceans of irrelevant information that wash over us each day. Large language models do not distinguish. They have no capacity for triage, no ability to distinguish garbage from gold. “A transformer keeps everything,” Dimakis told me. “It treats everything as important.” In that sense, the trouble isn’t that large language models can’t remember; it’s that they can’t figure out what to forget.

The Age of Infinite Misinformation Has Arrived

The Atlantic

www.theatlantic.com › technology › archive › 2023 › 03 › ai-chatbots-large-language-model-misinformation › 673376

New AI systems such as ChatGPT, the overhauled Microsoft Bing search engine, and the reportedly soon-to-arrive GPT-4 have utterly captured the public imagination. ChatGPT is the fastest-growing online application, ever, and it’s no wonder why. Type in some text, and instead of getting back web links, you get well-formed, conversational responses on whatever topic you selected—an undeniably seductive vision.

But the public, and the tech giants, aren’t the only ones who have become enthralled with the Big Data–driven technology known as the large language model. Bad actors have taken note of the technology as well. At the extreme end, there’s Andrew Torba, the CEO of the far-right social network Gab, who said recently that his company is actively developing AI tools to “uphold a Christian worldview” and fight “the censorship tools of the Regime.” But even users who aren’t motivated by ideology will have their impact. Clarkesworld, a publisher of sci-fi short stories, temporarily stopped taking submissions last month, because it was being spammed by AI-generated stories—the result of influencers promoting ways to use the technology to “get rich quick,” the magazine’s editor told The Guardian.  

This is a moment of immense peril: Tech companies are rushing ahead to roll out buzzy new AI products, even after the problems with those products have been well documented for years and years. I am a cognitive scientist focused on applying what I’ve learned about the human mind to the study of artificial intelligence. Way back in 2001, I wrote a book called The Algebraic Mind in which I detailed then how neural networks, a kind of vaguely brainlike technology undergirding some AI products, tended to overgeneralize, applying individual characteristics to larger groups. If I told an AI back then that my aunt Esther had won the lottery, it might have concluded that all aunts, or all Esthers, had also won the lottery.

Technology has advanced quite a bit since then, but the general problem persists. In fact, the mainstreaming of the technology, and the scale of the data it’s drawing on, has made it worse in many ways. Forget Aunt Esther: In November, Galactica, a large language model released by Meta—and quickly pulled offline—reportedly claimed that Elon Musk had died in a Tesla car crash in 2018. Once again, AI appears to have overgeneralized a concept that was true on an individual level (someone died in a Tesla car crash in 2018) and applied it erroneously to another individual who happens to shares some personal attributes, such as gender, state of residence at the time, and a tie to the car manufacturer.

This kind of error, which has come to be known as a “hallucination,” is rampant. Whatever the reason that the AI made this particular error, it’s a clear demonstration of the capacity for these systems to write fluent prose that is clearly at odds with reality. You don’t have to imagine what happens when such flawed and problematic associations are drawn in real-world settings: NYU’s Meredith Broussard and UCLA’s Safiya Noble are among the researchers who have repeatedly shown how different types of AI replicate and reinforce racial biases in a range of real-world situations, including health care. Large language models like ChatGPT have been shown to exhibit similar biases in some cases.

Nevertheless, companies press on to develop and release new AI systems without much transparency, and in many cases without sufficient vetting. Researchers poking around at these newer models have discovered all kinds of disturbing things. Before Galactica was pulled, the journalist Tristan Greene discovered that it could be used to create detailed, scientific-style articles on topics such as the benefits of anti-Semitism and eating crushed glass, complete with references to fabricated studies. Others found that the program generated racist and inaccurate responses. (Yann LeCun, Meta’s chief AI scientist, has argued that Galactica wouldn’t make the online spread of misinformation easier than it already is; a Meta spokesperson told CNET in November, “Galactica is not a source of truth, it is a research experiment using [machine learning] systems to learn and summarize information.”)

More recently, the Wharton professor Ethan Mollick was able to get the new Bing to write five detailed and utterly untrue paragraphs on dinosaurs’ “advanced civilization,” filled with authoritative-sounding morsels including “For example, some researchers have claimed that the pyramids of Egypt, the Nazca lines of Peru, and the Easter Island statues of Chile were actually constructed by dinosaurs, or by their descendents or allies.” Just this weekend, Dileep George, an AI researcher at DeepMind, said he was able to get Bing to create a paragraph of bogus text stating that OpenAI and a nonexistent GPT-5 played a role in the Silicon Valley Bank collapse. Microsoft did not immediately answer questions about these responses when reached for comment; last month, a spokesperson for the company said, “Given this is an early preview, [the new Bing] can sometimes show unexpected or inaccurate answers … we are adjusting its responses to create coherent, relevant and positive answers.”

[Read: Conspiracy theories have a new best friend]

Some observers, like LeCun, say that these isolated examples are neither surprising nor concerning: Give a machine bad input and you will receive bad output. But the Elon Musk car crash example makes clear these systems can create hallucinations that appear nowhere in the training data. Moreover, the potential scale of this problem is cause for worry. We can only begin to imagine what state-sponsored troll farms with large budgets and customized large language models of their own might accomplish. Bad actors could easily use these tools, or tools like them, to generate harmful misinformation, at unprecedented and enormous scale. In 2020, Renée DiResta, the research manager of the Stanford Internet Observatory, warned that the “supply of misinformation will soon be infinite.” That moment has arrived.

Each day is bringing us a little bit closer to a kind of information-sphere disaster, in which bad actors weaponize large language models, distributing their ill-gotten gains through armies of ever more sophisticated bots. GPT-3 produces more plausible outputs than GPT-2, and GPT-4 will be more powerful than GPT-3. And none of the automated systems designed to discriminate human-generated text from machine-generated text has proved particularly effective.

[Read: ChatGPT is about to dump more work on everyone]

We already face a problem with echo chambers that polarize our minds. The mass-scale automated production of misinformation will assist in the weaponization of those echo chambers and likely drive us even further into extremes. The goal of the Russian “Firehose of Falsehood” model is to create an atmosphere of mistrust, allowing authoritarians to step in; it is along these lines that the political strategist Steve Bannon aimed, during the Trump administration, to “flood the zone with shit.” It’s urgent that we figure out how democracy can be preserved in a world in which misinformation can be created so rapidly, and at such scale.  

One suggestion, worth exploring but likely insufficient, is to “watermark” or otherwise track content that is produced by large language models. OpenAI might for example watermark anything generated by GPT-4, the next-generation version of the technology powering ChatGPT; the trouble is that bad actors could simply use alternative large language models to create whatever they want, without watermarks.

A second approach is to penalize misinformation when it is produced at large scale. Currently, most people are free to lie most of the time without consequence, unless they are, for example, speaking under oath. America’s Founders simply didn’t envision a world in which someone could set up a troll farm and put out a billion mistruths in a single day, disseminated with an army of bots, across the internet. We may need new laws to address such scenarios.

A third approach would be to build a new form of AI that can detect misinformation, rather than simply generate it. Large language models are not inherently well suited to this; they lose track of the sources of information that they use, and lack ways of directly validating what they say. Even in a system like Bing’s, where information is sourced from the web, mistruths can emerge once the data are fed through the machine. Validating the output of large language models will require developing new approaches to AI that center reasoning and knowledge, ideas that were once popular but are currently out of fashion.  

It will be an uphill, ongoing move-and-countermove arms race from here; just as spammers change their tactics when anti-spammers change theirs, we can expect a constant battle between bad actors striving to use large language models to produce massive amounts of misinformation and governments and private corporations trying to fight back. If we don’t start fighting now, democracy may well be overwhelmed by misinformation and consequent polarization—and perhaps quite soon. The 2024 elections could be unlike anything we have seen before.

Large group in Mexico attempted mass entry into US at El Paso, Texas border crossing, officials say

CNN

www.cnn.com › 2023 › 03 › 12 › us › el-paso-texas-migrant-surge › index.html

A large group of people in Mexico approached a US border entry point in El Paso, Texas, Sunday in an attempt at mass entry into the country, causing disruptions along the border and authorities to erect barricades, US Customs and Border Protection said.