Itemoids

Austin

ChatGPT Resembles a Slice of the Human Brain

The Atlantic

www.theatlantic.com › technology › archive › 2023 › 01 › chatgpt-ai-language-human-computer-grammar-logic › 672902

Language is commonly understood to be the “stuff” of thought. People “talk it out” and “speak their mind,” follow “trains of thought” or “streams of consciousness.” Some of the pinnacles of human creation—music, geometry, computer programming—are framed as metaphorical languages. The underlying assumption is that the brain processes the world and our experience of it through a progression of words. And this supposed link between language and thinking is a large part of what makes ChatGPT and similar programs so uncanny: The ability of AI to answer any prompt with human-sounding language can suggest that the machine has some sort of intent, even sentience.

But then the program says something completely absurd—that there are 12 letters in nineteen or that sailfish are mammals—and the veil drops. Although ChatGPT can generate fluent and sometimes elegant prose, easily passing the Turing-test benchmark that has haunted the field of AI for more than 70 years, it can also seem incredibly dumb, even dangerous. It gets math wrong, fails to give the most basic cooking instructions, and displays shocking biases. In a new paper, cognitive scientists and linguists address this dissonance by separating communication via language from the act of thinking: Capacity for one does not imply the other. At a moment when pundits are fixated on the potential for generative AI to disrupt every aspect of how we live and work, their argument should force a reevaluation of the limits and complexities of artificial and human intelligence alike.

The researchers explain that words may not work very well as a synecdoche for thought. People, after all, identify themselves on a continuum of visual to verbal thinking; the experience of not being able to put an idea into words is perhaps as human as language itself. Contemporary research on the human brain, too, suggests that “there is a separation between language and thought,” says Anna Ivanova, a cognitive neuroscientist at MIT and one of the study’s two lead authors. Brain scans of people using dozens of languages have revealed a particular network of neurons that fires independent of the language being used (including invented tongues such as Na’vi and Dothraki).

That network of neurons is not generally involved in thinking activities including math, music, and coding. In addition, many patients with aphasia—a loss of the ability to comprehend or produce language, as a result of brain damage—remain skilled at arithmetic and other nonlinguistic mental tasks. Combined, these two bodies of evidence suggest that language alone is not the medium of thought; it is more like a messenger. The use of grammar and a lexicon to communicate functions that involve other parts of the brain, such as socializing and logic, is what makes human language special.

[Read: Hollywood’s love affair with fictional languages]

ChatGPT and software like it demonstrate an incredible ability to string words together, but they struggle with other tasks. Ask for a letter explaining to a child that Santa Claus is fake, and it produces a moving message signed by Saint Nick himself. These large language models, also called LLMs, work by predicting the next word in a sentence based on everything before it (popular belief follows contrary to, for example). But ask ChatGPT to do basic arithmetic and spelling or give advice for frying an egg, and you may receive grammatically superb nonsense: “If you use too much force when flipping the egg, the eggshell can crack and break.”

These shortcomings point to a distinction, not dissimilar to one that exists in the human brain, between piecing together words and piecing together ideas—what the authors term formal and functional linguistic competence, respectively. “Language models are really good at producing fluent, grammatical language,” says the University of Texas at Austin linguist Kyle Mahowald, the paper’s other lead author. “But that doesn’t necessarily mean something which can produce grammatical language is able to do math or logical reasoning, or think, or navigate social contexts.”

If the human brain’s language network is not responsible for math, music, or programming—that is, for thinking—then there’s no reason an artificial “neural network” trained on terabytes of text would be good at those things either. “In line with evidence from cognitive neuroscience,” the authors write, “LLMs’ behavior highlights the difference between being good at language and being good at thought.” ChatGPT’s ability to get mediocre scores on some business- and law-school exams, then, is more a mirage than a sign of understanding.

Still, hype swirls around the next iteration of language models, which will train on far more words and with far more computing power. OpenAI, the creator of ChatGPT, claims that its programs are approaching a so-called general intelligence that would put the machines on par with humankind. But if the comparison to the human brain holds, then simply making models better at word prediction won’t bring them much closer to this goal. In other words, you can dismiss the notion that AI programs such as ChatGPT have a soul or resemble an alien invasion.

Ivanova and Mahowald believe that different training methods are required to spur further advances in AI—for instance, approaches specific to logical or social reasoning rather than word prediction. ChatGPT may have already taken a step in that direction, not just reading massive amounts of text but also incorporating human feedback: Supervisors were able to comment on what constituted good or bad responses. But with few details about ChatGPT’s training available, it’s unclear just what that human input targeted; the program apparently thinks 1,000 is both greater and less than 1,062. (OpenAI released an update to ChatGPT yesterday that supposedly improves its “mathematical capabilities,” but it’s still reportedly struggling with basic word problems.)

[Read: What happens when AI has read everything?]

There are, it should be noted, people who believe that large language models are not as good at language as Ivanova and Mahowald write—that they are basically glorified auto-completes whose flaws scale with their power. “Language is more than just syntax,” says Gary Marcus, a cognitive scientist and prominent AI researcher. “In particular, it’s also about semantics.” It’s not just that AI chatbots don’t understand math or how to fry eggs—they also, he says, struggle to comprehend how a sentence derives meaning from the structure of its parts.

For instance, imagine three plastic balls in a row: green, blue, blue. Someone asks you to grab “the second blue ball”: You understand that they’re referring to the last ball in the sequence, but a chatbot might understand the instruction as referring to the second ball, which also happens to be blue. “That a large language model is good at language is overstated,” Marcus says. But to Ivanova, something like the blue-ball example requires not just compiling words but also conjuring a scene, and as such “is not really about language proper; it’s about language use.”

And no matter how compelling their language use is, there’s still a healthy debate over just how much programs such as ChatGPT actually “understand” about the world by simply being fed data from books and Wikipedia entries. “Meaning is not given,” says Roxana Girju, a computational linguist at the University of Illinois at Urbana-Champaign. “Meaning is negotiated in our interactions, discussions, not only with other people but also with the world. It’s something that we reach at in the process of engaging through language.” If that’s right, building a truly intelligent machine would require a different way of combining language and thought—not just layering different algorithms but designing a program that might, for instance, learn language and how to navigate social relationships at the same time.

Ivanova and Mahowald are not outright rejecting the view that language epitomizes human intelligence; they’re complicating it. Humans are “good” at language precisely because we combine thought with its expression. A computer that both masters the rules of language and can put them to use will necessarily be intelligent—the flip side being that narrowly mimicking human utterances is precisely what is holding machines back. But before we can use our organic brains to better understand silicon ones, we will need both new ideas and new words to understand the significance of language itself.

Are Standardized Tests Racist, or Are They Anti-racist?

The Atlantic

www.theatlantic.com › science › archive › 2023 › 01 › should-college-admissions-use-standardized-test-scores › 672816

They’re making their lists, checking them twice, trying to decide who’s in and who’s not. Once again, it’s admissions season, and tensions are running high as university leaders wrestle with challenging decisions that will affect the future of their schools. Chief among those tensions, in the past few years, has been the question of whether standardized tests should be central to the process.

In 2021, the University of California system ditched the use of all standardized testing for undergraduate admissions. California State University followed suit last spring, and in November, the American Bar Association voted to abandon the LSAT requirement for admission to any of the nation’s law schools beginning in 2025. Many other schools have lately reached the same conclusion. Science magazine reports that among a sample of 50 U.S. universities, only 3 percent of Ph.D. science programs currently require applicants to submit GRE scores, compared with 84 percent four years ago. And colleges that dropped their testing requirements or made them optional in response to the pandemic are now feeling torn about whether to bring that testing back.

Proponents of these changes have long argued that standardized tests are biased against low-income students and students of color, and should not be used. The system serves to perpetuate a status quo, they say, where children whose parents are in the top 1 percent of income distribution are 77 times more likely to attend an Ivy League university than children whose parents are in the bottom quintile. But those who still endorse the tests make the mirror-image claim: Schools have been able to identify talented low-income students and students of color and give them transformative educational experiences, they argue, precisely because those students are tested.

These two perspectives—that standardized tests are a driver of inequality, and that they are a great tool to ameliorate it—are often pitted against each other in contemporary discourse. But in my view, they are not oppositional positions. Both of these things can be true at the same time: Tests can be biased against marginalized students and they can be used to help those students succeed. We often forget an important lesson about standardized tests: They, or at least their outputs, take the form of data; and data can be interpreted—and acted upon—in multiple ways. That might sound like an obvious statement, but it’s crucial to resolving this debate.

I teach a Ph.D. seminar on quantitative research methods that dives into the intricacies of data generation, interpretation, and application. One of the readings I assign —Andrea Jones-Rooy’s article “I’m a Data Scientist Who Is Skeptical About Data”—contains a passage that is relevant to our thinking about standardized tests and their use in admissions:

Data can’t say anything about an issue any more than a hammer can build a house or almond meal can make a macaron. Data is a necessary ingredient in discovery, but you need a human to select it, shape it, and then turn it into an insight.

When reviewing applications, admissions officials have to turn test scores into insights about each applicant’s potential for success at the university. But their ability to generate those insights depends on what they know about the broader data-generating process that led students to get those scores, and how the officials interpret what they know about that process. In other words, what they do with test scores—and whether they end up perpetuating or reducing inequality—depends on how they think about bias in a larger system.

First, who takes these tests is not random. Obtaining a score can be so costly—in terms of both time and money—that it’s out of reach for many students. This source of bias can be addressed, at least in part, by public policy. For example, research has found that when states implement universal testing policies in high schools, and make testing part of the regular curriculum rather than an add-on that students and parents must provide for themselves, more disadvantaged students enter college and the income gap narrows. Even if we solve that problem, though, another—admittedly harder—issue would still need to be addressed.

The second issue relates to what the tests are actually measuring. Researchers have argued about this question for decades, and continue to debate it in academic journals. To understand the tension, recall what I said earlier: Universities are trying to figure out applicants’ potential for success. Students’ ability to realize their potential depends both on what they know before they arrive on campus and on being in a supportive academic environment. The tests are supposed to measure prior knowledge, but the nature of how learning works in American society means they end up measuring some other things, too.

In the United States, we have a primary and secondary education system that is unequal because of historic and contemporary laws and policies. American schools continue to be highly segregated by race, ethnicity, and social class, and that segregation affects what students have the opportunity to learn. Well-resourced schools can afford to provide more enriching educational experiences to their students than underfunded schools can. When students take standardized tests, they answer questions based on what they’ve learned, but what they’ve learned depends on the kind of schools they were lucky (or unlucky) enough to attend.

This creates a challenge for test-makers and the universities that rely on their data. They are attempting to assess student aptitude, but the unequal nature of the learning environments in which students have been raised means that tests are also capturing the underlying disparities; that is one of the reasons test scores tend to reflect larger patterns of inequality. When admissions officers see a student with low scores, they don’t know whether that person lacked potential or has instead been deprived of educational opportunity.

So how should colleges and universities use these data, given what they know about the factors that feed into it? The answer depends on how colleges and universities view their mission and broader purpose in society.

From the start, standardized tests were meant to filter students out. A congressional report on the history of testing in American schools describes how, in the late 1800s, elite colleges and universities had become disgruntled with the quality of high-school graduates, and sought a better means of screening them. Harvard’s president first proposed a system of common entrance exams in 1890; the College Entrance Examination Board was formed 10 years later. That orientation—toward exclusion—led schools down the path of using tests to find and admit only those students who seemed likely to embody and preserve an institution’s prestigious legacy. This brought them to some pretty unsavory policies. For example, a few years ago, a spokesperson for the University of Texas at Austin admitted that the school’s adoption of standardized testing in the 1950s had come out of its concerns over the effects of Brown v. Board of Education. UT looked at the distribution of test scores, found cutoff points that would eliminate the majority of Black applicants, and then used those cutoffs to guide admissions.

[Read: The college-admissions process is completely broken]

These days universities often claim to have goals of inclusion. They talk about the value of educating not just children of the elite, but a diverse cross-section of the population. Instead of searching for and admitting students who have already had tremendous advantages and specifically excluding nearly everyone else, these schools could try to recruit and educate the kinds of students who have not had remarkable educational opportunities in the past.

A careful use of testing data could support this goal. If students’ scores indicate a need for more support in particular areas, universities might invest more educational resources into those areas. They could hire more instructors or support staff to work with low-scoring students. And if schools notice alarming patterns in the data—consistent areas where students have been insufficiently prepared—they could respond not with disgruntlement, but with leadership. They could advocate for the state to provide K–12 schools with better resources.

Such investments would be in the nation’s interest, considering that one of the functions of our education system is to prepare young people for current and future challenges. These include improving equity and innovation in science and engineering, addressing climate change and climate justice, and creating technological systems that benefit a diverse public. All of these areas benefit from diverse groups of people working together—but diverse groups cannot come together if some members never learn the skills necessary for participation.

[Read: The SAT isn’t what’s unfair]

But universities—at least the elite ones—have not traditionally pursued inclusion, through the use of standardized testing or otherwise. At the moment, research on university behavior suggests that they operate as if they were largely competing for prestige. If that’s their mission—as opposed to advancing inclusive education—then it makes sense to use test scores for exclusion. Enrolling students who score the highest helps schools optimize their marketplace metrics—that is, their ranking.

Which is to say, the tests themselves are not the problem. Most components of admissions portfolios suffer from the same biases. In terms of favoring the rich, admissions essays are even worse than standardized tests; the same goes for participation in extracurricular activities and legacy admissions. Yet all of these provide universities with usable information about the kinds of students who may arrive on campus.

None of those data speak for themselves. Historically, the people who interpret and act upon this information have conferred advantages to wealthy students. But they can make different decisions today. Whether universities continue on their exclusive trajectories or become more inclusive institutions does not depend on how their students fill in bubble sheets. Instead, schools must find the answers for themselves: What kind of business are they in, and whom do they exist to serve?

Austin Butler wins a Golden Globe, Elvis voice intact

CNN

www.cnn.com › 2023 › 01 › 10 › entertainment › austin-butler-golden-globes-elvis-cec › index.html

Even while accepting a Golden Globe for his dynamic performance as Elvis Presley in the eponymous film, Austin Butler just couldn't drop the King's Mississippi drawl.