Lunch was at a prominent conservative think tank. The people around the table were fairly well known; I'd read some of their books and articles and had even seen them interviewed on television. They listened to me talk about bad statistics, and they agreed that the problem was serious. They had only one major criticism: I'd missed the role of ideology. Bad statistics, they assured me, were almost always promoted by liberals.
Two months earlier, I'd been interviewed by a liberal radio talk-show host (they do exist!). He, too, thought it was high time to expose bad statistics—especially those so often circulated by conservatives.
When I talk to people about statistics, I find that they usually are quite willing to criticize dubious statistics—as long as the numbers come from people with whom they disagree. Political conservatives are convinced that the statistics presented by liberals are deeply flawed, just as liberals are eager to denounce conservatives' shaky figures. When conservatives (or liberals) ask me how to spot bad statistics, I suspect that they'd like me to say, "Watch out for numbers promoted by people with whom you disagree." Everyone seems to insist that the other guy's figures are lousy (but mine are, of course, just fine, or at least good enough). People like examples of an opponent's bad statistics, but they don't care to have their own numbers criticized because, they worry, people might get the wrong idea: criticizing my statistics might lead someone to question my larger argument, so let's focus on the other guy's errors and downplay mine.
Alas, I don't believe that any particular group, faction, or ideology holds a monopoly on poor statistical reasoning. In fact, in choosing examples to illustrate this book's chapters, I've tried to identify a broad range of offenders. My goal is not to convince you that those other guys can't be trusted (after all, you probably already believe that). Rather, I want you to come away from this book with a sense that all numbers—theirs and yours—need to be handled with care.
This is tricky, because we tend to assume that statistics are facts, little nuggets of truth that we uncover, much as rock collectors find stones.1 After all, we think, a statistic is a number, and numbers seem to be solid, factual proof that someone must have actually counted something. But that's the point: people count. For every number we encounter, some person had to do the counting. Instead of imagining that statistics are like rocks, we'd do better to think of them as jewels. Gemstones may be found in nature, but people have to create jewels. Jewels must be selected, cut, polished, and placed in settings to be viewed from particular angles. In much the same way, people create statistics: they choose what to count, how to go about counting, which of the resulting numbers they share with others, and which words they use to describe and interpret those figures. Numbers do not exist independent of people; understanding numbers requires knowing who counted what, why they bothered counting, and how they went about it.
All statistics are products of social activity, the process sociologists call social construction. Although this point might seem painfully obvious, it tends to be forgotten or ignored when we think about—and particularly when we teach—statistics. We usually envision statistics as a branch of mathematics, a view reinforced by high school and college statistics courses, which begin by introducing probability theory as a foundation for statistical thinking, a foundation on which is assembled a structure of increasingly sophisticated statistical measures. Students are taught the underlying logic of each measure, the formula used to compute the measure, the software commands used to extract it from the computer, and some guidelines for interpreting the numbers that result from these computations. These are complicated lessons: few students have an intuitive grasp of any but the simplest statistics, and instruction usually focuses on clarifying the computational complexities.
The result is that statistical instruction tends to downplay consideration of how real-life statistics come into being. Yet all statistics are products of people's choices and compromises, which inevitably shape, limit, and distort the outcome. Statistics instructors often dismiss this as melodramatic irrelevance. Just as the conservatives at the think tank lunch imagined that bad statistics were the work of devious liberals, statistics instructors might briefly caution that calculations or presentations of statistical results may be "biased" (that is, intentionally designed to deceive). Similarly, a surprisingly large number of book titles draw a distinction between statistics and lies: How to Lie with Statistics (also, How to Lie with Charts, How to Lie with Maps, and so on); How to Tell the Liars from the Statisticians; How Numbers Lie; even (ahem) my own Damned Lies and Statistics.2 One might conclude that statistics are pure, unless they unfortunately become contaminated by the bad motives of dishonest people.
Perhaps it is necessary to set aside the real world in an effort to teach students about advanced statistical reasoning. But dismissive warnings to watch out for bias don't go very far in preparing people to think critically about the numbers they read in newspaper stories or hear from television commentators. Statistics play important roles in real-world debates about social problems and social policies; numbers become key bits of evidence used to challenge opponents' claims and to promote one's own views. Because people do knowingly present distorted or even false figures, we cannot dismiss bias as nonexistent. But neither can we simply categorize numbers as either true figures presented by sincere, well-meaning people (who, naturally, agree with us) or false statistics knowingly promoted by devious folks (who are on the other side, of course).
Misplaced enthusiasm is probably at least as common as deliberate bias in explaining why people spread bad statistics. Numbers rarely come first. People do not begin by carefully creating some bit of statistical information and then deduce what they ought to think. Much more often, they start with their own interests or concerns, which lead them to run across, or perhaps actively uncover, relevant statistical information. When these figures support what people already believe—or hope, or fear—to be true, it is very easy for them to adopt the numbers, to overlook or minimize their limitations, to find the figures first arresting, then compelling, and finally authoritative. People soon begin sharing these now important numbers with others and become outraged if their statistics are questioned. One need not intentionally lie to others, or even to oneself. One need only let down one's critical guard when encountering a number that seems appealing, and momentum can do the rest.
The solution is to maintain critical standards when thinking about statistics. Some people are adept at this, as long as they are examining their opponents' figures. It is much more difficult to maintain a critical stance toward our own numbers. After all, our numbers support what we believe to be true. Whatever minor flaws they might have surely must be unimportant. At least, that's what we tell ourselves when we justify having a double standard for judging our own statistics and those of others.
This book promotes what we might call a single standard for statistical criticism. It argues that we must recognize that all numbers are social products and that we cannot understand a statistic unless we know something about the process by which it came into being. It further argues that all statistics are imperfect and that we need to recognize and acknowledge their flaws and limitations. All this is true regardless of whether we agree or disagree with the people presenting the numbers. We need to think critically about both the other guys' figures and our own.
I should confess that, in writing this book, I have done little original research. I have borrowed most of my examples from works by other analysts, mostly social scientists and journalists. My goal in writing about bad statistics is to show how these numbers emerge and spread. Just as I do not believe that this is the work of one political faction, I do not mean to suggest that all the blame can be laid at the door of one segment of society, such as the media. The media often circulate bad numbers, but then so do activists, corporations, officials, and even scientists—in fact, those folks usually are the sources for the statistics that appear in the media. And, we should remember, the problems with bad statistics often come to light through the critical efforts of probing journalists or scientists who think the numbers through, discover their flaws, and bring those flaws to public attention. A glance at my sources will reveal that critical thinking, just like bad statistics, can be found in many places.
The chapters in this book explore some common problems in thinking about social statistics. The chapter titles refer to different sorts of numbers—missing numbers, confusing numbers, and so on. As I use them, these terms have no formal mathematical meanings; they are simply headings for organizing the discussion. Thus, chapter 1 addresses what I call missing numbers, that is, statistics that might be relevant to debates over social issues but that somehow don't emerge during those discussions. It identifies several types of missing numbers and seeks to account for their absence. Chapter 2 considers confusing numbers, basic problems that bedevil our understanding of many simple statistics and graphs. Scary numbers—statistics about risks and other threats—are the focus of chapter 3.
The next three chapters explore the relationship between authority and statistics. Chapter 4's subject is authoritative numbers. This chapter considers what we might think of as statistics that seem good enough to be beyond dispute—products of scientific research or government data collection, for instance. It argues that even the best statistics need to be handled with care, that even data gathered by experts can be subject to misinterpretation. Chapter 5 examines what I call magical numbers—efforts to resolve issues through statistics, as though figures are a way to distill reality into pure, incontrovertible facts. Chapter 6 concentrates on contentious numbers, cases of data duels and stat wars in which opponents hurl contradictory figures at one another. Finally, chapter 7 explores the prospects for teaching statistical literacy, for improving public understanding of numbers and teaching people how to be more thoughtful and more critical consumers of statistics.
The lesson that people count—that we don't just find statistics but that we create them—offers both a warning and a promise. The warning is that we must be wary, that unless we approach statistics with a critical attitude, we run the risk of badly misunderstanding the world around us. But there is also a promise: that we need not be at the mercy of numbers, that we can learn to think critically about them, and that we can come to appreciate both their strengths and their flaws.
CBS News anchor Dan Rather began his evening newscast on March 5, 2001, by declaring: "School shootings in this country have become an epidemic." That day, a student in Santee, California, had killed two other students and wounded thirteen more, and media coverage linked this episode to a disturbing trend. Between December 1997 and May 1998, there had been three heavily publicized school shooting incidents: in West Paducah, Kentucky (three dead, five wounded); Jonesboro, Arkansas (five dead, ten wounded); and Springfield, Oregon (two dead and twenty-one wounded at the school, after the shooter had killed his parents at home). The following spring brought the rampage at Columbine High School in Littleton, Colorado, in which two students killed twelve fellow students and a teacher, before shooting themselves.1 Who could doubt Rather's claim about an epidemic?
And yet the word epidemic suggests a widespread, growing phenomenon. Were school shootings indeed on the rise? Surprisingly, a great deal of evidence indicated that they were not:
ï Since school shootings are violent crimes, we might begin by examining trends in criminality documented by the Federal Bureau of Investigation. The Uniform Crime Reports, the FBI's tally of crimes reported to the police, showed that the overall crime rate, as well as the rates for such major violent crimes as homicide, robbery, and aggravated assault, fell during the 1990s.
ï Similarly, the National Crime Victimization Survey (which asks respondents whether anyone in their household has been a crime victim) revealed that victimization rates fell during the 1990s; in particular, reports of teenagers being victimized by violent crimes at school dropped.
ï Other indicators of school violence also showed decreases. The Youth Risk Behavior Survey conducted by the U.S. Centers for Disease Control and Prevention found steadily declining percentages of high school students who reported fighting or carrying weapons on school property during the 1990s.
ï Finally, when researchers at the National School Safety Center combed media reports from the school years 1992-;1993 through 2000-;2001, they identified 321 violent deaths that had occurred at schools. Not all of these incidents involved student-on-student violence; they included, for example, 16 accidental deaths and 56 suicides, as well as incidents involving nonstudents, such as a teacher killed by her estranged husband (who then shot himself) and a nonstudent killed on a school playground during a weekend. Even if we include all 321 of these deaths, however, the average fell from 48 violent deaths per year during the school years 1992-;1993 through 1996-;1997 to 32 per year from 1997-;1998 through 2000-;2001. If we eliminate accidental deaths and suicides, the decline remains, with the average falling from 31 deaths per year in the earlier period to 24 per year in the later period (which included all of the heavily publicized incidents mentioned earlier). While violent deaths are tragedies, they are also rare. Tens of millions of children attend school; for every million students, fewer than one violent death per year occurs in school.
In other words, a great deal of statistical evidence was available to challenge claims that the country was experiencing a sudden epidemic of school shootings. The FBI's Uniform Crime Reports and the National Crime Victimization Survey in particular are standard sources for reporters who examine crime trends; the media's failure to incorporate findings from these sources in their coverage of school shootings is striking.2
Although it might seem that statistics appear in every discussion of every social issue, in some cases—such as the media's coverage of school shootings—relevant, readily available statistics are ignored. We might think of these as missing numbers. This chapter examines several reasons for missing numbers, including overwhelming examples, incalculable concepts, uncounted phenomena, forgotten figures, and legendary numbers. It asks why potentially relevant statistics don't figure in certain public debates and tries to assess the consequences of their absence.
The Power of Examples
Why are numbers missing from some debates over social problems and social policies? One answer is that a powerful example can overwhelm discussion of an issue. The 1999 shootings at Columbine High School are a case in point. The high death toll ensured that Columbine would be a major news story. Moreover, the school's location in a suburb of a major city made it easy for reporters to reach the scene. As it took some hours to evacuate the students and secure the building, the press had time to arrive and capture dramatic video footage that could be replayed to illustrate related stories in the weeks that followed. The juxtaposition of a terrible crime in a prosperous suburban community made the story especially frightening—if this school shooting could happen at Columbine, surely such crimes could happen anywhere. In addition, the Columbine tragedy occurred in the era of competing twenty-four-hour cable news channels; their decisions to run live coverage of several funeral and memorial services and to devote broadcast time to extended discussions of the event and its implications helped to keep the story alive for weeks.
For today's media, a dramatic event can become more than simply a news story in its own right; reporters have become attuned to searching for the larger significance of an event so that they can portray newsworthy incidents as instances of a widespread pattern or problem. Thus, Columbine, when coupled with the earlier, heavily publicized school shooting stories of 1997-;1998, came to exemplify the problem of school violence. And, commentators reasoned, if a larger problem existed, it must reflect underlying societal conditions; that is, school shootings needed to be understood as a trend, wave, or epidemic with identifiable causes. Journalists have been identifying such crime waves since at least the nineteenth century—and, for nearly as long, criminologists have understood that crime waves are not so much patterns in criminal behavior as they are patterns in media coverage. All of the available statistical evidence suggested that school violence had declined from the early 1990s to the late 1990s; there was no actual wave of school shootings. But the powerful images from Columbine made that evidence irrelevant. One terrible example was "proof" that school shootings were epidemic.
Compelling examples need not even be true. The stories that folklorists call contemporary legends (or the more familiar term urban legends) also shape our thinking about social problems. Contemporary legends usually spread through informal channels, which once meant word of mouth but now also includes the more modern means of faxes and e-mail messages. A legend's key quality remains unchanged, however: it must be a good story, good enough for people to remember it and want to pass it along. Legends thrive because they arouse fear, disgust, or other powerful emotions that make the tales memorable and repeatable.3 Very often, contemporary legends are topical: when child abductions are in the news, we tell stories about kidnappings in shopping malls; when gangs are receiving attention, we warn each other about lethal gang initiation rites. Such stories shape our thinking about social problems in much the same way dramatic news stories do.
The power of examples is widely recognized. A reporter preparing a story about any broad social condition—say, homelessness—is likely to begin by illustrating the problem with an example, perhaps a particular homeless person. Journalists (and their editors) prefer interesting, compelling examples that will intrigue their audience. And advocates who are trying to promote particular social policies learn to help journalists by guiding them to examples that can be used to make specific points. Thus, activists calling for increased services for the homeless might showcase a homeless family, perhaps a mother of young children whose husband has been laid off by a factory closing and who cannot find affordable housing. In contrast, politicians seeking new powers to institutionalize the homeless mentally ill might point to a deranged, violent individual who seems to endanger passersby.4 The choice of examples conveys a sense of a social problem's nature.
The problem with examples—whether they derive from dramatic events, contemporary legends, or the strategic choices of journalists or advocates—is that they probably aren't especially typical. Examples compel when they have emotional power, when they frighten or disturb us. But atypical examples usually distort our understanding of a social problem; when we concentrate on the dramatic exception, we tend to overlook the more common, more typical—but more mundane—cases. Thus, Democrats used to complain about Republican President Ronald Reagan's fondness for repeating the story of a "welfare queen" who had supposedly collected dozens of welfare checks using false identities.5 Using such colorful examples to typify welfare fraud implies that welfare recipients are undeserving or don't really need public assistance. Defenders of welfare often countered Reagan's anecdotes with statistics showing that recipients were deserving (as evidenced by the small number of able-bodied adults without dependent children who received benefits) or that criminal convictions for fraud were relatively few.6 The danger is that the powerful but atypical example—the homeless intact family, the welfare queen—will warp our vision of a social problem, thereby reducing a complicated social condition to a simple, melodramatic fable.
Statistics, then, offer a way of checking our examples. If studies of the homeless find few intact families (or individuals who pose threats of violence), or if studies of welfare recipients find that fraud involving multiple false identities is rare, then we should recognize the distorting effects of atypical examples and realize that the absence of numbers can damage our ability to grasp the actual dimensions of our problems.
Sometimes numbers are missing because phenomena are very hard to count. Consider another crime wave. During the summer of 2002, public concern turned to kidnapped children. Attention first focused on the case of an adolescent girl abducted from her bedroom one night—a classic melodramatic example of a terrible crime that seemingly could happen to anyone. As weeks passed without a sign of the girl, both the search and the accompanying news coverage continued. Reports of other cases of kidnapped or murdered children began linking these presumably unrelated crimes to the earlier kidnapping, leading the media to begin talking about an epidemic of abductions.
This issue had a history, however. Twenty years earlier, activists had aroused national concern about the problem of missing children by coupling frightening examples to large statistical estimates. One widespread claim alleged that nearly two million children went missing each year, including fifty thousand kidnapped by strangers. Later, journalists and social scientists exposed these early estimates as being unreasonably high. As a result, in 2002, some reporters questioned the claims of a new abduction epidemic; in fact, they argued, the FBI had investigated more kidnappings the previous year, which suggested that these crimes were actually becoming less common.7
Both sets of claims—that kidnappings were epidemic and that they were declining—were based on weak evidence. Missing-children statistics can never be precise because missing children are so difficult to count. We encounter problems of definition:
ï What is a child—that is, what is the upper age limit for being counted?
ï What do we mean by missing? How long must a child be missing to be counted—a few minutes, one day, seventy-two hours?
ï What sorts of absences should be counted? Wandering off and getting lost? Running away? Being taken by a relative during a family dispute? Is a child who is with a noncustodial parent at a known location considered missing?
People need to agree about what to count before they can start counting, but not everyone agrees about the answers to these questions. Obviously, the answers chosen will affect the numbers counted; using a broad definition means that more missing children will be counted.
A second set of problems concerns reporting. Parents of missing children presumably call their local law enforcement agency—usually a police or sheriff's department. But those authorities may respond in different ways. Some states require them to forward all missing-children reports to a statewide clearinghouse, which is supposed to contact all law enforcement agencies in the state in order to facilitate the search. The clearinghouses—and some departments—may notify the National Crime Information Center, a branch of the FBI that compiles missing-persons reports. Some reports also reach the National Center for Missing and Exploited Children (the federally funded group best known for circulating pictures of missing children) or FBI investigators (who claim jurisdiction over a few, but by no means most, kidnappings). Authorities in the same jurisdiction do not necessarily handle all missing-children reports the same way; the case of a six-year-old seen being dragged into a strange car is likely to be treated differently than a report of a sixteen-year-old who has run away. We can suspect that the policies of different agencies will vary significantly. The point is that the jurisdiction from which a child disappears and the particulars of the case probably affect whether a particular missing-child report finds its way into various agencies' records.
It is thus very difficult to make convincing comparisons of the numbers of missing children from either time to time or place to place. Reporters who noted that fewer child-kidnapping reports were filed with the FBI in 2002 than in 2001, and who therefore concluded that the problem was declining, mistakenly assumed that the FBI's records were more complete and authoritative than they actually were. Some things—like missing children—are very difficult to count, which should make us skeptical about the accuracy of statistics that claim to describe the situation.
Such difficulties can create special problems when people try to weigh things that are relatively easy to measure against things that are less calculable. Consider the method of cost-benefit analysis as a basis for decision-making.8 In principle, it seems straightforward: calculate the expected costs and the value of the expected benefits for different courses of action, and choose the option that promises the best outcome. One problem, however, is that some costs and benefits are easier to compute than others. A teenager trying to decide whether to go to a movie or spend an evening babysitting can probably assign reasonably accurate dollar values to these options—the cost of the movie ticket and refreshments versus the expected earnings from babysitting—but even then the decision will probably hinge on additional assumptions about happiness: would I be happier spending the evening with my friends at a movie, or would I prefer to earn money that can be spent for some greater benefit down the line?
When applied to questions of social policy, such calculations only become more complex. Should we build more highways or support mass transit? Mass transit is rarely self-supporting: if the cost per trip seems too high, riders abandon mass transit; in order to keep them riding, ticket prices usually must be kept low by subsidizing the system. Critics of mass transit sometimes argue that such subsidies are wrong, that mass transit is inefficient, expensive, and therefore not competitive. Advocates respond that this critique ignores many of the relevant costs and benefits. Whereas riders directly bear the costs of using mass transit each time they buy a ticket, the ways we pay for the costs of highway travel are less obvious (for example, through gasoline taxes). Moreover, highways carry hidden, quality of life costs, such as greater air pollution, more traffic fatalities, and cities that discourage foot traffic by devoting huge areas to roads and parking lots. But such costs are hard to calculate. Even if we can agree on the likely health costs from air pollution and traffic accidents, how can we hope to assign a dollar value to being able to comfortably walk from one destination to another? And, of course, the critics have a rebuttal: costs are also incurred in building and maintaining mass transit systems. And what about the freedom cars offer—the ability to choose your own route and schedule? Shouldn't these considerations be incorporated in any calculations?
There are basically two solutions to the problems that intangible factors pose to cost-benefit analyses, but neither solution is completely satisfactory. The first is to leave these factors out of the equation, to simply ignore what seems impossible to quantify. But should factors such as quality of life be treated as irrelevant simply because they are hard to measure? The second solution is to estimate the values of costs and benefits, to assign dollar values to them. This approach keeps these factors in view, but the process is obviously arbitrary—what dollar value should be assigned to comfort or freedom? It is easy to skew the results of any cost-benefit analysis by pegging values as either very high or very low.
Our culture has a particularly difficult time assigning values to certain types of factors. Periodically, for example, the press expresses shock that a cost-benefit analysis has assigned some specific value to individual lives.9 Such revelations produce predictably outraged challenges: how can anyone place a dollar value on a human life—aren't people's lives priceless? The answer to that question depends on when and where it is asked. Americans' notion that human life is priceless has a surprisingly short history. Only a century ago, the parents of a child killed by a streetcar could sue the streetcar company for damages equal to the child's economic value to the family (basically, the child's expected earnings until adulthood); today, of course, the parents would sue for the (vastly greater) value of their pain and suffering. Even the dollar value of a child's life varies across time and space.10
But the larger point is that trade-offs are inevitable. Building a bridge or implementing a childhood vaccination program has both risks and costs—as do the alternatives of not building the bridge or not vaccinating children. Our culture seems to have a lot of difficulty debating whether, say, vaccinations should proceed if they will cause some number of children to sicken and die. Advocates on both sides try to circumvent this debate by creating melodramatically simple alternatives: vaccine proponents can be counted on to declare that harm from vaccines is virtually nonexistent but that failure to vaccinate will have terrible, widespread consequences; whereas opponents predictably insist that vaccines harm many and that they don't do all that much good. Obviously, such debates could use some good data. But, beyond that, we need to recognize that every choice carries costs and that we can weigh and choose only among imperfect options. Even if we can agree that a vaccine will kill a small number of children but will save a great many, how are we to incorporate into our decision-making the notion that every human life is beyond price? How should we weigh the value of a few priceless lives that might be lost if vaccinations proceed against the value of many priceless lives that might be lost if vaccinations are curtailed? (Chapter 3 extends this discussion of trade-offs.)
In short, some numbers are missing from discussions of social issues because certain phenomena are hard to quantify, and any effort to assign numeric values to them is subject to debate. But refusing to somehow incorporate these factors into our calculations creates its own hazards. The best solution is to acknowledge the difficulties we encounter in measuring these phenomena, debate openly, and weigh the options as best we can.
A third category of missing numbers involves what is deliberately uncounted, records that go unkept. Consider the U.S. Bureau of the Census's tabulations of religious affiliation: there are none. In fact, the census asks no questions about religion. Arguments about the constitutionally mandated separation of church and state, as well as a general sense that religion is a touchy subject, have led the Census Bureau to omit any questions about religion when it surveys the citizenry (in contrast to most European countries, where such questions are asked).11
Thus, anyone trying to estimate the level of religious activity in the United States must rely on less accurate numbers, such as church membership rolls or individuals' reports of their attendance at worship services. The membership rolls of different denominations vary in what they count: Are infants counted once baptized, or does one become an enrolled member only in childhood or even adulthood? Are individuals culled from the rolls if they stop attending or actively participating in religious activities? Such variation makes it difficult to compare the sizes of different faiths (as discussed further in chapter 6). Surveys other than the census sometimes ask people how often they attend religious services, but we have good reason to suspect that respondents overreport attendance (possibly to make a good impression on the interviewers).12 The result is that, for the United States, at least, it is difficult to accurately measure the population's religious preferences or level of involvement. The policy of not asking questions about religion through the census means that such information simply does not exist.
The way choices are phrased also creates uncounted categories. Since 1790, each census has asked about race or ethnicity, but the wording of the questions—and the array of possible answers—has changed. The 2000 census, for example, was the first to offer respondents the chance to identify themselves as multiracial. Proponents of this change had argued that many Americans have family trees that include ancestors of different races and that it was unreasonable to force people to place themselves within a single racial category.
But some advocates had another reason for promoting this change. When forced to choose only one category, people who knew that their family backgrounds included people of different ethnicities had to oversimplify; most probably picked the option that fit the largest share of their ancestors. For example, an individual whose grandparents included three whites and one Native American was likely to choose "white." In a society in which a group's political influence depends partly on its size, such choices could depress the numbers of people of American Indian ancestry (or any other relatively small, heavily intermarried group) identified by the census. Native American activists favored letting people list themselves as being of more than one race because they believed that this would help identify a larger Native American population and presumably increase that group's political clout. In contrast, African American activists tended to be less enthusiastic about allowing people to identify themselves as multiracial. Based in part on the legacy of segregation, which sometimes held that having a single black ancestor was sufficient to warrant being considered nonwhite, people with mixed black and white ancestry (who account for a majority of those usually classified as African Americans) had tended to list themselves as "black." If large numbers of these individuals began listing more than one racial group, black people might risk losing political influence.
As is so often the case, attitudes toward altering the census categories depended on whether one expected to win or lose by the change. The reclassification had the expected effect, even though only 2.4 percent of respondents to the 2000 census opted to describe themselves as multiracial. The new classification boosted the numbers of people classified as Native Americans: although only 2.5 million respondents listed themselves under the traditional one-ethnicity category, adding those who identified themselves as part-Indian raised the total to 4.1 million—a 110 percent increase since 1990. However, relatively small numbers of people (fewer than eight hundred thousand) listed their race as both white and black, compared to almost thirty-four million identified as black.13
Sometimes only certain cases go uncounted. Critics argue that the official unemployment rate, which counts only those without full-time work who have actively looked for a job during the previous four weeks, is too low. They insist that a more accurate count would include those who want to work but have given up looking as well as those who want full-time work but have had to settle for part-time jobs—two groups that, taken together, actually outnumber the officially unemployed.14 Of course, every definition draws such distinctions between what does—and doesn't—count.
The lesson is simple. Statistics depend on collecting information. If questions go unasked, or if they are asked in ways that limit responses, or if measures count some cases but exclude others, information goes ungathered, and missing numbers result. Nevertheless, choices regarding which data to collect and how to go about collecting the information are inevitable. If we want to describe America's racial composition in a way that can be understood, we need to distill incredible diversity into a few categories. The cost of classifying anything into a particular set of categories is that some information is inevitably lost: distinctions seem sharper; what may have been arbitrary cut-offs are treated as meaningful; and, in particular, we tend to lose sight of the choices and uncertainties that went into creating our categories.
In some cases, critics argue that a failure to gather information is intentional, a method of avoiding the release of damaging information. For example, it has proven very difficult to collect information about the circumstances under which police shoot civilians. We might imagine that police shootings can be divided into two categories: those that are justified by the circumstances, and those that are not. In fact, many police departments conduct reviews of shootings to designate them as justifiable or not. Yet efforts to collect national data on these findings have foundered. Not all departments share their records (which, critics say, implies that they have something to hide); and the proportion of shootings labeled "justified" varies wildly from department to department (suggesting either that police behave very differently in different departments or that the process of reviewing shootings varies a great deal).15
There are a variety of ways to ensure that things remain uncounted. The simplest is to not collect the information (for instance, don't ask census respondents any questions about religion). But, even when the data exist, it is possible to avoid compiling information (by simply not doing the calculations necessary to produce certain statistics), to refuse to publish the information, or even to block access to it.16 More subtly, both data collection and analysis can be time-consuming and expensive; in a society where researchers depend on others for funding, decisions not to fund certain research can have the effect of relegating those topics to the ranks of the uncounted.
This works both ways. Inevitably, we also hear arguments that people should stop gathering some sorts of numbers. For example, a popular guide to colleges for prospective students offers a ranking of "party schools." A Matter of Degree—a program sponsored by the American Medical Association to fight alcohol abuse on college campuses—claims that this ranking makes light of and perhaps contributes to campus drinking problems and has called for the guidebook to stop publishing the list.17 While it is probably uncommon for critics to worry that statistics might be a harmful moral influence, all sorts of data, some will contend, might be better left uncollected—and therefore missing.
Another form of missing numbers is easy to overlook—these are figures, once public and even familiar, that we no longer remember or don't bother to consider. Consider the number of deaths from measles. In 1900, the death rate from measles was 13.3 per 100,000 in the population; measles ranked among the top ten diseases causing death in the United States. Over the course of a century, however, measles lost its power to kill; first more effective treatments and then vaccination eliminated measles as a major medical threat. Nor was this an exceptional case. At the beginning of the twentieth century, many of the leading causes of death were infectious diseases; influenza/pneumonia, tuberculosis, diphtheria, and typhoid/typhoid fever also ranked in the top ten.18 Most of those formerly devastating diseases have been brought under something approaching complete control in the United States through the advent of vaccinations and antibiotics. The array of medical threats has changed.
Forgotten numbers have the potential to help us put things in perspective, if only we can bring ourselves to remember them. When we lose sight of the past, we have more trouble assessing our current situation. However, people who are trying to draw attention to social problems are often reluctant to make comparisons with the past. After all, such comparisons may reveal considerable progress. During the twentieth century, for example, Americans' life expectancies increased dramatically. In 1900, a newborn male could expect to live forty-six years; a century later, male life expectancy had risen to seventy-three. The increase for females was even greater—from age forty-eight to eighty. During the same period, the proportion of Americans completing high school rose from about 6 percent to about 85 percent. Many advocates seem to fear that talking about long-term progress invites complacency about contemporary society, and they prefer to focus on short-run trends—especially if the numbers seem more compelling because they show things getting worse.19
Similarly, comparing our society to others can help us get a better sense of the size and shape of our problems. Again, in discussions of social issues, such comparisons tend to be made selectively, in ways that emphasize the magnitude of our contemporary problems. Where data suggest that the United States lags behind other nations, comparative statistics are commonplace, but we might suspect that those trying to promote social action will be less likely to present evidence showing America to advantage. (Of course, those resisting change may favor just such numbers.) Comparisons across time and space are recalled when they help advocates make their points, but otherwise they tend to be ignored, if not forgotten.
One final category deserves mention. It does not involve potentially relevant numbers that are missing, but rather includes irrelevant or erroneous figures that somehow find their way into discussions of social issues. Recently, for example, it became fairly common for journalists to compare various risks against a peculiar standard: the number of people killed worldwide each year by falling coconuts (the annual coconut-death figure usually cited was 150). Do 150 people actually die in this way? It might seem possible—coconuts are hard and heavy, and they fall a great distance, so being bonked on the head presumably might be fatal. But who keeps track of coconut fatalities? The answer: no one. Although it turns out that the medical literature includes a few reports of injuries—not deaths—inflicted by falling coconuts, the figure of 150 deaths is the journalistic equivalent of a contemporary legend.20 It gets passed along as a "true fact," repeated as something that "everybody knows."
Other legendary statistics are attributed to presumably authoritative sources. A claim that a World Health Organization (WHO) study had determined that blondness was caused by a recessive gene and that blonds would be extinct within two hundred years was carried by a number of prominent news outlets, which presumably ran the story on the basis of one another's coverage, without bothering to check with the WHO (which denied the story).21
Legendary numbers can become surprisingly well established. Take the claim that fifty-six is the average age at which a woman becomes widowed. In spite of its obvious improbability (after all, the average male lives into his seventies, married men live longer than those who are unmarried, and husbands are only a few years older on average than their wives), this statistic has circulated for more than twenty years. It appeared in a television commercial for financial services, in materials distributed to women's studies students, and in countless newspaper and magazine articles; its origins are long lost. Perhaps it has endured because no official agency collects data on age at widowhood, making it difficult to challenge such a frequently repeated figure. Nevertheless, demographers—using complicated equations that incorporate age-specific death rates, the percentage of married people in various age cohorts, and age differences between husbands and wives—have concluded that the average age at which women become widows has, to no one's surprise, been rising steadily, from sixty-five in 1970 to about sixty-nine in 1988.22
Even figures that actually originate in scientists' statements can take on legendary qualities. In part, this reflects the difficulties of translating complex scientific ideas into what are intended to be easy-to-understand statements. For example, the widely repeated claim that individuals need to drink eight glasses of water each day had its origin in an analysis that did in fact recommend that level of water intake. But the analysis also noted that most of this water would ordinarily come from food (bread, for example, is 35 percent water, and meats and vegetables contain even higher proportions of water). However, the notion that food contained most of the water needed for good health was soon forgotten, in favor of urging people to consume the entire amount through drinking.23 Similarly, the oft-repeated statements that humans and chimpanzees have DNA that is 98 percent similar—or, variously, 98.4, 99, or 99.44 percent similar—may seem precise, but they ignore the complex assumptions involved in making such calculations and imply that this measure is more meaningful than it actually is.24
Widely circulated numbers are not necessarily valid or even meaningful. In the modern world, with ready access to the Internet and all manner of electronic databases, even figures that have been thoroughly debunked can remain in circulation; they are easy to retrieve and disseminate but almost impossible to eradicate. The problem is not one of missing numbers—in such cases, the numbers are all too present. What is absent is the sort of evidence needed to give the statistics any credibility.
The attraction of legendary numbers is that they seem to give weight or authority to a claim. It is far less convincing to argue, "That's not such an important cause of death! Why, I'll bet more people are killed each year by falling coconuts!" than to flatly compare 150 coconut deaths to whatever is at issue. Numbers are presumed to be factual; numbers imply that someone has actually counted something. Of course, if that is true, it should be possible to document the claim—which cannot be done for legendary numbers.
A related phenomenon is that some numbers, if not themselves fanciful, come to be considered more meaningful than they are. (Chapter 5 also addresses this theme.) We see this particularly in the efforts of bureaucrats to measure the unmeasurable. A school district, for example, might want to reward good teaching. But what makes a good teacher? Most of us can look back on our teachers and identify some as better than others. But what made them better? Maybe they helped us when we were having trouble, encouraged us, or set high standards. My reasons for singling out some of my teachers as especially good might be very different from the reasons you would cite. Teachers can be excellent in many ways, and there's probably no reliable method of translating degree of excellence into a number. How can we measure good teaching or artistic genius? Even baseball fans—those compulsive recordkeepers and lovers of statistics—can argue about the relative merits of different athletes, and baseball has remarkably complete records of players' performances.
But that sort of soft appeal to the immeasurability of performance is unlikely to appease politicians or an angry public demanding better schools. So educational bureaucrats—school districts and state education departments—insist on measuring "performance." In recent years, the favored measure has been students' scores on standardized tests. This is not completely unreasonable—one could argue that, overall, better teaching should lead to students learning more and, in turn, to higher test scores. But test scores are affected by many things besides teachers' performance, including students' home lives. And our own memories of our "best teachers" probably don't depend on how they shaped our performances on standardized tests.
However imperfect test scores might be as an indicator of the quality of teaching, they do offer a nice quantitative measure—this student got so many right, the students in this class scored this well, and so on. No wonder bureaucrats gravitate toward such measures—they are precise (and it is relatively inexpensive to get the information), even if it isn't clear just what they mean. The same thing happens in many settings. Universities want their professors to do high-quality research and be good teachers, but everyone recognizes that these qualities are hard to measure. Thus, there is a tremendous temptation to focus on things that are easy to count: How many books or articles has a faculty member published? (Some departments even selectively weigh articles in different journals, depending on some measure of each journal's influence.) Are a professor's teaching evaluation scores better than average?
The problem with such bureaucratic measures is that we lose sight of their limitations. We begin by telling ourselves that we need some way of measuring teaching quality and that this method—whatever its flaws—is better than nothing. Even if some resist adopting the measure at first, over time inertia sets in, and people come to accept its use. Before long, the measure is taken for granted, and its flaws tend to be forgotten. The criticism of being an imperfect measure can be leveled at many of the numbers discussed in the chapters that follow. If pressed, a statistic's defenders will often acknowledge that the criticism is valid, that the measure is flawed. But, they ask, what choice do we have? How else can we measure—quickly, cheaply, and more or less objectively—good teaching (or whatever else concerns us)? Isn't an imperfect statistic better than none at all? They have a point. But we should never blind ourselves to a statistic's shortcomings; once we forget a number's limitations, we give it far more power and influence than it deserves. We need to remember that a clear and direct measure would be preferable and that our imperfect measure is—once again—a type of missing number.
When people use statistics, they assume—or, at least, they want their listeners to assume—that the numbers are meaningful. This means, at a minimum, that someone has actually counted something and that they have done the counting in a way that makes sense. Statistical information is one of the best ways we have of making sense of the world's complexities, of identifying patterns amid the confusion. But bad statistics give us bad information.
This chapter argues that some statistics are bad not so much because the information they contain is bad but because of what is missing—what has not been counted. Numbers can be missing in several senses: a powerful example can make us forget to look for statistics; things can go uncounted because they are considered difficult or impossible to count or because we decide not to count them. In other cases, we count, but something gets lost in the process: things once counted are forgotten, or we brandish numbers that lack substance.
In all of these cases, something is missing. Understanding that helps us recognize what counts as a good statistic. Good statistics are not only products of people counting; the quality of statistics also depends on people's willingness and ability to count thoughtfully and on their decisions about what, exactly, ought to be counted so that the resulting numbers will be both accurate and meaningful.
This process is never perfect. Every number has its limitations; every number is a product of choices that inevitably involve compromise. Statistics are intended to help us summarize, to get an overview of part of the world's complexity. But some information is always sacrificed in the process of choosing what will be counted and how. Something is, in short, always missing. In evaluating statistics, we should not forget what has been lost, if only because this helps us understand what we still have.