The Uncertainty Specialist

Andrea Jones-Rooy 鈥�04, a professor of data science at NYU who specializes in complexity, is skeptical of data.

fter millennia of relying on anecdotes, instincts and old wives鈥� tales as evidence of our opinions, today most of us demand that people use data to support their arguments and ideas. Whether it鈥檚 curing cancer, solving workplace inequality or winning elections, data is now perceived as being the Rosetta stone for cracking the code of pretty much all of human existence.

But in the frenzy, we鈥檝e conflated data with truth. And this has dangerous implications for our ability to understand, explain and improve the things we care about.

I have skin in this game. I am a professor of data science at New York University and a social science consultant for companies, for which I conduct quantitative research to help them understand and improve diversity. I make my living from data, yet I consistently find that whether I鈥檓 talking to students or clients, I have to remind them that data is not a perfect representation of reality: It鈥檚 a fundamentally human construct and therefore subject to biases, limitations and other meaningful and consequential imperfections.

The clearest expression of this misunderstanding is the question heard from boardrooms to classrooms when well-meaning people try to get to the bottom of tricky issues:

鈥淲hat does the data say?鈥�

Data doesn鈥檛 say anything. Humans say things. They say what they notice or look for in data鈥攄ata that only exists in the first place because humans chose to collect it, and they collected it using human-made tools.

Data can鈥檛 say anything about an issue any more than a hammer can build a house or almond meal can make a macaron. Data is a necessary ingredient in discovery, but you need a human to select it, shape it and then turn it into an insight.

Data is therefore only as useful as its quality and the skills of the person wielding it. (You know this if you鈥檝e ever tried to make a macaron. Which I have. And let鈥檚 just say that data would certainly not be up to a French patisserie鈥檚 standard.)

So if data on its own can鈥檛 do or say anything, then what is it?

What is data?

Data is an imperfect approximation of some aspect of the world at a certain time and place. (I know, that definition is a lot less sexy than we were all hoping for.) It鈥檚 what results when humans want to know something about something, try to measure it, and then combine those measurements in particular ways.

Here are four big ways that we can introduce imperfections into data: random errors, systematic errors, errors of choosing what to measure, and errors of exclusion.

These errors don鈥檛 mean that we should throw out all data ever and that nothing is knowable, however. They mean that we should approach data collection with thoughtfulness, asking ourselves what we might be missing and welcoming the collection of further data.

This view is not anti-science or anti-data. To the contrary, the strength of both comes from being transparent about the limitations of our work. Being aware of possible errors can make our inferences stronger.

Random errors. These happen when humans decide to measure something, and then either due to broken equipment or their own mistakes, the data recorded is wrong. This could take the form of hanging a thermometer on a wall to measure the temperature or using a stethoscope to count heartbeats. If the thermometer is broken, it might not tell you the right number of degrees. The stethoscope might not be broken, but the human doing the counting might space out and miss a beat.

A big way this plays out in the rest of our lives (when we鈥檙e not assiduously logging temperatures and heartbeats) is in the form of false positives in medical screenings. A false positive for, say, breast cancer, means the results suggest we have cancer but we don鈥檛. There are lots of reasons this might happen, most of which boil down to a misstep in the process of turning a fact about the world (whether or not we have cancer) into data (through mammograms and humans).

The consequences of this error are very real, too. Studies show that a false positive can lead to years of negative mental-health consequences, even though the patient turned out to be physically well. On the bright side, the fear of false positives can also lead to more vigilant screening (which increases the chances of further false positives, but I digress).

Generally speaking, as long as our equipment isn鈥檛 broken and we鈥檙e doing our best, we hope these errors are statistically random and thus cancel out over time鈥攖hough that鈥檚 not a great consolation if your medical screening is one of the errors.

Systematic errors. This refers to the possibility that some data is consistently making its way into your data set at the expense of other data, thus potentially leading you to make faulty conclusions about the world. This might happen for lots of different reasons: who you sample, when you sample them, or who joins your study or fills out your survey.

A common kind of systematic error is selection bias. For example, using data from Twitter posts to understand public sentiment about a particular issue is flawed because most of us don鈥檛 tweet鈥攁nd those who do don鈥檛 always post their true feelings. Instead, a collection of data from Twitter is just that: a way of understanding what some people who have selected to participate in this particular platform have selected to share with the world, and no more.

The 2016 U.S. presidential election is an example of when a series of systematic biases may have led the polls to wrongly favor Hillary Clinton. It can be tempting to conclude that all polling is wrong鈥攁nd it is, but not in the general way we might think.

One possibility is that voters were less likely to report that they were going to vote for Trump due to perceptions that this was the unpopular choice. We call this 鈥渟ocial desirability bias.鈥� It鈥檚 useful to stop to think about this, because if we鈥檇 been more conscious of this bias ahead of time, we might have been able to build it into our models and better predict the election results.

Sadly, medical studies are riddled with systematic biases, too: They are often based on people who are already sick and who have the means to get to a doctor or enroll in a clinical trial. There鈥檚 some excitement about wearable technology as a way of overcoming this. If everyone who has an Apple Watch, for example, could just send their heart rates and steps per day to the cloud, then we would have tons more data with less bias. But this may introduce a whole new bias: The data will likely now be skewed to wealthy members of the Western world.

Errors of choosing what to measure. This is when we think we鈥檙e measuring one thing, but in fact we鈥檙e measuring something else.

I work with many companies that are interested鈥攍audably鈥攊n finding ways to make more-objective hiring and promotion decisions. Often, the temptation is to turn to technology: How can we get more data in front of our managers so that they make better decisions, and how can we apply the right filters to make sure we are getting the best talent in front of our recruiters?

But very few pause to ask if their data is measuring what they think it鈥檚 measuring. For example, if we are looking for top job candidates, we might prefer those who went to top universities. But rather than that being a measure of talent, it might just be a measure of membership in a social network that gave someone the 鈥渞ight鈥� sequence of opportunities to get them into a good college in the first place. A person鈥檚 GPA is perhaps a great measure of someone鈥檚 ability to select classes they鈥檙e guaranteed to ace, and their SAT scores might be a lovely expression of the ability of their parents to pay for a private tutor.

Companies鈥攁nd my students鈥攁re so obsessed with being on the cutting edge of methodologies that they鈥檙e skipping the deeper question: Why are we measuring this in this way in the first place? Is there another way we could more thoroughly understand people? And, given the data we have, how can we adjust our filters to reduce some of this bias?

Errors of exclusion. These happen when populations are systematically ignored in data sets, which can set a precedent for further exclusion.

For example, women are now more likely to die from heart attacks than men, which is thought to be largely due to the fact that most cardiovascular data is based on men, who experience different symptoms from women, thus leading to incorrect diagnoses.

We also currently have a lot of data on how white women fare when they run for political office in the U.S. but not a lot on the experiences of people of color (of any gender), who face different biases on the campaign trail than do white women. (And that鈥檚 not even mentioning the data on the different experiences of, say, black candidates compared with Latinx candidates, and so on). Until we do these studies, we鈥檒l be trying to make inferences about apples from data about oranges鈥攂ut with worse consequences than an unbalanced fruit salad.

Choosing to study something can also incentivize further research on that topic, which is a bias in and of itself. As it鈥檚 easier to build from existing data sets than to create your own, researchers often gather around certain topics鈥攍ike white women running for office or male cardiovascular health鈥攁t the expense of others. If you repeat this enough times, all of a sudden men are the default in heart disease studies and white women are the default in political participation studies.

Examples abound. Measuring 鈥渓eadership鈥� might incentivize people to be more aggressive in meetings, thus breaking down communication in the long run. Adding an 鈥渁dversity鈥� score to the SATs might incentivize parents to move to different zip codes so that their scores are worth more.

I also see this play out in the diversity space: DiversityInc and other organizations that try to evaluate diversity in companies have chosen a few metrics on which they reward companies鈥攆or example, 鈥渓eadership buy-in,鈥� which is measured by having a chief diversity officer. In order to tick this box, it has incentivized a burst of behaviors that may not actually do anything, such as appointing a CDO who has no real power.

Why we still need to believe in data. In the age of anti-intellectualism, fake news, alternative facts and pseudoscience, I am very reluctant to say any of this. Sometimes it feels like we scientists are barely hanging on as it is. But I believe that the usefulness of data and science comes not from the fact that it鈥檚 perfect and complete but from the fact that we recognize the limitations of our efforts. Just as we want to analyze things carefully with statistics and algorithms, we also need to collect it carefully. We are only as strong as our humility and awareness of our limitations.

This doesn鈥檛 mean throw out data. It means that when we include evidence in our analysis, we should think about the biases that have affected its reliability. We should not just ask 鈥淲hat does it say?鈥� but 鈥淲ho collected it, how did they do it and how did those decisions affect the results?鈥�

We need to question data rather than assuming that just because we鈥檝e assigned a number to something it鈥檚 suddenly the cold, hard Truth. When you encounter a study or data set, I urge you to ask: What might be missing from this picture? What鈥檚 another way to consider what happened? And what does this particular measure rule in, rule out or incentivize?

We need to be as thoughtful about data as we are starting to be about statistics, algorithms and privacy. As long as data is considered cold, hard, infallible truth, we run the risk of generating and reinforcing a lot of inaccurate understandings of the world around us.

This piece first ran in Quartz. Available at qz.com.

糖心TV