September 24, 2021
How can we apply the theory of measurement accuracy to human judgments? How can cognitive biases affect both the bias term and the noise term in measurement error? How much noise should we expect in judgments of various kinds? Is there reason to think that machines will eventually make better decisions than humans in all domains? How does machine decision-making differ (if at all) from human decision-making? In what domains should we work to reduce variance in decision-making? If machines learn use human decisions as training data, then to what extent will human biases become "baked into" machine decisions? And can such biases be compensated for? Are there any domains where human judgment will always be preferable to machine judgment? What does the "fragile families" study tell us about the limits of predicting life outcomes? What does good decision "hygiene" look like? Why do people focus more on bias than noise when trying to reduce error? To what extent can people improve their decision-making abilities? How can we recognize good ideas when we have them? Humans aren't fully rational, but are they irrational?
Daniel Kahneman is Professor of Psychology and Public Affairs Emeritus at the Princeton School of Public and International Affairs, the Eugene Higgins Professor of Psychology Emeritus at Princeton University, and a fellow of the Center for Rationality at the Hebrew University in Jerusalem. Dr. Kahneman has held the position of professor of psychology at the Hebrew University in Jerusalem (1970-1978), the University of British Columbia (1978-1986), and the University of California, Berkeley (1986-1994). He is a member of the National Academy of Science, the Philosophical Society, the American Academy of Arts and Sciences, and is a fellow of the American Psychological Association, the American Psychological Society, the Society of Experimental Psychologists, and the Econometric Society. He has been the recipient of many awards, among them the Distinguished Scientific Contribution Award of the American Psychological Association (1982) and the Grawemeyer Prize (2002), both jointly with Amos Tversky, the Warren Medal of the Society of Experimental Psychologists (1995), the Hilgard Award for Career Contributions to General Psychology (1995), the Nobel Prize in Economic Sciences (2002), the Lifetime Contribution Award of the American Psychological Association (2007), and the Presidential Medal of Freedom (2013). He holds honorary degrees from numerous universities. Find out more about him here.
Here's the link to the Thought Saver deck that accompanies this episode: https://app.thoughtsaver.com/embed/JGXcbe19e1?start=1&end=17
JOSH: Hello, and welcome to Clearer Thinking with Spencer Greenberg, the podcast about ideas that matter. I'm Josh Castle, the producer of the podcast and I'm so glad you've joined us today. In this episode, Spencer speaks with Daniel Kahneman about bias and noise in decision making, machine versus human judgment, and the limits of predictability. In the listener survey, we ran several weeks ago, many of you indicated that you'd like for us to list a few takeaways for action items at the end of each episode. So we're trying an experiment. Using our tool, Thought Saver, we've built a deck of flashcards that contain many of the most important takeaways from this episode. Thought Saver is designed to help you easily remember anything you add to it with short daily quizzes. We'd love for you to try it out and give us feedback on whether this is something we should offer alongside future episodes. You can find the link to these flashcards in the show notes. And now here are Spencer and Daniel Kahneman.
SPENCER: Danny, welcome.
DANIEL: My pleasure to be here.
SPENCER: So the first question I would ask you about is the Theory of Measurement Accuracy. And can we apply these ideas in the physical sciences and extend them to measuring judgments? So how would we do that?
DANIEL: Well, I can introduce an example. We think about judgment, as an operation of measurement. And measurement assigns an object, the value on the scale, and judgment actually does the same thing. So that when you have somebody making a diagnosis, or sentencing someone to a prison-term, or making a decision about a person to hire or to fire, that means there is a scale and there is a choice of the response on the scale, and this responding to a particular object. So judgment is measurement. So the theory of measurement becomes applicable to judgment. And the theory of measurement is a little more than 200 years old, and it applies in all the sciences. And he discusses accuracy, in terms of the absence of error, and it characterizes error in terms of bias and noise. And to illustrate this, we think of judgment in general, as an operation of measurement. And in the operation of measurement, there is an object, there is a scale, and you assign the object value on the scale, that's how you measure the height of someone, the length of the building, the weight of the object, or whatever. And that length we think of as a measurement, but the measuring instrument is not a scale in the ruler, it's a human mind. And the theory of measurement or the theory of measurement accuracy is applicable to and the theory of measurement accuracy is [inaudible] over 200 years old. And it's associated with the name of the famous mathematician called Frederick Gauss. And it's basically accepted in all the science. So I'll give an example from a measurement thing. And then we'll go on to judgment because it transfers directly. So suppose you have a line, and you're measuring it with a very, very fine, and you measure it multiple times. Now one thing is guaranteed, you're not going to get exactly the same measurements, if the scale is fine enough, there's going to be variability, and there is going to be errors so that if somebody didn't perform the same measurement to the scientific instrument, they could see the errors that you may apply in your ruler. The average of these areas is biased. So you can overestimate the length or you can underestimate length, that's the average area. That's bias. The variability of parents is noise. And this transfers exactly to judgment. So that if you have a number of forecasters forecasting inflation for next year, and realize inflation as the target and the prediction as a measurement, then the extent by which on average thereof their bias, and then you find that different forecasters are off by different degrees. And that's so that variability is noise, the average error is bias. And then, the reason that I go through this very complicated exercise is that there is a formula for measuring total error, the overall error in a set of measurements, and the formula is remarkably simple. It is the global measure of error equals the square of the bias plus the square of the noise. And that's very important because it tells you that there are two kinds of errors. And it tells you that, in principle, they're equally important. And what is remarkable when you think about a judgment is that when we talk about judgment, we think about judgment, we think almost exclusively about bias. And we hardly think about errors about noise at all. And that's what the book is about. The book is about noise in judgment, not the measurement, and noise in distinction to bias. And it's about the fact that noise tends to be neglected and biased and emphasized in our thinking of judgment.
SPENCER: So your first book for the popular audience Thinking Fast and Slow, was focused on one term in that equation, the bias term, and now your new book Noise is focused on the second term of that equation, and the two of them together, they make up the total error and prediction, is that right?
DANIEL: That's correct. And that's what attracted me to [inaudible]. I devoted most of my career to studying biases and completely neglected individual differences in variability of judgment, so it seems like a good topic.
SPENCER: I find it interesting what you say in your book about how cognitive biases can actually influence both terms in the equation. Do you want to talk about that a little bit.
DANIEL: Yes, true. Cognitive biases, typically, say that they characterize the thinking of individuals. So for example, suppose you have federal judges passing sentences, or your show or a bunch of federal judges, the same crimes and the same defendant. And they have to set a sentence to try the variability among the judges. That's noise. And one of the components of noise is that some judges are more severe than others. So each judge can be characterized by bias. But the difference, the biases are not the same across individuals. And the variability of biases is one of the sources of knowledge. It's not the only one, but it's an important one.
SPENCER: Right. So for example, if one judge was much harsher with a certain type of crime than another judge, that would be a form of bias that would actually influence the noise that you get in the equation, because depending on which judge you get, which essentially random, you might get a different outcome for your sentence, is that right?
DANIEL: To give you an example of how much noise there is, there was a study that actually stated that the structure they gave earlier with 208 federal judges, were shown 16 cases of crimes specified with enough detail so they could set an appropriate sentence. And the variability among judges in their evaluation at the same crime was horrendous. And to give you a sense of what it was, if you take a triangle, the average sentence was seven years. And you take two judges at random, the difference that you would expect to find is about four years, and then characterized as a lottery that the defendant could trace. That's huge variability. And one of the sources of variability is that some vendors are generally most of the other numbers. And you know something about those churches, so that if judges tend to be more severe than others, judges in the south them to be more severe than judges in the north, that's one source of noises indicated, then there is another one, which you mentioned, which is the charges are more severe for some kinds of crimes than for others. So there is a judge with a specialty severe for cases of fraud. And another one was especially severe for cases of violence against the elderly. Those factors among judges -- there's a lot of variability in those patterns. And that is a source of, actually, it's the biggest source of error.
SPENCER: And we've talked about how error can be divided up into these two parts, the bias term and then this noise term, but I think you can also divide the noise term into two parts because there's noise because of different people making judgments and then there's noise within a single judge or do you want to talk about that?
DANIEL: Yes, certainly. So, when a radiologist looks at x-rays and gives a diagnosis, you present the same x-ray to the radiologists on another day, there is a very good chance of not getting exactly [inaudible], that is noise within the judge and what causes, we call that occasional hawks because, on different occasions, you get different responses to the same object, to the same topic. And there is a lot about and it comes from the response to the relevant aspects of the situation. So for example, judges tend to be more severe on hot days than other days. So that creates sort of lotteries from the point of view of the defendants that the temperature is up that day but then changes day-to-day. So that's the occasion. Physicians get different prescriptions in the morning and the afternoon, they're more likely to prescribe antibiotics or opioids in the afternoon than in the morning because they're more tired. So they're more likely to just respond to wherever the patient asked for. Again, the patient faces a lottery, the lottery is the difference you don't know the mood that you're going to find, the mood of the judges that you're facing – clearly important daily variable, that's occasion. And then there is the noise across individuals. And as we already said, there are two kinds of those that the overall difference in severity, we call that level noise, or in the context of diagnosis, you may have some physicians depicting more heart problems than other physicians in exactly the same EKGs. So that's level noise. And then there is the pattern noise, which is a differential response to different kinds of things. And that's the most important kind of noise there is in the sense that it comes from most of the noise.
SPENCER: [inaudible] and the pattern is a little more.
DANIEL: Yeah. But we can learn from better notes is that when different people look at the same situation, they don't see it the same way. People have judgment personalities, and they look at exactly the same thing, they're surprisingly different from each other. And it's interesting to think why this is surprising. And this is surprising because every one of us lives, we live inside our own head. And every one of us, I am and I'm sure you're too, I think I see the world as I do because that's the way it is. And therefore, I'm convinced that I see the world as real, I see reality, my perception is real, it's true. And I expect you to see the world as I do. But in fact, your head, what goes on inside your head is quite different. You're different patterns in different ways of evaluating complicated problems. We differ, and we don't know it, and each of us would be surprised by how different the scene is viewed from inside the other person's head. That's better noise.
SPENCER: Right. So for example, in a job interview context, two people might interview the same candidate, and one of them thinks, "Oh, man, I would never want to work with that person, but a jerk." And the other thinks "Well, they seem like they're really perfectionistic and really capable." And they're just kind of focused on different aspects of the person, or maybe they're noticing different information. Or maybe they're combining that information in different ways with their own prior experience, and so on. Would that be an example?
DANIEL: Yes, that's an excellent example. And indeed, this happens in reality, there are interviewers hiring people who focus on different aspects, and in a regular way, some of them emphasize, say, originality more than others, some emphasize reliability more than others do. Those are differences in their sensitivity and their responses. And they cause noise.
SPENCER: One thing that I found helpful in thinking about the distinction between bias and also different types of noise, is the Archery Metaphor. Do you want to elaborate on that metaphor?
DANIEL: Sure, imagine that you have a target, and we're shooting a target with arrows or rifles. And you can imagine that when we're very accurate, a very good shooter, all the holes that you're going to find that the target is clustered closely together and centered on the center of the target. That's accurate. Now you have people who also clustered their shots, but they're off the targets and the distance between the center of the target and the center of their shots. That's a bias. And that's usually has a causal explanation. For example, something is bent in the gunfight is the gun side is slightly bent, then you'll be biased when we're on the other when you are aiming straight at the target. As you see it. You're in fact, aiming somewhere else. And that's where you, then there is noise. And noise is simply when the shots are distributed more widely, and you can have noise without bias that is when the shots are roughly centered on the target, except they're quite different from each other. And then, of course, you have a common situation, such as a comment in judgment, where you have both noise and bias, that is the sensor of the shot is of the target, and the shots are widely distributed. That's noise.
SPENCER: I also like that the archery example can show you a couple of different kinds of noise. For example, if you were looking at the target, and you saw this scattering of bullet shots, some of the scatterings could come about from different shooters, who have different levels of scattering, and some of the bad noise could come about from a single shooter, who with every shot is going to vary a little bit and where they hit the target.
DANIEL: What is important to know is that when you start a set of shooters, and each one of them shoots once, the noise that you observe is actually a combination of two sources. Because each shooter is the same shooter when to shoot, once again, at the same target, they wouldn't hit exactly the same spot. So both various sources of variability out there, if a defendant faces a judge, that judge may be more or less severe than others. And in addition, the judge may be in a better or worse mood than at other times, and both sources of variability contribute to the lottery that they could trace.
SPENCER: You make a real issue point in your book, which is that if you can see only the backs of the targets, so you don't even know where the bullseye is, you can still see potentially, that people are inaccurate, because if you see a really wide scatter, no matter where the target is, you know that they can't be that good at shooting.
DANIEL: Well, absolutely. I mean, if you just imagine what a target looks like, from the back, you have no idea about bikes, because you don't know where the center of the target is. But in fact, you have all the information about noise. And this is sort of important because in an organization or if you want to deal with noise in judgment, you don't need to have the truth. So for example, you can establish noise among forecasters, without knowing what the truth is because you're interested in the variability among forecasters. So that's a very important difference between noise and bias. Noise is actually a lot easier to measure than bias.
SPENCER: Right. So that's where you have this idea of a noise audit where you can go, whenever you have people making forecasts, you can compare them to each other and see if they differ from each other, which gives you a sense of how much noise there is, and then you can work to reduce the noise. Even if you don't know what the right answer should be. It's still better to have less noise in the system.
DANIEL: Absolutely right. So let me tell you the origin story. And the story is that about seven years ago, I was consulting in an insurance company, and I had the idea of examining whether people in the same role, agree in their judgments. For example, we have underwriting executives, prepare cases, quite realistic cases, the same kind of cases, but complex cases that their underwriters would make judgments about. And then we had, say, 50 underwriters look at the same case, and put $1 value on it, what is the appropriate premium for that person. And ask you the same question that I asked the executives, "If you take two underwriters at random in a well-run insurance company, and they both make dollar judgments about the complex risk, but how much would you expect them to differ?" And the interesting thing is that there is a common answer, people 10% that I'm sure that most people in your audience have thought 10%.
SPENCER: Yeah, I thought 10 to 20 hours popped into my mind. Yeah.
DANIEL: 10%, because it seems to everybody, we've run surveys on that. And that's what the executives in the company also expect. But the truth was 52%. So there were five times as much noise as the executives anticipated. And that's what eventually made the book interesting to write or useful to write because everybody knows that judgments will not be perfect. In fact, it's a definition of judgment not to expect reasonable people to disagree on a matter of judgment. What is surprising is that people don't expect how much noise there is. And it was that surprise, as well as the amount of noise that justified, I think, a book to draw people's attention to a problem that by large, they tend not to see.
SPENCER: On a personal level, I have found that when you go to the doctor for something that's very typical, where they see a lot of cases, they tend to all give the same judgment. But if you go to something kind of more outside of what they're used to, kind of obscure, you can get just wildly different responses from doctors. And I think this can be an interesting example where we actually kind of in our own lives.
DANIEL: Absolutely what we call a difficult prop of judgments. That is a complicated problem with various aspects that you have to put weight on. So that's where people were this and so on easy problems, in general, you will expect little noise, but on complex problems, typically important problems are complex and complex problems, you can expect a lot.
SPENCER: Increasingly we're seeing society turn everything into algorithms, right? You might think, well, if humans have all of these problems in our judgment making, maybe we should just use machines as an alternative.
DANIEL: Though, there is actually a very good argument for doing that. It's not only Albert but it's also been known for 70 years, that when you pit individuals making a judgment, against very simple rules to combine information mechanically, that is available to the judge, the simple rules typically do as well. And in about 50% of the cases, they do better than individuals. And the reason, by the way, is that people are noisy, and rules are not. That is when you present the same information profile to an algorithm or to a rule on different occasions, you'll get exactly the same answer. This is not true across people. And it's not true within people. There is noise and algorithms of noise-free. curve algorithms have very complicated rules. In fact, machine learning finds solutions that people do not understand. But machine learning is even better when there is enough data that is more accurate than the routine statistical methods that have been in use for decades. And equally noise-free algorithms are clearly superior. And that's true in important domains, for example, in improvements of bail, in decisions about bail, there is absolutely clear evidence that artificial intelligence will do it better than human judges, in the sense that it would either incarcerate fewer people and keep the same level of crime, or incarcerate the same number of people in jail until their trial, and then prime would be produced. Unquestionably algorithms are winning this one that winning against dermatologists that winning against radiologists. And we can expect the algorithm to outdo people in a large number of problems when there's enough data.
SPENCER: Yeah, I think having enough data points is a really critical one. Because in some sense, the way that algorithms are outperforming humans is that they're averaging over the noise, right. So maybe you have a bunch of label data made by human experts, and the human experts disagree, what the algorithm is trying to predict is essentially the average of human experts. So it's trying to put it right between what they all predict, which is essentially the kind of noise-free production. But of course, if we don't have much data, then they basically can't average over the noise. And the more noise there is, the more data they need.
DANIEL: You make a very interesting point and a very useful one. Because you're pointing out to the following, that if you average enough judgment of the same object, and you just take the average, you will be eliminating noise. If you take the average of 100 people, you're in effect, reducing noise to 10% of what it was. That's a sure thing. You're keeping bias exactly the way it was. But you can reduce noise by taking many people to have to average. And as you say, large machine learning programs, are based ultimately on human judgment. So human inputs, and on the average of human input, which is noise-free.
SPENCER: One of the most shocking things, I think, to come out of that research on comparing human judgment to algorithms is that sometimes when you train an algorithm not to predict the right answer, but just to predict what a single human will say. So you know, if you have a particular human judge, and you train the algorithm to predict what that judge will say, it can actually outperform that judge at predicting the outcome, even though he never got to see the outcome.
DANIEL: And the reason is straightforward. There's only one possible reason why the judge, that is you statistically create a model of the judge, which predicts what the judge would say. But actually, the terms would not say that every time the judge is noisy, the model is not. And the superiority of the model in terms of its accuracy is for a simple reason. It's noise-free. And that is really the key to the importance of the problem is that you can improve accuracy, not only by reducing bias, but you actually improve accuracy, even when you leave bias the same by reducing noise. That's the key to the Gauss's rule about how to measure error.
SPENCER: It's interesting to think about fairness with regard to noise because, on the one hand, the noise seems fair. It's sort of like a coin flip. You know, if we determine who gets something based on a coin flip, you could say, well, that's fair. You know, you might get it or I might get it. But the decision procedure seems fair. But on the other hand, noise can also seem very unfair, because if two people commit the same crime and one gets 10 years and one gets five years, that's, that seems unfair in a different sense.
DANIEL: Yeah. And that's absolutely right. That's the unfairness of noise. And in many contexts, noise produces unfairness because equal cases should be judged identically. And you can see where noise hurts, and it hurts when organizations are using people to make judgments. So for example, the insurance company is a system, and the system uses underwriters and the underwriter speaks for the company. Similarly, there is a justice system with individual judges, and the individual judges speak for the judges for the justice system, and they don't speak in the same voice. And that variability is costly to the insurance company and is unfair in the context of justice.
SPENCER: in the sentencing algorithms in particular, that are actually now used in courtrooms around the United States to do things like predicting what's the chance of someone committing another crime, if they're released on bail? There's a lot of pushback from activists, and a lot of people say that these algorithms, they're going to kind of systematize bias. I'm curious to hear your reaction to that. I also have some thoughts on that.
DANIEL: Well, I think that, unfortunately, some algorithms that have been used in the past have been biased. And that bias is typically built into the very design of the algorithm of the very measure that you use. So if, for example, you're predicting crime in different areas, and you're measuring the incidence of crime by the number of arrests, then if the rest of us, for example, if blacks than to be arrested disproportionate, then we're building that bias right into your program. But this is something that, when you think critically about the program, you can see you can detect it, that's now serious bias, then there is another kind of bias, a lot has been made, which is that the algorithm is differentially sensitive, or differentially accurate for different groups. So for example, face recognition algorithms tend to work better on whites than on blacks. And the reason is not that they are biased against blacks, but simply that there is a lot of data about whites than about blacks. And as a result of these differences in data, you get a difference inaccuracy. So my sense would be there is a bias problem. And that should be controlled, but it can be controlled. And on the whole, my sense in that algorithm has gone [inaudible].
SPENCER: Yeah, well, one thing that's interesting is that you could argue, okay, these algorithms are going to systematize bias. But on the other hand, judges might be systemizing advertising bias, right? Humans have so many biases, it's sort of the bias that you can measure versus the bias that you can't measure.
DANIEL: Absolute. And it turns out the bias that you can't measure, people are quite comfortable. For example, there is a lot of pushback against the use of algorithms injustice. And it's certainly not only by activists, it's by judges. Judges don't want to be told how to make judgments, judges hate the idea of algorithms setting sentences, and they even hate the idea of being constrained by guidelines. You know, we've tried sentencing guidelines, and they were removed. And two things happen when they were removed, there was no more noise than before, and the judges were much happier with the system. So people want to use their own judgment. And because they're deeply convinced that their judgment is true, and they are somehow convinced against the evidence that because their judgment is true, anybody else in their shoes and the other competent judge will make the same job. And that is wrong.
SPENCER: That reminds me of another piece of pushback I think that algorithms often get, which is that you need human discretion to take into account the full complexity of the case.
DANIEL: Well, there's a long history on that question. And Garry Kasparov, after he was defeated by a Deep Blue World Champion of Chess, here, because he for a number of years advocated a hybrid that is a human and a computer playing together. And that is on the same side. And he said that that combination would actually be the best. It turns out, he was wrong. And he was wrong because the software that eventually develops, doesn't need human input. In fact, human input gets in the way.
SPENCER: He might have been right for a few years.
DANIEL: It was true for a few years. But algorithms keep improving much faster than people do. And so when algorithms are close in accuracy to what humans accomplished, it's a short thing. But within a decade, the algorithms are going to do better. And who should have the last word is a fascinating question? It seems obvious to most people that when you have a human, and a piece of software making the judgment, the human should have the last word. It turns out, there are cases were clearly, if you're going to approve a loan to someone, or the computer says, approve that loan, and you happen to know something, that computer didn't know that the person with the rest of this morning for fraud, then we are not going to approve the loan. This is a case where the person has crucial information that the computer doesn't have. But when the human and the computer have roughly comparable information, the conclusion is nasty, the conclusion of the computer should have the last word.
SPENCER: I largely agree with you, though, I would say that there, I think there are some important caveats when algorithms can actually be a bad idea. And one of them is in rapidly changing environments where the training data may not reflect the new environment. And unless the algorithm can be updated with enough data, it can be sort of predicted based on bad information, you also point out a good example where sometimes the human will know things that the algorithm doesn't know that a really critical. A third one that I'll just point at is that there can also be a weird feedback loop situation where if you're using an algorithm to make judgments for a long time, that can actually start biasing your training data in favor of the algorithm. So if the algorithm tells you to arrest certain people, and then you go arrest them, and then you feedback that data into the algorithm to tell you who to arrest, you can get weird kinds of feedback loops and cycles.
DANIEL: I agree with all of this. Clearly, algorithms are susceptible to errors. But the errors are detectable in principle. You can look at the way the algorithm was set up and diagnose the probable errors.
SPENCER: Right, at least you can inspect what's happening. Now, I mean, it's fascinating that a lot of these really simple algorithms, whether it's simple linear regression, or even simpler, like very simple rules that just use a few factors that they often outperform human judgment. And then if you start thinking about these really complex neural net models that might have a million parameters or more, where we literally don't even understand what they're doing those unlike the simple algorithms, which often be humans, but can't really pick up on complex patterns. These giant neural nets, if trained on enough training data, can actually start picking up on really complex patterns of the sort that a human judge might be able to find.
DANIEL: Yeah. I think the advantage of simple rules, like the advantage of the model of the judge over the judge, is because the judge is noisy and the model is not the judge is noisy, and the rule is not. And there's so much noise in human judgment as to make that's the main reason for the superiority of rules. In fact, what is interesting is that rules are superior to human judgment in many situations, but never by very much that as human judgment is not bad. It's mainly noises. More than bias in many cases.
SPENCER: There may be certain dimensions to which algorithms could be much better like speed is an obvious one. If you need to have a very huge number of judgments made or let me know in the 100 milliseconds, like with algorithmic trading, and then also, it seems like there are some cases wherein order to detect the pattern, you would just be way more examples that a human could ever see in their lifetime. And in those situations, maybe you can imagine algorithms having a high level of superiority, even though as you point out, in most cases, the algorithms don't have that much advantage, if they're better, it's just a little bit.
DANIEL: And that's quite important is that there is a limit to how predictable the environment is. So the algorithm can do no better than reaching the limits of predictability. And the limits of predictability are very often quite low. So if you're predicting the performance of vegetables, on the job, in a complex job, so much depends on factors that don't exist at the time of making the hiring decision that the person will work with, whether they will start on the right foot or on the left foot, whatever, that actually most of the variance is completely unpredictable. A complicated algorithm with lots of data can reach the limits but it can never exceed the limits of predictable as a central treatment.
SPENCER: Yeah, good point. And it's gonna depend on what the input features are, right? So for any given set of input features, there's a limit of predictability, but if somehow you could get some better set of features, then the accuracy might go up.
DANIEL: The point about Machine Learning when applied to vast amounts of data, is that you can trust them if there is a factor that was detected, so but if they may need information that could be surprised by conditional features, and certainly the performance of the crew.
SPENCER: Can you tell us about the fragile family study?
DANIEL: A few years ago, there was an interesting publication that came out in proceedings of the National Academy. And that was a joint paper by a large number of authors, data scientists, and sociologists. And the question that the study was intended to answer is, how predictable are events in the life of families from the kind of variable that sociologists usually collect. So there is a large body of data called The Fragile Family study in which for each trial, they're actually those typically are children of broken homes. And for each child, there are 1000s of items of information about the family about going to the grandparents, about health, about development, and then we are trying to predict a particular event in that child's life in a particular year, for example, will the family be evicted during that year? With all the variables that sociologists typically collect? And it turns out, you cannot really predict very well. The accuracy of prediction is chance would be 50%. You can get, say 55%, 56%, 57%, you cannot do more. And that's not how did they learn it, they put a very large number of computer scientists and researchers to work trying to find a pattern in these data and use them for prediction. And the best predictor achieves that level of accuracy from 50 to 57%. And that, evidently, is the limit of predictability, you can count on the fact that if you have 100, and something like 160 experts trying to generate a prediction rule that the winner among these prediction rules, is at the limit of predictability. So predictability of life events is low. And that means that neither humans nor algorithms can predict them by accuracy.
SPENCER: You might imagine this could partly be due to kind of Chaos Theory effects, was to suppose that one day, one of those children happened to fall into a bad crowd of friends, and became a drug dealer, versus they just never happened to meet those friends, because of coincidence, that could have a huge effect on the rest of their life. And yet, how would you ever predict that, right? There are so many of these kinds of chance events? Or do you think about how someone meets their life partner, whether in many cases, they could have easily just never met that person due to a chance event, and yet, it dictates so much of their future? I do wonder whether someday we could find better features, you said they use a huge number of different pieces of information, hundreds or even maybe 1000 different pieces of information about each child. But I do wonder whether it's possible that there could be other information about children that could increase that number, but it does seem like at least with that information, we're kind of at the limit of predictability and it's not as good as you might want it to be.
DANIEL: And it's even difficult to think of how much you could improve it if you have more information. The information that you lack is real-time information about what happens. Like who is that trial going out with today. You know, a good crowd or a bad crowd. And that is something that you don't know, never going to be known. That's the limit to predictability.
SPENCER: So we talked about a lot of the different challenges that come up in making judgments. Let's talk about some of the solutions. Can you tell us a bit about what is good decision hygiene look like?
DANIEL: Well, we coined the term Decision Hygiene with a particular image in mind. Hygiene is different from vaccination, or from medication. Vaccination and medication, as we know, are specific to a particular disease. Hygiene is not specific. When you wash your hands, you have no idea what germs you're killing and it feels good at it, you will never know. That is the kind of thing that we were looking for a noise-reducing, noise-reduction approach. Decision hygiene, what are the steps that you would take that are equivalent to washing your hands? That is to avoid errors without knowing what the errors are that you're avoiding? And we have a set of procedures.
SPENCER: Can you walk us through some of those procedures?
DANIEL: Yeah, I think the most important recommendation is something that we call the Mediating the System Protocol. It's an ugly name. But the idea is quite simple. And it's based on the kind of decision that people make in hiring when they interview someone. When you interview someone for hiring, there are two ways of doing it, crudely speaking. One is to talk to the person trying to form an impression of who they are. That's the way job interviews are normally conducted. And the other one which is called a Structured Approach involves defining the attributes that are important to performance. And evaluating those attributes one at a time, and to the extent possible, independent of each other, and delaying the global judgment until you have made separate judgments with all the attributes. And we think independence is crucial so that you actually collect valuable information. In unstructured interviews. The conclusion is in general, that the interviewer forms an impression within the first two or three minutes of the interview, and spends most of the rest of the time confirming that impression. That's not useful. And what you want is to get as much information as possible, and delay intuition. So that's how I break up the problem, evaluate aspects of the problem, and delay the global evaluation until you have all the information. Try not to think whether you want to hire that person. Try to focus on reliability, punctuality, originality, wherever the attributes are, that is important. So that applies to hiring candidates. And I think what could make this valuable is if you think of decision-making, you think of somebody choosing between options. And there is a very natural way of thinking that an option is like a candidate. That is an option that can be characterized on various attributes that define how desirable it is. And you should do exactly the same thing as in the structured into, you should evaluate those attributes separately and independently, and delay your global judgment, delay your intuition until you're done. So I think is a principle, we have a number of Decision Hygiene procedures. But then I think the most interesting and obvious one is simply taking many judges. And when you take many judges and average interpreters pointed out earlier, you get something that's sometimes called the wisdom of the crowd. But it's called the wisdom of the crowd, where there is no bias when there is no bias and wherever enough judgment, you'll get an exactly correct judge. But when there is a bias, and your average, lots of the bias is going to say exactly what it was, which will eliminate noise. And if you remember what we call the Gauss equation that defines where the total error is bias squared plus noise square, it follows from that, that reducing noise even when you don't change bias at all, improves accuracy. This is not intuitive. And this was one of the important reasons actually decided to write the book is this idea that just reducing noise, without finding anything about bias is effectively guaranteed to improve accuracy.
SPENCER: And it's also really interesting to apply this idea of independence in group judgments, because it seems like so often at companies or nonprofits, a bunch of people is sitting around the table or these days a Zoom meeting, discussing what to do or what they think can happen, their judgments end up contaminating each other. So instead of getting this noise reduction effect, where you kind of average you to their predictions, and then you kind of get something that's better than most of the predictors, instead, you can actually get an effect where it gets potentially even worse than just looking at the one or two best predictors among them.
DANIEL: Yeah, you know, it's an accident. And that's what makes it noisy. It's an accident who speaks first, it's an accident who speaks most lovely, it's flexible, who is most confident.
SPENCER: Soon as it's someone's personality.
DANIEL: But all of these have an effect on the group that is the second person to speak is strongly influenced by the first. So you generate the semblance of agreement. So even if just to begin when people have different opinions, the noise among them is never revealed, because people are trying to come a little closer to the others, so as to reduce conflict. And we're not even aware of that. I mean, you can think of what happens when a bunch of people comes up with a film, and you love the film. But the first dose to speak says, Oh, that was terrible. You are not going to be as unequivocal about saying you love the film as he would have been otherwise. That's the process.
SPENCER: Yeah, they can even create a kind of false consensus effect, where those that let's say the first person has a really strong positive reaction. And then the second person has a positive reaction. Everyone can be left with the impression that everyone really liked it because just the other people didn't speak up.
DANIEL: And furthermore, the original further reframe on my collaborators, Cass Sunstein studied in particular, which is polarization. That is that when you see other people agreeing with you when people agree on a job, they typically make the judgment more extreme than they were initially, that phenomenon of polarization is quite powerful. It is, of course, enhanced as noise.
SPENCER: Could you elaborate on that a little bit? So imagine you have a group where most people are in favor of something? To the average is like a little bit in favor, do what mechanism does it end up leading to a group decision that serves even more in favor than the average?
DANIEL: Well, I'll give you the example where you know, that first observed phenomenon, it was on research, Cass Sunstein and I probably did together more than 50 years ago, and you have juries, or deciding the punishment for a company that was accused of negligence, what you find is that juries when the punishment is fairly severe when that is when the negligence is fairly severe, then juries tend to be more extreme in punishing than their initial average, so you collect judgments from the tour, assemble it, then they discuss, and then you see that they agreed on, and they agreed on something that is more extreme than their average. And the reason is that, in general, when people are leaning in one direction, they are getting interested in more extreme expressions and agreeing with a general sentiment, but taking a more extreme position is sort of rewarded. We see the same thing in polarization, of course, social media, political polarization of media is basically a similar mechanism.
SPENCER: So why is it that people tend to focus more on bias than noise?
DANIEL: Well, the focus on bias comes down to a property of the human mind. And the human minds are much better at dealing with causes than dealing with statistics. That is our normal mode of operation is building a causal model of the world, which enables us to understand the story. And typically, stories are about the individual case. So that's the way that our thinking normally flows, is frozen, building stories about individual cases, that make sense of every aspect of that story. This is causal thinking. And it's about one chain. And bias can be perceived as a cause of the eventual judgment as pushing the eventual judgment one way or another. Noise is really different in that you never see noise in a single case, noise is a statistical property that takes at least two cases to detect variability. And you have to be able to think statistically about the collection of cases and that is something that people are not intuitively inclined to do. And so the focus on biases because biases are common and the neglect of noises because noise is statistically played. And that's the way we're built, professional courses of statistics.
SPENCER: It seems related to this idea that one of the most common things humans do to entertain themselves is they hear stories of different sorts, whether it's television or on podcasts or movies or in books, it seems like there's something fundamental about stories that appeal to the way our minds work.
DANIEL: Absolutely. You know, the way we understand the world is like unfolding stories. And the characteristic of a story is that we feel we understand what's happening, although we could not predict in advance that. That is, you don't know what I'm going to say next. But unless I say something completely crazy, they won't be surprised. And so that there is the very interesting gap between expectation and surprise, you don't really expect what is happening but you're not surprised by it appearing normal, and a normal event is understood after the fact this perfect. It makes sense after the fact. But it could not have been predicted in it. And it's this gap between prediction and explanation, that causes a lot of mystery because it gives people confidence in their ability to predict because they find it so easy to explain what happened. But in fact, people are much better at explaining the past than predicting the future.
SPENCER: So this relates to the hindsight bias, but also to cognitive miserliness, where, let's suppose we made a bad prediction, and we find it out afterward, maybe we hired someone, they turned out to be a bad employee. So we look back at that decision, say, "Ah, I see, I missed that time, if I only spotted that time, I would have known it." And then suddenly, we have a kind of satisfying explanation for what happened there. That makes us feel like it was predictable, even though in fact, we missed predicted it. And we don't bother to think about five other possible explanations, that could have been very different about why this went wrong. Maybe the person is dealing with a difficult situation in their life, or maybe the person has a drug problem that we don't know about, or what have you.
DANIEL: The sense that you have that, you know, that person, you could have predicted the event in advance. That's an illusion. And altogether, what this kind of mechanism does, explains why we live our lives with exaggerated confidence in our understanding of the world and exaggerated confidence in the predictability and in the ease with which it can be understood. And in fact, the world is unpredictable. And our sense that we can predict it is interesting cognitively.
SPENCER: Since every event in our lives is distinct from every other than, no matter what's happening to you. It's not exactly in every way, like any past event. How do we think about a decision being noisy? How do we think about noise in the context of everything being technically unique and different from everything else?
DANIEL: Well, there are two ways of dealing with this issue. Clearly, when a judge makes a single decision, that decision is never repeated. It's not like the X-ray, which can be read several times. But there is noise in the individual decision, although you cannot detect it, you cannot measure it may help to think about if you think of your mood today, and in what ways is your mood different from average, if you think of how differently you could have done it, you can make that judgment. But it also helps to think of how somebody likely could have made a jump. And this is very non-intuitive. But if you think of people like know what you know, and have your values, and you think of their range in terms of that they could have made the average of those jobs if you couldn't find it is probably more accurate than yours. So that's one way of thinking about what would the average FMP it's very difficult for people to know that. There is something else that's much more constructive and helpful. Well, one of my collaborators of my co-authors, Bernie had a phrase that I very much like, and that phrase is a unique event is a repeated event that happens only once. That is the rules that apply to repeated events are applicable to unique events and procedure improvements that reduce error in the pizza chirps but also reduce error in simple terms.
SPENCER: It seems to be clearer to think about this if you think about noise as being a process that affects something. So imagine you have a judge making a judgment and you model it as okay, they make a perfect judgment plus, there are these irrelevant factors that then insolence that judgment, let's say like the temperature in the room and how tired they are, and you know how severe they tend to be with this kind of punishment. That's kind of an effect that alters that judgment. And then even though they may only make that judgment one time, you can still ask questions of what is the strength of that kind of noise effect.
DANIEL: But it is easier to imagine when it's noise within the post, that is when you made that judgment, and you were in a very good mood or in a very bad mood, you may be able to correct to some extent, for the effect of what is much harder is to think that the way that you see the world, and you see the case at somebody else looking at exactly the same case, no less smart than you are could see very different. And that the average judgment of many people like me about that problem, doesn't have noise in it, and is, therefore, more accurate to that extent, than the average individual. So the average of judgments is more accurate than the judgment of the average person.
SPENCER: That reminds me of one of my favorite, cool ideas from your book, which is this idea that if you actually average your own judgments across time, you can improve your judgments the same way that you can reduce noise by averaging multiple people's judgments. And I think you give an actually interesting strategy for doing that where you could make a judgment and then imagine that, you could do a premortem, where you say yourself, okay, suppose this judgment turns out to be bad? What do I think the most likely explanation for being bad is? And then how would I adjust it based on that, and then you can average original with that new judgment and potentially get some of these benefits of noise reduction.
DANIEL: That's absolutely true. Of course, it's not as good as asking two different people because the range of variability within an individual is more restricted than the differences between people. But it can be, I think, about 1/3 as good as asking somebody else. So there is a clear advantage in using what's been called the crowd within. That is the fact that your judgments are not unique, they could be different on different occasions. So if you elicit more than one and your average, they're going to improve things statistically.
SPENCER: Before we finish up, would you be up for doing a kind of lightning round where I asked you a bunch of shorter questions to get your quick tips on them?
DANIEL: Oh, yes.
SPENCER: Great. Okay, so So actually, this relates to that idea of this strategy for the inner crowd, but broadens it which is how do you feel about debiasing and denoising at the individual level where a person uses strategies to try to become better at making judgments without kind of bringing in other people?
DANIEL: Well, I've been generally skeptical of the ability of people to greatly improve their judgment that is merely knowing about biases and noise doesn't make you immune to them, or they know that because of my personal experience. I've been studying the human judgment and human error for more than half a century. And I really don't think that that the fruits of that much so individuals, I think are very different from the organization, there is much more hope for organizations improving their judgment and decision procedures than there is for individuals. But individuals could do so as well, we hope.
SPENCER: Now, when I think about individuals improving their bias or their or reducing their noise, I kind of think about it as a series of steps that have to happen. You have to be in a situation where you first notice there's some kind of pattern that suggests this might be a case where you might be noisy or you might be biased. Once you've noticed that you then need to have a motivation to do it differently but to use a different strategy. Third, you need to actually have an idea of what strategy you can use. It might be effective And then fourth, you have to actually be able to execute that strategy properly. So an example of this might be the planning fallacy, which I know that you've worked on, you may have coined that term, which is basically the idea that people tend to underestimate how long a project will take, or how much it will cost. So if you notice, you're in a situation where you're having to estimate, you know how long a project will take, you could say, "Ah, this is somewhere where the planning fallacy might have might apply." And then if you feel motivated to make it a more accurate estimate, you could then pull out a strategy like a reference price forecasting, which I believe you developed where you consider previous instances where you had to do similar projects. And then if you can correctly apply reference class forecasting, you can improve your prediction in that situation. So that's kind of how I think about how I deal biasing would work. And I was curious to hear your comments on that.
DANIEL: Well, you know this is really the term decision hygiene was created for that situation. And looking at a case, as an instance of a product class, which is reference class forecasting, tends to reduce noise and actually to reduce bias as well. And all the steps of decision hygiene, if you adopt them deliberately, are likely to make your judgments more accurate. But that requires a lot of self-discipline. All together with decision hygiene is it's a form of discipline thinking, that is adopted in this approach. And you can do that, even for yourself, when you're making a decision that you realize is important. And people do that to some extent. But I think knowing about decision hygiene, that is they know that when they're choosing a job, or they're choosing whether or not to move to another city, they must think carefully. There are techniques that can make you think carefully more effectively. And they're not guaranteed to provide you with success in an unpredictable world. But they will improve your odds to some extent.
SPENCER: My suspicion of why a lot of debiasing has failed, you know, just knowing about biases doesn't actually make people that much less biased, if at all, is because it doesn't install the pattern to think of it at the right moment. So you really want to ideally train on lots and lots of examples in kind of real-life situations where that bias might occur. So that's the first thing. And then second, really convincing people is getting them motivated to actually bother to put in the effort to try to improve in that way, which people aren't always motivated to do. And then third, really teaching a strategy that they can use at that moment that actually does a better job, like the reference class forecasting for the planning fallacy. And so I'm not sure if you're familiar with our work Clearer Thinking our website, but what we're trying to do is actually teach people through these interactive modules to learn to notice these kinds of patterns, and then have correction strategies. That's one of our efforts. And obviously, I feel like we're really at the beginning of that kind of work of like, how do you actually help people really be better judges without putting them into kind of a situation where you can use the external environment to help force it on them?
DANIEL: What you're doing in your operation is you're helping people discipline their own thinking, by adopting some rules, you know, we would call them decision hygiene, but the term doesn't matter. That's the set of rules and principles that you're probably naturally you cannot apply that to, or that would be too cumbersome. But you can apply that to an important decision. And you can apply that to an important decision, especially when you suspect that you might be making this thing.
SPENCER: Absolutely. Okay. So next question. What are your feelings now about the replication crisis?
DANIEL: Well, it turns out that many findings, so the 60 findings of psychological research, the kinds of findings that people remembered for their surprising, turned out to be surprising, but not true, in the sense that further research failed to replicate them. And that happened on a pretty massive scale. And over the last decade, there's been a dramatic change, I think, within psychology, a trend of standards, that procedures have been adopted, very generally like using larger samples, planning the statistical analyses in advance, reporting every detail of your procedure, that there have been all these procedures adopted, that simply improve the quality of research and make it more scientific and more replicable.
SPENCER: So you pretty optimistic that we're gonna see in the next decade, social science being much more replicable that was in than past?
DANIEL: Oh, yes, I think so. I mean, we still have a problem with fraud. But I think the problem of fraud is limited to briefly bad apples. Most people want to be good scientists, and there are no procedures that are accepted that will help people be better than they were. This was really happening.
SPENCER: I totally agree with you. There are a lot of really good scientists, people preregistering studies and releasing their data and code, but maybe I'm a bit more cynical than you. And I'm just curious to hear your reaction to the following, which is, I guess the way that I think about it is that discovering important truths about humans is just really, really difficult, super, super difficult. And it's much, much easier to publish a nice-sounding paper and a top journal than it is to actually truly discover something worthwhile by human nature. And that while these improved practices help, they're still just a vast advantage to doing the fake thing over the real thing. And even if you shore up a bunch of practices, it will tend to just move the fake thing to different areas of fakery. And I don't mean fake here, as in fraud, I mean, more like using kind of not great practices that allow you to get nice looking findings that just aren't actually going to be anything real.
DANIEL: I mean, the real culprit, I think, in the replication crisis was not a fraud. It was self-deception, that isn't people trying to do the honest thing, and engaging in practices that they keep more likely that they will confirm their own hypotheses. Many of us have had that experience, I've had it myself. That is, I've discovered that some results that I had had, I have unconsciously and certainly without any intent, I had sort of played with the data to make the data prettier. Now that was acceptable 10 or 15 years ago, it's not acceptable today. And today, you're not allowed to run for studies in your laboratory, and to publish the most successful, which was sort of standard practice 10 or 15 years ago. So that's what I mean by procedures, these procedures will not eliminate fraud, but they will eliminate or reduce the odds of self-deception because people will be much more disciplined. And fraud will not be eliminated, but it will become riskier and more difficult. You're making the life of sports harder.
SPENCER: Richard Fineman, the great physicist had a quote that I really loved, which is something like, the first rule of science is not to fool yourself, and you're the easiest person of all the fool. And then he goes on to say, once you've done that, once you've avoided fooling yourself, it's relatively easy not to fool other people.
DANIEL: Yeah, that's exactly what the replication crisis was. It was a crisis of self-deception.
SPENCER: So next question, how did you approach generating ideas in your work? Because you just had just a shocking number of really influential on amazing ideas in psychology. I'm just wondering, what did that look like? You know, were you just wandering around thinking, you know, in a great idea would pop into your head? Or did you have certain processes you use?
DANIEL: I don't think that deliberately searching for good ideas, gets a very productive search, good ideas occur to you. And then what's important is to recognize that you have a good idea, and to recognize that there may be a glimmer of something new, you typically don't understand yourself. You don't know what you're thinking. And in some cases, it takes a long time, I was very fortunate, my most important work was with my collaborator. And what happened is, I would have a vague idea, and he would understand it better than I did. And that made our process sort of magical. It's that very quick recognition of the value of an idea because you have ideas over time, you have to edit and select them. And that's where there's luck in their skill.
SPENCER: These glimmers of ideas, do you think a lot of them came about just from observing the world and kind of noticing patterns? Or was it more mysterious than that?
DANIEL: So observing people observing the world observing myself, most of the ideas that I've had that turned out to be influential, most of which they have the character, something that after the fact about, it's a matter of seeing, and that's what I've been fortunate in doing the few times of noticing something that is actually a common-sense idea. But that discipline has ignored, that your discipline has left actually sort of pressure on the ground that you can pick up because it's common sense. And much has had that character theory or the common knowledge of the discipline, the covenant agreement of discipline have gone off concepts, and just [inaudible].
SPENCER: So then, basically, you're taking that idea which maybe stems from common sense, but then you're kind of honing it, making it more precise and rigorous, and then, eventually, experimentally testing and building a theory around it.
DANIEL: And what makes the idea useful is that the existing theory doesn't accept it or doesn't generate it. That's what makes it interesting scientific. So many years, [inaudible] and I had the conversation many times. And I think one of the more important concepts he came up with, for the concept of losses of the weight of a book, is okay. And we were joking among those, that this is really something that matters. But our grandmothers couldn't have done what we were able to do was to generate interesting, testable consequences that would change the view of our colleagues. That's a skill we had that our grandmothers didn't be idea themselves were there.
SPENCER: I think that in your career, you've seen a lot of pushback where people want to argue that humans are rational. And I see this even today, you know, many times when people write an article about some form of rationality, people will give this hard pushback or they'll be academic papers, arguing that actually, if you interpret it just the right way, actually, that could be rational. I'm just wondering, from a more meta-perspective, why do you think that people want to push so hard against human rationality?
DANIEL: Well, people are proud of what they are. And the presumption is that people know what they're doing. What I think has happened is that a lot of our work has been misunderstood as an indictment of human cognition. It's one of my least favorite words is the word irrationality. I've never used it. And you rationality as a technical definition of the logic of decision making, or the logic of probability, which people intuitions simply cannot conform to. It's not feasible for rationality, as defined in this is not feasible. So it's not even interesting to say that people are not rational. Of course, they're not fully rational, as decision theorists would define rationality, nobody can be. The question is to characterize how people think in a constructive way, and how that generates both correct judgments and error. Most of our judgments are correct, but the errors are not random. The errors are predictable and systematic and was study.
SPENCER: As soon as the mischaracterization that people claim that your work shows that humans are irrational?
DANIEL: Well, absolutely. I would say that our work has shown that people are not fully rational, as the concept is defined by an economist on this theory, that's very different from saying that people are rational. I think people must be reasonable. We should keep irrationality out.
SPENCER: I see because you're using irrationality that much more technical sense, as a decision theorist.
DANIEL: That's right.
SPENCER: So the final question for you. What do you think of Bayesianism as sort of a framework for thinking about how to make correct judgments? Yeah, so for those who are not familiar, Bayesianism tells us that there's a sort of single question you can ask to evaluate the strength of evidence, you can't always answer the question. But at least there's one question to ask, which is something along the lines of how much more likely am I to see this evidence if my hypothesis is true, compared to if it's not true? And that ratio kind of tells us the strength of the evidence? And so it kind of gives us a frame for thinking about evidence?
DANIEL: That's a very good definition. It's sort of unavoidably correct that there are ways of extracting information from the world as evidence and as their ways of changing your mind. And that would basis is basically to set the proofs for how you should change your mind in light of the evidence. And those rules make perfect sense. Now technical difficulties and applying the technical difficulties are in defining what is your State of Belief when you have no evidence, be in complexity, but in principle that this is normally correct? I think it's very difficult to disagree with, at least.
SPENCER: Danny, thank you so much for coming on. This was really fun.
DANIEL: Thank you.
Click here to return to the list of all episodes.
Sign up to receive one helpful idea and one brand-new podcast episode each week!
Subscribe via RSS or through one of the major podcast platforms:
Host / Director