Episode 187: We can't mitigate AI risks we've never imagined (with Darren McKee)

Enjoying the episode? Want to listen later? Subscribe on any of these apps or stores to be notified when we release new episodes:

Listen on

Apple Podcasts

December 7, 2023

How can we find and expand the limitations of our imaginations, especially with respect to possible futures for humanity? What sorts of existential threats have we not yet even imagined? Why is there a failure of imagination among the general populace about AI safety? How can we make better decisions under uncertainty and avoid decision paralysis? What kinds of tribes have been forming lately within AI fields? What are the differences between alignment and control in AI safety? What do people most commonly misunderstand about AI safety? Why can't we just turn a rogue AI off? What threats from AI are unique in human history? What can the average person do to help mitigate AI risks? What are the best ways to communicate AI risks to the general populace?

Darren McKee (MSc, MPA) is the author of the just-released Uncontrollable: The Threat of Artificial Superintelligence and the Race to Save the World. He is a speaker and sits on the Board of Advisors for AIGS Canada, the leading safety and governance network in the country. McKee also hosts the international award-winning podcast, The Reality Check, a top 0.5% podcast on Listen Notes with over 4.5 million downloads. Learn more about him on his website, darrenmckee.info, or follow him on X / Twitter at @dbcmckee.

SPENCER: Darren, welcome.

DARREN: Hi. Great to be here.

SPENCER: People may not think of imagination or the limits of imagination as an important topic. But why do you see that as very important?

DARREN: I think it's taking a really broad step back. I'm using imagination very generally here, so related to abstraction, or how we think about concepts that aren't quite in front of us. And in that sense, imagination allows us to think about what might happen in a wide range of situations. And as such, if you're thinking about what you might do tomorrow, or what events might happen in the world, imagination is really fundamental to thinking about what could be different from what is.

SPENCER: Okay. And so why is that so critical? Why is missing on potential futures a big deal?

DARREN: Well, in our current age of a lot of AI news, I was seeing it through the lens of what we might imagine AI being capable of, or what might be happening in that space. And I think when you have a lot of discussions with people, either imagination, failures of imagination, or even perhaps overactive imagination, these things are underlying why people might feel a certain way. I've realized, if you have discussions, people often have some cruxes which are buried deep. And I think related to those, or even below those, is the nature of imagination.

SPENCER: This makes me think about situations where maybe you have a village that's near a lake, and the lake has never overflowed in the memory of the people. And so people don't even have that in their mental model that, one day, the lake could overflow because they've never seen it. And then finally, when it does overflow, suddenly, people are, "Oh, man. How could we never thought to protect against this?"

DARREN: Right. That's a great example. So it's especially useful for things that have never happened before. But it's also relevant for things that have happened before. Depending how old you are, many people have lived through a lot of very safe, famous shocks, whether it's the financial collapse of 2008, 9/11, recently, the COVID pandemic, these sorts of things. And so, people, when asked, "Could this happen again?" And if you prompt them, they'll probably say, "Yes, of course." But I think in our default setting, getting through the day, focusing on our normal goals, we don't quite think about it. So when prompted to consider a possibility of AI doing this, or maybe even another global pandemic, the availability bias for our brain kicks in and we reach for something immediately. And because it feels something is plausible, or is not plausible, that I think kind of anchors our general belief, as opposed to thinking, "Let me think through this in more detail, and why might this be."

SPENCER: I was talking to someone yesterday, actually, who works on pandemic prevention. And he was saying that, although there was an initial surge of energy around the pandemic, obviously, people really worked hard to combat it. Even now — not that much later — the interest in it has greatly waned and the amount of funding for it (you might expect it to be there) has greatly diminished. You might think, "Well, we live through one pandemic. Shouldn't we be preparing for the next one already? Shouldn't lots of donors be putting tons of money into this?" But he was saying it wasn't the case. So I was kind of surprised by that.

DARREN: Yes, it is both surprising and saddening. And that's a great example, because it's not a 'what if,' it just happened. And yet there's something about our cognitive architecture and/or interacting with different incentives in society that is precluding people from doing what seems the obvious move. Before the pandemic, yes, there were experts and epidemiologists saying, "Look. It's just a matter of when not a matter of if." And then it happened. And so you'd hope everyone kind of updates towards thinking, "Okay, whatever I thought the likelihood of this thing happening (a pandemic) is now higher, because it just happened." Again, there could be details that if it just happened, it's less likely, but there's no evidence for that, because we haven't done much more to protect ourselves. So I think that's a great example. I also think it's very useful as an example of large, dramatic global events that can happen in a relatively short period of time that almost no one sees coming. So in my book about AI, "Uncontrollable: The Threat of Artificial Superintelligence and the Race to Save the World," I mentioned, "Imagine it's December 2019. You're with your family over the holidays. You're trying to relax. You've tried to pick up some reading. You've left because you've been so busy. And you pick up the Economist: The World in 2020. This is the magazine The Economist puts out every year — usually later in the year — and it talks about the year ahead. So because we're in 2019, this was the 2020 version. So these are detailed analytical magazines. They're supposed to describe all the major events, political, technological, economic, that are going to happen in 2020. And not once is COVID mentioned. No COVID-19, no pandemic, nowhere." And this is by a professional, dedicated team of people who are supposed to understand these things. And they missed it three months out. And of course, some people, in January, were still denying it, or February, or March and some people even after the fact. But generally speaking, you have the situation where a very important global event can happen within a very short period of time and catch almost everyone off guard. And I think that is a general lesson worth taking on board.

SPENCER: So when people think about AI scenarios in particular, where do you see failures of imagination? Is it that they can't actually concretely visualize what bad things might occur?

DARREN: I think that's definitely part of it. I think with AI, it's a tricky thing. What is AI? Certainly, it's something computer-based. It's software in some ways and not another's. It's kind of this vague, amorphous blob that is many things and not one thing in particular. So without having a handle on it, it's not a physical phone or a pandemic — which is a virus, which is a little bit more tangible, even though it's microscopic in many cases — it was not a fire, not an asteroid. It's sort of, "What's going to happen here?" And I think with AI, it's the rate of change that's pretty shocking. If you look at the slopes and the curves of various graphs, you get some sense of just how fast things are going. But that does not intersect well with how the human brain normally processes and understands information. So it's really hard to get that rate of change angle, even if you can be told and explained, "Look, it's happening faster than you can fully understand." You are like, "Uh huh." But because of your sort of visceral sense, your natural default will say, "Abstraction or imagination of what could happen doesn't serve you that well." There's sort of this temptation to go from — again, not consciously — but, "I can't imagine it, therefore, it isn't so or can't be so." And this is really the argument from personal incredulity that because you can't imagine it, it can't be. Of course, again, when pointed out, I think most people understand that this does not make any sense. But again, normally, as we go through the day, you don't necessarily check yourself, and so you end up with these beliefs that aren't necessarily tracking reality.

SPENCER: I think there's a double-edged sword here. Because if you paint really specific, vivid AI calamity scenarios, they're very likely to be wrong. You're kind of picking out one possible example from this huge space of possible bad things that could happen. And so then, people can fairly criticize that one example and say, "Well. It's probably not really going to happen like that. And why did you choose that example and another example of what's going to happen?" On the other hand, if you don't paint a vivid picture, then you're dealing with abstractions, and people can have a hard time really imagining what you're talking about.

DARREN: I think that's an excellent point. And it's something that I touched upon briefly in the book where it is a delicate balance. And I'm very sympathetic to anyone, say, outside the AI safety space, who says, "I don't know what's going on. I don't know why you guys are saying all these things. Give me something more concrete." And you're right, the more concrete details you put towards something, the more incidents/things have to coincide, which makes it less likely to occur overall. But at the same time, if you don't give someone something to hang their sort of mental stepping stone on...they need a path, and you can't just say, "Well. This thing will be smarter than everyone. It can do anything." Well, maybe. But that really doesn't help someone understand the power of it. So I think we have to try to find a balance, which is even more difficult when, of course, you're talking to diverse audiences, where some people would like more concrete details, and some people would like less. And invariably, something is gonna go a bit awry or suboptimal for some audience. But I think it makes sense to put multiple messages out there, hopefully to reach multiple people. I think the chess analogy is useful. It's the short and quick one where if I played the world's best chess computer, or chess playing program, I'm going to lose. Exactly how? We don't know. It's just clear that I would lose. And I think that helps. But I know it's not fully satisfying to everyone.

SPENCER: Does this suggest that for things like pandemics or risks from AI, art can be helpful, whether it's filmmaking or other things that kind of speak in the language of stories?

DARREN: I think so. It might be hard to know exactly in which way though. I feel there are different demographics, or people involved in these discussions where some people watched the Terminator movies three decades ago. And they're like, "Yeah, of course. It's a threat. I already agreed." And it's not that they have that many reasons, aside from, "I saw the movie, therefore, I agree with it." And alternatively, people think, "Well, I saw the movie, therefore, it's fictional. Therefore, it's not much of a problem." I think for the average person who is not going to be reading papers, or following this stuff in detail, the imagining of Sci-Fi, or the stories we've been hearing for decades has primed us, probably more than most other types of events, for something that could be dramatically gone awry. But I would hope there's the follow up nuance that says, "Okay, look. Yes, this does point at a broader risk or concern. But we don't have as much definitive information in one or two ways, or this other part of it seems completely unrealistic, compared to something a bit more plausible."

SPENCER: Examples of Terminator and AI is interesting, because for years, when you see an article about AI, there'll be some kind of reference to Terminator either the glowing eyes in the image in the article or some mention of Terminator. And in a way that gives people something to anchor their kind of imagination to. But also, in a way, it's a silly movie. I think it's a fun, enjoyable movie to a lot of people. But there are elements to it that's just completely unrealistic. It has time travel elements, and there's the good AI and the bad AI and so on. So because of that, it's like, well, we have to be careful, I think, about using art to anchor ourselves. But on the other hand, art can really get people to understand something better, and to feel more invested, and to have more concrete depictions of the future.

DARREN: You got to be careful with these movies and who's the accent of a robot [laughs]. Yes, I think we have to be a bit cautious. I went out of my way for the cover of my book to not have a big robot head or a big robot glowing eye. And it's very difficult to show abstract or kind of less tangible depictions of AI or computers that could be a risk or harm without a robotic imprint, because that's what people recognize the most. That said, depending on what you're trying to convey, if someone thinks AI broadly as technology, which relates to machines and robotics, and they just get the message that this is dangerous, that might be passable. It's not ideal. Ideally, there'd be a more graded and nuanced understanding of the whole thing. My book doesn't really deal with robotics at all, because I think it's not necessary for AI to be a risk, although it could obviously be one of the instantiations of it. But I think it could be the quickest way to get there, but maybe a bit inaccurately.

SPENCER: When it comes to these broad risks to society, it's interesting to think about whether forecasting tools work. Take nuclear weapons. If we were to use standard forecasting tools, we could try to look at, "Okay, well, how often are there events that could lead to nuclear disaster and so on." But there's really very little data. There are some incidents, but they're not that many. It's hard to build statistical models. There's so many factors that are hard to model. And so it seems that we know that there's a risk that nuclear weapons are used, but it's very hard to forecast or to use statistical methods to study it. It seems the same thing might be true of AI. There are certain aspects of it that you can model or forecast. For example, you can model how quickly chips are improving in terms of the amount of computations per second they can do. And you could model maybe how much spending is increasing over a short period of time. But it may be harder to model some of the things that matter most with regard to AI. So I'm curious to hear your thoughts on forecasting in AI.

DARREN: I think it's an important issue. And it relates to the broader concern of making decisions under uncertainty. So we are faced with a world that has a lot of complexity, a lot of uncertainty, a lot of things that are very difficult to know, even if they are quite knowable. And as such, we have to figure out ways of making decisions. It's something to think you can kind of just hold back or be agnostic. But it doesn't really work that way. By not making a choice, you're kind of making a choice and agreeing with people who think you don't have to do anything. So with nukes, it is a good example that has some parallels to AI, maybe more so than climate or something else. Because in climate, we have decades or centuries of data or even longer, depending on what data sources you're looking at. To understand how things might work, we have projections, various sophisticated computer models. With the nukes, as you said, yes, there have been detonations, both in war (not that many, of course), and just a couple of tests here and there, but not a lot of data to go on. And of course, it also involves a lot of complicated human dynamics (why don't we put it that way?). With AI, at least we're talking about advanced AI, if something like Artificial General Intelligence or on its way to Artificial Super Intelligence, it's something that's never happened before. It's literally unprecedented. And as you said, usually when you're making projections about specific forecasts, you take in data that has happened before, and you project forward, "This is how the insurance industry works." If you're thinking about cars, home, health, whatever insurance, they have some sense of the likelihood based on various demographics and events in the world to make some estimate of the likelihood of certain events happening again. With AI, everyone is kind of guessing. Now, that said, I think some guesses are better than others. But I think it's better to really lean into the uncertainty we're all facing here, that we don't really know what's going to happen when. So yes, there are, as you said, graphs which indicate investment, patents are increasing. Investment is generally increased, though it dipped a little bit, but maybe not for the frontier AI models. And when we think about all these things, it looks like computational power will increase, as will capabilities. But exactly to what degree, in what way, and in what timeframe, we've got a lot of uncertainties there.

SPENCER: So given these challenges, how might we actually make decisions under this great uncertainty?

DARREN: Indeed, I think it's one of the biggest challenges of our time: to try to figure out what to do about this AI thing. And in the book, I have one chapter dedicated towards AI timelines, and another one dedicated towards AI risk. And I think it was useful to separate these out. The risk is not the same as the timelines, and to try to figure out what someone might agree with and disagree with. Regardless, in both situations, I try to take people through the fact that it is unprecedented, and we can't quite be neutral. And so what should we do? And I'm trying to think of some way to have some sense of thinking about this. And one of the approaches is to use different signals or indicators of timelines or risk. And so this would be surveys by experts, like the AI index or a Katja Grace's survey, or then you have signals from luminaries or key figures in the field and what they might be thinking, both pro and con, of course. And then you also have Metaculus with its online forecasting projections of when something like AGI will arrive. Depending if it's the strong or weak version, it seems to be 2030 or less. And none of these are definitive. And that's kind of the problem that we face. Unfortunately, you can't just say, "Oh, good. This is reliable. This is trustworthy and an absolute sense. And I'll just believe what they say." You just kind of have to piece it together. And there's a lot of wiggle room.

SPENCER: So when you talk about separating risk versus timelines, what that makes me think about is, we can ask a question like, "If AI reached different levels of capability, how risky would it be?" and keep that separate from the question of, "When will it reach that level of capability?" Is that what you're referring to?

DARREN: Yes, I think that's a good way of phrasing it, where I want to understand why someone might think the way they do about AI, advanced AI, and its capabilities. And I think it's very useful to separate out when they think a highly capable AI system will arrive, and when it could cause harm. Now, of course, these are very much related. But they don't have to be obviously related in the sense that maybe someone thinks a highly capable AI will arrive very soon. But if you've asked a question such that some AI that's highly capable will cause harm in the next five years, they may disagree with that. And you don't realize that they think that capabilities broadly are going to be there soon. They just think it's not going to cause harm. So by separating this out a bit better, I think we can get a better sense of where people stand and what evidence they may be open to.

SPENCER: When people bring up the AI topic with me, who maybe aren't as familiar with it and they want my perspective, I actually often start by talking about risks, not timelines. I start saying, "Well, imagine there was an AI that could do this job as well as 99% of people doing that job," and then talk to them about, "Well, what do you think would happen in that scenario? What do you think the implications of that would be?" And then, "Later, we can broach the question of, "How far away are we from that? Will that happen in the next 10 years? Will it happen in 100 years or 1000 years?" And I actually find that that makes it much more fruitful. So I'm sympathetic to your kind of division of those two.

DARREN: Yeah, I think that's the right way to go. Because the concepts that are related to the AI issue or AI safety in concern, some of them are specific to AI, but most of them are not. These are general issues where people respond to incentives. There's complicated competition dynamics. There's a lot of uncertainty. These things are happening in many different domains, if not almost all domains. And so, an approach I took in the book, which is really trying to bring the AI safety issue to everyone — you don't need a background in it whatsoever — is sort of a meta point, so bear with me for a moment, where I would think about the discussions that occur in the AI safety space, and why people might have certain concerns or whatnot, try to extract the key concept that might be an issue, and then bring that to the front of, say, a chapter where I'm then trying to explain it in general terms. And so an example could be, say, before you have people come over for a party, or some sort of dinner event, you might tidy your place. You clean up a bit, and some people, they vacuum, they dust, they do this and do that. But then, you find you're running out of time. And so, you realize, "Okay, I gotta make things look a little bit nicer." So you stuffed some things in a drawer or in a closet. And in a way, this is reward hacking, as they say in AI safety. What you've done is given the appearance of cleanliness, without necessarily fully achieving it. And so these sorts of things, that people respond to incentives in a perverse manner, we all know this, we all do this. And in that sense, people can better understand that AI does it too.

SPENCER: Are there other ways you'd suggest thinking about risk and uncertainty in the AI space?

DARREN: Yeah, so we talked a bit about forecasting, like a weather forecast where there's usually an event that it's going to rain with a probability of 80%. But I think another thing that's very useful is foresight. I'll explain a bit what this means. I was a foresight analyst for a while. And I think it's very useful as a mental tool to try to explore different plausible futures. So with forecasting, and often online prediction tournaments, you're trying to make a specific prediction about the nature of the world. This is what miraculous does: This X thing is going to happen on Y date. When will a certain type of advanced AI arrive? And there's a specific probability and date. And that's great. I think we should be having things like that. In addition, though, there's this opportunity to kind of, again, open up our imaginations to think about what could be without being so worried about what definitely is. So with foresight, typically, one is concerned about plausible futures. It doesn't have to be what's probable, just in the broad sense of what's plausible. There are many different types of foresight exercises, and one could say there's this Five P Rule which I'll explain. So you talk about plausible futures, probable futures, possible futures, or maybe even in some senses, preferred or preventable futures. And in each sense, you can imagine Plausible is what seems like it could happen. Possible is even more of what could possibly happen. And then you have Probable, the most likely. And then preventable is what you don't want to happen. And Preferred, of course, is what you do want to happen. Okay, say imagine AGI arrived in 10 years, just take that as an example. What would have to happen for that to have happened? But now imagine it was 20 years. And now imagine it was 30 years. What happened or didn't happen to get to that point? The whole point of all these exercises is to really start challenging your own assumptions about what you think the world is. And to, again, aid with imagination one way or another.

SPENCER: It reminds me a little bit of the pre-hindsight exercise where you're planning a project and you ask yourself, "Okay, suppose that in five years this project has failed, what do I predict the reasons will have been?" And this is kind of the reverse. Suppose this thing will actually happen in five years, what are the ways that it actually got there?

DARREN: Right. I think that's great. I use it. I know it by the term, pre-mortem versus post-mortem. Why did this go wrong? Why did it not? But yes, it's really just trying to get at: that we don't quite know what is going to happen. So let's try to think through the different ways the world might be to then better prepare one way or another. And with AI, you can, again, think about, "Well, if such a certain entity, AGI or artificial general intelligence, arrives in a certain time, what should we have done to prepare for that years in advance one way or another? Which then intersects with risk of course. If such a thing is very dangerous, and even if they think there's a one to five, maybe 10 percent chance of great harm, that will also affect what one thinks and one should prepare for.

SPENCER: This reminds me of the kind of analysis I've heard done with terrorism, where you'll have, let's say, analysts look at scenarios, "Okay, suppose that there's an attack on Washington DC, what do we predict the most likely types of attacks?" And then they kind of start breaking down scenarios. And then they start considering what are improbable types of attacks, but maybe it could be done and so on. Is that an analogous strategy?

DARREN: Oh, certainly. I would just see foresight, in a way, as just one type of scenario planning. So if you're already on board with scenario planning, these are just different ways to get at similar things. You could do a mix of foresight and forecasting. Obviously, data is very useful. With foresight, also, one often tries to do first, second, third order events. For example, when DALL-E, the first version, made a bit of a splash a couple of years ago in image generation, you think, "Well, okay, so say this thing catches on, what might happen?" Well, more people can create art. And then okay. Then you say, "Okay, well, what if that happens more?" Okay, then if more people create art, maybe there's just more art, or maybe there's less employment for artists, because more people are doing it and it's easier. And then you think, "Okay, what if that happens?" And you keep trying to play it through with all these different implications. Again, not being totally concerned about whether that's definitely the case, but really trying to explore the possibility space, as they might say, "What are the different possibilities?" And then later, you might be able to go through it with some data or reflection or further analysis to think, "Well, do we have to develop policies or other things in place to try to, again, create a preferred future or to try to, at least, reduce some risks in some way?"

[promo]

SPENCER: So one concern I could see someone having about this kind of approach is, it seems like you can generate so many scenarios, and yet, it's so hard to assign probabilities to them. So how do you actually begin to take that next step of saying which ones are possible but very unlikely versus which ones are probable?

DARREN: That is a great question. Because when you start to explore what's plausible, if you have an open mind, very little ends up not being plausible. One of the examples I give in the book is: imagine it's 2006, and someone says to you, "Okay, I know marijuana is a Schedule I drug in the United States that imprisons thousands of people each year." I'm going to say, "In 15 years, some government is going to give out free marijuana so people will get vaccines during a pandemic." That sounds ridiculous. But that's what happened in Washington State. So you can imagine a similar projection. "It's 2033, or 2035, and I'm gonna say someone's gonna give out free mushrooms, so then people are more likely to download free antivirus software, because there's a global worldwide cyber attack." It does sound a bit absurd. And yet, why wouldn't that be seen as plausible? To your question, though, it's a tricky business. I don't think you can have a detailed structure of what exactly to do. One might just have to look at history, look at the trend lines, and then think what are the most likely outcomes to occur. Certainly the complication with foresight is it doesn't lead to obvious policy recommendations. But it does help challenge assumptions. As I said, I think the main utility there is that sometimes people don't even realize the biases they have going into a situation or when they're imagining how the world might be. And if those can be addressed or further reflected upon or analyzed, I think there's a lot of value there.

SPENCER: When you go through an exercise like this, I imagine you produce a lot of different scenarios. Once you've done that, how do you feel? Do you feel actually more confused? Because now you're like, "Well, now I see there's so many possibilities I hadn't even considered." Or do you actually feel like it gives you confidence?

DARREN: I'd say it's a bit of both. I also think it depends on what you're trying to achieve. So you could imagine, there's a foresight exercise, which might be, say, an afternoon. It's two hours, or maybe it's three hours in the afternoon. And you might end up thinking, "Oh, gee, I know a lot less than I used to," which I think cognitive humility is generally good. And then again, there could be an exercise which occurs over multiple days or a week, or maybe multiple months, as some scenario planning occurs. And I think then, one would shift from, "Gee, I don't know anything," to still kind of feeling that way, because there's still some good reasons. But building back up that some situations or scenarios definitely seem more likely than others, or at least more able to interact with to do something about to address, prevent, or encourage, depending on what one's goal is. I think always keeping in mind some cognitive humility is probably a good idea. And if foresight helps with that, that's good. At the same time, yes, true paralysis is not good. And if you feel like then, "Oh, I don't know I can't do anything." Again, that's a misleading stance. It feels like it's somehow justified because of such great uncertainty. But by doing nothing, you end up agreeing with the people who think we shouldn't do anything. It's this unfortunate forced move by the world that just doing nothing doesn't really work.

SPENCER: What do you mean by that? That doing nothing doesn't work?

DARREN: Well, certainly, it's an allowable action. I guess what I'm trying to say is that the justification for doing nothing, like agnosticism, saying, "I won't agree with anyone." That doesn't quite work, because your behavior ends up agreeing with the people who think we shouldn't do anything. It's a nuanced, tricky business there with AI. Some people think there's going to be no problem, we shouldn't do anything." And other people think, "Oh, gee. This thing is coming in a freight train. AGI or something more threatening is going to be here in two to five years, we better do something immediately." And you could say, "I don't really know. I'm just not going to do anything." But then you've implicitly inadvertently agreed with the first group who says, "We shouldn't do anything." Ideally, there's reasons for doing one thing or another. And I'm not saying I feel 100% confident in the stance. But I do think there's something there that highlights a complication where neutrality is not what it seems.

SPENCER: Yeah, on the one hand, default bias seems like a real bias, where sometimes we just do something because it's the default, even though it may not be a good strategy. But at least it's the default. On the other hand, given limited attention, given limited effort, default bias makes a lot of sense. Because you're like, "Well there's so many things I'm not going to be able to put my attention to, or I'm not going to focus on. What else am I gonna do other than do the default on those things?"

DARREN: And that is fair. There's a lot of things going on in the world. And we can't all pay attention to all the things, and we can't all engage in meaningful action about all the things. So I don't want to be condemnatory. But I do think that there's a difference between statements and revealed preferences. And again, I'm not saying this is 100% sensible, but I do think there's something there about, "Well, whatever you say, your behavior supports a certain policy outcome." And that's where it gets complicated. Where, yes, it may feel like I don't support any policy outcome. But then behaviorally, because you have voted or not, or you've tried to engage politically or not, you end up supporting one type of outcome or another.

SPENCER: So suppose strategic foresight was used at a much larger scale on risk from AI, what would that look like?

DARREN: I think it might help the general nature of the discussion, where at the moment — and I mean to be fair, I understand this — in the media cycle, usually, you need a snippet where someone says, "There's a 10% chance of extinction in the next three years." Or, "There's a 90% chance," or "There's a 0% chance." This is what sort of gets extracted out in terms of a tweet or a public conversation. But there may just be a more general understanding of the complexity of the situation, where, again, if you talk to someone for an extended period of time, they probably say, "Oh, yes. There's uncertainty all over the place. I'm not quite sure about XYZ. And I could see a range of plausible scenarios. And it would be nice if everyone could agree on that general frame." And in that sense, we may be able to move the debate a little bit further along in terms of acknowledging the uncertainty. Understanding that different outcomes are plausible, it's not really binary, it's not one or another, and then maybe have different policies in place. Now, of course, there's still going to be disagreements, and there's still going to be people who feel very strongly about one thing or another. And if that's their opinion, that's fine. But I think there's maybe a larger group, which might be more willing to say, "Okay, well we don't really know. And if we don't really know, what should we do about that?"

SPENCER: I think this is a major point of contention, where some people say, "Well, we don't really know. So the people who are saying that this is really risky, or we should make massive movements to try to protect ourselves from it are being overconfident, and they're over claiming, and therefore they're not credible." Whereas on the other side, people say, "Well, we don't know. So we have to take precautions. If there's even a reasonable chance of this happening anytime soon, or even in a century, we should make sure we're prepared."

DARREN: I think yes, you've hit the nail on the head. This very complicated situation where one errs on one side or the other. Now, I think generally, it's reasonable to be sympathetic to most positions in this space because there is a lot of uncertainty. But I think I would like the nuanced conversation to be exactly, "So what are your timelines exactly?" When you are talking to someone about this. When you look at the projections of computational power, when you look at what has happened and say in the past two or three years in terms of advancing AI capabilities, is it really that unlikely that in the next 2, 3, 5 years we might get certain other more advanced capabilities, more threatening capabilities?" So that's kind of one part of the discussion. And then there is this kind of this meta point, "Would you rather be over prepared or underprepared?" And then we can have a specific conversation about that, which also relates to the risk component, "Does the world usually take multiple years, if not decades, to organize itself to address a problem?" I think the answer is yes. And evidence points towards that. Now, you could say, recently, a lot of movement on AI safety has been absolutely fantastic. But earlier in our chat, we mentioned the pandemic and we just had one, and we're not in any better position to deal with the next one. So that really indicates to me that we need a lot of time to address a problem, even if it's 10 or 20 years away. So while there may be differences between the two years and the 20 years, if it's going to take 10 or 20 years to start addressing these things, we need to get started now.

SPENCER: Just as an aside, do you actually believe we're not any better prepared to deal with the next pandemic? I'm not sure that I believe that you believe that.

DARREN: I think it's mixed. I think it depends on which country you're in and how people react. I think the idea that most people would not take a free vaccine, which is generally effective, was surprising to a lot of people. And if it happens again, it may be worse than when it was the first time because people are already anti-vaccine in ways they may not have been before. There's a lot of complexity and nuance there. So I'm not saying we aren't any better prepared. Hopefully, we've learned some lessons. But it's also the case that there could be other forces at play that make people even less receptive to various public health measures that would be useful if another pandemic happened.

SPENCER: Got it.

DARREN: I think there's another variable at play — which you see often comes up — which is the idea that if people are concerned, say, about existential threats from AI, that means maybe they're not concerned about current problems in AI, like algorithmic bias, or concentration of power, or privacy issues, and so on. And this is just a really unfortunate situation. Most people that I interact with care about both. It's just a matter of maybe where there's some emphasis. It's also the case that a lot of things that would help one of these would help the other. I think, say, mechanistic interpretability, or more generally, understanding how and why AI systems do what they do and the decisions they make, helps a whole range of different problems from the present to the near term future risks. And I wish people could kind of come together and realize, we kind of got to work on both of these things. And it's okay, and we have to care about many things.

SPENCER: One thing that I find disheartening is it seems like a form of tribalism is beginning to crop up in the AI space, where they're sort of different teams. There's the team, "Let's fight near-term threats." And that team might feel a bunch of money and attention is being pulled away from the things that they think are really important by people who are talking about existential risks. And then you've got people talking about existential risks, who feel they're being thwarted by people who are saying that their ideas are silly, because they can't prove that this is actually an existential risk, and they're getting everyone terrified over nothing. And then you have top AI researchers taking different sides in this. And it just seems really bad because it's becoming a team-based thing rather than a, "Hey, we're all in this together. Let's all discuss how big different risks are, and each comes to our conclusion about how big the different types of risks actually will be."

DARREN: I fully agree. And I'm seeing some of the same phenomenon. And it's concerning. It's unfortunate. I guess, in some ways, it's predictable, but it didn't have to be. I also think, probably, most people, again, in the space are okay with this. But the more that different team dynamics formed, the larger those teams happened to get. And, again, if you take a step back, we have to care about the environment and trying to stop cancer and AI issues, whichever ones they may be. I still buy anti-malarial bed nets, and I care about AI x-risk or extinction risk. And so it's odd to have this, when something's of a similar type, then it does really seem in competition with it. But in the broad sense, every problem is in competition with every other problem. So we can validate and say, if one person is advocating all the money should go to extinction stuff and nothing to algorithmic bias, yes, I can see that if your main concern is algorithmic bias, you're going to feel displeased by this and argue against it. And the same thing in the reverse sense. I do think though, again, a lot of the policies that could help one thing could help another. So again, the hope would be something like, "Okay, if you're in Team A, (I'm not gonna say which one it is), what are 5 to 15 policy proposals you want to see to improve AI in the broad sense AI safety or whatever it is you care about? And then for Team B, same thing. Give me 5 to 15 different proposals." And then you see which ones overlap. And then hopefully people can come together and work on those together.

SPENCER: I think another challenge with the kind of teams forming is if people start reading bad intentions into the other teams. It goes beyond, "Oh, this is just an intellectual disagreement we're having," to an actual negative sentiment towards the other group or feeling like the other group is sinister or insincere.

DARREN: Yes. You definitely can find examples of that on both sides. How representative they are, I think, is something we also have to be careful about. Social media amplifies certain things, especially different team dynamics, so it may be the case that it is true that many people feel very much aggrieved and see bad intentions from the other side. Maybe it's not. Maybe it's 10 to 20% of each side, if there are sides. And so I think that's something to keep in mind. And as usual, if you talk to people one-on-one for a longer period of time, usually they have a whole range of concerns. And they then can express those with nuance and sophistication that doesn't usually allow itself for social media posts that are shorter, or clips that get cut out and taken out of place, perhaps sometimes in broader media coverage.

SPENCER: So we've been talking about potential risks from AI. You named your book 'Uncontrollable' which is interesting, because it suggests a kind of certain take on the dangers of AI. So tell us about why you named it that, and how do you think about control being central to the problem?

DARREN: That's the title because I wanted to sort of get attention on the severity of the problem. There is a world where I could have put 'uncontrollable' with a question mark. But I think all knowledge is probabilistic. So there's some uncertainty no matter what, and we're trying to deal with whether the AI issue is that we're creating something uncontrollable. And I think that was the quickest, most direct way to communicate that idea. The subtitle is, "The threat of artificial super intelligence and the race to save the world." As you know, most of the titles are a bit more explanatory. But it is the threat of; it's not the definitive likelihood of. It's that there's this threat of an uncontrollable thing. For the book, often in the AI space, when we talk about alignment — that's whether an AI system might have our values or implement goals or pursue tasks in a way that align with our values — that's usually a bit separate from control. Now, some people put these together, and some people separate them. I chose to separate them just for ease of communication. Alignment: I see as, does an AI do kind of what we want in the way that we want? And control is, can we stop it if it doesn't?

SPENCER: That's interesting to break it down that way. So if we talk about alignment, the idea of, "Does it do what we want in the way we want?" I guess the argument might be that, "Well, if it was misaligned, but we could control it, it wouldn't matter so much, because okay, maybe there's some damage but at least then we're gonna stop it." Whereas if you have misalignment with the lack of control as well, so we can't actually stop it, that's when things get really, really bad. Is that the way you think about it?

DARREN: That's great. People can't see it, but I'm smiling. Because I think fundamentally, the concern about AI or advanced AI consists of several different things that all happen together, and must necessarily happen together to some extent, for there to be a problem. As you said, if an AI system was aligned, and in a broad sense, let's assume it doesn't cause problems. We don't really have to worry about control. If we could control it, we would have to worry less about alignment. If things weren't happening so fast during the development of capabilities, we wouldn't have to be as worried about alignment or control. And then with risk, it relates to things being a risk because we're not so certain about whether we can align it or whether we can control it. So all of these pieces are coming together somewhat necessarily to create a larger problem where if you solve any major one plank, you kind of address the other ones. To your particular point, though, in a way it has to do with just ease of communication. In the book, alignment is kind of split into two chapters. One: a very simple introduction exploring Isaac Asimov's Three Laws of Robotics. It is just showing that simple rules don't really work. And then another chapter is more focused on what we say the alignment problem is in the AI safety space, whether something will kind of do what we want in the way we want, and that talks about accident, misuse, or power seeking; that sort of thing. And then control is its own chapter. But if it was all one chapter, honestly the chapter would just be too long. It'd be a bit unwieldy. So structurally, it made sense to split these things out, even though you could see well a lot of power seeking behavior. Whether an AI tries to manipulate humans or make money for itself or replicate itself, that does, of course, relate to control, but I split them out for communication purposes.

SPENCER: We'll get more into alignment and control in a moment. But one thing that doesn't seem to address that way of looking at the dangers of AI, is the idea of concentration of power in the hands of humans. Some people worry about AI not so much because they think it's going to be sort of a thing that does its own behaviors that we can't control and are misaligned with our own values, but because they think it's going to do what people want it to but the people who want it to do those things will use it to kind of wield an unreasonable amount of power. So you could imagine, for example, a company having control over a billion really smart AIs and using it to basically control the whole world, or an individual person using it to place their values across the globe in a way that's kind of dystopian. So I'm wondering, do you get into those kinds of scenarios? Is there a reason you don't worry about those as much?

DARREN: Oh, no. I do worry about them, and stuff like that is mentioned in the book. It is part of the overall alignment or control issue. I wouldn't say it has its own dedicated chapter. But I think anytime you talk about having an AI system that's aligned with our values, you're like, "Well, whose values? What does that even mean?" Number one, as an individual, sometimes our preferences change over time. Most of us feel very differently in the world when we were a child compared to now. Or even as an adult, our beliefs change. And then you think about within a country, there's huge differences of opinion. And you think across many countries, you think about certain people who are running AI companies versus people who are affected by them. So these are all very, very important issues, where I didn't have time to explore everything, but I kind of see there's multiple alignment problems. And even if you solved one of them — so-called the technical alignment thing — you still have this other huge problem about concentrations of power, and who is going to make these decisions. No one elected Sam Altman of OpenAI, or Demis Hassabis of Google DeepMind, to kind of decide the fate of the world. And yet these people seem to have a dramatic, outsized influence on what might happen. It's a very peculiar time that we're in. But I do think it really makes sense to think about who has the power to make these decisions and how they will be implemented.

SPENCER: So you can kind of think about it as: there's alignment in the sense of aligning an AI with what anyone wants, what the creator wants, which may be a hard problem. And then there's this additional hard problem of aligning it with what humanity wants, which may be different from what the creator wants/

DARREN: Oh, very much so. And to try to get at the issue of alignment, we could think of, have you ever tried to order a pizza with four people? And we see that people have different preferences. Someone wants pepperoni, someone doesn't want any meat, someone only wants cheese on half, someone wants a Hawaiian. And that's just for people. And then you think, "Well, where are we even ordering from?" "Oh, I don't want to order from that place or something else." And this is a kind of a silly example. But it doesn't get any easier when we think of broader notions of values and ethics and morality. And so, yes, there is no obvious solution to trying to solve moral philosophy in the next couple of years as the AI systems come online, aside from doing what we've always kind of tried to do, which is talk more with each other, try to have more facilitated conversations and democratic spaces, and hopefully we can muddle through again, as we often have, but don't want to bet on that.

SPENCER: Based on your own experience, what do you think people are most likely to misunderstand about the problems of alignment and control?

DARREN: Regarding alignment, I think people who aren't in the space might think that machines won't do something kind of stupid from our point of view. There's a famous example of I think it was a play-fun algorithm that was asked to play Tetris and to not lose. And to play Tetris, usually you put the blocks in a line to make sure they don't pile up to the top. Well, the algorithm just paused the game indefinitely, so it never lost. And that's not something a human would typically do. But it's something an AI would do, because it's within the allowable parameters. And most people don't think of machines as sophisticated, like lawyers looking for loopholes. But that's how they often appear to us in certain ways. So I think that's one aspect of it. I think people don't realize just how powerful and integrated these systems are becoming. And that as it relates to a notion of control. Again, it's not that people don't have any exposure to these concepts, I think it's just helpful to remind them of how we end up losing control of things. So I talked about how much control you have over your phone. Well, you chose the phone, and you use it too, and so it is your phone, that's a lot of control. But if you start really breaking it down, like well, could you choose not to have a phone. Well, you could but the cost would be very high. Or could you choose to have a phone without accepting so many user agreements? Not really. Could you use a phone without connection to the internet? Again, not really. And you start to sort of break down that. Wait a minute, how much control do I really have here? And in that sense, AI, as it becomes more integrated into our lives. I think we're gonna have less control, even though we initially willingly brought it on board.

SPENCER: Could you tie that back a little more into AI? How does that apply in an AI context?

DARREN: Sure. So that's just one particular example. Social media might be another, where I think the main reasons we're kind of plausibly again lose control over AI. It's not definitive, but one is that: we won't want to control AI. We're gonna like the services it brings. There's huge financial incentives for AI products integrated into different things like Office suite, products with Word or various technologies, AI assistants, military applications, government tracking software for good and bad reasons, all these sorts of things create all these enormous incentives to develop AI more and more. And so in that sense, most people are going to be on board with a lot of the benefits that AI brings. If you use the image generators, a lot of people think they're amazing. Now some people don't. But for people who aren't that artistically inclined, they're delighting in all this stuff. So there's a net positive. And there's many other different applications of like, "Isn't this thing amazing? Look what I can do." So this is kind of, to me, the first stage of things. But once AI gets more and more integrated, it kind of becomes like your phone or social media or perhaps the internet, where it's not that easy now to back out, so to speak. The world has restructured itself in a way around the technology, and you can't easily shut it down without huge costs.

[promo]

SPENCER: When I talk to people about AI and potential risks, people who are less familiar with the topic propose what I think are reasonable solutions to AI alignment and control problems, from the point of view of someone who's never really thought about it. But then, once you've thought about it for a while, you realize that these are not actually reasonable solutions. So one of them that comes up a lot is just, well, why can't we monitor the AI system? And if we notice it doing something bad, just turn it off?

DARREN: Yes, I cover these things as well. One of them has the heading, "Can't we just shut it off? Can't we keep it in a box?" And these sorts of things. You're right, these are the knee jerk impulses that seem reasonable. And I think maybe going back to the internet's a useful example, that right now the internet is so integrated into our society, and it is built to be robust. People don't want their internet to go down. Global commerce depends on it, as do many other things, from municipal infrastructure, to how hospitals function, all these things very much use the internet. And so if the internet could be shut down, it would be a huge cost. And we don't have any means to do that. There's no internet kill-switch. And so I think you'd have to imagine a world where different communities, businesses, countries all agree on the criteria by which they would shut off the internet, and then have that implemented in a quick manner. And that seems very implausible. You also could imagine if there was a kill-switch for the Internet, that's a huge risk to the internet. And if someone was a terrorist with malicious intent, that would be an optimal target. So there's a huge incentive to not even create the possibility of shutting down such a thing. So in multiple ways, it seems we're onboarding a technology that is so integrated, that we don't want to shut it off. And even if we could, the costs are so high, it's unlikely that there's going to be agreement upon it. And I think that's a useful parallel for something that might happen with AI.

SPENCER: So that's a social reason or set of social reasons why it might be hard to shut off. What about more technical reasons?

DARREN: Well, in that sense, though, the technical aspect is there, too. So the internet functions and underground cables and all these other servers and whatnot. But it's distributed in such a way that, again, if something technically goes down in one domain, there's another part of it that still works. It is relatively robust in multiple places. With AI, again, depending on what we're talking about, the advanced model could be in many different places, across many different servers and across many different countries who have jurisdictional responsibility. The willingness and the desire to actually do anything. Again, not quite right now, but if you imagine in multiple years, there's more and more powerful systems out there, which more and more actors have access to, including on dark web and whatnot, that it's not clear that's going to be very easy to address or control.

SPENCER: I like the internet analogy you're giving. I think it can give some intuition here. But it also seems to break down in a certain way, which is that we're talking about AI agents. We're talking about things that are smart, and things that are smart can actually subvert you on purpose. Well, on purpose, at least in the sense that they might actually have an incentive based on their own goals to subvert you.

DARREN: Yes, I think you're right. And like anything, these are just sort of mental stepping stones analogies. It's not quite the same. AI is not quite like the internet. It's not quite like nukes. But these are useful examples to help us think through it. And then, as you said, to think about the disconnects or discontinuities, where we might have to be even more concerned. So if there are AI agents that whether they actually have goals, or they just act as if they have goals (I'm not that concerned about at the moment), but they can pursue actions in a goal-directed manner, which is far more concerning than, say, an internet, which is, as far as we know, not having any sort of preference or intentions in the same way. So all that to say is that the internet as a foundation way of thinking about it can be useful. But then we should be more concerned when AI systems become more capable, seeming to display intentions or capabilities that are very fast, diffuse, and very powerful.

SPENCER: So what do you think are some of the unique aspects to AI that are different from other sorts of dangers we've ever encountered?

DARREN: I think the speed, the insight, and the capability is sort of how I sort of come to understand it. Now, it's just the super speed. If you have a system that can process things so quickly, we're just not used to that. Now you could say various aspects of different technologies in the past or threats happen quickly. But I think as AI systems become even more powerful, the speed by which they can make change in the world is not something we're really prepared for. I think another key concern about AI is it might understand the world in ways we don't. And you can see this broadly as insight or pattern recognition or something like that. But I think conceptual insight is a big deal. What humanity has managed to achieve over the past 10 years, a hundred years, thousands of years, is truly amazing. In the book, I talked about various things like, "Have you ever baked cookies?" Well, it seems pretty simple. But as far as we know, no other life form has baked cookies, because it's actually quite complicated. And when you onboard the power of intelligence, you can really go quite far. And we've managed to figure out different laws and ways in which the world works to broadly rearrange matter to create things, whether it's cookies, or cars, or computers, or podcasts in a village to listen to them. And what might an AI system that's a bit more insightful than us be able to discover, be able to figure out? I'm not saying it might be new fundamental laws of the universe, but maybe it could be. Who knows, that's kind of the concern there. And once you can do that sort of thing, it becomes hard to protect yourself against. I think it was recently shown that Wi Fi signals can be used as an imaging system to figure out where people are and how they move about the room. Well, we didn't know how to do that, and then someone figured it out. And that's a lot of how things go. So what might the AI figure out that we currently don't know?

SPENCER: So we've talked a bit about the dangers of AI. But people might wonder, "Well, what can you actually do about it? Is it a lost-cause? If someone's not, let's say, knee-deep in AI safety ideas, is there anything they can do to contribute?"

DARREN: Yes, there is. I don't want people to have a sense of futility and hopelessness. It's very important that they don't. There are lots of problems in the world, and we've overcome many things. And even if it looks like it's going to be hard to overcome certain challenges, it makes sense to try. And I think this is where hope can come in for sure. One is that uncertainty cuts both ways. We don't quite know what's going to happen. And these problems, many of them are currently unsolved. But that doesn't mean they're unsolvable. We don't quite know. So there's hope there that we may be able to figure this out. And the other more rational way of looking at it is that we're far more likely to find solutions, if we're looking for them. If we end up sort of feeling hopeless and not engaging at all, then it doesn't work. More practically, this is sort of tricky, on a personal level. it can be hard to impact change. The main ways are advocating for other people, say, lawmakers or people who have more influence or power to do something, and then trying to do more yourself. So that's the normal route of political advocacy: talking to your representatives to make sure they're enacting policies that are going to make you safe, or raising awareness to others, or volunteering, or donating, or trying to work in AI safety. Those sorts of things. And so that's on the personal level. And then we think of more of, say, the national level, the global level, the state level, then it's what policies can be put in place. And I think there are a huge range of things we can do. And I'm starting to see movement in countries and companies. They are trying to be more transparent. They're trying to share their safety protocols. The Biden White House issued that executive order in early November. That flurry of activity around the UK safety Summit, also in early November. So when I think of things like that, it's having auditing, valuation schemes, having licensing requirements, increasing transparency, increasing security, we'll say both cyber and broad security of these AI companies, because the stuff they're developing is very powerful. And even things like computer governance, where we have a better understanding of where powerful AI chips are going, who has them, and how they're using them. So I think there's a whole host of things there, which again, may be a bit more technical, a bit more policy oriented, but the average person can try to at least push the representatives to implement policies that seem to be safety oriented.

SPENCER: If you do that kind of exercise, where you imagine in some number of years that super intelligent AI is built, and then you kind of look backwards at what happened. Where do you see leverage points in regulation? Because one thing that I wonder about, so I totally agree that more transparency and where things are happening seems like a good thing. If it is a risky technology, transparency seems good. Regulation of dangerous models, and maybe look at what people have to do before they deploy them in terms of testing and so on. But I think something in the back of my mind with all of this is, "Do I actually expect that those kinds of implementations would actually either prevent artificial general intelligence from being built, or even be able to tell right before it's built?" And I sort of think that they wouldn't. So I'm curious to hear your thoughts on that.

DARREN: It's a great question and relates to a broad concern of so much uncertainty in this space. So depending on, again what I draw upon my imagination kind of goes both ways, if you think of human dynamics, the normal things that happen when there's competition or countries that don't like each other, it doesn't seem like it's going to work out. But then again, humanity has worked together before. Not perfectly, but we have had an agreement on banning landmines among many countries. We have managed to fix the hole in the ozone largely. We've been able to contain the proliferation of nuclear weapons to some extent. And there are more countries than there used to be, but there's not as many countries. So I think that's also a success, even though it's a mixed one. So with AI, that's kind of my current hope that it might be a mixed success. And recent statements by different governments, including US and China, saying we need to work on this stuff are good signs. And it's going to be messy, and there's going to be ups and downs, and there's going to be problems. But I take a step back and think, "Okay, if this only has a 10% chance of working out, it's still worth trying. What's the alternative? To not try at all?"

SPENCER: I guess, what I wonder about is, if this is a technology that we don't understand very well, then it may be hard to know when we're on the precipice of really significant breakthroughs, where things are just totally different. So imagine you have an agency that sort of requires certain transparency or certain tests of the systems, but then you get a kind of complete phase change in the systems where suddenly they go from behaving in a reasonably intelligent way to behaving in a super intelligent way. I worry that sort of the generalization won't happen, where the tests that worked in previous generations will stop working, or the transparency that's there won't be enough to predict that we're on the verge of a transition. So I'm just wondering if it is kind of a big phase shift that happens very suddenly, whether there's policies that can actually help in those kinds of scenarios?

DARREN: That's a great point. Yes, I think dramatic leaps in capability that are hard to anticipate is a primary concern. And in a way that 's why I wrote the book, why we're talking now, because this is such a possibility. I think we should be concerned about it. In terms of specific measures, I think one of the ways to try to get at this is that not only do AI companies need to pass certain safety measures before deployment, but they also have to do it before training. So if they have some sense of the model that they're training might be harmful, that should be really analyzed and evaluated to the extent that it can. I think AI companies also have to describe in detail what they predict the capabilities of their systems will be at certain levels of training, or certain compute or complexity. And in this sense, if an AI company seems to have a good track record of saying, "Okay, we're training a model. It has five times more computational power than the last one. Therefore, we think it will be capable of ABCDEFG." And it turns out, they're mostly right. That's a decent thing. They may be more trustworthy. Again, not entirely. Well, alternatively, if they've written down almost a pre-registered trial and science, that they think ABC is going to happen, and they're completely wrong, this shows that they don't really have a sense of what the power of their models are going to be. And in that sense, we should maybe have less leeway to make such a thing in the first place.

SPENCER: It suggests that there could be a lot of value and getting better at understanding the mapping from, let's say, more compute and more data to more capabilities. Because as I understand it, certain capabilities improvements, so we've seen in these models, were very predictable. Because you kind of can see this curve, "Oh, it's getting better and better at this task over time." Let's say, in image recognition, classification accuracy on certain standard datasets, you kind of can see it creeping up, as models improved. But then you have other things where a new capability emerges that seems like nobody even thought to measure previously, because the other previous models couldn't do it at all. And then holy shit, now this model can suddenly do it. An example might be: you train these models on internet text to try to generate human-like text, and then one day you realize, "Oh, wait. It actually can solve simple math problems, or actually can do coding to some degree." Where it's like you hadn't even intended that, you hadn't even thought to measure its capability on that in previous iterations.

DARREN: Right. Or it was, say, trained primarily on English. And it turns out, it can speak many languages. Yes, I fully agree with you. So I think these are just good reasons to have more people in the AI safety space evaluating the different capabilities and how things might go awry. And even just a greater understanding of how all of this works. Now's the time for more resources to figure out how these systems work and how they are likely to work when they get more and more powerful.

SPENCER: Before we wrap up, I want to ask you about the communication challenge that you have gone through to write this book, because this is obviously a really difficult topic to communicate about. And so I'm curious, what are some of the things you learned about how to communicate these topics in doing this?

DARREN: It certainly was. As we know, this is a very technical field. It's very complicated, things are moving very fast. That's the other thing. As I jokingly say to some friends, this book is not about birds. Birds don't change much within a year, let alone within six months, let alone within a month. But AI was changing so fast; that was another challenge. But broadly, I would just try to think about if I had no exposure to this topic, what do these words mean? How am I to explain the concept? Many people in the AI safety space are very familiar, and they don't realize how much they've come to learn and understand about it. So they use a lot of shortcuts, a lot of languages that are jargony. And I tried to get away from all of that. So in this book, it's really for people who aren't even sure what AI is, and are curious like, "What the heck is going on with AI?" So there's no orthogonality thesis. There's no conditional probability, at least not by name. That's not how I use this. It's trying to think of what is the most easily accessible way to communicate an idea. So for, say, forecasting or probabilities, or decision under uncertainty, the weather forecast is a great example. I think this is the most common frequent exposure an average person has to forecast that usually has a probability related to a specific outcome. And it's not dressed up in terms of more complicated probabilistic languages. There's no Bayesian probability here, at least by name. And so using that approach, and then having people who aren't not in this space, who are peers or acquaintances who have no exposure to the AI safety space, having them have a look at it, that was also very useful. In academic writing, which I have a bit of exposure to, it's very common for people to say, "This is what I'm saying. And this is what I'm not saying." But of course, most people if they're not saying something, they just don't say it. So that's another way of saying, "Well, if you're not gonna say something, just don't say you don't say you're not gonna say it." And as such, I would try to test examples on different people and try to think about, "Okay, what are the things that people have, again, most familiarity with in their lives?" So you have things like music, you have things like food, you have things like themselves, their own childhood. In that particular case, when I sort of say that you as a four-year-old are very different from you as a 30-year-old, or you as a 40-year-old. In a way, you are your own super intelligence in terms of the capability difference between when you as a child existed and when you as an adult existed. As a child, you couldn't imagine how capable you would be in your 20s, 30s, and 40s. And so really, again, trying to meet people where they are, and to try to make it as accessible as possible without too much technical language while addressing the main concepts.

SPENCER: So I absolutely agree that avoiding complex terminology that people won't be familiar with and using metaphors that they can relate to in their own life seem like awesome techniques. I'm wondering, are there other techniques that you found helpful.

DARREN: In terms of other techniques, it might have been just an overall approach to try to think about the structure of the book overall, or the way a chapter flows to have a nice linear structure where the main ideas flow into each other and key pieces support each other. And it's, again, all very accessible. And with a phone analogy, or the internet, is stuff that people already have exposure to. Otherwise, it's to try to, again, make things into reasonably sized nuggets. The most complicated the book gets is when I tried to explain how image generators work. And that was a bit more technical. But I think things like that are very important. Because if you tell someone, "We don't really understand how things work." That's not giving them enough, I don't think it really communicates the idea. So in that sense, I tried to provide as many rationales or reasons for things. But at the same time, giving people enough evidence or analogies or reasons themselves and evidence so then they could understand why that might be a reasonable position to hold.

SPENCER: In my experience, a key thing that's missing a lot of times in communicating around this topic is really imparting why intelligence is special, why it's such a big deal, why it's not like other things. And I'm wondering if you feel like you found ways to communicate about the specialness of intelligence?

DARREN: I guess we'll see. I think I've managed to do it decently. The book really does focus on intelligence building the world. And it starts off, the first chapter, the introduction, but the first chapter really is about how powerful intelligence is in terms of technological developments, and what we've managed to achieve. And it really isn't about AI at all. As I said, the main issue that I want people to understand is that intelligence is really powerful. Things are happening very fast. There's uncertainty in all these spaces. These are not AI-specific issues, but they very much relate to the AI situation. With intelligence, I really just tried to sort of ground people in the fact that so much has changed. We've gone from things that used to be considered impossible to now we take them for granted. That was the famous example of the Wright brothers and their flight, and the New York Times predicted it would be one to 10 million years away, and it happened two months later. But there's also the story of one of the Wright brothers in 1901, saying he didn't think a flight was going to happen for 50 years. So you have the very people involved in it, they don't see it happening either. So trying to get at both the humility of uncertainty, but also look how much we've changed, look how much has happened. We have recorded music now. We have planes, trains, cars, and rockets. And we're living really in the age of Sci-Fi compared to almost all humans in all of history. And to remind ourselves of that. So when you think of going into the future with AI things, and it sounds a bit sci-fi, well, that doesn't mean it's less plausible. I should also mention that in terms of trying to make the ideas more accessible, each chapter begins with a minor overview and each chapter ends with key messages. So I'm really trying to make the ideas as accessible as possible and more likely to stick.

SPENCER: In Daniel Kahneman's book, "Thinking Fast and Slow," he ended his chapter with kind of the quick takeaways, which I think is a fantastic memory aid and really helps reinforce some materia. And I think it is used by a lot more authors, but I don't know if I've ever seen one besides him and you do that.

DARREN: Max Tegmark in Life 3.0, I believe he did something similar and called it the 'bottom line.' And I thought, "Yeah, why not have sort of four to eight bullet points that encapsulate the main messages of a chapter?" It's very useful. Sometimes people read a book all the way through, and sometimes they pick it up from time-to-time. And it's very common for people to lose their place or forget something that they thought was important.

SPENCER: Last question for you. I find that in discussions about AI, people can often get tripped up in philosophical questions that aren't actually that directly relevant. So one of them can be. "Well. Are the AIs really intelligent? What truly is intelligence?" Or another one can be, "Well, but can AIs really be conscious? Or can AIs really have goals?" And so I'm wondering how you handle those kinds of questions?

DARREN: It's a great question, Spencer. And I'm smiling, because I love all of these topics. And they are not in the book, or they're only in the book to say that they're not in the book. And in that sense, consciousness has about a paragraph because my concern about AI causing harm, they do not need to be conscious to cause harm, much like a virus can cause a lot of harm without being conscious or other phenomena. And so consciousness is not the thing. I think it's an important issue. And maybe there's another book that someone else is writing, or I will in the future about consciousness and digital entities and whether they can suffer and all these things. But I really wanted to make this book dedicated to the AI risk, AI safety space. Similarly, there's a couple of paragraphs, "Do AIs really have goals? Well, what does that mean?" And again, I just sort of say, let's not get lost here. They act as if they have goals. That's good enough. Are they intelligent? Well, they act as if they are demonstrating intelligence in some way. So I think these issues are important. I just think my book wasn't the place to delve into them deeply because it didn't relate to the overall thesis of: AI could be harmful for many reasons.

SPENCER: Do you have any final words for people that are concerned about this topic?

DARREN: I think it is right to be concerned. I think it makes sense to try to get a wide range of voices and information about this topic. And it really is one of the most important issues of our time. So I think it makes sense to be concerned, and try to learn more about it, and see how you can engage.

SPENCER: Darren, thanks for coming on.

DARREN: Thank you for having me, Spencer. It's been great.

[outro]

JOSH: A listener asks: "What question do you wish people asked you more often?"

SPENCER: Well, a question that I really like and I tend to ask others is a variant of "What are you excited about right now?" or "What's something that you're really enjoying thinking about?" or "What's an idea that you've been mulling over lately?" So those are the kinds of questions I like when people ask me, and I also like to ask others those questions. That being said, sometimes people can be a little befuddled by them because, you know, they're not the most typical question to ask someone. But I do find that even if people are a little befuddled, it often leads to a more interesting place in the conversation rather than asking "What do you do for work?" or "What are your hobbies?" I think asking "What are you excited about lately?" or "What have you been thinking about lately?" — especially with a more intellectual kind of person — can take the conversation in a direction that they really enjoy talking about.

Staff

Spencer Greenberg — Host / Director
Josh Castle — Producer
Ryan Kessler — Audio Engineer
Uri Bram — Factotum
WeAmplify — Transcriptionists
Miles Kestran — Marketing

Music

Affiliates

Click here to return to the list of all episodes.

CLEARER THINKING

Episode 187: We can't mitigate AI risks we've never imagined (with Darren McKee)

Contact Us