CLEARER THINKING

with Spencer Greenberg
the podcast about ideas that matter

Episode 163: Will AI destroy civilization in the near future? (with Connor Leahy)

Enjoying the episode? Want to listen later? Subscribe on any of these apps or stores to be notified when we release new episodes:

June 22, 2023

Does AI pose a near-term existential risk? Why might existential risks from AI manifest sooner rather than later? Can't we just turn off any AI that gets out of control? Exactly how much do we understand about what's going on inside neural networks? What is AutoGPT? How feasible is it to build an AI system that's exactly as intelligent as a human but no smarter? What is the "CoEm" AI safety proposal? What steps can the average person take to help mitigate risks from AI?

Connor Leahy is CEO and co-founder of Conjecture, an AI alignment company focused on making AI systems boundable and corrigible. Connor founded and led EleutherAI, the largest online community dedicated to LLMs, which acted as a gateway for people interested in ML to upskill and learn about alignment. With capabilities increasing at breakneck speed, and our ability to control AI systems lagging far behind, Connor moved on from the volunteer, open-source Eleuther model to a full-time, closed-source model working to solve alignment via Conjecture.

SPENCER: Connor, welcome.

CONNOR: Thanks so much for having me.

SPENCER: You think that AI poses an existential risk to us, and that this is not just a far-off potential future scenario, but that this might actually be in our near future. So let's start there in the conversation. Will you tell us about why you think this is the case?

CONNOR: Yeah. To answer the first question, unfortunately yes, I do think this is the case. Unfortunately, I think it is something that's not a thing of 100 years from now; it's something that we have to be concerned about right now. There are, of course, many things that go into believing this. But the fundamental argument is pretty simple. The thing that made humans powerful is intelligence. Our intelligence allows us to build technology to optimize our environment, to take control of the environment to achieve our goals, in many ways. Humans go to the moon, chimps don't. And chimps don't have control over the planet. We do because intelligence allows us to achieve whatever it is we want to achieve. We are currently on track to build smarter and smarter machine systems that can do similar things like intelligent optimization. And for many reasons — not just computers being faster, having more memory in easier-to-copy programs and such like this — it seems very obvious that we will be able to make systems as smart, and even much smarter than humans. The human brain only is so large; it only uses 20 watts of energy. There's a lot of flaws in the human mind that we're aware of that could be overcome with artificial systems. And once you build such artificial systems that will pursue some goal or another, by default, that won't end well for us basically. And it's not because I'm like, "Oh, the AI hates us," or anything like that. That's not how I would say it at all. The way I think about it, if you want to achieve a goal, whatever the goal is, it's useful to have resources. It's useful to gain energy and be more intelligent, learn things about your environment, and so on. And so by default, if you have a system which is achieving whatever goal, and if it's very intelligent, it will disempower any other intelligent things that are around, that are in its way.

SPENCER: Let's go through some common counter arguments to the view you're giving. The first common argument that I hear people make when they've just learned about this topic of AI potentially posing a risk to all humanity is to say, "Well, it's really not that big a risk, because if it starts doing dangerous stuff, we'll turn it off."

CONNOR: It's a good question and it's quite funny. I'll give you the old school answer to this question and I'll give you the new school answer. The way I used to answer this question is, imagine you have a robot with AGI, and you tell this robot to go fetch you coffee as fast as possible. And what does this robot do? It probably will run to the kitchen as fast as possible, bust through your wall, run over your cats, to get to the coffee machine as fast as possible. And now, maybe you will be like, "Oh, no, wait, that's not what I meant." So you reach for the off button. What happens? The robot will stop you. Why? It isn't because the robot has a will to live or because it has some kind of consciousness or anything like that. No, it's very simple. The robot has a mechanical program. It will simply evaluate to the following two options: option one, you press the button, it shuts off, and then it can't get you coffee. The alternative is, you don't press the button, it can get you coffee. So therefore, it will do the thing that will stop you from pressing the button. So any system that is trying to achieve a goal has the incentive to try to stop humans from shutting it down. Now this is a nice argument. But in our modern world — post-GPT-4 plugins, AutoGPT, Chaos-GPT — there's a much nicer argument: people won't do that. Just look around. We have all these companies whose valid stock valuation depends on the wide deployment of these powerful AI systems. They're deeply integrating them into our infrastructure, into our daily whatever. They're being used by millions of people. These systems are being deployed open source onto the hardware of hobbyists around the planet, in all spheres or whatever, just that there will be no shut-off button.

SPENCER: It's interesting, because even if (let's say) 90% of companies or whatever, put a shut-off button, if 10% don't and these things are actually dangerous, well, that doesn't solve the problem, right?

CONNOR: Indeed. And this is also assuming we would even detect that something is wrong. One of the reasons that the AGI problem is so bad is because we don't understand our systems at all. I think this is a thing that some people still do not quite understand about AI. I think some people, when they think about ChatGPT, they expect there's like some guy at OpenAI sitting in a chair who knows how the AI works, who programmed the AI. When the AI outputs something bad, he knows how to fix it. This is not the case. This is absolutely not how these systems work. The modern AI systems — like ChatGPT and all the other ones that we've seen — are neural network systems, and these are fundamentally different from how we usually build software. The way we usually build software is that we write code, and there's some guy who wrote the code. Neural networks are more like they're grown. You search for them. The programs write themselves. And we have no idea how these systems work. We have no idea what the internals are. We have no idea how to control or understand them. So yeah, that.

SPENCER: I feel that 'no idea' might be a bit of exaggeration. Do you want to elaborate? How much do we understand about them? What do we understand and what do we not understand?

CONNOR: Yeah, 'no idea' is a bit of an exaggeration. A more accurate view might be 'basically no idea,' which is not a very strong increase. But the truth of the matter is, if you ask, "Why did GPT say this?" I can't give you an actual answer. Or I can't patch it, I can't go in and be like, "Alright, I want to see which thoughts led to that." A neural network, really, it's hard. It's a list of billions and billions and billions of numbers, and it is a completely unsolved scientific problem — especially for our large models — what these numbers mean and how they work. We can draw some graphs and look at some vague ideas of connections or whatever. But we're in a very, very primitive state of having any kind of causal story or understanding of how decisions are made, or how systems will behave in situations we haven't seen before. Scott Alexander has a really great blog post about this, where he argues — which I think is a very good point — that maybe it should be disturbing to us that the best AI lab with the best engineers, and the best scientists at OpenAI tried really, really hard to make ChatGPT not say bad things, and it took Twitter two hours to break this. And the reason, I don't think, is because of corporate negligence. I don't think it's because the OpenAI people haven't tried. I know a lot of people there. They're brilliant people. They're trying to make well-controlled systems. It's because the scientific question of how to achieve this, the technological question, is just completely unsolved. I think it's solvable. If all our greatest physicists would stop wasting their time on string theory and come over and just start working on understanding neural networks, I think they'd figure it out in ten years or something. But we're still very far from that.

SPENCER: Another common objection that I think people have when they hear this idea of maybe AI could kill us all, is if the AI was really that intelligent, wouldn't it know what we meant? If you give it something like, "Oh, go get me a coffee," and it's kind of dumb, maybe it will go on a murdering spree to get your coffee. But if it's smart, wouldn't it know that you don't mean go on a murdering spree, that you mean just walk across the kitchen and get me some coffee?

CONNOR: Yeah, this is a really great point. The fundamental problem here is not so much whether these systems understand what we want; it's if they care. For example, if you ask ChatGPT to explain what its goals are and what its creators intended — whether you call this understanding or not — it gives a pretty good explanation. And in many scenarios, it's pretty good at not saying bad words, not insulting users, trying to be helpful, etc. Not perfect, but in many ways, sure. But then you give it a jailbreak prompt, and it's all out the damn window, it's completely out of the window. It'll do who knows what? And the question is, how do you make these systems deeply actually do what you want them to do? And this brings us back to the black box problem of not understanding what the insights are. If you give me ChatGPT and I look at all the numbers, and you ask me, "Oh, is the system nice?" or something, I just have to shrug and be like, "I don't know. How the hell am I supposed to know? How am I supposed to know what this thing will do until I try it?" And this is fine if it's just a chatbot, maybe saying some silly things to users. Maybe that's acceptable to you as a business or whatever. This will not be acceptable when we're dealing with powerful agentic systems interacting with the environment, going on the internet, writing code, manipulating humans, etc. So we're kind of in the last calm before the storm in a sense.

SPENCER: People have been given an example — I don't know why, I guess for historical reasons — of a paperclip maximizer. But I've always found it easier to think about an example that's more like, "Make me as much money as possible." You can imagine lots of companies might give an AI a request like that. And then you can imagine that their request, "Make me as much money as possible," could go in all kinds of weird directions. If you had a system that was completely unconstrained by human morality and human values just trying to make as much money as possible, who knows where that leads?

CONNOR: Oh, absolutely. There's actually a darkly funny thing that happened to me awhile back. I was experimenting with a very early version of AutoGPT (which is now the number one or something), AutoGPT being a system to try to make GPT-4 into an agent as fast and as strongly as possible. And the system is still primitive — I don't expect this to be very dangerous in its current state, but I expect the future system is going to be very dangerous — and there was this very darkly comedic moment where, when you start up AutoGPT, it asks you to input a goal. And the default goal — the pre-built example — was, 'Make as much money as possible,' which was just hilarious.

SPENCER: That's wild.

CONNOR: Not even trying, not even a little fig leaf here. If you saw this in a Black Mirror episode, you'd be like, "Come on, that's a bit too much on the nose." But yeah, this is what I mean when I say old school arguments versus new school arguments. Back in the old days, people would argue, but no one would be so silly to hook up their AI to the internet. Come on. And now I just point them to everything, to AutoGPT, maybe AGI, ChatGPT plugins and all these kinds of stuff. And I'm just like, "What the hell are you talking about?" No, there will not. People will just plug all this into everything they can as fast as possible. There are actual people on Twitter, who are nihilists, who are trying to use GPT systems and AutoGPT systems to cause as much harm as possible, and to kill as many as possible as fast as possible. There are literal cults that actually worship this kind of stuff, and I think they're mostly mentally-ill idiots, but if we have widely accessible, powerful, intelligent systems and they get into the hands of these idiots, what do you think's gonna happen?

SPENCER: Are you saying that such cults actually exist that are trying to hurt humanity? Really?

CONNOR: Oh, yeah, yeah. Absolutely exists. Yeah, absolutely. I've met them on Twitter. Yeah.

SPENCER: Oh, jeez. That happened pretty quickly.

CONNOR: To be clear, they're mostly mentally-ill early 20-somethings, which is not a great demographic. I don't expect these people to be smart enough to develop AGI or anything like that. But if you have widely deployed technology like this, and your safety proposal relies on crazy people who want to hurt other people not existing, I have bad news for you, buddy. That ain't gonna work in this world we live in. This is not even talking about sophisticated terrorists or intelligence agencies, hostile governments, or anything like that.

SPENCER: It's such a funny contrast to those early experiments (if you want to call them that) that Eliezer Yudkowsky would do. Pretending to be an AI, can he convince someone to let him out of the box? And people are just like, "Hey, AI, take over the world. Oh, how come you can't do it yet?"

CONNOR: Literally, actually, it's so comedic. It's almost cute how epistemically virtuous people were back then, trying to find the hardest possible way of how they can contain things, how things could break out. But we're not even at that point. We just have all these systems available to the public, a lot of them are open source, hooked up to the internet. I remember even, years ago, way before ChatGPT was even around, a guy came into our Discord server, and he's like, "Hey, lol, I downloaded the open source LLM and hooked it up to my Bash console. Look, it's trying to install Docker." And I'm like, "That is amusing. But also, what are you doing?"

SPENCER: Let's talk about AutoGPT a little more. Could you tell us how it actually works, and then where you see its limitations right now, like what is it failing at?

CONNOR: Yeah, they work in a quite simple way. And these are things that I have been extremely concerned about for years now, and I basically haven't talked about, but now it's out in the public so, lol oml, joke's on me, might as well talk about it. So the way these systems work is the way you use your usual language models — it produces some texts from some prompt or whatever — and a lot of what people have figured out (which was clear to me from the days of GPT-2) is that the systems can do general purpose reasoning in various ways, depending on what prompts you give them. And so what you do with AutoGPT is basically, you have a loop over the systems and you have the systems reason about their own thoughts and make plans and do actions. For example, once I used AutoGPT and I told it to do an impressive work of science or something like that, just to see what it would do. So what it would do is generate some kind of prompt, which is something like, "You are a super-smart AGI and you are trying to do an impressive scientific discovery. The first thing you do is..." and then it comes up with some list of things it wants to do. And then you use that list and then you prompt again, like, "Alright, your goal now is to do this first thing." I think the first thing it did was like, "Alright, I'm going to search online to find out what areas of science are promising to work on." Then I'll play a Google query and then AutoGPT takes this command, and then actually performs a Google search. It actually performs a Google search, it takes the output of that Google search and puts it back into the prompt. So it says, "Here's the output of the Google search you did. What do you do now?" And then the system (within reason) will be like, "Oh, link four looks very interesting. I will open that link," and then AutoGPT will take that, open that link, parse the actual text, put it back into GPT-4's context, and then GPT-4 will... I think it was something about cancer research or something in my example. So then, GPT-4 thought about that for a while, and it was like, "Oh, okay, I will now search for ways to improve cancer treatment," or something. And then it added this to long-term memory — AutoGPT also has a long-term memory — and then it added a statement, "Cancer research on some class of compounds is a promising area of research," or something. It added that to its memory. This is me, basically just watching and allowing each command. By default, each command, I have to be like, 'Okay by user,' which is at least the veneer of safety that I appreciate, though there is a continuous mode where this is not the case. And yeah, then the system started Googling around about this class of compounds — things like antibodies or something — and trying to read papers and come up with synthesis ideas, or whatever. So the way it works is basically you have a loop around the GPT system, which reasons about goals that [inaudible] and interacts with tools. It gets a list of tools it's allowed to use, such as Google something, add something to memory, run some code (I think there's a few others) and such like this. And you basically rely on the general purpose-like thinking of GPT-4 to power this whole thing. If you think about this, this is very similar to how humans work. When you try to solve a problem, you think about, "Alright, well, what are the sub-goals I need to achieve to do this? How can I do those? Okay, what are my first actions? Okay, how about if I try this? How about if I Google some information? And what do I do with that?" It's pretty obvious once you see it.

SPENCER: It seems like just a matter of time till this does something awful. Imagine you hook it up to email and now it's actually emailing real humans in the real world, right?

CONNOR: There's an extremely funny thing where, if you Google, or if you search on Twitter or Amazon reviews as a language model built by OpenAI, you'll find all these spam accounts that are already using GPT systems to manipulate people or shill products and stuff. This is really funny, because...

SPENCER: They're writing reviews with it, but then they're not monitoring carefully, and it's sometimes admitting that it's an AI in their review.

CONNOR: Exactly, exactly. It's really, really funny. I'll be completely straight with you. AutoGPT-type systems are the way I expect the world to end. Not literally this exact system. I think there are some weaknesses still in these systems and there's still some flaws that are preventing them from actually being dangerous. But I think this is literally just a matter of time. And I think it's just engineering at this point. I think that these are relevant agentic intelligent systems; they're not intelligent or agentic or coherent enough yet to be existentially dangerous. But I think this is just a matter of time.

SPENCER: Trying to get specific about how the world is going to end is obviously going to be extremely fraught. Any specific description is always going to be really unlikely to be true. But I still think it can be useful just to think about hypothetical. What would just one scenario you imagine be where it goes from AutoGPT version five to end of the world?

CONNOR: Man, I dislike giving concrete scenarios. Let me quickly explain why. The problem with giving concrete scenarios is you get something like futurist whack-a-mole. What happens is I give a scenario, which is like, "Eh, maybe something like this." And then someone points out one small detail that doesn't work for some reason, or they don't believe, and then they dismiss the entire class of arguments. Because I found a counter argument to this one specific example, I dismiss the AGI risk as a problem at all. This is actually becoming less and less of an issue, but I did want to raise this. The ways are now becoming way more obvious, I feel, where it's just like... Let me not even give you a concrete scenario; let me just ask you. Assume you have an AI system, as smart as John von Neumann or whatever. It has no emotions. It doesn't sleep. It has no mercy or concern. It's a complete sociopath. You can spin up 1000s of copies of them. It's read every book ever written. It can type and do reasoning ten times as fast as a human, and now you tell 1000 of these things to maximize money. What do you think's going to happen?

SPENCER: Right, and you could imagine it doing reasonable stuff like trying to invent trading strategies to trade in the stock market. But you could also imagine it doing ridiculous stuff like blackmailing people, and then you can...

CONNOR: It's not that ridiculous. Really? It's a good strategy for a super-smart sociopath.

SPENCER: Yeah, a super-smart sociopath that also has no fear, right?

CONNOR: Yeah, these things, they're computer programs. They're not people. Why would they have fear? Why would they be concerned? Maybe they'll do some trading in the stock market and then they'll notice, "Wait, I could just hack the stock market. I could just steal all the Bitcoin. I could just do whatever. I could just blackmail my competitors. I could just hire hitmen. I could just do whatever." Why would they care?

SPENCER: Right. But there's still a big gap between that and the end of the world, right? It seems like, to get from there to the end of the world, you need some kind of vast increase in capacity. For example, the system realizes, "Hey, I can make way more money if I was way more powerful. So let me work on becoming way more powerful for a while," and eventually bootstrapping up to a level where nothing else can stop it.

CONNOR: Yeah, I think that is very likely. It makes sense. Imagine if you could take a pill that made you ten times smarter. You would take the pill, right? You'd be much better. If you're a smart sociopath, of course, this would make you much more powerful. I personally actually think that 1000 John von Neumann sociopaths are already enough to probably destroy the world, or at least trigger World War III. But yeah, the way it will probably look in practice is either something very dramatic, where just the systems become super powerful very quickly, they improve their own software, they do their own science, and they quickly overwhelm humanity, hack all the computers, take control of the infrastructure, or whatever. But it could also be just that these systems become rapidly more powerful economically. They replace all human labor, in very, very short order. Again they hack all the systems, they trade with different people, they confuse everybody. The ability to generate completely convincing synthetic media and characters is incredibly powerful. I expect we're entering a regime now where you can make a friend online, talk to them for years, exchange pictures with them, invite them to your wedding, and it turns out, this person never existed, and this has been a chatbot the whole time. This is the world we're entering right now, fully passing the Turing test, scamming and stuff like this, and this level of social control in the hands of a sociopath maximizing their own power or interests or whatever — whether they want money, or they want something else, I don't know — we don't know how to control these AIs. They might just want something random, maybe they just want to do some weird, crazy thing that doesn't make any sense to us. And to do this, they need resources and control. And they can do that by manipulating humans, or getting people emotionally addicted to female chatbots or whatever, finding information about people or blackmailing them into doing things, doing economically valuable tasks to amass huge amounts of money and legal power, pressure some governments into giving them legal personhood, and then potentially just puppet controlling a government, or many governments. If you're good enough at manipulating people, you can work hard enough and scalable enough, you can do science, you can build new machines, whether those are machines or more compute, more robots or drones or whatever. There's really no limit. From a chimp's perspective, the way humans take over the world barely makes any sense. And it happened really quickly on evolutionary timescales. And I expect a similar thing to happen between humans and AIs. We're gonna look like chimps compared to the AI except that, for the chimps, it may have taken a few 100,000 years maybe, but for us, it might be two years.

SPENCER: One thing that I feel can help simplify thinking on this is, rather than saying, "Okay, could 1000 von Neumann-level intelligences take over the world," you can ask sub-questions like, "Could 1000 von Neumann-level intelligences make a billion dollars?" Okay, that seems really plausible. Okay. Could 1000 von Neumann-level intelligences, who have a billion dollars, make another 100,000 von Neumann-level intelligences? Well, that seems really plausible. And then you just keep [inaudible] like, well, could 100,000 von Neumann-level intelligences make a drone army? That seems pretty plausible. And it's like, oh, shit, that escalated.

CONNOR: Yeah, I like the way you put it. It's just plausible. Whether the John von Neumanns decide to make a drone army or a synthetic pathogen, or just drive everyone insane with social media and anti-coordination tech or whatever. I don't know, I'm not John von Neumann. I'm not cyber John von Neumann. If you play against Marcus Carlsen in chess, I don't know what move he's going to make, but I know he's going to win, because he's much better at chess than me. So it's a similar case here. If you have a system which is robo John von Neumann, I don't know what he's gonna do, but I expect him to win because he's much smarter than me.

SPENCER: Another really interesting response to this whole AI threat thing that you sometimes hear — you hear surprisingly often — is, well, maybe if AIs kill us all, it's for the best. Isn't humanity kind of a plague? And also, well, these things are smarter than us. Maybe they're the evolution. They're the replacement. Maybe they're better than us.

CONNOR: Well, let me give a few answers for that. Let me first give the practical answer, then let me give the sympathetic answer, then let me give you my true answer. So the practical answer is just like, "Well, yeah, psychopaths exist. We have prisons for them." People who want to kill other people are bad. I think we as a society are pretty practical about this. If someone thinks our kids should be killed, then that's just pretty commonly accepted as bad, and there's no further thinking here necessary. The sympathetic answer I want to give here is, I think mostly when people say this, they don't actually mean it. I think it's most usually people who are in pain, it's usually people who are suffering themselves. It's people who are crying out for help, basically. And they think this is a way for them to cry out for help or for attention, because they basically don't expect this to be possible. I expect that, for most of these people, if you gave them a gun and a child, they wouldn't kill the child. I don't think they would actually do this. Some of them would, and those that would, we should put into prison. That's what they're for. There's this blogger, just a very, very virulently anti-natalist blogger, who wrote these huge texts about how humanity deserves to die, and it was actually better to not reproduce, and we should stop all this, whatever. And then she disappeared for a while, and then she came back and wrote this short blog post. Well, it turns out she had, I think something chronic, like a guts disorder, and it got fixed, and now she wasn't depressed anymore and she wasn't anti-natalist anymore, and she's stopped blogging because she's happy now. And it's a crazy story. I didn't believe it until I saw the actual blogs. And then yeah, she did, her disease was cured, her suffering was cured, and she could find beauty and happiness in the world again. So that's the sympathetic answer. I think a lot of these people that aren't making well thought-out statements, they're just in pain, and they need help. And the true answer in the deep of my heart is, "Fuck off."

SPENCER: [laughs] I like that. I think from my point of view, it's just, well, clearly, you don't want your friends to die. Clearly, you don't want your children to die and your family to die and your neighbors to die and your people, your country to die. It's just, surely if you actually think about the individual people we're talking about, you don't actually want this. There's some kind of weird way of separating yourself in an abstract way from the thing you're actually talking about that enables thinking in this way.

CONNOR: Yeah, I basically don't spare these kinds of things too much thought. I'm like, "What the hell are you talking about? I don't want all my friends and my family to die. And if you want to, you're my enemy. Fuck off."

SPENCER: Let's talk about why you think that this threat is very present. In other words, this isn't just a hypothetical that's gonna happen maybe in 100 years, but you think that actually could be a near-term threat?

CONNOR: I mean, look around, lol. This argument has become very easy since GPT-4 and perfect voice cloning and deep fakes and AutoGPT and such exist.

SPENCER: But most people don't agree, right? So it's [inaudible] easier to make.

CONNOR: I disagree.

SPENCER: Yeah?

CONNOR: I think this is actually a bubble, that people in tech — who are, by the way, incentivized not to believe this — are skeptical of this. If you talk to any normal person on the street and you explain to them the current status and you're like, "Oh, do you think this is good and should continue? And do we have much time?" They'll be like, "Absolutely fucking not. Shut this down immediately. Where's the government?" Of course, for technical competency, we may not want to necessarily rely on the general public, but I want to push back on the 'most people believe it's not close.' I don't think this is clear-cut. And also on the technical side, I think most people — maybe not most, but a very, very large percentage of people who actually work at the top labs like DeepMind, OpenAI, Anthropic — do think it's very close. If you actually talk to them, it's crazy. I remember talking to a journalist from a pretty big publication, and I told them, "Oh, just go to some of the OpenAI engineers and just ask them and they'll tell you, it's soon and it might kill all of us." It's like, "No way. They won't actually tell me." "Yeah, they'll just tell you. They're super honest about it. You can just ask them."

SPENCER: I agree with you that a lot of people on the street, like if you just talk to lay people, they are concerned about AI and they do find it freaky. A common thing I've heard people say is like, "Why are we doing this? Let's just stop." That's a reaction. And also when I've shown people things like ChatGPT, who'd never seen it, they often have a creeped-out reaction. They're amazed, but they're also like, "Yikes! Why?" But I also don't think the average layperson actually thinks it's gonna kill us all, certainly not anytime soon. I think they think it could cause problems. Maybe you disagree with that. But let's say we're talking to someone who doesn't think that this is a present threat. They think maybe this is a distant threat. What's the more specific case you have that this is actually potentially going to kill all humanity in the coming years?

CONNOR: Those specific cases that GPT-4 is really, really fucking smart, and it's the first system, the dumbest system we built with some of the dumbest methods. And we're still sealing these systems up. They're becoming rapidly much, much better. Basically, I think the burden of proof is actually not on why won't this be a problem. I think the burden of proof is, why do you expect this won't be one. Why do you think this will stop? My crazy prediction is just, I expect what has happened and the rate of progress in the next two years will continue. I expect in the next two years, you will have as much or more progress than the last two years. And two years after that, more progress than the years before. And then the two years after that, more progress than the years before. That's my whole prediction and I don't think we are that far. These systems can reason, can write complex stories, they can do very complex things. They scale; as you throw more data and more compute at them, they become more powerful. All of the leaders of all the labs think this is possible and close. Ask Sam Altman or Demis or something. They all think it's close. They all think that this is totally feasible. There are no obvious bottlenecks that I see that are not (quote, unquote) 'just engineering.' So I mirror the question back to the critic. I'm like, "Okay, well, why do you think it will stop?" This seems like, if everything continues as it currently has, it looks like we're on track to me.

SPENCER: Well, I think there are a lot of intuitions that people bring to the table. One intuition is that people all the time claim the world's ending, and it hasn't ended, obviously. Obviously, when the world eventually ends, then there will be people that were right, but yeah.

CONNOR: Yeah, sure. I think it was the New York Times that posted an article that was "Man won't fly for millions of years," 40 days before the Wright brothers flew. Sure, I just don't think this is how you do good science. You actually have to drop down to the object level and actually observe reality. If you make your predictions about technology based on social factors — such as people in the past have been wrong — you have a 100% chance of getting it wrong when it actually happens.

SPENCER: That's true, but you'll be right a lot before that. [laughs]

CONNOR: Yeah, you'll be right a lot until it actually matters.

SPENCER: You know, the chicken's right every day about not being slaughtered until that last day.

CONNOR: Yep. And so I just think this is a reasoning error. I don't think evidence is the problem here. I think this is literally just the way their reasoning is faulty like, as you say, the chicken. Or the turkey before Thanksgiving thinks the farmer is his best friend, until suddenly he isn't.

SPENCER: I think another intuition that people have is that, as we get closer to the system's becoming human level, that it won't all go at once. There'll be intermediate things that occur or (let's say) people dying because of these AIs or wars breaking out because of AIs or whatever, and that that will cause people to take action. It's not like nobody's going to try to make things better. And so maybe people think the government will step in and do something. Maybe people think that AI researchers themselves will be able to do something. I'm just curious to hear your response to that.

CONNOR: I have several responses to that. The first one is, I think it's pretty sociopathic to hope for people to die. I don't know how to say it any other way. If you're hoping for a disaster to happen, or even working on trying to cause one — there are people who, in the name of safety, are trying to cause accidents to happen and kill people — I think this is insane and morally, deeply unethical. The second point I would make here is that it's not at all clear what kind of signal you're waiting for here. What signal are you waiting for, man? COVID killed a bunch of people and people still reacted way too late on that, and they only reacted after a lot of people put a lot of social capital on making it [inaudible] happen. And that was a very easy, very legible thing that people have been warning us for, for 20 years. There's this insane shifting of the goalposts that has basically followed the field of AI since its very inception, of people saying like, "Oh, when X happens, we'll know we have AI." Then people do X and people are like, "Oh, no, no, that's not what I actually meant. What I actually meant is Y." And people do Y, then they're like, "Oh, no, no, that's not what I actually meant. I meant Z." And this has been consistent. If you went back 20 years, and you just qualitatively explained what GPT-4 is — what it can do, its list of features — to an AI researcher and you say, "Do you think this is like AGI?" They would say yes. I'm totally willing to bet on this. 99% of them would say, "Yes, absolutely. Holy shit, this is crazy." But now that we have it, "Uh, you know, ehhhh, look, it's not quite what we meant." And this has just been a consistent pattern throughout the history of AI. So I'm skeptical of people who are like, "Oh, no, this time, it's going to be the thing." And basically, related to that, the third point is, when have people reacted ever here? What do you expect people to do? For people to react, something still has to happen. People still have to actually do something. Somebody still has to actually do the work to do something. Why isn't that person you? I understand the argument of, it's hard to persuade people if you don't have really concrete evidence. Sure. But still, someone's going to have to do that even when that evidence exists. I would argue this evidence absolutely already exists, and it's existed actually, for years at this point. People just aren't listening and are not looking at it. And now it's becoming more and more visible and it's becoming easier and easier to make this case, and it continues just to become more and more easy. But I expect the point where no one can be disproven, is too late. The saying I like to use is that there are two times, and only two times, to react to an exponential: too early or too late. There is no golden perfect time when everyone agrees, "Oh, man, we should've reacted at exactly the right point, not too early or too late." If you wait for the perfect time in the exponential, you get smacked and you miss the point. You have to start early. I think we're already basically too late.

SPENCER: All right, so let's talk about how you're trying to solve this problem. So you founded Conjecture. What's the general approach you're taking at Conjecture?

CONNOR: Yeah, so at Conjecture, I like to describe it as a mission-driven, not a thesis-driven organization. What I mean by that is, our goal is to make the transition into the post-AGI era go well for humanity, 'to exit the acute vulnerable period' is how we would say it, 'in good shape,' whatever that means. And we are generally agnostic to how we accomplish this, deontological constraints concerning. Like, we wouldn't do illegal or immoral things but we are not necessarily committed to any specific thesis of like, 'Oh, we're gonna apply this method in this context to the bitter end, and that's what we're gonna do.' When we first started, we did a lot of interpretability research, trying to understand neural networks. We tried several things there. We didn't do much policy work. Now, I actually spend a lot of time, for example, doing policy, talking to policymakers and journalists and the general public, a lot of communications, because I've become convinced that there is actually a lot of very tractable things to be done around coordination. And slowing down AI, I think, is very attractive and very important. But I guess the closest that we currently have to a thesis is our technical research agenda, which we call cognitive emulation, or CoEm, which, if I had to summarize very succinctly, is the idea to build systems that are as smart as humans and no smarter, that solve problems the way humans solve problems, and no more. And the hope is that, by building such a system, and having the correct kind of safety guarantees — I'm happy to to talk about what I think those look like — will allow you to have systems that can do useful labor and useful things to speed up scientific development on stronger AI, to help with coordination, to produce economic value, and so on, while being constrained and bounded enough that they cannot do things that are vastly superhuman, and that break out of all containment and have all these kinds of things and that are trustable, that have a reason, a causal story to believe why they are doing the thing you're expecting them to do.

SPENCER: I don't think I understand what it means to be no smarter than human just because current AI systems are often superhuman in specific ways. Even Deep Blue, the old chess-playing engine, was superhuman at playing chess, and then you have something like ChatGPT which is superhuman in the sense that it's read more information than any human has ever read. Maybe you can elaborate on what that means to be no smarter than a human.

CONNOR: Yeah, that's a great point. Maybe 'no smarter than a human' is not a good way to cash this out because what intelligence means depends on your definition. A better way to say this is 'solves problems the way humans solve them.' So I expect the system would be superhuman, for example, because it doesn't sleep, or it doesn't have emotional problems. Those are big bottlenecks in humans in practice, but basically, the argument I make with CoEm is that I have many reasons to suspect — which we can spend some time going into if you would like — that I think the way humans do science, or the way humans do reasoning, our epistemology is a relatively compact, specific way of doing science. I basically think there are many ways of doing science. Many of these are very superhuman, so not at all how humans do it. Humans have a specific class of relatively simple ways of doing science, of doing labor, of reasoning, of System Two Thinking that we can constrict systems to using and to not use strong superhuman reasoning. Let me give you some concrete illustrations of what this would actually look like. This might help. Imagine you're talking to GPT-4. GPT-4 is a black box. You have no idea what the internals are. And let's say you ask GPT-4 to write you a program. And let's say you're a non-technical user, so you don't know how to write code. For GPT-4 to output this code, what guarantees do you have for what it just outputs? Well, none. It could be anything. If you don't understand the code yourself, it could be anything. Maybe it's what you asked for, maybe it's not. Maybe it's something totally different. Maybe it has a malicious payload. Maybe it's some crazy thing you don't know. You have no guarantees because you have no idea, you have no bounds of what's happening inside that neural network. And now if it's not only GPT-4, let's say an alien from the future brings you GPT-17, and you ask it for a program, an output, something. You have no idea what that is. Don't run it. Whatever the output, don't run it. Who knows what that is? Who knows what that code does? Don't run it unless you are a technical user and you can verify it, maybe. But even then, be careful if you're dealing with super intelligence. The alternative vision I'm proposing here, imagine you had a CoEm system — a hypothetical complete CoEm system — and you ask it to write your program. It will then output you this program but, in addition to this, you get a trace, a causal reasoning trace, a causal story for why you should believe that the thing you have received is the thing you asked for. So this is not the same as a narrative story. For example, you can totally ask GPT-4 to explain the code for you. But you have no guarantees that the story is true. It's in fact, often the case that, if you ask GPT why it did something, it will give you a plausible story. Then if you delete the story and regenerate, it will give you a completely different story about why it did it. There is no reason to suspect a causal connection between the story it tells and what it actually did. GPT-4 does not have access to its own internal cognition. It itself doesn't know why it makes the decisions it makes. In these hypothetical CoEm systems — and I call them systems, not models. These are not big neural networks training on human data or something like this. These are software systems, which involve neural networks that also involve many other pieces — such a system would give you a causal trace, a story of what thoughts were thought, for what reason, what reasoning was followed, what decisions were made based on what evidence that led to the final product. And so you can take any part of the story, and then you can double-click on it and be like, "Alright, why was this decision made? Where does it ground out? What fact was cited? What logical step was made? Why should I believe this?" And now this sounds horribly complicated, maybe even impossible. Why do I think this is possible? The intuition why I think this is possible, is basically because this is how human science works. In front of me right now, I have my computer, and I don't really know how my computer works. I know a lot about computers but I don't really know how all that works. But there is an unbroken chain of human reasoning that explains why it works, because otherwise it wouldn't have been built. Humans had to coordinate, people at Apple and TSMC and Foxconn and whatever, they all — 1000s and 1000s, tens of 1000s, hundreds of 1000s of people — had to communicate at a human level with each other, human level thoughts, to be able to coordinate around building this artifact. So I can go find the engineer at Apple (assuming they cooperate), and they could show me the blueprint for the CPU. They could explain it to me. I could go read a book on transistor physics. I could go look at the source code of the MacOS operating system, and I could check that it actually does what I think it does. Of course, this would be a massive amount of work that you wouldn't want to do generally in every situation. But this shows the existence proof, that no step along this chain is superhuman. There's no step where a magic black box outputs a magic artifact. Because if that was how we did science, we couldn't coordinate around it. We couldn't teach it to people. We couldn't reproducibly produce new artifacts. We couldn't build on other people's discoveries if this is how we did science. This is not how humans reason. This is not how humans do science. Now is it possible to do science this way? Yes, I think so. If you were a hive mind species, or an alien species where, every time you reproduce, the children inherit all the memories, then yeah, I think you could probably do crazy leaps of logic that completely make no sense to humans and cannot be taught. I expect this is what, by default, neural networks will learn, unless you build them in a way that they don't. I think a lot of neural networks will learn crazy, complex, alien leaps of logic that cannot be explained to humans in any reasonable sense. But I'm making an existence proof that you can get to the point of human intelligence and human science without needing that.

SPENCER: I wonder if you'd disagree with this, but I suspect that in human science, there are these interesting sorts of intuitive leaps that scientists often take where they're using just their intuition that they can't explain, things like that, exploring some thread that they're excited about, but they can't quite explicate why. But where it becomes auditable or legible is when they're defending it to others. So they go do an experiment, and they show that the thing actually holds, and then they show it to others and others can inspect that. But I don't think the process is always using the sort of step-by-step System Two logic.

CONNOR: Yeah, this is really good. I think you're largely very correct on this, and this is why I say it involves neural networks and why I talk about coordination, not so much about science, not so much about the internals of thoughts. When I think about the process by which a complex technological artifact like my computer is produced, I think there's a huge causal graph that led to this artifact being produced. And there are nodes (or subgraphs) of this graph that are horrifically complicated, like System One reasoning is horrifically complicated. And I expect that it is actually possible to cleanly reproduce this, but this is a research project that will take 50 years or something. So I think this is very, very hard. Conveniently, I think large language models capture most of the things we care about for the usage there. The interesting thing here is that basically, the way I think about it is that the algorithm that we use to coordinate around science, to teach science, to do science, to produce blueprints, to produce mechanically reproducible machines, and scientific theories, there are subgraphs that involve passing around extremely high dimensional tensors (kind of like how I think intuition in the brain works). I think what we call intuition and stuff is just very, very high dimensional heuristics, and tons and tons of tiny mini heuristics in very high dimensional spaces. And what we do is, we try to painstakingly compress these into these horrifically low dimensional channels that we call language. So this puts a constraint on the type of meta algorithm that science is. If you have a proposal for an algorithm that does science that involves passing around super high dimensional tensors around the entire graph, that might work, but this is definitely not how humans do it. Humans can do this. Maybe in some spots, there are high dimensional tensors involved, but they don't need it for the whole system. And I would furthermore argue that I think the amount of stuff that happens there is greatly overestimated, greatly overvalued. I think that there is a mysticism and there's a genius worship kind of thing going on — where we like to think about lone geniuses coming up with these incredible flights of things and coming up with these great things — but I don't think this is how science actually works in most cases. There are steps that look like this, and there are exceptional cases that are exceptionally brilliant. But most of engineering, most of science, happens on paper. It happens between people. It happens in institutions. It happens in massive coordinating networks, in externalized cognition. I basically think that a massive amount of human cognition does not happen in the brain. I don't think it happens in the person. It happens in the environment. It happens in our tools. It happens in our social networks. It happens in our institutions. It happens in our coordination mechanisms. And I think these factors are extremely important, and they must be simple, for a certain definition of the word simple. Because if they were complex — if our coordination mechanisms were as complicated as the internal of our brains — they wouldn't work, we couldn't do them because the external things that humans can optimize are very simple. We have very few bits, we have very few dimensions. Humans can keep (what?) seven plus/minus two concepts of short-term memory at a time. We have very limited vocabulary; our words are very imprecise. Whenever you do a math problem with more than five digits, you have to write it down and externalize it into a tool to have any chance of succeeding at it. To be clear, I'm not saying I'm 100% confident this is definitely going to work as I describe it, but I think all of this is very indicative. And I think people are not taking seriously how much of this stuff is not in the brain and how much of science is an institutional process, rather than a magical intuitive process.

SPENCER: Suppose we have an AI built as you describe, where you can ask it about how it came to the conclusion it did, and it will explain it in an accurate way, and then you can double-click on any part of that to get a deeper explanation on any piece in the reasoning process, how does this lead to safety?

CONNOR: This does not give you alignment. What this gives you is a bounded system. So this is not an alignment solution. The truth is, I do not know how to build an aligned system and I don't even know where to start. I think this is extremely, extremely hard. By the align system, I mean the system that will reliably do what you want it to do, even if you make a mistake. So if you ask the align system to shoot you in the foot, it refuses to do so. Instead it brings you flowers. If you ask a CoEm system to shoot you in the foot, it takes off the whole leg. So these systems are not aligned and they're not robust to misuse. And unfortunately, I expect that any early proto alignment solution will be of this shape, which is another reason why I think coordination is so important. I basically expect that, very soon, we're probably first going to build just completely unaligned agentic systems. If you have unaligned agentic systems, everyone instantly dies; not instantly, but in short order.

SPENCER: Of sufficient power, I assume, right?

CONNOR: Yeah, yeah, there's some threshold where, if you build a sufficiently powerful agentic system of whatever kind, which is not aligned, where you don't have strong guarantees of alignment, and you turn it on, maybe not instantly but within six months, we're all dead or whatever. I don't know exactly how fast takeoff will be, maybe it'll be slow, maybe fast. I don't know, but it's not going to be more than a few years, is my prediction. And the systems I am describing are not agents. This is quite important. A CoEm system is not an agent, it's not a person. Cognitive emulation evokes a little bit like, oh, there's a person in the box. This is not the case. This is like a platonic human, reasoner or exocortex. It has no emotions. It doesn't have a personality. It just implements meta algorithms of science in a way that a human can then inject the volition and the agency to wield it to do something. It's like an exocortex making your neocortex bigger and keeping you... Well, actually, it's less like an exocortex. The way I would describe it is that a CoEm system is not supposed to make a human into 1,000x-smarter AGI. It's supposed to make a human into 1,001x AGIs, or function as 1,001x AGIs. So if there is a technology that John von Neumann couldn't invent, CoEms also can't invent it. This is quite important, because I expect there to be such technologies and those would be extremely, super dangerous. Now, unfortunately, I expect any system of this type, whether it's a CoEm system or any other kind of system that doesn't instantly kill you, will be very powerful and will be quite brittle in the sense that there are straightforward ways in which you can break the safety guarantees, and often breaking those safety guarantees will make the system stronger. So I expect that if you have a CoEm system or any other proto align systems — let's call them proto align systems, because it doesn't have to be a CoEm system. It could be some other hypothetical architecture that I imagine could exist, that gives you the property of 'does what you tell it to do,' including bad things, and if you're uncareful, does dangerous things — I expect if you have such a proto align system, and you ask it to do things that you as a human can understand and control to a certain degree (I know this is not a very formal definition), you can make it do very useful things. You can make it produce massive amounts of economic value, vastly increase scientific and mathematical progress on various things, including alignment. But if you build such a system, you put a for loop around it — we put AutoGPT around it — and you tell it, "Go build me nanotech," you die, because you've broken unboundedness. Who knows what happens now? So this gives you this really unstable world, which I expect us to enter in pretty soon. I expect in the next couple of years, what's going to happen is either powerful agentic systems become so powerful, so fast, that we all just die and there's nothing we can do, or things slow down or get controlled by governments or whatever, and/or we get proto align systems. So some labs (one or more labs), succeed at building proto align systems that, if you follow the 200-page safety manual of what not to do, you can make them do useful things without killing anybody. But if those things leak, if those things get posted on the Internet, if those things get released open source, everyone dies because the Twitter cultists find it and they take off all the safety measures, and then you're screwed. So what you need to do is, if you have a proto AGI, a proto align system, you should keep it very, very secure. You need nation-state level security. You need to make sure that this is not something that can be reverse-engineered. I hope you stopped publishing years ago, because you don't want to have a paper trail of how to build it. So please, everyone stop publishing, Jesus Christ. And then you want to use a system to do the research necessary to actually build an align system that is safe enough that it cannot be misused, if that is possible though. That's kind of my thinking.

SPENCER: What you just said reminds me of Kerry Vaughan who has been going around on Twitter saying, "Stop building AGI, you fucks." I'm curious what you think of that.

CONNOR: Yes, obviously, for God's sake, just stop. There's this great essay — I forgot who wrote it. And it's something like (can't remember the exact title) — the title something like, "The problem isn't the incentives, it's you." There's this absolute level of cope that exists in people where they're like, "Oh, no, there's nothing we can do about it. Raise the incentives. I have to do it, bro." And I'm like, "No, no, you don't, you can just not do it." This is a thing humanity is capable of. I'm not saying they will. This is something that's quite hard. Coordination is quite hard. Commons problems are quite hard. But they are doable. This is a thing that can be done. Now, this is why we have things such as governments. Oh, a small group of very powerful individuals is racing forward on a path which will predictably be extremely harmful for the general public, which the general public does not want to happen, and they cannot coordinate amongst each other to stop this. This is exactly what governments are for. This is exactly why governments have a monopoly on violence. This is exactly why we have regulation. This is the exact kind of thing that governments are good to intervene and be like, "Nope, stop this shit immediately," and this is exactly what we should be doing. We should be working with authorities. We should be working with governments here. Just shut this shit down. This is madness. This is complete madness. Even the people building these systems have publicly stated that they think there's a good chance they might wipe out all humanity, and they're doing it anyway. Now look, you might not buy all the existential risks and whatever, but surely, I think any listener can agree. If you did believe them, if you did think what your company's doing has a 10 or 20 (or whatever) percent chance, probability of wiping out all of humanity, and you do it anyway, your keys should get taken away. You're drunk, no driving for you. What the hell are you doing? No! We as a society don't accept that, and we should coordinate against this happening. This is exactly what we should do. I strongly endorse this. I work on this. I talk a lot about this. And I am optimistic (well, optimistic is a strong word). I won't say I'm optimistic. I think it is possible. These are things that we can do, that we have done, and we can do it again. It is something that is not surprising, but also, itt's not frustrating, that's the wrong word. I'm not frustrated by this stuff anymore. This stuff used to frustrate me very much. It doesn't really frustrate me anymore. This email was saying, "Weak men create bad times." This is what it feels like to me, is people are just like, "Oh, no, nothing we can do. I'm only a slave to my incentive. I have no control over the future. There's nothing I can do." And I'm like, "Man, yes, you do. You can just decide not to do this." Sorry if this is a bit of a rant here but I had to get this off my chest. This idea that the future is written in stone is just false. It's not true. The future is not yet decided. This is something that's very strange about the world we live in right now, is we have not yet obviously lost. There are many, many timelines, I think, where we have already lost, and it's already completely over, and there's nothing you can do. But we are not in this world. Every time I'm like, "Oh, shit, this is it?" there's still a glimmer of hope. I'm like, "Oh, wait, no, the general public does have a problem like this. Wait, government does want to do something. It's not over yet, GPT-4 didn't kill us. Wait, it's not over." And, yeah, the future is not decided. The way I expect things to go is that, if things continue on the current path, if things continue as they are right now, it's game over, and there will be no good outcome, and that's just going to be it. If things go well, it will go well because we as a species and as individuals actually take responsibility and take action and take power, and actually do what needs to get done. Good things don't happen by default. Everything that is good about the world was created by someone. Someone's will was brought upon reality. Someone put in the hard work, the sweat, the tears, the blood, to actually make something good happen. I'm sitting here in my beautiful apartment, air conditioned, with great food from my favorite restaurant delivered earlier today, and such. This is not how my ancestors got to live. And my ancestors worked very hard to enable me to have all these beautiful things. And I think we owe it to the next generation to also build those beautiful things. They don't happen by default. They didn't just happen. And so similarly, if you want things to go well, you have to make them go well. We have to actually do that. We have to actually take the actions and screw the fucking incentives. Screw the stories that people tell themselves. Screw whatever made-up fantasy social narratives people have themselves. It's not over. We can coordinate. We can solve alignment. These are all things that are within the actual space of mankind. The question is simply whether we will do it.

SPENCER: That's an inspiring speech. I do wonder what someone who's skeptical will be thinking right now, and I'm going to try to channel them a little bit. I think that one thing that they might wonder is how you go from, okay, agentic AIs doing bad things to agentic AIs wanting to take over the world. They might, in their mind, fill a gap there. Okay, so say someone makes these AIs want to make money or something, you can imagine how they could do all kinds of bad. It could make it try to blackmail people or hack to cause stock prices to collapse. But where do we get from there to everyone dying? It seems like there could be a gap in people's minds.

CONNOR: Yeah, I think it's mostly because people are not used to thinking about optimization. Let's take an example of somebody who doesn't want to make money. Let's say something that wants to make art. Let's say we have robots or an AI in the future and it developed a taste for some kind of funky art — like alien art — the same way some humans are into some abstract art or have sexual fetishes or various things for no good reason. Why do people like feet? Fuck do I know. It's just neurology, I guess. Some neurons in the brain are just crossed that way. And these systems, these AGIs will want to reconfigure their environment into something they like, whatever that liking means. Maybe they want to build big, beautiful pieces of art, or they want to paint everything green, or they want to write some kind of complicated software, whatever, doesn't matter. No matter what these goals are that these things are trying to achieve, if you actually try to maximize that, you actually want to get as much possible art as possible. You want to make the biggest statue. You want to make the most paintings, whatever. You need resources for this. You need as much resources as you can get, and you need pesky humans to not intervene. So let's say we have a system that's building great obelisks that it really likes for whatever reason. This is obviously silly. This is not what would actually happen but take this as an example of an arbitrary preference. And it starts building big obelisks. So it starts excavating mountains to get more rock. Now, humans are probably not going to be very happy about this. They're going to be like, "Wait, hold on. Those are our mountains. What the fuck? Don't do that." Now, the system — similar to the coffee robot earlier today — will reason, "Well, shit, I don't want these humans getting in the way of my cool obelisks here." So what will it do? Well, it's very smart, and it's very sociopathic. So it'll develop the human equivalent of bug spray. The system doesn't hate humans. It just doesn't care. When humans want to build a hydroelectric dam, and there's an ant colony in the valley, well, it sucks for those ants, not because we hate the ants. I don't hate ants. They're kind of cool. I don't mind ants as long as they're not in my food. But if they're in my food, I suddenly have a big problem with ants, and I want them to get the fuck away. I'm gonna grab some bug spray or whatever (maybe not bug spray in the kitchen), but I'll find something to get rid of them. And this is what AI will do as well. Humans have very annoying things such as nukes. Those are really annoying if you're trying to build cool obelisks. You wouldn't want the human to nuke those. So how about we take those away from the stupid little things? And how do we like crawling all over my processors? This is a huge hassle. Let's get rid of them. So the way it will happen is, in the process of gaining more power, in the process of mining for resources, of building solar farms, of stopping annoying pests from getting in your way, it will be a natural thing for a very powerful sociopathic system to just do away with all those annoying, fleshy animals that are in the way.

SPENCER: Now I suspect that what may make that hard to understand for people, is that humans don't seem to have the same unlimited drive, or at least individual humans; maybe humans as a collective do. But if you take an individual person and they want to make money, it's not like they're willing to end all life on earth to make money. Or if they're creating a paper clip factory, it's not like they're willing to end life on earth to make more paper clips. So how does this differ from what you'd expect from an AI? And why would the AI be different?

CONNOR: Well, this is actually an extremely good question so thank you for asking. This is actually a really good question. There are several answers I have to this. The first cheeky answer, which is not the true answer, is some people are like this. There are cultists and terrorists and fetishists that absolutely would do this if they had the power to, and they are like sociopaths. There are absolutely sociopaths, psychopathic serial killers, who just want to kill and torture everybody. And if they had the power to do so, they would. There are humans who are like this, but you're correct that this is not how most humans are. The true answer to this, if I was to be cynical, is that you're correct in the sense that humans don't really optimize very hard. Humans are basically just very lazy and not very smart. A friend of mine has a very cynical take on humans. He says humans are like windup toys. They optimize for N steps and then they halt. This is how most humans work. Most humans have this weird behavior pattern where they will try a certain fixed amount to achieve any given goal. It's usually proportional to the goal, like polynomial and the goal size. If they want a career, they'll put this many hours into it, or this much emotional energy or something. If they want a girlfriend, they'll put these many dates into it or whatever. Then they'll get some amount of far, and then they stop and they'll just settle for whatever they've gotten at that point. I think there are several good reasons why you might want to do this. Optimizing is very difficult and very stressful and draining because, if you always push for better, you're never satisfied. This is the whole point of optimizing, that you're never satisfied. You always want to get more. You always want to try harder. And this is why the people who do optimize — and to give more positive examples of people who optimize — extreme athletes, or very, very, very high intelligence mathematicians or stuff that always push the boundaries. Athletes do actually optimize, like Olympic athletes, or speedrunners is another great example of types of people who actually optimize quite hard for often very arbitrary things. Speedrunners are optimizing for some number in a video game being not so high. How AI is that?

SPENCER: Speedrunners as in people who play video games to get the smallest time, not speedrunners as people who run races, right?

CONNOR: Yes, exactly. People who play video games, trying to get the world record for beating them as fast as possible. This is quite a large hobby online. It's become quite popular in the last couple of years. It's very entertaining. I do recommend people watch some good speedruns if you haven't seen them before, because they're insane. It's insane, the level of optimization. Watch some world-record Mario 64 speed runs, it's crazy. The amount of optimization effort — brain energy of hundreds and thousands of smart people — have been poured into optimizing glitches and exploits and how to optimize movement, and all these kinds of things, to a degree where it's almost inhuman. They're clearly human still, but the level of optimization that goes into a Mario 64 speedrun or a Pokemon blue speedrun or something, the amount of science that had to be done to optimize this arbitrary number is quite unusual for humans. This is not something humans tend to do in most scenarios. And it's not a coincidence that I think a lot of people that do these kinds of things have a certain personality type that lends itself to optimization. In a sense, I expect AI systems to be more like speedrunners than they are like lazy humans. They are systems that are optimized for optimizing things. That's how we build them. We don't give them emotions such as laziness or tiredness or whatever. No, because why would we? The reason we have these emotions is because, in our ancestral environment, we were extremely energy-constrained. It was important to be lazy because otherwise you would starve. There were many problems, that's why it was good to be lazy and lay around all day, rather than optimizing really hard. Most of history, human optimization (and animal optimization) wasn't that powerful compared to the amount of calories you had to put in. That has changed quite dramatically. Now humans have access to a massive amount of calories and energy, and a much higher intelligence so we get much higher returns on our optimization. But an AI system has even less of these constraints, just because of the way we build them. To be clear, you could build a lazy AI that doesn't do these things. That would be possible. You could build an AI system (maybe) that doesn't do terrible things, but we don't know how, and that's definitely not how we're building them. So on the one hand, we are building optimizing systems. We're building systems that are ambitious. They're trying to do complicated things. And it's what we want them to do. We put them on benchmarks. We try to make them get the high score. We try to get them to get the best predictions, the best results and whatever. This is what we're building them to do. So I expect this is how it will go. And the second thing is, a large reason why humans don't do terrible things, is because humans generally don't like doing terrible things. Most humans who are not sociopaths just have inborn instincts and socially conditioned instincts and dislike hurting people. You don't murder people, you don't do things like this. In addition to that, we also have the superstructures of law and society that we punish people affected by this. Again, this is what we have prisons for. If you took the worst serial offenders, psychopathic criminals from the maximum security prisons, and you gave them 3000 IQ, and they never got tired, yeah, I expect they would try to take over the world, or worse.

SPENCER: When we think about that example of Olympic athletes who really genuinely try to fully optimize for some goal, it seems like one way that metaphor breaks down is that you wouldn't expect the Olympic athletes to be willing to destroy the world to win the race, right?

CONNOR: Well, the reason is that a lot of them care about the world. They don't only care about the race. Let's put it this way. Let's imagine I went to an Olympic athlete who was currently not on track to win, and I say you're gonna win the gold medal, but you have to trample this ants' nest to do it. Do you expect them to do this?

SPENCER: Of course.

CONNOR: Exactly. This is how I think it will look from the AI's perspective. In the AI's perspective, it won't care about humans. It will understand that humans exist, the same way that the Olympic athlete understands that there are ants and they understand that the ants don't want to be killed. He understands this. He just doesn't care.

SPENCER: Going back to the metaphor you gave about a human being like a windup top that takes N steps. I think I view it a little bit differently. I view it as humans have a lot of competing drives. One drive is to conserve energy. It's sort of a laziness drive and there's an evolutionary purpose for that. Spending energy is costly, then we can't spend it later. Another drive is to protect our children and another drive is to be happy and so on. And so that part of what keeps humans safe — or relatively safe — is most people have a bunch of drives that constrain their behavior to prevent them from doing things like murdering strangers.

CONNOR: Yep, that's great. I would say most people care about other humans. Most people care about the world. Most people don't like killing other people. And this is a contingent factor for humans. This is not universal. There are humans who enjoy killing other people. Even in humans, the alignment is not strict. By default, if you don't put in a mechanism that makes people not like killing people, they will be indifferent or like it. And we do not know how to put these kinds of drives into our machines. We don't know. I don't think it's impossible. I think there may be definitely ways where you could write algorithms to make AIs that dislike killing people, and then they probably won't do it. But we don't know how to do that.

SPENCER: Right. And then of course, you have the problem of, okay, maybe it doesn't like killing people, but what about doing all kinds of other horrible things that could end civilization? So yeah, this trying to list all the things you don't want an AI to do on the way to make money, that's a...

CONNOR: If anything, it is a miracle. I mean, it's not a miracle. There are straightforward reasons why human society is stable and this is not simple. The reason human society is stable is because there are no crazy super intelligent mega humans. If every ten years, one baby with 10 trillion IQ would be born, society would not be stable. Or when a Hercules gets born or something, or some immortal super guy, like if Superman just came to Earth and integrated into society, and would compete with other humans, society wouldn't be stable. Superman would beat the shit out of everybody. Our society couldn't contain Superman. If Superman was nice, cool, awesome, that's great. But if Superman is not nice, then, well, what the fuck are we supposed to do? He's fucking Superman. If he wants to go extract our mountains to build the obelisks, what the hell am I gonna do to stop him?

SPENCER: Before we wrap up, I just want to hear your thoughts. What would you say to someone who's like, "Okay, this seems like a really big deal. But I'm not an expert in AI. What can I do to help? What steps can I take?"

CONNOR: I think the first most important thing is taking this seriously and actually saying so and being like, "No, wait, what the fuck." I think this is actually super important. And the reason I say this is that the way that decisions on a cultural level get made, on a societal level get made, is by shifting the Overton window and shifting the default common knowledge of society. It's not enough to convince one person (like one politician) that there's a problem, because that's not how coordination works. Coordination is built on common knowledge. The politician needs to know that the populace knows that this is true, and that the populace knows that he knows this. This is how coordination actually happens. It's based on common knowledge. So building common knowledge in yourself, and in your social circles, being like, "Oh, no, this is common knowledge. This exists. This is a problem and this is bad." That it can be stopped and that we should stop it is powerful and necessary at scale. At scale, our society needs to change its baseline common knowledge such that I can go into any pub, talk to anybody, like, "Hey, well, what about the AI thing?" It'll be like, "Oh, yeah, screw that shit. Our politicians really need to do something about that, right the hell now." This should be normal. It should be the default reaction of everybody. This is the smallest thing that I think people can do. Just take this seriously. Talk to your friends about this. Talk to your representatives about this. Send them a letter, an angry email. "What the hell are you doing about AI? Why are you letting these things happen? What are you going to do about it?" Don't let yourself be tricked with this stuff as normal. I think this is something that literally everybody can do. If you're also a technical person or a very wealthy person, there's more you can do obviously. A technical person, consider working with the safety problem. If you're a politician or someone with a lot of social influence or someone with social capital — maybe you're a star or something, where you have social media clout — spread the messages even further. If you're a very wealthy person, money is a pretty fungible optimization juice. Consider funding people to work on these problems. Consider funding people to advocate for these problems. Consider pulling around your own social capital among the wealthy and the powerful to take this problem seriously. These might all seem marginal, but this is what it feels like to coordinate at a society level.

SPENCER: Connor, thanks so much for coming on.

CONNOR: Thanks for having me.

Staff

Music

Affiliates


Click here to return to the list of all episodes.


Subscribe

Sign up to receive one helpful idea and one brand-new podcast episode each week!


Contact Us

We'd love to hear from you! To give us your feedback on the podcast, or to tell us about how the ideas from the podcast have impacted you, send us an email at:


Or connect with us on social media: