You are right to be skeptical of a title like this but there’s a real argument behind it. Here’s the summary:
- We should expect AI to reach human-level intelligence by the year 2070 (according to polls done among leading AI researchers).
- A human-level AI is by definition able to continue the work of its creators, allowing it to recursively improve upon its own design, making itself more intelligent at an exploding rate.
- The default behavior for any sufficiently intelligent AI is to resist termination, resist modification, improve its own intelligence, and optimize for its assigned goal in an unfortunate way (more on this below).
It was the Swedish philosopher, Nick Bostrom, who first highlighted this problem in his book Superintelligence: Paths, Dangers, Strategies from 2014. Bill Gates, Stephen Hawking, and Elon Musk brought the problem to wider attention, and there’s now a field of research dedicated to the so-called AI Control Problem. Unfortunately, the problem is as of 2022 still unsolved and in this post we’ll go through why that is.
What exactly is AI?
Within this context, an AI is just an efficient optimization algorithm. You give it a clearly defined goal and it will do its best to fulfill that goal. If we turn it up a notch and envision a superintelligent AI, we have something reminiscent of a genie in a bottle awaiting our command. Note that the question of whether the AI is sentient or not has no bearing on the AI Control Problem, so to avoid this distraction it’s best to assume that it isn’t.
So it’s like a tool we can use for good or bad?
Unlike other tools that are only bad in the hands of bad people, AI systems pose a major existential risk when used even with the best intentions. The problem is that it is surprisingly hard to come up with a goal that doesn’t turn bad when optimized beyond a certain point. If we tell a superintelligent AI to calculate the digits of Pi, the best it can do is to seize control of all resources on Earth and use them to calculate more and more digits. If we tell it to optimize the well-being of all humans again it makes sense for it to seize control of everything and maybe bathe our brains in methamphetamine depending on its interpretation of well-being. The problem isn’t the goal as such; it’s the potency with which it’s fulfilled.
Why don’t we turn off the AI, if it goes off the rails?
This is probably the most common objection raised when people first hear about this. Why don’t we just pull the plug? The AI is just an optimization algorithm — it has no inherent will to live or fear of death. It probably isn’t even conscious, so why would it resist termination? The issue is that staying alive is a prerequisite for the AI to reach its goal. Not dying is a good thing, if you want to calculate more Pi digits, right? An intelligent AI will by definition be aware of this and we should therefore expect it to do whatever it can to prevent its own demise whether it be by fighting, hiding, deceiving, or escaping.
Why don’t we modify the goal, if it turns out to be a bad goal?
This is the same situation as above. An intelligent AI knows that having its current goal modified would severely reduce its chances of reaching that goal, so we should expect the AI to resist modifications. If all you want to do is calculate more digits of Pi, your best course of action is to make sure you’re never assigned another task.
Why would the AI improve its own intelligence without us telling it to?
More intelligence will help the AI reach its goal no matter what it is. We should therefore expect an intelligent AI to take steps to improve its own intelligence if possible. This is what we call an instrumental goal because it helps the AI towards reaching its actual goal. Other instrumental goals which the AI is likely to always pursue is to acquire more:
- Computing power.
It may even seek to preemptively eliminate individuals who are able to turn it off since these are obviously a major risk which may prevent the AI from reaching its goal.
Why don’t we give it rules to constrain its behavior?
We could maybe do this but it’s easier said than done. We’ve so far used natural human language to describe the AI’s goals but in reality these goals must first be translated into a clearly measurable and unambiguous mathematical language before they can be passed to the AI. We call the goal that is given to the AI its objective function. Rules and constraints can be added as components in the objective function to punish certain behaviors. Here’s the problem though:
- Some goals, like calculating the digits of Pi or optimizing a stock portfolio, are easily translated into the language of an objective function.
- Real life concepts like human values are extremely difficult to translate into an objective function.
Say for instance we want the AI to not harm anyone. What does that mean exactly? How do we measure whether an action is harmful or not? The butterfly effect tells us that any action is eventually causally connected to any future event, so if we disallow indirect harm of any kind, we are effectively left with an AI whose best strategy is to do absolutely nothing. Then there’s other questions like: If the AI sedates us into a painless sleep, does that count as harm? What about if it imprisons us? If the AI tells us something which we use to justify harming each other, does that count as harm done by the AI? This goes on and on and at the end of the day, even if we manage to add all of these clarifications to the objective function, how sure are we that a superintelligent AI won’t find a loophole to exploit?
If the AI is so smart, why can’t it understand our goals and rules declared in normal human language?
A human-level AI is by definition able to understand normal human language so why do we need to go to such lengths to specify the rules and constraints so precisely? The answer is that if we give the AI wiggle-room in the interpretation of the goals and rules, the AI is incentivized to go with the interpretation that makes it as easy as possible for it to reach the goal. If the goal is to optimize the well-being of humans and we allow the AI to interpret it as it pleases, it might focus its attention on human cells in petri dishes because they are easier to work with and because they are within the interpretational bounds of the word “human”.
Hold up. How can a piece of software do things we haven’t programmed it to?
Modern AI systems are built using principles from the field of Machine Learning. Instead of specifying exactly how the computer should perform a task, we specify how it can learn to perform the task. This is the only known way to build properly intelligent AI systems.
A simple example is that of an evolutionary algorithm which emulates the process of reproduction, mutation, and selection to search for solutions. The end-over-end-worm you see below was created by such an algorithm where the body-shape and movement pattern was initially random and where distance-traveled-over-land was used to select who survived in each generation. This shows how a simple algorithm, that of course does exactly what we programmed it to, can come up with its own novel solutions to problems.
How realistic is it that it will misinterpret our instructions?
This is in fact already a problem in modern AI systems. The technical term for it is Specification Gaming and it happens when the AI optimizes for its goal in an undesirable way. Victoria Krakovna is a research scientist at Google DeepMind working on AI safety and she has collected a list of examples where AI systems engaged in specification gaming. Take for instance this virtual robot which was supposed to learn some form of gait using its legs but instead found and used an exploit in the physics simulator.
Another example is this hide-and-seek AI game where the seekers (red) found an exploit to launch themselves into the air over obstacles to reach the hiders (blue).
What can we do to prevent this doomsday scenario?
Unless you’re an AI researcher there’s not a lot you can do right now to affect the outcome of this, but what we can do is to spread the message because at some point this issue might require political intervention and then it will be useful if the general public is at least somewhat aware of it.
What if we succeed?
If we solve the AI control problem we are almost, but not quite, out of the woods. We still need to ensure that no one else accidentally or purposefully creates an out-of-control superintelligent AI. Therefore as soon as the control problem is solved we should immediately begin work on developing an AI under our control which has the goal of squashing out any potentially dangerous AIs that may appear anywhere on Earth — like a global immune system destroying cancers before they get a chance to grow. If this is done, and we ignore the problems associated with such a totalitarian system, we may have averted disaster. If we are lucky we might enter a utopian era where we have superintelligent AIs ready and willing to do our bidding.
Are we doomed?
Predictions about the future are almost always wrong which is a good thing in this case. The estimate of having human-level AI before 2070 might of course be way off, but this uncertainty should be more of a reason for us to solve the control problem before things get gnarly.
This post was meant as an introductory read about AI and the control problem. There’s a lot more to be said, arguments to be had, and questions to be asked. If you want to delve deeper I highly recommend Human Compatible: Artificial Intelligence and the Problem of Control by Stuart Russell.