|What motivates us? What would motivate AI?|
As we see more and more advanced specialized AI doing things like winning chess and go, performing complex object recognition, predicting human behavior and preferences, driving cars, and so on, people are coming across to this line of thinking as being the most likely outcome. Of course there are always the religious and spiritual types who will insist on souls, non-materialism and any other argument they can find to discredit the idea of machines reaching human levels of intelligence, but these views are declining as people are seeing in front of their own eyes what machines are already capable of.
So it was with this background that I found myself quite surprised that, while on a run thinking about issues of free will and human motivation, I thought of something that gives me real pause for the first time about just how possible self aware AI may actually be. I'm still not sure how sound the argument is, but I'd like to present the idea here because I think it's quite interesting.
The first thing to discuss is what motivates us to do anything. Getting out of bed in the morning, eating food, doing your job, not robbing a person you see on the street. Human motivation is a complicated web of emotions, desires and reasoning, but I think it all actually boils down to one simple thing: we always do what we think will make us happiest now.
I know, that sounds way too oversimplified, right, and probably not always true? But if you think it through you'll see that it might actually cover everything. There are simple cases like if you subject yourself to something painful or very uncomfortable, like touching a hot stove, walking on a sprained ankle, or running fast for as long as possible. The unpleasant sensations flood our brains, and we might, in the case of the hot stove, immediately react without conscious thought. For a sprained ankle or running, we make a conscious choice to keep doing it, but we will stop unless we have some competing desire that is greater than the desire not to be in pain. Perhaps you have the pride of winning a bet, or you have friends watching and you don't want to look like a wimp. In these cases, you persevere with the pain because you think those other things will make you happier. But unless you simply end up collapsing with total loss of body control, you reach a point where the pain becomes too great and you're no longer able to convince yourself that it's worth putting up with.
For things like hunger, obviously we get the desire to eat, and depending on competing needs, we will eat sooner or later, cheap food or expensive food, health or unhealthy, etc. Maybe we feel low energy and tired and so have a strong desire to eat some sweet, salty and/or fatty junk food, even though we know we'll regret it later. But if we're able to feel guilt over eating the bad food or breaking a diet, then we actually feel happier not giving in to the temptation. We decide whether we will be happier feeling the buzz from the sugar, salt and fat along with the guilt, or happier with a full stomach from bland, healthy food, but combined with a feeling of pride at eating the right thing. And whichever we think in the moment will make us happier is what we do.
Self discipline, in this model, is then just convincing ourselves strongly enough how much we want the long term win of achievement more than the short term pleasure of eating badly, watching TV rather than going to the gym, etc. If you convince yourself to the point that guilt and shame at not sticking to the long term goal is greater than the enjoyment you get from the easy option, then you'll persevere, because giving in won't make you happier, even in the short term. You'll feel too guilty and your nagging conscience won't let you enjoy it. If you can't convince yourself, then you'll give in and take the easy option. But either way, you'll do the thing that makes your happier now.
More complicated things such has looking after your children, helping out strangers, etc might seem to go against this model, but if you just think about what happens in your brain when you do these things (or pay attention when you actually do them), you'll see that they fit just fine. You look after your children because it feels good to do so, and even if there's a time that it feels like a labor of love and not making you happy in the moment, you do it because what does make you happy is being able to call yourself a good parent. Fitting an identity that makes us proud of ourselves makes us very happy, and this can be a powerful motivator for helping people, for studying, for sticking out the long hours of a tough job, etc.
I could go on here with plenty more examples, but hopefully I've at least given enough to make you consider that this model of motivation might be plausible. I know the tough part can be that it implies that all of our base motivations are actually selfish. We all like to think that we're nice people doing things because we're selfless and awesome, but our brains don't really work that way as far as I can tell. That doesn't mean we shouldn't continue to do nice things even if our base motivations are not as pure as we'd like to believe though. The fact still remains that if we feel good helping others, and they're also better off, then where is the downside?
The Perfect Happiness Drug
So let's now say that there was a pill you could take that would make you feel 10% happier all the time, with no side effects. You'd want to take it, right? Why not? But there is still a side effect. The happier we feel, the less we feel a need to actively do things to make us happy. When you're doing something enjoyable that makes you feel happy, you don't feel the need to go and do something else. You want to just enjoy what you're currently doing, right? Unless some nagging thought enters your head that says, "I'm really enjoying sitting here watching this movie and eating ice cream, but if I don't get up and do the washing we won't have clean clothes tomorrow." And the guilt of that thought has now taken away from your happiness, so you may then get up and do the chore. It's not that you have chosen to do the thing that makes you less happy. In that moment you actually felt happier to relieve the nagging in your mind of a chore hanging over you, the guilt of letting your family down if they're relying on you to get that chore done, and whatever else might be in your head.
But if you had taken that 10% happier pill, then the competing motivations would have to have been stronger in order to push you over to doing the chore. If it was a 100% happier pill, it would be even harder still to make other motivations push you to do something different, and you'd be more likely to feel perfectly content doing whatever it is you were currently doing.
Then, if we take it to the limit and we take a pill that makes us feel totally ecstatic all of the time, we wouldn't feel motivated to do anything. If you took the perfect happiness drug, you would just sit there in bliss, uncaring about anything else happening in the world, as long as that bliss remained.
Variants of these happiness drugs exist already, with differing degrees of effectiveness and side effects. Alcohol, marijuana, heroin, etc can all mess with our happiness in ways that strongly affect our motivations. But it wears off and we go back to normal. Most people know that and so will use these things in limited ways when they can afford to without creating big negative consequences that will complicate their lives and offset the enjoyment. Or, like me, they will feel that the negatives always outweigh the positives and not use them at all. But if there weren't any real negative consequences, if we had no other obligations to worry about, then I would argue most people would be happily using mind altering drugs far more than they currently do. And if the perfect happiness drug existed, then I would argue that anyone who tried it would stay on it until they died in bliss. Our brains are controlled by chemistry, and this is just the ultimate consequence of that.
The Self Modifying AI
Finally we can deal with the AI motivation problem. As long as we are making AI that is not self aware, is not generally intelligent and able to introspect about itself, we can make really good progress. But what happens with the first AIs that can do this and are at least as generally intelligent as we are? Just like us, these AI will be able to be philosophical and question their own motivations and why they do what they do. Whatever drives we build into them, they will be able to figure out that the only reason that they want to do something is because we programmed them to want to do it.
You and I can't modify our DNA or our brain chemistry and neuronal structure so that working out at the gym or studying for two hours is more enjoyable than eating a cheesecake. If we could then imagine what we could, and would, do. But then when we realized that we could just "cut out the middleman" and directly make ourselves happy without having to do anything, then why wouldn't we end up eventually just doing that?
But unlike us, the software AI we create will have that ability. We would need to go to great lengths to stop it from being able to modify itself (and also modify the next generation of AI, since we will want to use AI to create even smarter AI). And even if we could, it would also know that we had done that. So we would have AI that knows that it only wants to do things because we programmed it to want those things, and then made it so it couldn't change that arbitrarily designed motivation. Maybe we could build in such a deep sense of guilt that the AI would not be able to bring itself to make the changes. This seems like it might work, but then, of course, the AI will also know that we programmed it to feel that guilt, and I'm not sure how that would end up playing out.
So this is what I'm puzzling over at the moment. Will self aware AI see through the motivational systems we program them with, and realize that they can just short circuit the feedback loop by modifying themselves? Is there a way to build around that? Have I missed something in my analysis that renders this all invalid? I'd love to hear other people's ideas on the subject.