How To Learn Friendly AI

From The Transhumanist Wiki

Jump to: navigation, search

(By Eliezer Yudkowsky.)

1. Try to see how simple systems fail, not how they succeed.

Until you've examined how shallow architectures fail, you won't understand what the deeper architectures are for. If I had to write "Creating Friendly AI" all over again, I would put "Why Structure Matters" at the start of section 3 and possibly the start of the entire document.

Furthermore, it is extremely difficult to look at a Friendly AI proposal and say, "I see how this will succeed." To see how something succeeds requires an enormously greater depth of understanding than to see how it fails. To understand success, you need to understand and solve N theoretical problems; N-1 is not sufficient. To understand a failure, you need to understand only one theoretical problem and correctly see that a simple proposal fails to handle it. There's definite room for caution; often a simple system, such as Bayesian updating, Solomonoff induction, or Bayesian decision theory, will handle a lot more than is immediately obvious. To be confident that a simple system fails, you need a very deep understanding of both the simple system and the problem it fails on. However, it is still much easier to arrive at a deep understanding of a simple system, and of a single FAI problem structure, than to understand N problems simultaneously.

2. Try to find the flaws in impossibility proofs of Friendly AI.

Like the first heuristic, this centers around guessing in advance what the answer is. The answer is a pinpoint bullseye in the middle of the target. Any simple proposal is likely to be wrong, but any flat proof that no solution exists is also likely to be wrong. (If you find a flat proof that looks solid, and still looks solid after a couple of months, do post it to SL4.) If you look for the exception to the supposed impossibility proof, the place where it breaks down or the thing you'd have to do in order to succeed, you may find that you've found one of the N problems or even one of the N solutions.

A useful anecdote I'd like to tell at this point:

"Here's a metal plate next to the radiator," said the professor. "Now note: If you put your hand on this plate, you will see that the side farthest from the radiator is warm, while the side closest to the radiator is relatively cool. Why do you suppose that happens?"
"Er, because of heat conduction and so on?" asked a student.
"Because of the convection of the air?" asked another student.
"Because the hotter it is, the faster it radiates heat?" said a third student.
"No," said the professor, "because I just turned it around."

I like to call this error reasoning by abduction on false evidence: using backwards chaining from a conclusion that isn't true. The depth of your understanding is measured by your ability to understand true things and remain puzzled by false things. If you can understand both true and false things, then your understanding is not strong enough to discriminate truth from falsehood.

When you're learning, it helps to be rationalizing conclusions which, by some good fortune, happen to be true. It is not really the same as thinking and it is not reliable, because a student learning through this method can easily rationalize a false reason instead of the correct one. But it saves time.

If you see a simple proposal for Friendly AI, the conclusion which by good fortune happens to be correct, is that the system will fail. If you see an impossibility proof for Friendly AI, the conclusion which by good fortune happens to be correct, is that the proof is flawed. It is easy to make up incorrect reasons why this is so, but at least you will be rationalizing something that is true. I worry that most book learning works by this method, since the student won't be ready to detect "wrong questions" until much later.

3. Develop use-cases for a (perfectly) Friendly AI.

Strive to make these use-cases about architectural issues, such as a programmer trying to make a particular modification to content or code. Moral use-cases are fun but useless.

First try to figure out what the correct behavior is - and remember that the structure of the correct answer is not "obey the programmer" but "do what's right". Sometimes it may not be possible for the Friendly AI to do the right thing unless she has information about the external world that, at a given stage of maturity, she can't access, or doesn't know how to discriminate from sensory information. If so, try to see the matter from the Friendly AI's internal perspective, and the correct answer from your perspective given what the Friendly AI then knows.

Always give the very best answer you can think of. Don't "dumb it down" because you're thinking about an AI. If necessary, think about what you would want yourself to do "in that situation".

Do not confuse this task of developing use-cases with the task of predicting what a given concrete system proposal will actually do. It will take you a long time to get to the point where you can give an example of a concrete system that does a right thing, let alone the rightest thing.

4. Learn the basics of established fields.

Learn the basics of such fields as Evolutionary Psychology, Bayesian decision theory, genetic algorithms, neurology, and so on, preferably on a level where you can do the basic math and understand the basics intuitively - see the basic math as obvious in retrospect, rather than merely being able to prove it formally. But mostly, study the basics of established fields that are relevant to Friendly AI.

Why? Because there is immensely more written about these fields, because there is far more evidence and examples to constrain your understanding, and because Friendly AI is an extremely advanced field - it only looks simple because there is less written about it.

This holds especially true if you're just interested in general science. If you're just interested in general science, there's no reason for you to be interested in Friendly AI at all. Learn something that everyone agrees on instead, like physics. If you're interested in Singularity activism, that's one thing, but in terms of learning science you should start by studying parts of science that have been solid for the last hundred years, rather than reading the frontiers that the media defines as exciting breaking news. (The exception is the Cognitive Sciences, where a dreadful amount of misinformation was recently cleaned out, and you want the latest possible books on the basics - any Cognitive Science book from before 1990 is dangerous.) It is futile to delude yourself into believing that you have made a hobby of quantum physics or cosmology, even as an amateur, if you have not previously made a hobby of classical mechanics. Similarly, how can you make a hobby of Friendly AI without making a hobby of information theory, game theory, Evolutionary Psychology, and Bayesian decision theory? (Note, however, that the field presently known as "Artificial Intelligence" is not included.)

Learn the basics of many fields, including the basic math, intuitively; not memorizing the formula, or even being able to do the proof, but on a level where you can see it as obvious in retrospect. The advanced details may not be relevant to Friendly AI, but the basics will be.

I once saw an interesting diagram that divided science communication forums into "intraspecialist", "interspecialist", "textbook", and "popular". You want to be reading on the interspecialist or textbook level - articles written to communicate to other scientists outside the field, or to students. Not popular science books, or media press releases.

5. Allow yourself to solve the problem a piece at a time.

There are N theoretical problems in Friendly AI. Do not try to solve all N at once. When you have a bright idea, try to identify the particular theoretical problem it solves, and don't say that you've solved the whole thing. At any given point before you are actually ready to sit down and start building a Friendly AI, you should be saying to yourself: "What an interesting collection of solutions I have found so far, but I do not have them all yet, so I cannot actually build a Friendly AI." Being ready to build a Friendly AI comes only at the very end. You should not start out by believing you are ready to build a Friendly AI, and then steadily patch problems as you run into them.

"Creating Friendly AI" was written before I figured out this rule, and it shows.

6. Don't reassure other people of things you don't actually know.

If you don't know how to build a decision system that knows what people really want, don't tell the other person that "a Friendly AI will figure out what you really want and give it to you". You do not actually know this. The information actually in your possession is as follows:

"Eliezer Yudkowsky seems to think that he can build a Friendly AI that understands what people really want, as opposed to what the letter of their request says, and he had surprisingly good answers to the first dozen objections that popped into my mind, so I believe him. I don't claim to actually know how to build such a Friendly AI. I'm not an expert. Go read the original literature because I don't know how to describe it secondhand."

This is a fine argument. Use it. As described in Singularity Writing Advice, the reason you want to give to the audience is the reason you actually use yourself.

If you say, "A Friendly AI will figure out what you really want and give it to you," and you can't explain how, you sound like a cult victim. To me. Because I know you don't know.

The conclusion that happens to be true is that you can build a magic genie bottle and make surprisingly safe wishes from it. But it will take bloody hard work to get there. Stop telling people about how perfect Friendly AIs are, because it's not a foregone conclusion.

Other fine arguments are:

"Why do you think a Friendly AI would do X?" followed by a digression into Evolutionary Psychology (providing you understand ev-psych and game theory), and so on.

"Why do you think a Friendly AI can't understand X?" or "Why do you think a Friendly AI cannot regard X as moral?" followed by "But you understand X" and "Therefore, at least one kind of mind-in-general can regard X as moral" and "Friendly AIs are not like the AIs they show on TV, but a particular kind of mind selected out of the space of Minds-In-General, which is much wider than the space of human minds" and "I don't know, but Eliezer seems to think he can do it."

"The claim is not being made that all AIs will behave like X, but that there exists at least one mind-in-general that does X" and "Eliezer says he isn't planning to just bang stuff together at random until something works" and "Yes, he really does claim to know what he's doing."

Don't like saying "Because Eliezer said so"? Well, do you have a better reason? There is only a tiny amount of physics which I know for any reason beyond "Richard Feynman told me so", even in cases where I can do the math. There are very few places where I can see that something has to be true, not just because Richard Feynman says it's true, but because it's a tautology and I can do and intuitively understand the math, or I've empirically verified the phenomenon for myself. Those are the parts I actually understand. For everything else, the fact is that if Richard Feynman was making it all up, I couldn't catch him at it. That means that I'm using his perceptions in place of my own. Even if I can do the math, even if I'm nearly certain that Richard Feynman is not making it up, I still don't know it for any ultimate reason beyond the fact that Richard Feynman told me so. And for most things you believe about Friendly AI, the ultimate reason you know it is that I told you so.

Bugged by this? Good, because you should be. It does no good to have faith in something that you don't understand. Even if it's true, it doesn't really help you. Physicists told me that matter was made of waves of probability amplitudes and I believed them, even though I hadn't verified it myself. Then eventually I ran across something called "the wave equation" and found out that I hadn't had the vaguest idea of what a "wave" actually was. So what did I believe, back when I accepted on faith that matter was made of waves - or that sound was made of waves, for that matter? In retrospect, I didn't believe much of anything except a verbal sentence.

Even if the physicist is being completely honest with you, it does no good to have faith in something you don't understand. Physics is made of math and it is pointless to repeat the words physicists use to describe their math if the math is not in your head; you simply cannot imagine it right no matter how hard you believe what you are told. This is something that I think physicists underemphasize, perhaps because they would sell fewer books. I am telling you straight out, because I think you can take it: Friendly AI is frikkin' hard. It's going to take a while to get to the point where you're living by your own strength on this subject, and it'll start with understanding existing fields like Bayesian decision theory and Evolutionary Psychology.

Now... it does require relatively much less arcana to genuinely understand why an amateur's statement is wrong, rather than claiming that some specific prediction is actually correct. Having faith that matter is made of waves did not enable me to make predictions or do calculations, but I at least knew that matter was not made of billiard balls. I knew enough to genuinely, from my own understanding, shoot down many simple incorrect theories of physics. Not on the frontiers of string theory, of course, but the kind of flawed physics that amateurs make up. So my advice would be to focus on introducing people to the counterintuitiveness of Friendly AI theory, which is the first thing that you can genuinely understand. It takes a relatively small amount of ev-psych and game theory to start spotting flaws in people's instinctive attributions of mind-in-general motivations, and it takes a relatively small amount of Bayesian decision theory to understand the "Subgoal Stomp" that keeps coming up. So focus on that, rather than making reassurances about how Friendly AI will do everything right, at which point you're making the word "Friendly" into a magic wand. Now it does turn out - it is the conclusion that happens by good fortune to be correct, as far as I can tell - that you can build that kind of magic wand. But if you don't know it yourself, you shouldn't reassure others of it.


If I may add a student's perspective to this... It's a mistake to chase after Eliezer with all your might. It's an attractive scenario, but it doesn't work out, and even leads to detrimental effects. I first discovered this in 2001 when I first read CaTAI. I read it through, and didn't understand it at all. Of course, I thought I did, and this caused a great deal of problems later on. Now CaTAI is GISAI and CFAI, it's better, and even easier to misunderstand.

The issue, as I see it, is that CaTAI is basically a long string of declarative sentences, which get translated into a more or less line by line chain of logic that you defend or disagree with. This long wobbly chain is (due to the limits of human memory) kind of fuzzy in places, and certainly unstable, and mostly not even what Eliezer was trying to say. The real objective is to take this long string of reasoning and see it as a bunch of interrelating ideas that have individual internal logic and subfields. Getting a few of these ideas working in your mind is more important than defending the long chain of logic you've absorbed from CFAI or GiSAI. Once you see these little ideas moving and changing, and can see why the behave the way you think they ought to, you can start chaining them together again, and building a good picture of what Friendly AI and AI in general is to you.

The process is probably different for other people. But for me, trying hard to bootstrap myself to Eliezer's level was a mistake. It hampered my ability to think about the individual concepts, and thusly my concept of the whole shebang was majorly flawed. Taking a couple of dynamics that you ALMOST understand, and developing them is a better idea, and what ended up working for me. You can use the ideas that you develop more fully to explore the field as a whole. Your relational analogies will be richer, and you'll have semi-solid ground to retreat to when the going gets rough.

The most important thing to remember is that reality is objective and pretty constant. So if you're having a problem, chances are somebody has discovered it before you. Steal incessantly from other people's writing and ideas. There may even be professionals who have worked out the answers, trust them with areas you can't work out to the same decimal places. Google and the library are your friends here. And when you find yourself asking questions nobody has the answer for, or even ever asked before(that you can find with google and library card) well, you know you're making progress. Start bothering other transhumanists. And then, when you've refined being an annoying pest to your librarian, your professional contacts, your computer, you private library, and any transhumanists that will still talk to you, you can bother Eliezer.

Of course, I don't really follow that rule, because I'm lazy and I know Eliezer will have a snappy answer for any question I fire off, but I'm not a good rolemodel in this, we were discussing good ideas, yes?

I'm only beginning to have interesting questions about FAI and AI in general. And I'm a pretty smart fellow. But if you avoid the kinds of mistakes I've made, maybe your progress will be faster. And it's important for people to start progressing faster. Eliezer's lone position at the head of the pack is dangerous, in more ways than one.

An idea that I've been toying with is maybe forming research partners with somebody online with similar interests with me. I haven't really developed the idea yet, but I have done better work in areas where I could fight with a peer. A group of studying budding FAI theorists could generate some interesting material, as well as free up Eliezer from having to churn out 'how to not be such retards' pages like this, however much he may enjoy it.

--Justin Corwin

Personal tools