How an intern helped build the AI ​​that shook the world

AlphaGo’s victory was televised

Im Hun-jung/Yonhap/AP Photo via Getty Images

In March 2016, Google DeepMind’s AlphaGo artificial intelligence system shocked the world. In a stunning five-game series of Go, the ancient Chinese board game, an AI defeated the world’s best player Lee Sedol – a moment that was broadcast to millions of viewers and hailed by many as a historic moment in the development of artificial intelligence.

Chris Maddisonnow a professor of artificial intelligence at the University of Toronto, was a master’s student at the time and helped get the project off the ground. It all started when Ilya Sutskever, who later found OpenAI…

Alex Wilkins: How did the idea for AlphaGo first come about?

Chris Maddison: Ilya [Sutskever] gave me the following argument why we should work on Go. He said: Chris, do you think that if an experienced player looks at the Go board, they can pick the best move in half a second? If you think they can, then that means you can learn a pretty good policy on how to choose the best move using a neural network.

This is because half a second is approximately the time it takes your visual cortex to make one forward pass [a round of processing]and we already knew that from ImageNET [an important AI image-recognition competition] that we’re pretty good at approximating things that only take one pass through your visual cortex ahead.

I bought that argument so decided to add [Google Brain] as an intern in the summer of 2014.

How did AlphaGo evolve from there?

When I joined, there was another small team at DeepMind that I wanted to work with, which was Aja Huang and David Silver, who had started working on Go. Basically, I was tasked with starting to build neural networks. It was a dream.

We tried a lot of different approaches and a lot of the original things we tried failed. Finally, I got frustrated and tried the dumbest and easiest thing, which was to try to predict the next move an expert at a given board position would make, by training a neural network on a large corpus of expert plays. And that turned out to be the approach that really got us off the ground.

At the end of the summer, we had a little match with Thor Graepel from DeepMind, who thought he was a decent Go player, and my nets beat him. DeepMind then became convinced that this was going to be a real thing and started pouring resources into it and building a big team around it.

How hard was it to beat Lee Sedol?

I remember in the summer of 2014 we practically had a portrait of Lee Sedol on the table next to us. I’m not a Go player, but an Aja player [Huang] is. Every time I build a new network, it gets a little bit better, I’d turn to Aja and say, OK, are we a little bit better, how close are we to Lee Sedol? And Aja turned to me and said: Chris, you don’t understand. Lee Sedol is one stone from God.

You left the AlphaGo team before the big event. Why?

David [Silver] he said we’d love to keep you and really take this project to the next level, and in retrospect it was maybe one of the dumber decisions I made, I turned him down. I said I think I need to focus on my PhD, I’m an academic at heart. I returned to my PhD studies and have been consulting freely on the project ever since. I’m a little proud to say that it took them a while to beat my neural nets. But in the end, the artifact played by Lee Sedol was the product of a great engineering effort and a great team.

What was the atmosphere like in Seoul when AlphaGo won?

Being in Seoul at that moment was hard to express. It was emotional. It was intense. There was a sense of anxiety. You go with confidence, but you never know. It’s like a sports game. Statistically speaking, you are the better player, but you never know how it will shake out. I remember being in the hotel where we played the matches and watching from the window. We were at a high enough level to look out over one of the city’s main intersections. I realized there was a big screen, something like Times Square, showing our match. And then I looked down the sidewalks and people were standing in line looking at the screen. I heard numbers that the first game was watched by hundreds of millions of people in China, but I remember that moment as, oh my God, we really stopped East Asia in its tracks.

How important has AlphaGo been to AI in general?

On the surface level, a lot has changed in the world of large language models (LLM), they are now quite different from AlphaGo in some ways, but in fact there is an underlying technological thread that hasn’t really changed.

So the first part of the algorithm is to train the neural network to predict the next move. Today’s LLMs start with what we call pre-training to predict the next word, from a large corpus of human text found mostly on the Internet.

For the second step in AlphaGo, we took information from the human corpus that was compressed into these neural networks and enhanced it using reinforcement learning to adapt the system’s behavior to the goal of winning games.

When you learn to predict an expert’s next move, he tries to win, but that’s not the only thing that explains the next move. Maybe they don’t understand what the best move is, maybe they made a mistake, so you have to align the overall system with your real goal, which in AlphaGo’s case was victory.

In large language models, it is the same after pretraining. The networks are not aligned with how we want to use them, so we do a series of reinforcement learning steps that align the networks with our goals.

In some ways, not much has changed.

Does this tell us anything about where we can expect AI to succeed?

It has consequences in terms of what we choose to focus on. If you’re worried about making progress on important problems, the key hurdles to worry about are whether you have enough pre-training data and whether you have reward signals to perform post-training. If you don’t have those ingredients, there’s no amount of smarts—you know, this algorithm versus that algorithm—that’s going to get you off the ground.

Did you feel any sympathy for Lee Sedol?

Lee Sedol was this idol through the summer of 2014, this unattainable milestone. To be there all of a sudden in person, watching the matches, his stress, his anxiety, his realization that this is a much more valuable opponent than he might have thought going into it, it was very stressful. You don’t want to put someone in that position. When he lost the match, he apologized to humanity and said, “This is my failure, not yours.” That was tragic.

In Go, it is also customary to check the match with the opponent. Someone wins or someone loses, but at the end you review the match, decompress and explore each other’s variations. Lee Sedol couldn’t do that because AlphaGo wasn’t human, so instead he had his friends come and check the match, but it’s just not the same. There was something heartbreaking about it.

But I didn’t appreciate all the man vs. machine stories surrounding the match because a team of people built AlphaGo. This was the tribe’s effort to build an artifact that could achieve perfection in the human game. In the end, it was the artifact that all our blood, sweat and tears went into.

Do you think there is still a place for humans in the world because artificial intelligence is doing more of the work of human thinking?

We’re learning more about the game of Go, and if we think the game is beautiful, which we do, and AI can teach us more about that beauty, there’s a lot of inherent goodness in that, too. There is a difference between goals and purposes. The goal of Go is to win, but that’s not its only purpose—one purpose is to have fun. Board games are not ruined by the presence of AI; chess is a thriving industry. We still appreciate the intrigue and human achievements of the sport.

topics:

Source

Be the first to comment

Leave a Reply

Your email address will not be published.


*