AlphaGo Zero

Alex | November 20th 2017

The huge news to come out of the world of machine learning this week is the resounding success of Google DeepMind’s AlphaGo Zero.

AlphaGo is a programme that uses machine learning techniques in an attempt to beat players at the popular Chinese board game Go. In October 2015, AlphaGo Fan became the first computer programme to beat a Go professional after beating the European Champion Mr Fan Hui 5 - 0. Impressive stuff.

This was eclipsed by AlphaGo Lee’s groundbreaking 4-1 victory over legendary Go player Lee Se-Dol in Seoul in March 2016 in front of a worldwide audience of 200 million people. Further victory against the world number one Ke Jie meant AlphaGo Lee was given an Elo rating of 3,739. To put that into perspective, Ke Jie’s rating is 3,667.

Now, a new version of AlphaGo has been tested against AlphaGo Lee over 100 games… and didn’t lose a single one. Impressed yet? What if you were told that AlphaGo Zero achieved this feat within 72 hours of training? Professional players say AlphaGo Zero has developed techniques never seen before, revolutionising the way the game is played.

Go is a strategic board game in which two players take it in turns to place black or white stones on a board with the aim of surrounding more territory than their opponent. Previous versions of AlphaGo were trained on human strategies, with the three neural networks developed upon data obtained from professional Go matches. AlphaGo Zero has gone one step further and learnt without human help. Provided simply with the shape of the board and the rules of the game, AlphaGo Zero played against itself.

The significance of this result is huge. AlphaGo Lee’s success has always been overshadowed by the massive computational power required to evaluate all the positions. Critics say that the result may have been different if AlphaGo had less processors available. However, AlphaGo Zero uses only 4 TPUs (Tensor Processing Units) and has a MUCH lower power consumption due to Google’s recent hardware gains.

So what does this all really mean? Google has just demonstrated the capability to build a machine which can learn completely by itself and outperform 3000 years of Go tactics in just 3 days. The team are now embarking on other projects, employing these techniques to other industries including science and medicine.

If such a performance can be replicated in these areas, who knows what the future may bring?