Sem sombra de dúvidas o maior avanço de 2016 para a Inteligência Artificial/ Redes Neurais foi a vitória do Alpha Go sobre o Lee Sedol no jogo de Go.
Diferentemente da época do Deep Blue que derrotou o Gary Kasparov usando uma versão do algoritmo de busca exaustiva com um poder computacional muito alto na época (i.e. a cada movimento do jogo o Deep Blue calculava todas as possibilidades, e através de uma função de avaliação de cada resultado usava o resultado como heurística para o próximo movimento) .
Pequeno vídeo sobre o confronto:
Better Computer Go Player with Neural Network and Long-term Prediction
Yuandong Tian, Yan Zhu
Competing with top human players in the ancient game of Go has been a long-term goal of artificial intelligence. Go’s high branching factor makes traditional search techniques ineffective, even on leading-edge hardware, and Go’s evaluation function could change drastically with one stone change. Recent works [Maddison et al. (2015); Clark & Storkey (2015)] show that search is not strictly necessary for machine Go players. A pure pattern-matching approach, based on a Deep Convolutional Neural Network (DCNN) that predicts the next move, can perform as well as Monte Carlo Tree Search (MCTS)-based open source Go engines such as Pachi [Baudis & Gailly (2012)] if its search budget is limited. We extend this idea in our bot named darkforest, which relies on a DCNN designed for long-term predictions. Darkforest substantially improves the win rate for pattern-matching approaches against MCTS-based approaches, even with looser search budgets. Against human players, the newest versions, darkfores2, achieve a stable 3d level on KGS Go Server as a ranked bot, a substantial improvement upon the estimated 4k-5k ranks for DCNN reported in Clark & Storkey (2015) based on games against other machine players. Adding MCTS to darkfores2 creates a much stronger player named darkfmcts3: with 5000 rollouts, it beats Pachi with 10k rollouts in all 250 games; with 75k rollouts it achieves a stable 5d level in KGS server, on par with state-of-the-art Go AIs (e.g., Zen, DolBaram, CrazyStone) except for AlphaGo [Silver et al. (2016)]; with 110k rollouts, it won the 3rd place in January KGS Go Tournament.
In this paper, we have substantially improved the performance of DCNN-based Go AI, extensively evaluated it against both open source engines and strong amateur human players, and shown its potentials if combined with Monte-Carlo Tree Search (MCTS). Ideally, we want to construct a system that combines both pattern matching and search, and can be trained jointly in an online fashion. Pattern matching with DCNN is good at global board reading, but might fail to capture special local situations. On the other hand, search is excellent in modeling arbitrary situations, by building a local non-parametric model for the current state, only when the computation cost is affordable. One paradigm is to update DCNN weights (i.e., Policy Gradient [Sutton et al. (1999)]) after MCTS completes and chooses a different best move than DCNN’s proposal. To increase the signal bandwidth, we could also update weights using all the board situations along the trajectory of the best move. Alternatively, we could update the weights when MCTS is running. Actor-Critics algorithms [Konda & Tsitsiklis (1999)] can also be used to train two models simultaneously, one to predict the next move (actor) and the other to evaluate the current board situation (critic). Finally, local tactics training (e.g., Life/Death practice) focuses on local board situation with fewer variations, which DCNN approaches should benefit from like human players.