Esse paper to DeepStack, caso seja reprodutível, pode representar um avanço significativo em relação a todo eixo em que a Inteligência Artificial está hoje, em especial em problemas de informação assimétrica.
Como os autores salientam, jogos de Damas, Xadrez e Go partem de um princípio básico de que a informação é simétrica entre os jogadores; em outras palavras, há um determinado determinismo em relação às ações dos adversários.
O Poker por sua vez tem como principal característica ser um jogo em que há um algo grau de não-determinismo seja no River, na mão (cartas) dos oponentes, bem como no tão famigerado blefe (que não passa de um bom problema estocástico).
De qualquer maneira, para quem é especialista ou não em AI ou Machine Learning vale a pena conferir a modelagem e os resultados do Deep Stack.
Better Computer Go Player with Neural Network and Long-term Prediction
Yuandong Tian, Yan Zhu
Competing with top human players in the ancient game of Go has been a long-term goal of artificial intelligence. Go’s high branching factor makes traditional search techniques ineffective, even on leading-edge hardware, and Go’s evaluation function could change drastically with one stone change. Recent works [Maddison et al. (2015); Clark & Storkey (2015)] show that search is not strictly necessary for machine Go players. A pure pattern-matching approach, based on a Deep Convolutional Neural Network (DCNN) that predicts the next move, can perform as well as Monte Carlo Tree Search (MCTS)-based open source Go engines such as Pachi [Baudis & Gailly (2012)] if its search budget is limited. We extend this idea in our bot named darkforest, which relies on a DCNN designed for long-term predictions. Darkforest substantially improves the win rate for pattern-matching approaches against MCTS-based approaches, even with looser search budgets. Against human players, the newest versions, darkfores2, achieve a stable 3d level on KGS Go Server as a ranked bot, a substantial improvement upon the estimated 4k-5k ranks for DCNN reported in Clark & Storkey (2015) based on games against other machine players. Adding MCTS to darkfores2 creates a much stronger player named darkfmcts3: with 5000 rollouts, it beats Pachi with 10k rollouts in all 250 games; with 75k rollouts it achieves a stable 5d level in KGS server, on par with state-of-the-art Go AIs (e.g., Zen, DolBaram, CrazyStone) except for AlphaGo [Silver et al. (2016)]; with 110k rollouts, it won the 3rd place in January KGS Go Tournament.
In this paper, we have substantially improved the performance of DCNN-based Go AI, extensively evaluated it against both open source engines and strong amateur human players, and shown its potentials if combined with Monte-Carlo Tree Search (MCTS). Ideally, we want to construct a system that combines both pattern matching and search, and can be trained jointly in an online fashion. Pattern matching with DCNN is good at global board reading, but might fail to capture special local situations. On the other hand, search is excellent in modeling arbitrary situations, by building a local non-parametric model for the current state, only when the computation cost is affordable. One paradigm is to update DCNN weights (i.e., Policy Gradient [Sutton et al. (1999)]) after MCTS completes and chooses a different best move than DCNN’s proposal. To increase the signal bandwidth, we could also update weights using all the board situations along the trajectory of the best move. Alternatively, we could update the weights when MCTS is running. Actor-Critics algorithms [Konda & Tsitsiklis (1999)] can also be used to train two models simultaneously, one to predict the next move (actor) and the other to evaluate the current board situation (critic). Finally, local tactics training (e.g., Life/Death practice) focuses on local board situation with fewer variations, which DCNN approaches should benefit from like human players.
Uma das melhores coisas que podem acontecer quando há uma expectativa muito grande em sua área de atuação em tecnologia é quando alguém muito conhecido tem uma mesma opinião de empirismo cético a cerca do estado da arte.
O que pode ser dito no que diz respeito ao estado da arte em Machine Learning é que fora a parte de interconectividade com devices (que é um campo que pessoalmente eu não conhecia tantas limitações), não há nada de novo no front em termos algorítmicos em relação às restrições já conhecidas na academia.
O ponto extremamente positivo aqui, é que aos poucos todo o conhecimento da academia (que ainda está muito na frente da indústria) já está sendo transposto para a vida das pessoas, mesmo que ainda em termos de aplicações simples.
Em outras palavras, a automação de tarefas domésticas é hoje um problema muito mais de engenharia do que de tecnologia em si. E isso é ótimo.
Sobre a dificuldade de fazer a ligação do Jarvis com dispositivos não conectados à internet:
(…)Further, most appliances aren’t even connected to the internet yet. It’s possible to control some of these using internet-connected power switches that let you turn the power on and off remotely. But often that isn’t enough. For example, one thing I learned is it’s hard to find a toaster that will let you push the bread down while it’s powered off so you can automatically start toasting when the power goes on. I ended up finding an old toaster from the 1950s and rigging it up with a connected switch. Similarly, I found that connecting a food dispenser for Beast or a grey t-shirt cannon would require hardware modifications to work. For assistants like Jarvis to be able to control everything in homes for more people, we need more devices to be connected and the industry needs to develop common APIs and standards for the devices to talk to each other.(…)
Sobre a dificuldade semântica que as máquinas tem para lidar com alguns tipos de ambiguidade na comunicação:
(…)Music is a more interesting and complex domain for natural language because there are too many artists, songs and albums for a keyword system to handle. The range of things you can ask it is also much greater. Lights can only be turned up or down, but when you say “play X”, even subtle variations can mean many different things. Consider these requests related to Adele: “play someone like you”, “play someone like adele”, and “play some adele”. Those sound similar, but each is a completely different category of request. The first plays a specific song, the second recommends an artist, and the third creates a playlist of Adele’s best songs. Through a system of positive and negative feedback, an AI can learn these differences.(…)
A respeito da oportunidade de negócios em recomendação:
(…)it also knows whether I’m talking to it or Priscilla is, so it can make recommendations based on what we each listen to. In general, I’ve found we use these more open-ended requests more frequently than more specific asks. No commercial products I know of do this today, and this seems like a big opportunity.(…)
Uma ótima ideia que pode ser adaptada por governos através de suas secretarias de segurança para mapeamento de desaparecidos e criminosos (será um novo Minority Report?)
(…) I built a simple server that continuously watches the cameras and runs a two step process: first, it runs face detection to see if any person has come into view, and second, if it finds a face, then it runs face recognition to identify who the person is. Once it identifies the person, it checks a list to confirm I’m expecting that person, and if I am then it will let them in and tell me they’re here. (…)
(…)This preference for text communication over voice communication fits a pattern we’re seeing with Messenger and WhatsApp overall, where the volume of text messaging around the world is growing much faster than the volume of voice communication. This suggests that future AI products cannot be solely focused on voice and will need a private messaging interface as well.(…)