Interpretabilidade versus Desempenho: Ceticismo e AI-Winter

Neste post do Michael Elad que é editor chefe da SIAM da publicação Journal on Imaging Sciences ele faz uma série de reflexões bem ponderadas de como os métodos de Deep Learning estão resolvendo problemas reais e alcançando um alto grau de visibilidade mesmo com métodos não tão elegantes dentro da perspectiva matemática.

Ele coloca o ponto principal de que, no que tange o processamento de imagens, a academia teve sempre um lugar de destaque em uma abordagem na qual a interpretabilidade e o entendimento do modelos sempre teve precedência em relação aos resultados alcançados.

Isso fica claro no parágrafo abaixo:

A series of papers during the early 2000s suggested the successful application of this architecture, leading to state-of-the-art results in practically any assigned task. Key aspects in these contributions included the following: the use of many network layers, which explains the term “deep learning;” a huge amount of data on which to train; massive computations typically run on computer clusters or graphic processing units; and wise optimization algorithms that employ effective initializations and gradual stochastic gradient learning. Unfortunately, all of these great empirical achievements were obtained with hardly any theoretical understanding of the underlying paradigm. Moreover, the optimization employed in the learning process is highly non-convex and intractable from a theoretical viewpoint.

No final ele coloca uma visão sobre pragmatismo e agenda acadêmica:

Should we be happy about this trend? Well, if we are in the business of solving practical problems such as noise removal, the answer must be positive. Right? Therefore, a company seeking such a solution should be satisfied. But what about us scientists? What is the true objective behind the vast effort that we invested in the image denoising problem? Yes, we do aim for effective noise-removal algorithms, but this constitutes a small fraction of our motivation, as we have a much wider and deeper agenda. Researchers in our field aim to understand the data on which we operate. This is done by modeling information in order to decipher its true dimensionality and manifested phenomena. Such models serve denoising and other problems in image processing, but far more than that, they allow identifying new ways to extract knowledge from the data and enable new horizons.

Isso lembra uma passagem minha na RCB Investimentos quando eu trabalhava com o grande Renato Toledo no mercado de NPL em que ele me ensinou que bons modelos têm um alto grau de interpretabilidade e simplicidade, no qual esse fator deve ser o barômetro da tomada de decisão, dado que um modelo cujo a sua incerteza (ou erro) seja conhecido é melhor do que um modelo que ninguém sabe o que está acontecendo (Nota pessoal: Quem me conhece sabe que eu tenho uma frase sobre isso que é: se você não entende a dinâmica do modelo quando ele funciona, nunca vai saber o que deu errado quando ele falhar.)

Contudo é inegável que as redes Deep Learning estão resolvendo, ao meu ver, uma demanda reprimida de problemas que já existiam e que os métodos computacionais não conseguiam resolver de forma fácil, como reconhecimento facial, classificação de imagens, tradução, e problemas estruturados como fraude (a Fast.AI está fazendo um ótimo trabalho de clarificar isso).

Em que pese o fato dos pesquisadores de DL terem hardware infinito a preços módicos, o fato brutal é que esse campo de pesquisa durante aproximadamente 30 anos engoliu uma pílula bem amarga de ceticismo proveniente da própria academia: seja em colocar esse método em uma esfera de alto ceticismo levando a sua quase total extinção, ou mesmo com alguns jornals implicitamente não aceitarem trabalhos de DL; enquanto matemáticos estavam ganhando prêmios e tendo um alto nível de visibilidade por causa da acurácia dos seus métodos ao invés de uma pretensa ideia de que o mundo gostava da interpretabilidade de seus métodos.

Duas grandes questões estão em tela que são: 1) Será que os matemáticos e comunidades que estão chocadas com esse fenômeno podem aguentar o mesmo que a comunidade de Redes Neurais aguentou por mais de 30 anos? e 2) E em caso de um Math-Winter, a comunidade matemática consegue suportar uma potencial marginalização de sua pesquisa?

É esperar e ver.


Interpretabilidade versus Desempenho: Ceticismo e AI-Winter

Ética Estóica para agentes artificiais

E não é que adaptaram a filosofia estóica para a Inteligência Artificial?

Stoic Ethics for Artificial Agents – Gabriel Murray

Abstract: We present a position paper advocating the notion that Stoic philosophy and ethics can inform the development of ethical A.I. systems. This is in sharp contrast to most work on building ethical A.I., which has focused on Utilitarian or Deontological ethical theories. We relate ethical A.I. to several core Stoic notions, including the dichotomy of control, the four cardinal virtues, the ideal Sage, Stoic practices, and Stoic perspectives on emotion or affect. More generally, we put forward an ethical view of A.I. that focuses more on internal states of the artificial agent rather than on external actions of the agent. We provide examples relating to near-term A.I. systems as well as hypothetical superintelligent agents.

Conclusions: In this position paper, we have attempted to show how Stoic ethics could be applied to the development of ethical A.I. systems. We argued that internal states matter for ethical A.I. agents, and that internal states can be analyzed by describing the four cardinal Stoic virtues in terms of characteristics of an intelligent system. We also briefly described other Stoic practices and how they could be realized by an A.I. agent. We gave a brief sketch of how to start developing Stoic A.I. systems by creating approval-directed agents with Stoic overseers, and/or by employing a syncretic paramedic ethics algorithm with a step featuring Stoic constraints. While it can be beneficial to analyze the ethics of an A.I. agent from several different perspectives, including consequentialist perspectives, we have argued for the importance of also conducting a Stoic ethical analysis of A.I. agents, where the agent’s internal states are analyzed, and moral judgments are not based on consequences outside of the agent’s control.

Ética Estóica para agentes artificiais

DeepStack: Sistema Especialista de Inteligência Artificial para o jogo de Poker

Esse paper to DeepStack, caso seja reprodutível, pode representar um avanço significativo em relação a todo eixo em que a Inteligência Artificial está hoje, em especial em problemas de informação assimétrica.

Como os autores salientam, jogos de Damas, Xadrez e Go partem de um princípio básico de que a informação é simétrica entre os jogadores; em outras palavras, há um determinado determinismo em relação às ações dos adversários.

O Poker por sua vez tem como principal característica ser um jogo em que há um algo grau de não-determinismo seja no River, na mão (cartas) dos oponentes, bem como no tão famigerado blefe (que não passa de um bom problema estocástico).

De qualquer maneira, para quem é especialista ou não em AI ou Machine Learning vale a pena conferir a modelagem e os resultados do Deep Stack.

DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker

Abstract: Artificial intelligence has seen a number of breakthroughs in recent years, with games often serving as significant milestones. A common feature of games with these successes is that they involve information symmetry among the players, where all players have identical information. This property of perfect information, though, is far more common in games than in real-world problems. Poker is the quintessential game of imperfect information, and it has been a longstanding challenge problem in artificial intelligence. In this paper we introduce DeepStack, a new algorithm for imperfect information settings such as poker. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition about arbitrary poker situations that is automatically learned from selfplay games using deep learning. In a study involving dozens of participants and 44,000 hands of poker, DeepStack becomes the first computer program to beat professional poker players in heads-up no-limit Texas hold’em. Furthermore, we show this approach dramatically reduces worst-case exploitability compared to the abstraction paradigm that has been favored for over a decade

Conclusions: DeepStack is the first computer program to defeat professional poker players at heads-up nolimit Texas Hold’em, an imperfect information game with 10160 decision points. Notably it achieves this goal with almost no domain knowledge or training from expert human games. The implications go beyond just being a significant milestone for artificial intelligence. DeepStack is a paradigmatic shift in approximating solutions to large, sequential imperfect information games. Abstraction and offline computation of complete strategies has been the dominant approach for almost 20 years (29,36,37). DeepStack allows computation to be focused on specific situations that arise when making decisions and the use of automatically trained value functions. These are two of the core principles that have powered successes in perfect information games, albeit conceptually simpler to implement in those settings. As a result, for the first time the gap between the largest perfect and imperfect information games to have been mastered is mostly closed. As “real life consists of bluffing… deception… asking yourself what is the other man going to think” (9), DeepStack also has implications for seeing powerful AI applied more in settings that do not fit the perfect information assumption. The old paradigm for handling imperfect information has shown promise in applications like defending strategic resources (38) and robust decision making as needed for medical treatment recommendations (39). The new paradigm will hopefully open up many more possibilities.


DeepStack: Sistema Especialista de Inteligência Artificial para o jogo de Poker

Alpha Go: O maior avanço no campo de Redes Neurais e Inteligência Artificial de 2016

Sem sombra de dúvidas o maior avanço de 2016 para a Inteligência Artificial/ Redes Neurais foi a vitória do Alpha Go sobre o Lee Sedol no jogo de Go.

Diferentemente da época do Deep Blue que derrotou o Gary Kasparov usando uma versão do algoritmo de busca exaustiva com um poder computacional muito alto na época (i.e. a cada movimento do jogo o Deep Blue calculava todas as possibilidades, e através de uma função de avaliação de cada resultado usava o resultado como heurística para o próximo movimento) .

Pequeno vídeo sobre o confronto:

Better Computer Go Player with Neural Network and Long-term Prediction

Yuandong Tian, Yan Zhu

Competing with top human players in the ancient game of Go has been a long-term goal of artificial intelligence. Go’s high branching factor makes traditional search techniques ineffective, even on leading-edge hardware, and Go’s evaluation function could change drastically with one stone change. Recent works [Maddison et al. (2015); Clark & Storkey (2015)] show that search is not strictly necessary for machine Go players. A pure pattern-matching approach, based on a Deep Convolutional Neural Network (DCNN) that predicts the next move, can perform as well as Monte Carlo Tree Search (MCTS)-based open source Go engines such as Pachi [Baudis & Gailly (2012)] if its search budget is limited. We extend this idea in our bot named darkforest, which relies on a DCNN designed for long-term predictions. Darkforest substantially improves the win rate for pattern-matching approaches against MCTS-based approaches, even with looser search budgets. Against human players, the newest versions, darkfores2, achieve a stable 3d level on KGS Go Server as a ranked bot, a substantial improvement upon the estimated 4k-5k ranks for DCNN reported in Clark & Storkey (2015) based on games against other machine players. Adding MCTS to darkfores2 creates a much stronger player named darkfmcts3: with 5000 rollouts, it beats Pachi with 10k rollouts in all 250 games; with 75k rollouts it achieves a stable 5d level in KGS server, on par with state-of-the-art Go AIs (e.g., Zen, DolBaram, CrazyStone) except for AlphaGo [Silver et al. (2016)]; with 110k rollouts, it won the 3rd place in January KGS Go Tournament.


In this paper, we have substantially improved the performance of DCNN-based Go AI, extensively evaluated it against both open source engines and strong amateur human players, and shown its potentials if combined with Monte-Carlo Tree Search (MCTS). Ideally, we want to construct a system that combines both pattern matching and search, and can be trained jointly in an online fashion. Pattern matching with DCNN is good at global board reading, but might fail to capture special local situations. On the other hand, search is excellent in modeling arbitrary situations, by building a local non-parametric model for the current state, only when the computation cost is affordable. One paradigm is to update DCNN weights (i.e., Policy Gradient [Sutton et al. (1999)]) after MCTS completes and chooses a different best move than DCNN’s proposal. To increase the signal bandwidth, we could also update weights using all the board situations along the trajectory of the best move. Alternatively, we could update the weights when MCTS is running. Actor-Critics algorithms [Konda & Tsitsiklis (1999)] can also be used to train two models simultaneously, one to predict the next move (actor) and the other to evaluate the current board situation (critic). Finally, local tactics training (e.g., Life/Death practice) focuses on local board situation with fewer variations, which DCNN approaches should benefit from like human players.

Alpha Go: O maior avanço no campo de Redes Neurais e Inteligência Artificial de 2016

Construindo Jarvis. Por Mark Zuckerberg

Uma das melhores coisas que podem acontecer quando há uma expectativa muito grande em sua área de atuação em tecnologia é quando alguém muito conhecido tem uma mesma opinião de empirismo cético a cerca do estado da arte.

Mark Zuckerberg colocou uma meta em 2016 para construir o seu próprio Jarvis  (pra quem não sabe o Jarvis é o robô assistente que utiliza machine learning para auxiliar o Tony Stark em Iron Man) como uma forma de aprender sobre Inteligência Artificial e ver o estado da arte sobre o que estava sendo feito e usar isso em benefício próprio para realização de tarefas domésticas.

Arquitetura do Jarvis

O que pode ser dito no que diz respeito ao estado da arte em Machine Learning é que fora a parte de interconectividade com devices (que é um campo que pessoalmente eu não conhecia tantas limitações), não há nada de novo no front em termos algorítmicos em relação às restrições já conhecidas na academia.

Versão Beta do Jarvis.

O ponto extremamente positivo aqui, é que aos poucos todo o conhecimento da academia (que ainda está muito na frente da indústria) já está sendo transposto para a vida das pessoas, mesmo que ainda em termos de aplicações simples.

Em outras palavras, a automação de tarefas domésticas é hoje um problema muito mais de engenharia do que de tecnologia em si. E isso é ótimo.

Muito do que se discute em relação à Machine Learning tem muito de hype é verdade; mas se ao mesmo tempo isso amplifica mais ainda o discurso comercial atitudes como essa do Mark desmistifica o que é Machine Learning/Inteligência Artificial e contribuí para eliminar arrefecer o Inverno Nuclear em relação a Machine Learning e Inteligência Artificial causado pelo hype sobre esses dois campos de estudo.

Abaixo algumas partes do relato do Mark Zuckerberg:

Sobre a dificuldade de fazer a ligação do Jarvis com dispositivos não conectados à internet:

(…)Further, most appliances aren’t even connected to the internet yet. It’s possible to control some of these using internet-connected power switches that let you turn the power on and off remotely. But often that isn’t enough. For example, one thing I learned is it’s hard to find a toaster that will let you push the bread down while it’s powered off so you can automatically start toasting when the power goes on. I ended up finding an old toaster from the 1950s and rigging it up with a connected switch. Similarly, I found that connecting a food dispenser for Beast or a grey t-shirt cannon would require hardware modifications to work.
For assistants like Jarvis to be able to control everything in homes for more people, we need more devices to be connected and the industry needs to develop common APIs and standards for the devices to talk to each other.(…)

Sobre a dificuldade semântica que as máquinas tem para lidar com alguns tipos de ambiguidade na comunicação:

(…)Music is a more interesting and complex domain for natural language because there are too many artists, songs and albums for a keyword system to handle. The range of things you can ask it is also much greater. Lights can only be turned up or down, but when you say “play X”, even subtle variations can mean many different things. Consider these requests related to Adele: “play someone like you”, “play someone like adele”, and “play some adele”. Those sound similar, but each is a completely different category of request. The first plays a specific song, the second recommends an artist, and the third creates a playlist of Adele’s best songs. Through a system of positive and negative feedback, an AI can learn these differences.(…)

A respeito da oportunidade de negócios em recomendação:

(…)it also knows whether I’m talking to it or Priscilla is, so it can make recommendations based on what we each listen to. In general, I’ve found we use these more open-ended requests more frequently than more specific asks. No commercial products I know of do this today, and this seems like a big opportunity.(…)

Uma ótima ideia que pode ser adaptada por governos através de suas secretarias de segurança para mapeamento de desaparecidos e criminosos (será um novo Minority Report?)

(…) I built a simple server that continuously watches the cameras and runs a two step process: first, it runs face detection to see if any person has come into view, and second, if it finds a face, then it runs face recognition to identify who the person is. Once it identifies the person, it checks a list to confirm I’m expecting that person, and if I am then it will let them in and tell me they’re here. (…)

Como já discutimos na Movile, o chat é imortal!

(…)This preference for text communication over voice communication fits a pattern we’re seeing with Messenger and WhatsApp overall, where the volume of text messaging around the world is growing much faster than the volume of voice communication. This suggests that future AI products cannot be solely focused on voice and will need a private messaging interface as well.(…)

Construindo Jarvis. Por Mark Zuckerberg