Softmax GAN

Abstract: Softmax GAN is a novel variant of Generative Adversarial Network (GAN). The key idea of Softmax GAN is to replace the classification loss in the original GAN with a softmax cross-entropy loss in the sample space of one single batch. In the adversarial learning of N real training samples and M generated samples, the target of discriminator training is to distribute all the probability mass to the real samples, each with probability 1M, and distribute zero probability to generated data. In the generator training phase, the target is to assign equal probability to all data points in the batch, each with probability 1M+N. While the original GAN is closely related to Noise Contrastive Estimation (NCE), we show that Softmax GAN is the Importance Sampling version of GAN. We futher demonstrate with experiments that this simple change stabilizes GAN training.

Softmax GAN

Deep Learning AMI Amazon Web Services

Para quem quer escalar processamento em Machine Learning e não tem grana para comprar GPUs, o Deep Learning AMI da Amazon é uma ótima alternativa em termos de custos.

The Deep Learning AMI is an Amazon Linux image supported and maintained by Amazon Web Services for use on Amazon Elastic Compute Cloud (Amazon EC2). It is designed to provide a stable, secure, and high performance execution environment for deep learning applications running on Amazon EC2. It includes popular deep learning frameworks, including MXNet, Caffe, Tensorflow, Theano, CNTK and Torch as well as packages that enable easy integration with AWS, including launch configuration tools and many popular AWS libraries and tools. It also includes the Anaconda Data Science Platform for Python2 and Python3. Amazon Web Services provides ongoing security and maintenance updates to all instances running the Amazon Linux AMI. The Deep Learning AMI is provided at no additional charge to Amazon EC2 users.

The AMI Ids for the Deep Learning Amazon Linux AMI are the following:
us-east-1 : ami-e7c96af1
us-west-2: ami-dfb13ebf
eu-west-1: ami-6e5d6808

Release tags/Branches used for the DW Frameworks:
MXNet : v0.9.3 tag
Tensorflow : v1.0.0 tag
Theano : rel-0.8.2 tag
Caffe : rc5 tag
CNTK : v2.0beta12.0 tag
Torch : master branch
Keras : 1.2.2 tag

Deep Learning AMI Amazon Web Services

Falhas na abordagem de Deep Learning: Arquiteturas e Meta-parametrização

O maior desafio corrente enfrentado pela indústria no que diz respeito à Deep Learning está sem sombra de dúvidas na parte computacional em que todo o mercado está absorvendo tanto os serviços de nuvem para realizar cálculos cada vez mais complexos como também bem como investindo em capacidade de computação das GPU.

Entretanto, mesmo com o hardware nos dias de hoje já ser um commodity, a academia está resolvendo um problema que pode revolucionar a forma na qual se faz Deep Learning que é no aspecto arquitetural/parametrização.

Esse comentário da thread diz muito a respeito desse problema em que o usuário diz:

The main problem I see with Deep Learning: too many parameters.

When you have to find the best value for the parameters, that’s a gradient search by itself. The curse of meta-dimensionality.

Ou seja, mesmo com toda a disponibilidade do hardware a questão de saber qual é o melhor arranjo arquitetural de uma rede neural profunda? ainda não está resolvido.

Este paper do Shai Shalev-Shwartz , Ohad Shamir, e Shaked Shammah chamado “Failures of Deep Learning” expõe esse problema de forma bastante rica inclusive com experimentos (este é o repositório no Git).

Os autores colocam que os pontos de falha das redes Deep Learning que são a) falta de métodos baseados em gradiente para otimização de parâmetros, b) problemas estruturais nos algoritmos de Deep Learning na decomposição dos problemas, c) arquitetura e d) saturação das funções de ativação.

Em outras palavras, o que pode estar acontecendo em grande parte das aplicações de Deep Learning é que o tempo de convergência poderia ser muito menor ainda, se estes aspectos já estivessem resolvidos.

Com isso resolvido, grande parte do que conhecemos hoje como indústria de hardware para as redes Deep Learning seria ou sub-utilizada ao extremo (i.e. dado que haverá uma melhora do ponto de vista de otimização arquitetural/algorítmica) ou poderia ser aproveitada para tarefas mais complexas (e.g. como reconhecimento de imagens com baixo número de pixels).

Desta forma mesmo adotando uma metodologia baseada em hardware como a indústria vem fazendo, há ainda muito espaço de otimização em relação às redes Deep Learning do ponto de vista arquitetural e algorítmico.

Abaixo uma lista de referências direto do Stack Exchange para quem quiser se aprofundar mais no assunto:

Algoritmos Neuro-Evolutivos

Aprendizado por Reforço:

Miscelânea:

PS: O WordPress retirou a opção de justificar texto, logo desculpem de antemão a aparência amadora do blog nos próximos dias.

 

Falhas na abordagem de Deep Learning: Arquiteturas e Meta-parametrização

Modularização do Morfismo de Redes Neurais

Quem foi que disse que não podem ocorrer alterações morfológicas nas arquiteturas/topologias de Redes Neurais?

Modularized Morphing of Neural Networks – Tao Wei, Changhu Wang, Chang Wen Chen

Abstract: In this work we study the problem of network morphism, an effective learning scheme to morph a well-trained neural network to a new one with the network function completely preserved. Different from existing work where basic morphing types on the layer level were addressed, we target at the central problem of network morphism at a higher level, i.e., how a convolutional layer can be morphed into an arbitrary module of a neural network. To simplify the representation of a network, we abstract a module as a graph with blobs as vertices and convolutional layers as edges, based on which the morphing process is able to be formulated as a graph transformation problem. Two atomic morphing operations are introduced to compose the graphs, based on which modules are classified into two families, i.e., simple morphable modules and complex modules. We present practical morphing solutions for both of these two families, and prove that any reasonable module can be morphed from a single convolutional layer. Extensive experiments have been conducted based on the state-of-the-art ResNet on benchmark datasets, and the effectiveness of the proposed solution has been verified.

Conclusions: This paper presented a systematic study on the problem of network morphism at a higher level, and tried to answer the central question of such learning scheme, i.e., whether and how a convolutional layer can be morphed into an arbitrary module. To facilitate the study, we abstracted a modular network as a graph, and formulated the process of network morphism as a graph transformation process. Based on this formulation, both simple morphable modules and complex modules have been defined and corresponding morphing algorithms have been proposed. We have shown that a convolutional layer can be morphed into any module of a network. We have also carried out experiments to illustrate how to achieve a better performing model based on the state-of-the-art ResNet with minimal extra computational cost on benchmark datasets.

Modularização do Morfismo de Redes Neurais

Aplicação de Deep Learning para relacionar Pins

Ao que parece, o Pinterest está virando a nova casa de força de Deep Learning aplicada à imagens.

Using deep learning to generate Related Pins

We built Pin2Vec to embed all the Pins in a 128-dimension space. First, we label a Pin with all the other Pins someone has saved in his/her activity session, each as a Pin tuple. Pin tuples are used in supervised training to train the embedding matrix for each of the tens of millions of Pins of the vocabulary. We use TensorFlow as the trainer. At serving time, a set of nearest neighbors are found as Related Pins in the space for each of the Pins.

Training data is collected from recent engagement, such as saving or clicking, and a sliding window is applied. Low quality Pins and those not engaged with are removed from training. Then, each Pin is assigned with a unique Pin ID. Within the sliding window, training pairs are extracted such that the first Pin is the example and each of the following Pins is its label. Figure 3 illustrates an example session and training pairs. In our case, you can imagine each user session is a sentence with Pins as words.

We used a feedforward neural network with a hidden layer of 128 dimensions. Figure 4 shows the architecture. The network is inspired by word2vec. The input vector is a one-hot vector with a size of vocabulary and, in our case, is tens of millions of Pins. The vector is reduced to the 128-dimension vector by multiplying with the hidden layer weight matrix. An eLu activation function is applied after hidden layer. At last the hidden layer output is multiplied with the softmax matrix and a cross-entropy is used to calculate the loss. We sampled 64 negative Pins in loss optimization in lieu of iterating on tens of millions of Pins. We trained the Pin2Vec embedding on machines with 32 cores and 244GB memory.

Aplicação de Deep Learning para relacionar Pins

Akid: Uma biblioteca de Redes Neurais para pesquisa e produção

Finalmente começaram a pensar em eliminar esse vale entre ciência/academia e indústria.

Akid: A Library for Neural Network Research and Production from a Dataism Approach – Shuai Li
Abstract: Neural networks are a revolutionary but immature technique that is fast evolving and heavily relies on data. To benefit from the newest development and newly available data, we want the gap between research and production as small as possibly. On the other hand, differing from traditional machine learning models, neural network is not just yet another statistic model, but a model for the natural processing engine — the brain. In this work, we describe a neural network library named {\texttt akid}. It provides higher level of abstraction for entities (abstracted as blocks) in nature upon the abstraction done on signals (abstracted as tensors) by Tensorflow, characterizing the dataism observation that all entities in nature processes input and emit out in some ways. It includes a full stack of software that provides abstraction to let researchers focus on research instead of implementation, while at the same time the developed program can also be put into production seamlessly in a distributed environment, and be production ready. At the top application stack, it provides out-of-box tools for neural network applications. Lower down, akid provides a programming paradigm that lets user easily build customized models. The distributed computing stack handles the concurrency and communication, thus letting models be trained or deployed to a single GPU, multiple GPUs, or a distributed environment without affecting how a model is specified in the programming paradigm stack. Lastly, the distributed deployment stack handles how the distributed computing is deployed, thus decoupling the research prototype environment with the actual production environment, and is able to dynamically allocate computing resources, so development (Devs) and operations (Ops) could be separated. 

Akid: Uma biblioteca de Redes Neurais para pesquisa e produção

Redes Neurais Coevolucionárias aplicadas na identificação do Mal de Parkinson

Mais um caso de aplicação de Deep Learning em questões médicas.

Convolutional Neural Networks Applied for Parkinson’s Disease Identification

Abstract: Parkinson’s Disease (PD) is a chronic and progressive illness that affects hundreds of thousands of people worldwide. Although it is quite easy to identify someone affected by PD when the illness shows itself (e.g. tremors, slowness of movement and freezing-of-gait), most works have focused on studying the working mechanism of the disease in its very early stages. In such cases, drugs can be administered in order to increase the quality of life of the patients. Since the beginning, it is well-known that PD patients feature the micrography, which is related to muscle rigidity and tremors. As such, most exams to detect Parkinson’s Disease make use of handwritten assessment tools, where the individual is asked to perform some predefined tasks, such as drawing spirals and meanders on a template paper. Later, an expert analyses the drawings in order to classify the progressive of the disease. In this work, we are interested into aiding physicians in such task by means of machine learning techniques, which can learn proper information from digitized versions of the exams, and them recommending a probability of a given individual being affected by PD depending on its handwritten skills. Particularly, we are interested in deep learning techniques (i.e. Convolutional Neural Networks) due to their ability into learning features without human interaction. Additionally, we propose to fine-tune hyper-arameters of such techniques by means of meta-heuristic-based techniques, such as Bat Algorithm, Firefly Algorithm and Particle Swarm Optimization.

Redes Neurais Coevolucionárias aplicadas na identificação do Mal de Parkinson

Para quem quiser saber um pouco mais das evoluções em relação a aplicação de aprendizado por reforço  e Deep Learning em sistemas autônomos, esse paper é uma boa pedida.

Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks

Abstract: We propose an inverse reinforcement learning (IRL) approach using Deep QNetworks to extract the rewards in problems with large state spaces. We evaluate the performance of this approach in a simulation-based autonomous driving scenario. Our results resemble the intuitive relation between the reward function and readings of distance sensors mounted at different poses on the car. We also show that, after a few learning rounds, our simulated agent generates collision-free motions and performs human-like lane change behaviour.

Conclusions: In this paper we proposed using Deep Q-Networks as the refinement step in Inverse Reinforcement Learning approaches. This enabled us to extract the rewards in scenarios with large state spaces such as driving, given expert demonstrations. The aim of this work was to extend the general approach to IRL. Exploring more advanced methods like Maximum Entropy IRL and the support for nonlinear reward functions is currently under investigation.

DeepCancer: Detectando câncer através de expressões genéticas via Deep Learning

Este paper trás uma implementação de Deep Learning que se confirmada pode ser um grande avanço na indústria de diagnósticos para os serviços de saúde, dado que através de aprendizado algorítmico podem ser identificados diversos tipos de genes cancerígenos e isso pode conter duas externalidades positivas que são 1) o barateamento e a rapidez no diagnóstico, e 2) reformulação total da estratégia de combate e prevenção de doenças.

DeepCancer: Detecting Cancer through Gene Expressions via Deep Generative Learning

Abstract: Transcriptional profiling on microarrays to obtain gene expressions has been used to facilitate cancer diagnosis. We propose a deep generative machine learning architecture (called DeepCancer) that learn features from unlabeled microarray data. These models have been used in conjunction with conventional classifiers that perform classification of the tissue samples as either being cancerous or non-cancerous. The proposed model has been tested on two different clinical datasets. The evaluation demonstrates that DeepCancer model achieves a very high precision score, while significantly controlling the false positive and false negative scores.

Conclusions: We presented a deep generative learning model DeepCancer for detection and classification of inflammatory breast cancer and prostate cancer samples. The features are learned through an adversarial feature learning process and then sent as input to a conventional classifier specific to the objective of interest. After modifications through specified hyperparameters, the model performs quite comparatively well on the task tested on two different datasets. The proposed model utilized cDNA microarray gene expressions to gauge its efficacy. Based on deep generative learning, the tuned discriminator and generator models, D and G respectively, learned to differentiate between the gene signatures without any intermediate manual feature handpicking, indicating that much bigger datasets can be experimented on the proposed model more seamlessly. The DeepCloud model will be a vital aid to the medical imaging community and, ultimately, reduce inflammatory breast cancer and prostate cancer mortality.

DeepCancer: Detectando câncer através de expressões genéticas via Deep Learning

Hardware para Machine Learning: Desafios e oportunidades

Um ótimo paper de como o hardware vai exercer função crucial em alguns anos em relação à Core Machine Learning, em especial em sistemas embarcados.

Hardware for Machine Learning: Challenges and Opportunities

Abstract—Machine learning plays a critical role in extracting meaningful information out of the zetabytes of sensor data collected every day. For some applications, the goal is to analyze and understand the data to identify trends (e.g., surveillance, portable/wearable electronics); in other applications, the goal is to take immediate action based the data (e.g., robotics/drones, self-driving cars, smart Internet of Things). For many of these applications, local embedded processing near the sensor is preferred over the cloud due to privacy or latency concerns, or limitations in the communication bandwidth. However, at the sensor there are often stringent constraints on energy consumption and cost in addition to throughput and accuracy requirements. Furthermore, flexibility is often required such that the processing can be adapted for different applications or environments (e.g., update the weights and model in the classifier). In many applications, machine learning often involves transforming the input data into a higher dimensional space, which, along with programmable weights, increases data movement and consequently energy consumption. In this paper, we will discuss how these challenges can be addressed at various levels of hardware design ranging from architecture, hardware-friendly algorithms, mixed-signal circuits, and advanced technologies (including memories and sensors).

Conclusions: Machine learning is an important area of research with many promising applications and opportunities for innovation at various levels of hardware design. During the design process, it is important to balance the accuracy, energy, throughput and cost requirements. Since data movement dominates energy consumption, the primary focus of recent research has been to reduce the data movement while maintaining performance accuracy, throughput and cost. This means selecting architectures with favorable memory hierarchies like a spatial array, and developing dataflows that increase data reuse at the low-cost levels of the memory hierarchy. With joint design of algorithm and hardware, reduced bitwidth precision, increased sparsity and compression are used to minimize the data movement requirements. With mixed-signal circuit design and advanced technologies, computation is moved closer to the source by embedding computation near or within the sensor and the memories. One should also consider the interactions between these different levels. For instance, reducing the bitwidth through hardware-friendly algorithm design enables reduced precision processing with mixed-signal circuits and non-volatile memory. Reducing the cost of memory access with advanced technologies could result in more energy-efficient dataflows.

Hardware para Machine Learning: Desafios e oportunidades

Um novo operador Softmax para Aprendizado por Reforço

Em alguns problemas de classificação para fugir do espaço restrito dos scores da probabilidade (suma que vai até um) de uma determinada tupla pertencer a uma classe, o operador softmax faz o mapeamento de um vetor para uma probabilidade de uma determinada classe em problemas de classificação.

No Quora tem uma ótima definição dessa função, e de como ela é utilizada como função de ativação de forma combinada em uma rede neural através da transmissão via axiônios.

A função Softmax Logística é definida como:

Onde o θ representa um vetor de pesos, e x é um vetor de valores de input, em que essa função produz um output escalar definido por hθ(x),0<hθ(x)<1. Para quem quiser se aprofundar mais essa explicação está definitivamente matadora.

Essa pequena introdução foi para mostrar esse artigo abaixo que tem uma ótima abordagem para o Softmax no contexto de aprendizado por reforço. Enjoy.

A New Softmax Operator for Reinforcement Learning

Abstract:  A softmax operator applied to a set of values acts somewhat like the maximization function and somewhat like an average. In sequential decision making, softmax is often used in settings where it is necessary to maximize utility but also to hedge against problems that arise from putting all of one’s weight behind a single maximum utility decision. The Boltzmann softmax operator is the most commonly used softmax operator in this setting, but we show that this operator is prone to misbehavior. In this work, we study an alternative softmax operator that, among other properties, is both a non-expansion (ensuring convergent behavior in learning and planning) and differentiable (making it possible to improve decisions via gradient descent methods). We provide proofs of these properties and present empirical comparisons between various softmax operators.

Conclusions: We proposed the mellowmax operator as an alternative for the Boltzmann operator. We showed that mellowmax has several desirable properties and that it works favorably in practice. Arguably, mellowmax could be used in place of Boltzmann throughout reinforcement-learning research. Important future work is to expand the scope of investigation to the function approximation setting in which the state space or the action space is large and abstraction techniques are used. We expect mellowmax operator and its non-expansion property to behave more consistently than the Boltzmann operator when estimates of state–action values can be arbitrarily inaccurate. Another direction is to analyze the fixed point of planning, reinforcement-learning, and game-playing algorithms when using softmax and mellowmax operators. In particular, an interesting analysis could be one that bounds the suboptimality of fixed points found by value iteration under each operator. Finally, due to the convexity (Boyd & Vandenberghe, 2004) of mellowmax, it is compelling to use this operator in a gradient ascent algorithm in the context of sequential decision making. Inverse reinforcement-learning algorithms is a natural candidate given the popularity of softmax in this setting.

Um novo operador Softmax para Aprendizado por Reforço

Deep Learning para análise de séries temporais

Por mais que problemas de reconhecimento de imagens, ou mesmo de segmentação sonora estejam em alta em Deep Learning, 90% dos problemas do mundo quando falamos de dados, passam por dados estruturados, em especial séries temporais. Esse paper mostra uma metodologia pouco convencional (a transformação de séries temporais em uma ‘imagem’ para o uso de uma Rede Coevolucionária) mas que pode mostrar que o céu é o limite quando falamos de arranjos para solução de problemas de predição usando dados estruturados.

Deep Learning for Time-Series Analysis – John Cristian Borges Gamboa

Abstract: In many real-world application, e.g., speech recognition or sleep stage classification, data are captured over the course of time, constituting a Time-Series. Time-Series often contain temporal dependencies that cause two otherwise identical points of time to belong to different classes or predict different behavior. This characteristic generally increases the difficulty of analysing them. Existing techniques often depended on hand-crafted features that were expensive to create and required expert knowledge of the field. With the advent of Deep Learning new models of unsupervised learning of features for Time-series analysis and forecast have been developed. Such new developments are the topic of this paper: a review of the main Deep Learning techniques is presented, and some applications on Time-Series analysis are summaried. The results make it clear that Deep Learning has a lot to contribute to the field.

Conclusions: When applying Deep Learning, one seeks to stack several independent neural network layers that, working together, produce better results than the already existing shallow structures. In this paper, we have reviewed some of these modules, as well the recent work that has been done by using them, found in the literature. Additionally, we have discussed some of the main tasks normally performed when manipulating Time-Series data using deep neural network structures. Finally, a more specific focus was given on one work performing each one of these tasks. Employing Deep Learning to Time-Series analysis has yielded results in these cases that are better than the previously existing techniques, which is an evidence that this is a promising field for improvement.

deep-learning-for-time-series-analysis

Deep Learning para análise de séries temporais

DeepStack: Sistema Especialista de Inteligência Artificial para o jogo de Poker

Esse paper to DeepStack, caso seja reprodutível, pode representar um avanço significativo em relação a todo eixo em que a Inteligência Artificial está hoje, em especial em problemas de informação assimétrica.

Como os autores salientam, jogos de Damas, Xadrez e Go partem de um princípio básico de que a informação é simétrica entre os jogadores; em outras palavras, há um determinado determinismo em relação às ações dos adversários.

O Poker por sua vez tem como principal característica ser um jogo em que há um algo grau de não-determinismo seja no River, na mão (cartas) dos oponentes, bem como no tão famigerado blefe (que não passa de um bom problema estocástico).

De qualquer maneira, para quem é especialista ou não em AI ou Machine Learning vale a pena conferir a modelagem e os resultados do Deep Stack.

DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker

Abstract: Artificial intelligence has seen a number of breakthroughs in recent years, with games often serving as significant milestones. A common feature of games with these successes is that they involve information symmetry among the players, where all players have identical information. This property of perfect information, though, is far more common in games than in real-world problems. Poker is the quintessential game of imperfect information, and it has been a longstanding challenge problem in artificial intelligence. In this paper we introduce DeepStack, a new algorithm for imperfect information settings such as poker. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition about arbitrary poker situations that is automatically learned from selfplay games using deep learning. In a study involving dozens of participants and 44,000 hands of poker, DeepStack becomes the first computer program to beat professional poker players in heads-up no-limit Texas hold’em. Furthermore, we show this approach dramatically reduces worst-case exploitability compared to the abstraction paradigm that has been favored for over a decade

Conclusions: DeepStack is the first computer program to defeat professional poker players at heads-up nolimit Texas Hold’em, an imperfect information game with 10160 decision points. Notably it achieves this goal with almost no domain knowledge or training from expert human games. The implications go beyond just being a significant milestone for artificial intelligence. DeepStack is a paradigmatic shift in approximating solutions to large, sequential imperfect information games. Abstraction and offline computation of complete strategies has been the dominant approach for almost 20 years (29,36,37). DeepStack allows computation to be focused on specific situations that arise when making decisions and the use of automatically trained value functions. These are two of the core principles that have powered successes in perfect information games, albeit conceptually simpler to implement in those settings. As a result, for the first time the gap between the largest perfect and imperfect information games to have been mastered is mostly closed. As “real life consists of bluffing… deception… asking yourself what is the other man going to think” (9), DeepStack also has implications for seeing powerful AI applied more in settings that do not fit the perfect information assumption. The old paradigm for handling imperfect information has shown promise in applications like defending strategic resources (38) and robust decision making as needed for medical treatment recommendations (39). The new paradigm will hopefully open up many more possibilities.

deepstack-expert-level-artificial-intelligence-in-no-limit-poker

DeepStack: Sistema Especialista de Inteligência Artificial para o jogo de Poker

Principais soluções do H2O.ai

Agora que já sabemos um pouco sobre essa solução, vamos entender um pouco do ecossistema de soluções do H2O, e ver as principais características e aplicações de cada uma.

H2O

Essa plataforma é o carro chefe da empresa, o qual eles apostam tanto na versão Desktop para aplicação de Machine Learning quando também na versão para processamento distribuído para altos volumes de dados.

Essa versão tem alguns algoritmos prontos on the shelf como boosting, regressão linear e logística, algoritmos baseados em árvores, e alguns algoritmos que utilizam gradiente como método de otimização. Nada muito complexo, mas bem funcional.

Essa versão é ideal para usar se você quer conhecer um pouco mais da ferramenta e não quer gastar muito tempo instalando ou configurando coisas antes de sair aplicando os algoritmos, ou mesmo para um teste inicial das funções de processamento distribuído em cluster. 

Abaixo, um pouco da arquitetura da solução:

h2oarch

Sparkling Water

Essa solução tem como principal característica utilizar os próprios algoritmos, mas com a vantagem de usar todas as features de processamento distribuído e integração do Spark. Nesta solução todas as tarefas de computação também podem ser feitas dentro do Spark usando Scala e com uma interface via Web.

Essa solução é a mais recomendada para construção de aplicações de Machine Learning seja para microserviços ou até mesmo para embutir dentro de uma plataforma/sistema toda a parte algorítmica e computacional.

h2ospark h2ospark2

Deep Water

O Deep Water é a solução voltada para implementação de Deep Learning usando otimização computacional com GPUs com frameworks como o Tensor Flow, Theano, Caffe entre outros.

Neste caso, a plataforma do H2O será a interface onde serão incorporados todos os parâmetros de treinamento do modelo (cross validation, amostragem, critério de parada, hiperparametrização, etc) e o backend com o Tensor Flow, Theano etc. faz o processamento utilizando GPUs.

Steam

O Steam é uma plataforma que realiza todo o link entre os modelos de machine learning usando o H2O e também propriedades de desenvolvimento para incorporar modelos de Machine Learning em aplicações, tudo isso de forma colaborativa, algo muito similar ao Domino.

A principal vantagem do Steam é que ele abstraí toda a parte de engenharia por trás da tarefa de incorporar modelos e machine learning em produção como infraestrutura, auto-scale de infra estrutura de acordo com a carga de requisições, bem com algumas tarefas de Data Science como retreinamento de modelos; além de reduzir e muito os custos/investimentos de TI.

steam

Agora que sabemos quais são os principais produtos do H2O, em breve teremos alguns posts com alguns tutoriais explorando um pouco mais essa ferramenta.

Links úteis

Documentação Técnica

Principais soluções do H2O.ai

Alpha Go: O maior avanço no campo de Redes Neurais e Inteligência Artificial de 2016

Sem sombra de dúvidas o maior avanço de 2016 para a Inteligência Artificial/ Redes Neurais foi a vitória do Alpha Go sobre o Lee Sedol no jogo de Go.

Diferentemente da época do Deep Blue que derrotou o Gary Kasparov usando uma versão do algoritmo de busca exaustiva com um poder computacional muito alto na época (i.e. a cada movimento do jogo o Deep Blue calculava todas as possibilidades, e através de uma função de avaliação de cada resultado usava o resultado como heurística para o próximo movimento) .

Pequeno vídeo sobre o confronto:

Better Computer Go Player with Neural Network and Long-term Prediction

Yuandong Tian, Yan Zhu

Competing with top human players in the ancient game of Go has been a long-term goal of artificial intelligence. Go’s high branching factor makes traditional search techniques ineffective, even on leading-edge hardware, and Go’s evaluation function could change drastically with one stone change. Recent works [Maddison et al. (2015); Clark & Storkey (2015)] show that search is not strictly necessary for machine Go players. A pure pattern-matching approach, based on a Deep Convolutional Neural Network (DCNN) that predicts the next move, can perform as well as Monte Carlo Tree Search (MCTS)-based open source Go engines such as Pachi [Baudis & Gailly (2012)] if its search budget is limited. We extend this idea in our bot named darkforest, which relies on a DCNN designed for long-term predictions. Darkforest substantially improves the win rate for pattern-matching approaches against MCTS-based approaches, even with looser search budgets. Against human players, the newest versions, darkfores2, achieve a stable 3d level on KGS Go Server as a ranked bot, a substantial improvement upon the estimated 4k-5k ranks for DCNN reported in Clark & Storkey (2015) based on games against other machine players. Adding MCTS to darkfores2 creates a much stronger player named darkfmcts3: with 5000 rollouts, it beats Pachi with 10k rollouts in all 250 games; with 75k rollouts it achieves a stable 5d level in KGS server, on par with state-of-the-art Go AIs (e.g., Zen, DolBaram, CrazyStone) except for AlphaGo [Silver et al. (2016)]; with 110k rollouts, it won the 3rd place in January KGS Go Tournament.

Conclusão

In this paper, we have substantially improved the performance of DCNN-based Go AI, extensively evaluated it against both open source engines and strong amateur human players, and shown its potentials if combined with Monte-Carlo Tree Search (MCTS). Ideally, we want to construct a system that combines both pattern matching and search, and can be trained jointly in an online fashion. Pattern matching with DCNN is good at global board reading, but might fail to capture special local situations. On the other hand, search is excellent in modeling arbitrary situations, by building a local non-parametric model for the current state, only when the computation cost is affordable. One paradigm is to update DCNN weights (i.e., Policy Gradient [Sutton et al. (1999)]) after MCTS completes and chooses a different best move than DCNN’s proposal. To increase the signal bandwidth, we could also update weights using all the board situations along the trajectory of the best move. Alternatively, we could update the weights when MCTS is running. Actor-Critics algorithms [Konda & Tsitsiklis (1999)] can also be used to train two models simultaneously, one to predict the next move (actor) and the other to evaluate the current board situation (critic). Finally, local tactics training (e.g., Life/Death practice) focuses on local board situation with fewer variations, which DCNN approaches should benefit from like human players.

Alpha Go: O maior avanço no campo de Redes Neurais e Inteligência Artificial de 2016

Sim, você deveria entender o backpropagation!

Por Andrej Karpathy

Backpropagation is a leaky abstraction; it is a credit assignment scheme with non-trivial consequences. If you try to ignore how it works under the hood because “TensorFlow automagically makes my networks learn”, you will not be ready to wrestle with the dangers it presents, and you will be much less effective at building and debugging neural networks.
The good news is that backpropagation is not that difficult to understand, if presented properly. I have relatively strong feelings on this topic because it seems to me that 95% of backpropagation materials out there present it all wrong, filling pages with mechanical math. Instead, I would recommend the CS231n lecture on backprop which emphasizes intuition (yay for shameless self-advertising). And if you can spare the time, as a bonus, work through the CS231n assignments, which get you to write backprop manually and help you solidify your understanding.

Sim, você deveria entender o backpropagation!

Simple Black-Box Adversarial Perturbations for Deep Networks

por Nina Narodytska, Shiva Prasad Kasiviswanathan

Deep neural networks are powerful and popular learning models that achieve state-of-the-art pattern recognition performance on many computer vision, speech, and language processing tasks. However, these networks have also been shown susceptible to carefully crafted adversarial perturbations which force misclassification of the inputs. Adversarial examples enable adversaries to subvert the expected system behavior leading to undesired consequences and could pose a security risk when these systems are deployed in the real world.
In this work, we focus on deep convolutional neural networks and demonstrate that adversaries can easily craft adversarial examples even without any internal knowledge of the target network. Our attacks treat the network as an oracle (black-box) and only assume that the output of the network can be observed on the probed inputs. Our first attack is based on a simple idea of adding perturbation to a randomly selected single pixel or a small set of them. We then improve the effectiveness of this attack by carefully constructing a small set of pixels to perturb by using the idea of greedy local-search. Our proposed attacks also naturally extend to a stronger notion of misclassification. Our extensive experimental results illustrate that even these elementary attacks can reveal a deep neural network’s vulnerabilities. The simplicity and effectiveness of our proposed schemes mean that they could serve as a litmus test for designing robust networks.

Simple Black-Box Adversarial Perturbations for Deep Networks

Previsões para Deep Learning para 2017

Por James Kobielus da IBM.

(…)The first hugely successful consumer application of deep learning will come to market: I predict that deep learning’s first avid embrace by the general public will come in 2017. And I predict that it will be to process the glut of photos that people are capturing with their smartphones and sharing on social media. In this regard, the golden deep-learning opportunities will be in apps that facilitate image search, auto-tagging, auto-correction, embellishment, photorealistic rendering, resolution enhancement, style transformation, and fanciful figure inception. (…)

(…)A dominant open-source deep-learning tool and library will take the developer community by storm: As 2016 draws to a close, we’re seeing more solution providers open-source their deep learning tools, libraries, and other intellectual property. This past year, Google open-sourced its DeepMind and TensorFlow code, Apple published its deep-learning research, and the OpenAI non-profit group has started to build its deep-learning benchmarking technology. Already, developers have a choice of open-source tools for development of deep-learning applications in Spark, Scala, Python, and Java, with support for other languages sure to follow. In addition to DeepMind and TensorFlow, open tools for deep-learning development currently include DeepLearning4J, Keras, Caffe, Theano, Torch, OpenBLAS and Mxnet.(…)

(…)A new generation of low-cost commercial off-the-shelf deep-learning chipsets will come to market: Deep learning relies on the application of multilevel neural-network algorithms to high-dimensional data objects. As such, it requires the execution of fast-matrix manipulations in highly parallel architectures in order to identify complex, elusive patterns—such as objects, faces, voices, threats, etc. For high-dimensional deep learning to become more practical and pervasive, the underlying pattern-crunching hardware needs to become faster, cheaper, more scalable, and more versatile. Also, the hardware needs to become capable of processing data sets that will continue to grow in dimensionality as new sources are added, merged with other data, and analyzed by deep learning algorithms of greater sophistication. (…)

(…)The algorithmic repertoire of deep learning will grow more diverse and sophisticated: Deep learning remains a fairly arcane, specialized, and daunting technology to most data professionals. The growing adoption of deep learning in 2017 will compel data scientists and other developers to grow their expertise in such cutting-edge techniques as recurrent neural networks, deep convolutional networks, deep belief networks, restricted Boltzmann machines, and stacked auto-encoders. (…)

Previsões para Deep Learning para 2017

Como um agricultor do Japão está usando Deep Learning para seleção de pepinos

Direto do blog da Google

[…]Here’s a systems diagram of the cucumber sorter that Makoto built. The system uses Raspberry Pi 3 as the main controller to take images of the cucumbers with a camera, and in a first phase, runs a small-scale neural network on TensorFlow to detect whether or not the image is of a cucumber. It then forwards the image to a larger TensorFlow neural network running on a Linux server to perform a more detailed classification.

cucumber-farmer-14

Makoto used the sample TensorFlow code Deep MNIST for Experts with minor modifications to the convolution, pooling and last layers, changing the network design to adapt to the pixel format of cucumber images and the number of cucumber classes.[…]

[…]One of the current challenges with deep learning is that you need to have a large number of training datasets. To train the model, Makoto spent about three months taking 7,000 pictures of cucumbers sorted by his mother, but it’s probably not enough.

“When I did a validation with the test images, the recognition accuracy exceeded 95%. But if you apply the system with real use cases, the accuracy drops down to about 70%. I suspect the neural network model has the issue of “overfitting” (the phenomenon in neural network where the model is trained to fit only to the small training dataset) because of the insufficient number of training images.”

The second challenge of deep learning is that it consumes a lot of computing power. The current sorter uses a typical Windows desktop PC to train the neural network model. Although it converts the cucumber image into 80 x 80 pixel low-resolution images, it still takes two to three days to complete training the model with 7,000 images.[…]

Como um agricultor do Japão está usando Deep Learning para seleção de pepinos

Desenvolvimento e validação de um algoritmo de Deep Learning para detecção da retinopatia diabética em fotografias de fundo retinal

Em 2008 quando eu iniciei grande parte dos meus estudos e interesse em Data Mining (que é uma disciplina irmã do que chamamos hoje de Data Science) desde aquela época tinha uma convicção muito forte de que os dados seriam o motor do que estamos vivendo hoje que é a quarta revolução industrial; revolução esta que tem os dados como principal combustível seja nas mas diversas áreas do conhecimento humano.

Dito isso, a Google juntamente com pesquisadores dos times da University of Texas, 3EyePACS LLC, University of California, Aravind Medical Research Foundation, Shri Bhagwan Mahavir Vitreoretinal Services, Verily Life Sciences e da Harvard Medical School conseguiram um avanço da aplicação de Deep Learning que confirma essa tese acima.

A retinopatia diabética é uma doença causada pela evolução de um quadro de diabetes em que os vasos sanguíneos do fundo da retina sofrem algum tipo de dilatação ou rompimento e que e não tratada pode causar cegueira. É uma doença que tem mais de 150 mil casos no brasil e não tem cura. Contudo, se o diagnóstico for realizado de forma antecipada o tratamento pode ajudar na minimização da cegueira.

E a aplicação de Deep Learning por esses pesquisadores ocorreu para auxiliar os médicos no diagnóstico desta doença.

retinopatia

Colocando de forma bem resumida: esse time utilizou Deep Learning para detecção da retinopatia diabética usando uma base de fotografias de fundo retinal de treinamento e obteve 90.3% e 87.0% de sensibilidade (i.e. a capacidade do algoritmo saber corretamente quem está com a doença ou não dos casos em que a doença é presente) e 98.1% e 98.5% de especificidade (i.e. o grau de precisão do algoritmo para identificar corretamente as pessoas que não tem a doença em casos em que de fato a doença não está presente) na detecção da retinopatia diabética.

Essa alta taxa de especificidade de 98.1% (i.e. capacidade de saber corretamente quem não tem a doença dentro do grupo que realmente não tem, ou seja, saber identificar corretamente a classe negativa) direciona de forma muito assertiva os tratamentos no sentido em que há uma certeza maior nos resultados da classe negativa (e essa forma os esforços não seriam direcionados de fato para quem não tem a doença).

Uma ideia de estratégia de saúde preventiva ou mesmo de diagnósticos pontuais é que após o exame de imagem a fotografia de fundo de retina passaria pelo algoritmo de Deep Learning já treinado, e caso o resultado do teste fosse positivo , o médico já iniciaria o tratamento e ao mesmo tempo a rede hospitalar e de medicamentos seria notificada de mais um caso, e haveria automaticamente o provisionamento de medicação e alocação de recursos médicos para o acompanhamento/tratamento.

Isso sem dúvidas muda todo o eixo de como vemos a medicina hoje, dado que isso daria um ganho ainda maior de eficiência para a medicina preventiva e evitaria a ineficiente (e bem mais cara) medicina curativa.

Abaixo os principais resultados do estudo.

Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs

Varun Gulshan, PhD1; Lily Peng, MD, PhD1; Marc Coram, PhD1; et al Martin C. Stumpe, PhD1; Derek Wu, BS1; Arunachalam Narayanaswamy, PhD1; Subhashini Venugopalan, MS1,2; Kasumi Widner, MS1; Tom Madams, MEng1; Jorge Cuadros, OD, PhD3,4; Ramasamy Kim, OD, DNB5; Rajiv Raman, MS, DNB6; Philip C. Nelson, BS1; Jessica L. Mega, MD, MPH7,8; Dale R. Webster, PhD1

Key Points
Question How does the performance of an automated deep learning algorithm compare with manual grading by ophthalmologists for identifying diabetic retinopathy in retinal fundus photographs?

Finding In 2 validation sets of 9963 images and 1748 images, at the operating point selected for high specificity, the algorithm had 90.3% and 87.0% sensitivity and 98.1% and 98.5% specificity for detecting referable diabetic retinopathy, defined as moderate or worse diabetic retinopathy or referable macular edema by the majority decision of a panel of at least 7 US board-certified ophthalmologists. At the operating point selected for high sensitivity, the algorithm had 97.5% and 96.1% sensitivity and 93.4% and 93.9% specificity in the 2 validation sets.

Meaning Deep learning algorithms had high sensitivity and specificity for detecting diabetic retinopathy and macular edema in retinal fundus photographs.

Abstract
Importance Deep learning is a family of computational methods that allow an algorithm to program itself by learning from a large set of examples that demonstrate the desired behavior, removing the need to specify rules explicitly. Application of these methods to medical imaging requires further assessment and validation.

Objective To apply deep learning to create an algorithm for automated detection of diabetic retinopathy and diabetic macular edema in retinal fundus photographs.

Design and Setting A specific type of neural network optimized for image classification called a deep convolutional neural network was trained using a retrospective development data set of 128 175 retinal images, which were graded 3 to 7 times for diabetic retinopathy, diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists and ophthalmology senior residents between May and December 2015. The resultant algorithm was validated in January and February 2016 using 2 separate data sets, both graded by at least 7 US board-certified ophthalmologists with high intragrader consistency.

Exposure Deep learning–trained algorithm.

Main Outcomes and Measures The sensitivity and specificity of the algorithm for detecting referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy, referable diabetic macular edema, or both, were generated based on the reference standard of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2 operating points selected from the development set, one selected for high specificity and another for high sensitivity.

Results The EyePACS-1 data set consisted of 9963 images from 4997 patients (mean age, 54.4 years; 62.2% women; prevalence of RDR, 683/8878 fully gradable images [7.8%]); the Messidor-2 data set had 1748 images from 874 patients (mean age, 57.6 years; 42.6% women; prevalence of RDR, 254/1745 fully gradable images [14.6%]). For detecting RDR, the algorithm had an area under the receiver operating curve of 0.991 (95% CI, 0.988-0.993) for EyePACS-1 and 0.990 (95% CI, 0.986-0.995) for Messidor-2. Using the first operating cut point with high specificity, for EyePACS-1, the sensitivity was 90.3% (95% CI, 87.5%-92.7%) and the specificity was 98.1% (95% CI, 97.8%-98.5%). For Messidor-2, the sensitivity was 87.0% (95% CI, 81.1%-91.0%) and the specificity was 98.5% (95% CI, 97.7%-99.1%). Using a second operating point with high sensitivity in the development set, for EyePACS-1 the sensitivity was 97.5% and specificity was 93.4% and for Messidor-2 the sensitivity was 96.1% and specificity was 93.9%.

Conclusions and Relevance In this evaluation of retinal fundus photographs from adults with diabetes, an algorithm based on deep machine learning had high sensitivity and specificity for detecting referable diabetic retinopathy. Further research is necessary to determine the feasibility of applying this algorithm in the clinical setting and to determine whether use of the algorithm could lead to improved care and outcomes compared with current ophthalmologic assessment.

Desenvolvimento e validação de um algoritmo de Deep Learning para detecção da retinopatia diabética em fotografias de fundo retinal