Densely Connected Convolutional Networks – implementations

Abstract: Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections – one between each layer and its subsequent layer – our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance.

Densely Connected Convolutional Networks – implementations

Prevendo eventos extremos no Uber com LSTM

Em breve teremos alguns posts aqui no blog sobre o assunto, mas é um case de ML com engenharia de caras questão mandando bem com métodos bem avançados com arquiteturas escaláveis.

ENGINEERING EXTREME EVENT FORECASTING AT UBER WITH RECURRENT NEURAL NETWORKS J – BY NIKOLAY LAPTEV, SLAWEK SMYL, & SANTHOSH SHANMUGAM

We ultimately settled on conducting time series modeling based on the Long Short Term Memory (LSTM) architecture, a technique that features end-to-end modeling, ease of incorporating external variables, and automatic feature extraction abilities.4 By providing a large amount of data across numerous dimensions, an LSTM approach can model complex nonlinear feature interactions.

We decided to build a neural network architecture that provides single-model, heterogeneous forecasting through an automatic feature extraction module.6 As Figure 4 demonstrates, the model first primes the network by automatic, ensemble-based feature extraction. After feature vectors are extracted, they are averaged using a standard ensemble technique. The final vector is then concatenated with the input to produce the final forecast.

During testing, we were able to achieve a 14.09 percent symmetric mean absolute percentage error (SMAPE) improvement over the base LSTM architecture and over 25 percent improvement over the classical time series model used in Argos, Uber’s real-time monitoring and root cause-exploration tool.

Prevendo eventos extremos no Uber com LSTM

Pagou por múltiplas GPUs na Azure e não consegue usar com Deep Learning? Este post é pra você.

Você foi lá no portal da Azure pegou uma NC24R que tem maravilhosos 224 Gb de memória, 24 núcleos, mais de 1Tb de disco, e o melhor: 4 placas M80 para o seu deleite completo no treinamento com Deep Learning.

Tudo perfeito, certo? Quase.

Logo de início tentei usar um script para um treinamento e com um simples htop para monitorar o treinamento, vi que o Tensor Flow estava despejando todo o treinamento nos processadores.

Mesmo com esses 24 processadores maravilhosos batendo 100% de processamento, isso não chega nem perto do que as nossas GPUs mastodônticas podem produzir. (Nota: Você não trocaria 4 Ferraris 2017 por 24 Fiat 147 modelo 1985, certo?)

Acessando a nossa maravilhosa máquina para ver o que tinha acontecido, verifiquei primeiro se as GPUs estavam na máquina, o que de fato aconteceu.

azure_teste@deep-learning:~$ nvidia-smi
Tue Jun 27 18:21:05 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | B2A3:00:00.0     Off |                    0 |
| N/A   47C    P0    71W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | C4D8:00:00.0     Off |                    0 |
| N/A   57C    P0    61W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | D908:00:00.0     Off |                    0 |
| N/A   52C    P0    56W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | EEAF:00:00.0     Off |                    0 |
| N/A   42C    P0    69W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
azure_teste@deep-learning:~/deep-learning-moderator-msft$ lspci | grep -i NVIDIA
b2a3:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
c4d8:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
d908:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
eeaf:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

OK, as Teslas K80 estão prontas para o combate, contudo a própria Azure reconhece que há problemas no processo como um todo, e pra fazer isso basta alguns procedimentos bem simples em dois passos segundo a documentação.

PASSO 1

1) Clone o repositório do Git (Sim, eu dei fork pois rotineiramente isso costuma sumir por motivos que nem sempre sabemos).

$ git clone https://github.com/leestott/Azure-GPU-Setup.git

2) Entre na pasta que foi clonada

$ cd azure-gpu-setup

3) Execute o script que irá instalar algumas libs da NVIDIA e posteriormente fará o servidor reiniciar.

$ bash gpu-setup-part1.sh

PASSO 2

1) Vá na pasta do repositório do git

$ cd azure-gpu-setup

2) Execute o segundo script que fará a instalação do Tensorflow, do Toolkit do CUDA, e do CUDNN além de fazer o set de uma porção de variáveis de ambiente.

$ bash gpu-setup-part2.sh

3) Depois faça o teste da instalação

$ python gpu-test.py

Depois disso é só aproveitar as suas GPUs em carga total e aproveitar para treinar as suas GPUs.

Pagou por múltiplas GPUs na Azure e não consegue usar com Deep Learning? Este post é pra você.

Página do professor Jürgen Schmidhuber

Para quem não sabe, o professor Jürgen Schmidhuber é um dos pioneiros em pesquisa aplicada de Deep Learning.

Ele atesta em uma de suas palestras, que as Redes Deep Learning são datadas de 1971 quando o Aleksei Ivakhnenko e Valentin Lapa publicaram o primeiro trabalho com uma rede de 8 camadas.

Para quem quiser saber mais das suas ideias em relação a perspectivas em Deep Learning esse Ask Me Anything (AMA ou pergunte-me qualquer coisa) no Reddit é bem legal.

E aqui para baixar no post a sua revisão sistemática em relação à Deep Learning.

cse597g-deep_learning

Página do professor Jürgen Schmidhuber

Cleverhans – Lib python para prevenção de ataques de ruídos nos modelos

É um tema bem recente o ataque em sensores e por consequência em modelos de Machine Learning (isso será abordado em algum momento do futuro por aqui, mas esse artigo mostra bem o potencial danoso disso).

O Cleverhans é uma lib em Python que insere de maneira artificial um pouco de ruído/distúrbio na rede como forma de treino para esse tipo de ataque.

This repository contains the source code for cleverhans, a Python library to benchmark machine learning systems’ vulnerability to adversarial examples. You can learn more about such vulnerabilities on the accompanying blog.

The cleverhans library is under continual development, always welcoming contributions of the latest attacks and defenses. In particular, we always welcome help towards resolving the issues currently open.

About the name

The name cleverhans is a reference to a presentation by Bob Sturm titled “Clever Hans, Clever Algorithms: Are Your Machine Learnings Learning What You Think?” and the corresponding publication, “A Simple Method to Determine if a Music Information Retrieval System is a ‘Horse’.” Clever Hans was a horse that appeared to have learned to answer arithmetic questions, but had in fact only learned to read social cues that enabled him to give the correct answer. In controlled settings where he could not see people’s faces or receive other feedback, he was unable to answer the same questions. The story of Clever Hans is a metaphor for machine learning systems that may achieve very high accuracy on a test set drawn from the same distribution as the training data, but that do not actually understand the underlying task and perform poorly on other inputs.

Cleverhans – Lib python para prevenção de ataques de ruídos nos modelos

Interpretabilidade versus Desempenho: Ceticismo e AI-Winter

Neste post do Michael Elad que é editor chefe da SIAM da publicação Journal on Imaging Sciences ele faz uma série de reflexões bem ponderadas de como os métodos de Deep Learning estão resolvendo problemas reais e alcançando um alto grau de visibilidade mesmo com métodos não tão elegantes dentro da perspectiva matemática.

Ele coloca o ponto principal de que, no que tange o processamento de imagens, a academia teve sempre um lugar de destaque em uma abordagem na qual a interpretabilidade e o entendimento do modelos sempre teve precedência em relação aos resultados alcançados.

Isso fica claro no parágrafo abaixo:

A series of papers during the early 2000s suggested the successful application of this architecture, leading to state-of-the-art results in practically any assigned task. Key aspects in these contributions included the following: the use of many network layers, which explains the term “deep learning;” a huge amount of data on which to train; massive computations typically run on computer clusters or graphic processing units; and wise optimization algorithms that employ effective initializations and gradual stochastic gradient learning. Unfortunately, all of these great empirical achievements were obtained with hardly any theoretical understanding of the underlying paradigm. Moreover, the optimization employed in the learning process is highly non-convex and intractable from a theoretical viewpoint.

No final ele coloca uma visão sobre pragmatismo e agenda acadêmica:

Should we be happy about this trend? Well, if we are in the business of solving practical problems such as noise removal, the answer must be positive. Right? Therefore, a company seeking such a solution should be satisfied. But what about us scientists? What is the true objective behind the vast effort that we invested in the image denoising problem? Yes, we do aim for effective noise-removal algorithms, but this constitutes a small fraction of our motivation, as we have a much wider and deeper agenda. Researchers in our field aim to understand the data on which we operate. This is done by modeling information in order to decipher its true dimensionality and manifested phenomena. Such models serve denoising and other problems in image processing, but far more than that, they allow identifying new ways to extract knowledge from the data and enable new horizons.

Isso lembra uma passagem minha na RCB Investimentos quando eu trabalhava com o grande Renato Toledo no mercado de NPL em que ele me ensinou que bons modelos têm um alto grau de interpretabilidade e simplicidade, no qual esse fator deve ser o barômetro da tomada de decisão, dado que um modelo cujo a sua incerteza (ou erro) seja conhecido é melhor do que um modelo que ninguém sabe o que está acontecendo (Nota pessoal: Quem me conhece sabe que eu tenho uma frase sobre isso que é: se você não entende a dinâmica do modelo quando ele funciona, nunca vai saber o que deu errado quando ele falhar.)

Contudo é inegável que as redes Deep Learning estão resolvendo, ao meu ver, uma demanda reprimida de problemas que já existiam e que os métodos computacionais não conseguiam resolver de forma fácil, como reconhecimento facial, classificação de imagens, tradução, e problemas estruturados como fraude (a Fast.AI está fazendo um ótimo trabalho de clarificar isso).

Em que pese o fato dos pesquisadores de DL terem hardware infinito a preços módicos, o fato brutal é que esse campo de pesquisa durante aproximadamente 30 anos engoliu uma pílula bem amarga de ceticismo proveniente da própria academia: seja em colocar esse método em uma esfera de alto ceticismo levando a sua quase total extinção, ou mesmo com alguns jornals implicitamente não aceitarem trabalhos de DL; enquanto matemáticos estavam ganhando prêmios e tendo um alto nível de visibilidade por causa da acurácia dos seus métodos ao invés de uma pretensa ideia de que o mundo gostava da interpretabilidade de seus métodos.

Duas grandes questões estão em tela que são: 1) Será que os matemáticos e comunidades que estão chocadas com esse fenômeno podem aguentar o mesmo que a comunidade de Redes Neurais aguentou por mais de 30 anos? e 2) E em caso de um Math-Winter, a comunidade matemática consegue suportar uma potencial marginalização de sua pesquisa?

É esperar e ver.

 

Interpretabilidade versus Desempenho: Ceticismo e AI-Winter

Self-Driving Cars no GTA 5 usando Deep Learning

Pra quem acompanha o Python Programming, sabe que sempre quando eles postam algo é que coisa boa vem aí; e dessa vez não foi diferente.

O Harrison está fazendo uma série de posts sobre como jogar GTA V usando Deep Learning com Tensor Flow usando CNN (convolutional neural network).

Este é o primeiro vídeo da série em que ele faz o setup da solução:

 

E essa é a última versão treinada:

Para quem estiver interessado o Harrison deixou uma playlist com todos os estágios do treinamento, e um BOT rodando sozinho em um livestream (vale a pena ver o quão divertido é ver o bot tentando dirigir).

E o código está disponível no Github.

Self-Driving Cars no GTA 5 usando Deep Learning

Melhores papers de Deep Learning de 2012 até 2016

Para estudar com lápis na mão, e café na caneca.

Via Kdnuggets

1. Understanding / Generalization / Transfer

Distilling the knowledge in a neural network (2015), G. Hinton et al. [pdf]

2. Optimization / Training Techniques

Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015), S. Loffe and C. Szegedy [pdf]

3. Unsupervised / Generative Models

Unsupervised representation learning with deep convolutional generative adversarial networks (2015), A. Radford et al. [pdf]

4. Convolutional Neural Network Models

Deep residual learning for image recognition (2016), K. He et al. [pdf]

5. Image: Segmentation / Object Detection

Fast R-CNN (2015), R. Girshick [pdf]

6. Image / Video / Etc.

Show and tell: A neural image caption generator (2015), O. Vinyals et al. [pdf]

7. Natural Language Processing / RNNs

Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014), K. Cho et al. [pdf]

8. Speech / Other Domain

Speech recognition with deep recurrent neural networks (2013), A. Graves [pdf]

9. Reinforcement Learning / Robotics

Human-level control through deep reinforcement learning (2015), V. Mnih et al. [pdf]

10. More Papers from 2016

Domain-adversarial training of neural networks (2016), Y. Ganin et al. [pdf]

Melhores papers de Deep Learning de 2012 até 2016

Extração de Vocais de músicas usando Rede Neural Convolucional

Este trabalho do Ollin Boer Bohan é simplesmente fenomenal. E além de tudo tem o repositório no GitHub.

Extração de Vocais de músicas usando Rede Neural Convolucional

Softmax GAN

Abstract: Softmax GAN is a novel variant of Generative Adversarial Network (GAN). The key idea of Softmax GAN is to replace the classification loss in the original GAN with a softmax cross-entropy loss in the sample space of one single batch. In the adversarial learning of N real training samples and M generated samples, the target of discriminator training is to distribute all the probability mass to the real samples, each with probability 1M, and distribute zero probability to generated data. In the generator training phase, the target is to assign equal probability to all data points in the batch, each with probability 1M+N. While the original GAN is closely related to Noise Contrastive Estimation (NCE), we show that Softmax GAN is the Importance Sampling version of GAN. We futher demonstrate with experiments that this simple change stabilizes GAN training.

Softmax GAN

Deep Learning AMI Amazon Web Services

Para quem quer escalar processamento em Machine Learning e não tem grana para comprar GPUs, o Deep Learning AMI da Amazon é uma ótima alternativa em termos de custos.

The Deep Learning AMI is an Amazon Linux image supported and maintained by Amazon Web Services for use on Amazon Elastic Compute Cloud (Amazon EC2). It is designed to provide a stable, secure, and high performance execution environment for deep learning applications running on Amazon EC2. It includes popular deep learning frameworks, including MXNet, Caffe, Tensorflow, Theano, CNTK and Torch as well as packages that enable easy integration with AWS, including launch configuration tools and many popular AWS libraries and tools. It also includes the Anaconda Data Science Platform for Python2 and Python3. Amazon Web Services provides ongoing security and maintenance updates to all instances running the Amazon Linux AMI. The Deep Learning AMI is provided at no additional charge to Amazon EC2 users.

The AMI Ids for the Deep Learning Amazon Linux AMI are the following:
us-east-1 : ami-e7c96af1
us-west-2: ami-dfb13ebf
eu-west-1: ami-6e5d6808

Release tags/Branches used for the DW Frameworks:
MXNet : v0.9.3 tag
Tensorflow : v1.0.0 tag
Theano : rel-0.8.2 tag
Caffe : rc5 tag
CNTK : v2.0beta12.0 tag
Torch : master branch
Keras : 1.2.2 tag

Deep Learning AMI Amazon Web Services

Falhas na abordagem de Deep Learning: Arquiteturas e Meta-parametrização

O maior desafio corrente enfrentado pela indústria no que diz respeito à Deep Learning está sem sombra de dúvidas na parte computacional em que todo o mercado está absorvendo tanto os serviços de nuvem para realizar cálculos cada vez mais complexos como também bem como investindo em capacidade de computação das GPU.

Entretanto, mesmo com o hardware nos dias de hoje já ser um commodity, a academia está resolvendo um problema que pode revolucionar a forma na qual se faz Deep Learning que é no aspecto arquitetural/parametrização.

Esse comentário da thread diz muito a respeito desse problema em que o usuário diz:

The main problem I see with Deep Learning: too many parameters.

When you have to find the best value for the parameters, that’s a gradient search by itself. The curse of meta-dimensionality.

Ou seja, mesmo com toda a disponibilidade do hardware a questão de saber qual é o melhor arranjo arquitetural de uma rede neural profunda? ainda não está resolvido.

Este paper do Shai Shalev-Shwartz , Ohad Shamir, e Shaked Shammah chamado “Failures of Deep Learning” expõe esse problema de forma bastante rica inclusive com experimentos (este é o repositório no Git).

Os autores colocam que os pontos de falha das redes Deep Learning que são a) falta de métodos baseados em gradiente para otimização de parâmetros, b) problemas estruturais nos algoritmos de Deep Learning na decomposição dos problemas, c) arquitetura e d) saturação das funções de ativação.

Em outras palavras, o que pode estar acontecendo em grande parte das aplicações de Deep Learning é que o tempo de convergência poderia ser muito menor ainda, se estes aspectos já estivessem resolvidos.

Com isso resolvido, grande parte do que conhecemos hoje como indústria de hardware para as redes Deep Learning seria ou sub-utilizada ao extremo (i.e. dado que haverá uma melhora do ponto de vista de otimização arquitetural/algorítmica) ou poderia ser aproveitada para tarefas mais complexas (e.g. como reconhecimento de imagens com baixo número de pixels).

Desta forma mesmo adotando uma metodologia baseada em hardware como a indústria vem fazendo, há ainda muito espaço de otimização em relação às redes Deep Learning do ponto de vista arquitetural e algorítmico.

Abaixo uma lista de referências direto do Stack Exchange para quem quiser se aprofundar mais no assunto:

Algoritmos Neuro-Evolutivos

Aprendizado por Reforço:

Miscelânea:

PS: O WordPress retirou a opção de justificar texto, logo desculpem de antemão a aparência amadora do blog nos próximos dias.

 

Falhas na abordagem de Deep Learning: Arquiteturas e Meta-parametrização

Modularização do Morfismo de Redes Neurais

Quem foi que disse que não podem ocorrer alterações morfológicas nas arquiteturas/topologias de Redes Neurais?

Modularized Morphing of Neural Networks – Tao Wei, Changhu Wang, Chang Wen Chen

Abstract: In this work we study the problem of network morphism, an effective learning scheme to morph a well-trained neural network to a new one with the network function completely preserved. Different from existing work where basic morphing types on the layer level were addressed, we target at the central problem of network morphism at a higher level, i.e., how a convolutional layer can be morphed into an arbitrary module of a neural network. To simplify the representation of a network, we abstract a module as a graph with blobs as vertices and convolutional layers as edges, based on which the morphing process is able to be formulated as a graph transformation problem. Two atomic morphing operations are introduced to compose the graphs, based on which modules are classified into two families, i.e., simple morphable modules and complex modules. We present practical morphing solutions for both of these two families, and prove that any reasonable module can be morphed from a single convolutional layer. Extensive experiments have been conducted based on the state-of-the-art ResNet on benchmark datasets, and the effectiveness of the proposed solution has been verified.

Conclusions: This paper presented a systematic study on the problem of network morphism at a higher level, and tried to answer the central question of such learning scheme, i.e., whether and how a convolutional layer can be morphed into an arbitrary module. To facilitate the study, we abstracted a modular network as a graph, and formulated the process of network morphism as a graph transformation process. Based on this formulation, both simple morphable modules and complex modules have been defined and corresponding morphing algorithms have been proposed. We have shown that a convolutional layer can be morphed into any module of a network. We have also carried out experiments to illustrate how to achieve a better performing model based on the state-of-the-art ResNet with minimal extra computational cost on benchmark datasets.

Modularização do Morfismo de Redes Neurais

Aplicação de Deep Learning para relacionar Pins

Ao que parece, o Pinterest está virando a nova casa de força de Deep Learning aplicada à imagens.

Using deep learning to generate Related Pins

We built Pin2Vec to embed all the Pins in a 128-dimension space. First, we label a Pin with all the other Pins someone has saved in his/her activity session, each as a Pin tuple. Pin tuples are used in supervised training to train the embedding matrix for each of the tens of millions of Pins of the vocabulary. We use TensorFlow as the trainer. At serving time, a set of nearest neighbors are found as Related Pins in the space for each of the Pins.

Training data is collected from recent engagement, such as saving or clicking, and a sliding window is applied. Low quality Pins and those not engaged with are removed from training. Then, each Pin is assigned with a unique Pin ID. Within the sliding window, training pairs are extracted such that the first Pin is the example and each of the following Pins is its label. Figure 3 illustrates an example session and training pairs. In our case, you can imagine each user session is a sentence with Pins as words.

We used a feedforward neural network with a hidden layer of 128 dimensions. Figure 4 shows the architecture. The network is inspired by word2vec. The input vector is a one-hot vector with a size of vocabulary and, in our case, is tens of millions of Pins. The vector is reduced to the 128-dimension vector by multiplying with the hidden layer weight matrix. An eLu activation function is applied after hidden layer. At last the hidden layer output is multiplied with the softmax matrix and a cross-entropy is used to calculate the loss. We sampled 64 negative Pins in loss optimization in lieu of iterating on tens of millions of Pins. We trained the Pin2Vec embedding on machines with 32 cores and 244GB memory.

Aplicação de Deep Learning para relacionar Pins

Akid: Uma biblioteca de Redes Neurais para pesquisa e produção

Finalmente começaram a pensar em eliminar esse vale entre ciência/academia e indústria.

Akid: A Library for Neural Network Research and Production from a Dataism Approach – Shuai Li
Abstract: Neural networks are a revolutionary but immature technique that is fast evolving and heavily relies on data. To benefit from the newest development and newly available data, we want the gap between research and production as small as possibly. On the other hand, differing from traditional machine learning models, neural network is not just yet another statistic model, but a model for the natural processing engine — the brain. In this work, we describe a neural network library named {\texttt akid}. It provides higher level of abstraction for entities (abstracted as blocks) in nature upon the abstraction done on signals (abstracted as tensors) by Tensorflow, characterizing the dataism observation that all entities in nature processes input and emit out in some ways. It includes a full stack of software that provides abstraction to let researchers focus on research instead of implementation, while at the same time the developed program can also be put into production seamlessly in a distributed environment, and be production ready. At the top application stack, it provides out-of-box tools for neural network applications. Lower down, akid provides a programming paradigm that lets user easily build customized models. The distributed computing stack handles the concurrency and communication, thus letting models be trained or deployed to a single GPU, multiple GPUs, or a distributed environment without affecting how a model is specified in the programming paradigm stack. Lastly, the distributed deployment stack handles how the distributed computing is deployed, thus decoupling the research prototype environment with the actual production environment, and is able to dynamically allocate computing resources, so development (Devs) and operations (Ops) could be separated. 

Akid: Uma biblioteca de Redes Neurais para pesquisa e produção

Redes Neurais Coevolucionárias aplicadas na identificação do Mal de Parkinson

Mais um caso de aplicação de Deep Learning em questões médicas.

Convolutional Neural Networks Applied for Parkinson’s Disease Identification

Abstract: Parkinson’s Disease (PD) is a chronic and progressive illness that affects hundreds of thousands of people worldwide. Although it is quite easy to identify someone affected by PD when the illness shows itself (e.g. tremors, slowness of movement and freezing-of-gait), most works have focused on studying the working mechanism of the disease in its very early stages. In such cases, drugs can be administered in order to increase the quality of life of the patients. Since the beginning, it is well-known that PD patients feature the micrography, which is related to muscle rigidity and tremors. As such, most exams to detect Parkinson’s Disease make use of handwritten assessment tools, where the individual is asked to perform some predefined tasks, such as drawing spirals and meanders on a template paper. Later, an expert analyses the drawings in order to classify the progressive of the disease. In this work, we are interested into aiding physicians in such task by means of machine learning techniques, which can learn proper information from digitized versions of the exams, and them recommending a probability of a given individual being affected by PD depending on its handwritten skills. Particularly, we are interested in deep learning techniques (i.e. Convolutional Neural Networks) due to their ability into learning features without human interaction. Additionally, we propose to fine-tune hyper-arameters of such techniques by means of meta-heuristic-based techniques, such as Bat Algorithm, Firefly Algorithm and Particle Swarm Optimization.

Redes Neurais Coevolucionárias aplicadas na identificação do Mal de Parkinson

Para quem quiser saber um pouco mais das evoluções em relação a aplicação de aprendizado por reforço  e Deep Learning em sistemas autônomos, esse paper é uma boa pedida.

Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks

Abstract: We propose an inverse reinforcement learning (IRL) approach using Deep QNetworks to extract the rewards in problems with large state spaces. We evaluate the performance of this approach in a simulation-based autonomous driving scenario. Our results resemble the intuitive relation between the reward function and readings of distance sensors mounted at different poses on the car. We also show that, after a few learning rounds, our simulated agent generates collision-free motions and performs human-like lane change behaviour.

Conclusions: In this paper we proposed using Deep Q-Networks as the refinement step in Inverse Reinforcement Learning approaches. This enabled us to extract the rewards in scenarios with large state spaces such as driving, given expert demonstrations. The aim of this work was to extend the general approach to IRL. Exploring more advanced methods like Maximum Entropy IRL and the support for nonlinear reward functions is currently under investigation.

DeepCancer: Detectando câncer através de expressões genéticas via Deep Learning

Este paper trás uma implementação de Deep Learning que se confirmada pode ser um grande avanço na indústria de diagnósticos para os serviços de saúde, dado que através de aprendizado algorítmico podem ser identificados diversos tipos de genes cancerígenos e isso pode conter duas externalidades positivas que são 1) o barateamento e a rapidez no diagnóstico, e 2) reformulação total da estratégia de combate e prevenção de doenças.

DeepCancer: Detecting Cancer through Gene Expressions via Deep Generative Learning

Abstract: Transcriptional profiling on microarrays to obtain gene expressions has been used to facilitate cancer diagnosis. We propose a deep generative machine learning architecture (called DeepCancer) that learn features from unlabeled microarray data. These models have been used in conjunction with conventional classifiers that perform classification of the tissue samples as either being cancerous or non-cancerous. The proposed model has been tested on two different clinical datasets. The evaluation demonstrates that DeepCancer model achieves a very high precision score, while significantly controlling the false positive and false negative scores.

Conclusions: We presented a deep generative learning model DeepCancer for detection and classification of inflammatory breast cancer and prostate cancer samples. The features are learned through an adversarial feature learning process and then sent as input to a conventional classifier specific to the objective of interest. After modifications through specified hyperparameters, the model performs quite comparatively well on the task tested on two different datasets. The proposed model utilized cDNA microarray gene expressions to gauge its efficacy. Based on deep generative learning, the tuned discriminator and generator models, D and G respectively, learned to differentiate between the gene signatures without any intermediate manual feature handpicking, indicating that much bigger datasets can be experimented on the proposed model more seamlessly. The DeepCloud model will be a vital aid to the medical imaging community and, ultimately, reduce inflammatory breast cancer and prostate cancer mortality.

DeepCancer: Detectando câncer através de expressões genéticas via Deep Learning

Hardware para Machine Learning: Desafios e oportunidades

Um ótimo paper de como o hardware vai exercer função crucial em alguns anos em relação à Core Machine Learning, em especial em sistemas embarcados.

Hardware for Machine Learning: Challenges and Opportunities

Abstract—Machine learning plays a critical role in extracting meaningful information out of the zetabytes of sensor data collected every day. For some applications, the goal is to analyze and understand the data to identify trends (e.g., surveillance, portable/wearable electronics); in other applications, the goal is to take immediate action based the data (e.g., robotics/drones, self-driving cars, smart Internet of Things). For many of these applications, local embedded processing near the sensor is preferred over the cloud due to privacy or latency concerns, or limitations in the communication bandwidth. However, at the sensor there are often stringent constraints on energy consumption and cost in addition to throughput and accuracy requirements. Furthermore, flexibility is often required such that the processing can be adapted for different applications or environments (e.g., update the weights and model in the classifier). In many applications, machine learning often involves transforming the input data into a higher dimensional space, which, along with programmable weights, increases data movement and consequently energy consumption. In this paper, we will discuss how these challenges can be addressed at various levels of hardware design ranging from architecture, hardware-friendly algorithms, mixed-signal circuits, and advanced technologies (including memories and sensors).

Conclusions: Machine learning is an important area of research with many promising applications and opportunities for innovation at various levels of hardware design. During the design process, it is important to balance the accuracy, energy, throughput and cost requirements. Since data movement dominates energy consumption, the primary focus of recent research has been to reduce the data movement while maintaining performance accuracy, throughput and cost. This means selecting architectures with favorable memory hierarchies like a spatial array, and developing dataflows that increase data reuse at the low-cost levels of the memory hierarchy. With joint design of algorithm and hardware, reduced bitwidth precision, increased sparsity and compression are used to minimize the data movement requirements. With mixed-signal circuit design and advanced technologies, computation is moved closer to the source by embedding computation near or within the sensor and the memories. One should also consider the interactions between these different levels. For instance, reducing the bitwidth through hardware-friendly algorithm design enables reduced precision processing with mixed-signal circuits and non-volatile memory. Reducing the cost of memory access with advanced technologies could result in more energy-efficient dataflows.

Hardware para Machine Learning: Desafios e oportunidades