Interpretando a razão de chances

Agora o Matt Bogard do Econometric Sense dá a dica de como interpretar esse número:

From the basic probabilities above, we know that the probability of event Y is greater for males than females. The odds of event Y are also greater for males than females. These relationships are also reflected in the odds ratios. The odds of event Y for males is 3 times the odds of females. The odds of event Y for females are only .33 times the odds of males. In other words, the odds of event Y for males are greater and the odds of event Y for females is less.

This can also be seen from the formula for odds ratios. If the OR M vs F  = odds(M)/odds(F), we can see that if the odds (M) > odds(F), the odds ratio will be greater than 1. Alternatively, for OR  F vs M = odds(F)/odds(M), we can see that if the odds(F) < odds(M) then the ratio will be less than 1.  If the odds for both groups are equal, the odds ratio will be 1 exactly.

RELATION TO LOGISTIC REGRESSION

 Odds ratios can be obtained from logistic regression by exponentiating the coefficient or beta for a given explanatory variable.  For categorical variables, the odds ratios are interpreted as above. For continuous variables, odds ratios are in terms of changes in odds as a result of a one-unit change in the variable.

Interpretando a razão de chances

Falhar na preparação, é se preparar para falhar…

Assunto antigo, mas que deve ser lembrado sempre que possível:

Given this context, it is curious to note that so much of what is published (again, especially on-line; think of titles such as: “The 10 Learning Algorithms Every Data Scientist Must Know”) and so many job listings emphasize- almost to the point of exclusivity- learning algorithms, as opposed to practical questions of data sampling, data preparation and enhancement, variable reduction, solving the business problem (instead of the technical one) or ability to deploy the final product.

 

Falhar na preparação, é se preparar para falhar…

Deep Learning AMI Amazon Web Services

Para quem quer escalar processamento em Machine Learning e não tem grana para comprar GPUs, o Deep Learning AMI da Amazon é uma ótima alternativa em termos de custos.

The Deep Learning AMI is an Amazon Linux image supported and maintained by Amazon Web Services for use on Amazon Elastic Compute Cloud (Amazon EC2). It is designed to provide a stable, secure, and high performance execution environment for deep learning applications running on Amazon EC2. It includes popular deep learning frameworks, including MXNet, Caffe, Tensorflow, Theano, CNTK and Torch as well as packages that enable easy integration with AWS, including launch configuration tools and many popular AWS libraries and tools. It also includes the Anaconda Data Science Platform for Python2 and Python3. Amazon Web Services provides ongoing security and maintenance updates to all instances running the Amazon Linux AMI. The Deep Learning AMI is provided at no additional charge to Amazon EC2 users.

The AMI Ids for the Deep Learning Amazon Linux AMI are the following:
us-east-1 : ami-e7c96af1
us-west-2: ami-dfb13ebf
eu-west-1: ami-6e5d6808

Release tags/Branches used for the DW Frameworks:
MXNet : v0.9.3 tag
Tensorflow : v1.0.0 tag
Theano : rel-0.8.2 tag
Caffe : rc5 tag
CNTK : v2.0beta12.0 tag
Torch : master branch
Keras : 1.2.2 tag

Deep Learning AMI Amazon Web Services

Ferramenta para Machine Learning – MLJAR

Para quem busca uma alternativa paga para Machine Learning em ambientes fora da própria infraestrutura o MLJAR pode ser a resposta.

WHAT IS MLJAR?

MLJAR is a human-first platform for machine learning.
It provides a service for prototyping, development and deploying pattern recognition algorithms.
It makes algorithm search and tuning painless!

HOW IT WORKS?

You pay for computational time used for models training, predictions and data analysis. 1 credit is 1 computation hour on machine with 8 CPU and 15GB RAM. Computational time is aggregated per second basis.

Ferramenta para Machine Learning – MLJAR

Falhas na abordagem de Deep Learning: Arquiteturas e Meta-parametrização

O maior desafio corrente enfrentado pela indústria no que diz respeito à Deep Learning está sem sombra de dúvidas na parte computacional em que todo o mercado está absorvendo tanto os serviços de nuvem para realizar cálculos cada vez mais complexos como também bem como investindo em capacidade de computação das GPU.

Entretanto, mesmo com o hardware nos dias de hoje já ser um commodity, a academia está resolvendo um problema que pode revolucionar a forma na qual se faz Deep Learning que é no aspecto arquitetural/parametrização.

Esse comentário da thread diz muito a respeito desse problema em que o usuário diz:

The main problem I see with Deep Learning: too many parameters.

When you have to find the best value for the parameters, that’s a gradient search by itself. The curse of meta-dimensionality.

Ou seja, mesmo com toda a disponibilidade do hardware a questão de saber qual é o melhor arranjo arquitetural de uma rede neural profunda? ainda não está resolvido.

Este paper do Shai Shalev-Shwartz , Ohad Shamir, e Shaked Shammah chamado “Failures of Deep Learning” expõe esse problema de forma bastante rica inclusive com experimentos (este é o repositório no Git).

Os autores colocam que os pontos de falha das redes Deep Learning que são a) falta de métodos baseados em gradiente para otimização de parâmetros, b) problemas estruturais nos algoritmos de Deep Learning na decomposição dos problemas, c) arquitetura e d) saturação das funções de ativação.

Em outras palavras, o que pode estar acontecendo em grande parte das aplicações de Deep Learning é que o tempo de convergência poderia ser muito menor ainda, se estes aspectos já estivessem resolvidos.

Com isso resolvido, grande parte do que conhecemos hoje como indústria de hardware para as redes Deep Learning seria ou sub-utilizada ao extremo (i.e. dado que haverá uma melhora do ponto de vista de otimização arquitetural/algorítmica) ou poderia ser aproveitada para tarefas mais complexas (e.g. como reconhecimento de imagens com baixo número de pixels).

Desta forma mesmo adotando uma metodologia baseada em hardware como a indústria vem fazendo, há ainda muito espaço de otimização em relação às redes Deep Learning do ponto de vista arquitetural e algorítmico.

Abaixo uma lista de referências direto do Stack Exchange para quem quiser se aprofundar mais no assunto:

Algoritmos Neuro-Evolutivos

Aprendizado por Reforço:

Miscelânea:

PS: O WordPress retirou a opção de justificar texto, logo desculpem de antemão a aparência amadora do blog nos próximos dias.

 

Falhas na abordagem de Deep Learning: Arquiteturas e Meta-parametrização

Além do aprendizado ativo em Sistemas de Recomendação de domínio cruzado

Um dos problemas mais comuns em Sistemas de Recomendação é o famoso Cold Start (i.e. quando não há conhecimento prévio sobre os gostos de alguém que acaba de entrar na plataforma).

Esse paper trás uma perspectiva interessante sobre o assunto.

Toward Active Learning in Cross-domain Recommender Systems – Roberto Pagano, Massimo Quadrana, Mehdi Elahi, Paolo Cremonesi

Abstract: One of the main challenges in Recommender Systems (RSs) is the New User problem which happens when the system has to generate personalised recommendations for a new user whom the system has no information about. Active Learning tries to solve this problem by acquiring user preference data with the maximum quality, and with the minimum acquisition cost. Although there are variety of works in active learning for RSs research area, almost all of them have focused only on the single-domain recommendation scenario. However, several real-world RSs operate in the cross-domain scenario, where the system generates recommendations in the target domain by exploiting user preferences in both the target and auxiliary domains. In such a scenario, the performance of active learning strategies can be significantly influenced and typical active learning strategies may fail to perform properly. In this paper, we address this limitation, by evaluating active learning strategies in a novel evaluation framework, explicitly suited for the cross-domain recommendation scenario. We show that having access to the preferences of the users in the auxiliary domain may have a huge impact on the performance of active learning strategies w.r.t. the classical, single-domain scenario.

Conclusions: In this paper, we have evaluated several widely used active learning strategies adopted to tackle the cold-start problem in a novel usage scenario, i.e., Cross-domain recommendation scenario. In such a case, the user preferences are available not only in the target domain, but also in additional auxiliary domain. Hence, the active learner can exploit such knowledge to better estimate which preferences are more valuable for the system to acquire. Our results have shown that the performance of the considered active learning strategies significantly change in the cross-domain recommendation scenario in comparison to the single-domain recommendation. Hence, the presence of the auxiliary domain may strongly influence the performance of the active learning strategies. Indeed, while a certain active learning strategy performs the best for MAE reduction in the single scenario (i.e., highest-predicted strategy), it actually performs poor in the cross-domain scenario. On the other hand, the strategy with the worst MAE in single-domain scenario (i.e., lowest-predicted strategy) can perform excellent in the cross-domain scenario. This is an interesting observation which indicates the importance of further analysis of these two scenarios in order to better design and develop active learning strategies for them. Our future work includes the further analysis of the AL strategies in other domains such as book, electronic products, tourism, etc. Moreover, we plan to investigate the potential impact of considering different rating prediction models (e.g., context-aware models) on the performance of different active learning strategies.

Além do aprendizado ativo em Sistemas de Recomendação de domínio cruzado

Ética Estóica para agentes artificiais

E não é que adaptaram a filosofia estóica para a Inteligência Artificial?

Stoic Ethics for Artificial Agents – Gabriel Murray

Abstract: We present a position paper advocating the notion that Stoic philosophy and ethics can inform the development of ethical A.I. systems. This is in sharp contrast to most work on building ethical A.I., which has focused on Utilitarian or Deontological ethical theories. We relate ethical A.I. to several core Stoic notions, including the dichotomy of control, the four cardinal virtues, the ideal Sage, Stoic practices, and Stoic perspectives on emotion or affect. More generally, we put forward an ethical view of A.I. that focuses more on internal states of the artificial agent rather than on external actions of the agent. We provide examples relating to near-term A.I. systems as well as hypothetical superintelligent agents.

Conclusions: In this position paper, we have attempted to show how Stoic ethics could be applied to the development of ethical A.I. systems. We argued that internal states matter for ethical A.I. agents, and that internal states can be analyzed by describing the four cardinal Stoic virtues in terms of characteristics of an intelligent system. We also briefly described other Stoic practices and how they could be realized by an A.I. agent. We gave a brief sketch of how to start developing Stoic A.I. systems by creating approval-directed agents with Stoic overseers, and/or by employing a syncretic paramedic ethics algorithm with a step featuring Stoic constraints. While it can be beneficial to analyze the ethics of an A.I. agent from several different perspectives, including consequentialist perspectives, we have argued for the importance of also conducting a Stoic ethical analysis of A.I. agents, where the agent’s internal states are analyzed, and moral judgments are not based on consequences outside of the agent’s control.

Ética Estóica para agentes artificiais

MEBoost – Novo método para seleção de variáveis

Um dos campos bem pouco explorados em termos acadêmicos é sem sombra de dúvidas a parte de seleção de variáveis. Esse paper trás um pouco de luz sobre esse assunto tão importante e que drena parte do tempo produtivo de Data Scientists.

MEBoost: Variable Selection in the Presence of Measurement Error – Benjamin Brown, Timothy Weaver, Julian Wolfson

Abstract:  We present a novel method for variable selection in regression models when covariates are measured with error. The iterative algorithm we propose, MEBoost, follows a path defined by estimating equations that correct for covariate measurement error. Via simulation, we evaluated our method and compare its performance to the recently-proposed Convex Conditioned Lasso (CoCoLasso) and to the “naive” Lasso which does not correct for measurement error. Increasing the degree of measurement error increased prediction error and decreased the probability of accurate covariate selection, but this loss of accuracy was least pronounced when using MEBoost. We illustrate the use of MEBoost in practice by analyzing data from the Box Lunch Study, a clinical trial in nutrition where several variables are based on self-report and hence measured with error.

Conclusions: We examined the variable selection problem in regression when the number of potential covariates is large compared to the sample size and when these potential covariates are measured with measurement error. We proposed MEBoost, a computationally simple descent-based approach which follows a path determined by measurement error-corrected estimating equations. We compared MEBoost, via simulation and in a real data example, with the recently-proposed Convex Conditioned Lasso (CoCoLasso) as well as the naive Lasso which assumes that covariates are measured without error. In almost all simulation scenarios, MEBoost performed best in terms of prediction error and coefficient bias. The CoCoLasso is more conservative with the highest specificity in each case, but sensitivity and prediction are better with MEBoost. In the comparison of selection paths, we saw that MEBoost was more aggressive in identifying variables to be included in the model more quickly than the CoCoLasso. These differences were most apparent when the measurement error had a larger variance and a more complex correlation structure. In addition, MEBoost was 7 times faster than the CoCoLasso. One application of MEBoost took 0.04 seconds versus 0.28 seconds for the CoCoLasso. MEBoost, while a promising approach, has some limitations. One limitation–which is shared with many methods that correct for measurement error–is that we assume that the covariance matrix of the measurement error process is known, an assumption which in many settings may be unrealistic. In some cases, it may be possible to estimate these structures using external data sources, but absent such data one could perform a sensitivity analysis with different measurement error variances and correlation structures, as we demonstrate in the real data application. Another challenging aspect of model selection with error-prone covariates is that, even if the set of candidate models is generated via a technique which accounts for measurement error, the process of selecting a final model (e.g., via cross-validation) still uses covariates that are measured with error. However, we showed in our simulation study that MEBoost performs well in selecting a model which recovers the relationship between the true (error-free) covariates and the outcome, even when using error-prone covariates to select the final model. This finding suggests that the procedure for generating a “path” of candidate models has a greater influence on prediction error and variable selection accuracy than the procedure picking a final model from among those candidates. To conclude, we note that while we only considered linear and Poisson regression in this paper, MEBoost can easily be applied to other regression models by, e.g., using the estimating equations presented by Nakamura (1990) or others which correct for measurement error. In contrast, the approaches of Sørensen et al. (2012) and Datta and Zou (2017) exploit the structure of the linear regression model and it is not obvious how they could be extended to the broader family of generalized linear models. The robustness and simplicity of MEBoost, along with its strong performance against other methods in the linear model case suggests that this novel method is a reliable way to deal with variable selection in the presence of measurement error.

MEBoost – Novo método para seleção de variáveis

Regressão com instâncias corrompidas: Uma abordagem robusta e suas aplicações

Trabalho interessante.

Multivariate Regression with Grossly Corrupted Observations: A Robust Approach and its Applications – Xiaowei Zhang, Chi Xu, Yu Zhang, Tingshao Zhu, Li Cheng

Abstract: This paper studies the problem of multivariate linear regression where a portion of the observations is grossly corrupted or is missing, and the magnitudes and locations of such occurrences are unknown in priori. To deal with this problem, we propose a new approach by explicitly consider the error source as well as its sparseness nature. An interesting property of our approach lies in its ability of allowing individual regression output elements or tasks to possess their unique noise levels. Moreover, despite working with a non-smooth optimization problem, our approach still guarantees to converge to its optimal solution. Experiments on synthetic data demonstrate the competitiveness of our approach compared with existing multivariate regression models. In addition, empirically our approach has been validated with very promising results on two exemplar real-world applications: The first concerns the prediction of \textit{Big-Five} personality based on user behaviors at social network sites (SNSs), while the second is 3D human hand pose estimation from depth images. The implementation of our approach and comparison methods as well as the involved datasets are made publicly available in support of the open-source and reproducible research initiatives.

Conclusions: We consider a new approach dedicating to the multivariate regression problem where some output labels are either corrupted or missing. The gross error is explicitly addressed in our model, while it allows the adaptation of distinct regression elements or tasks according to their own noise levels. We further propose and analyze the convergence and runtime properties of the proposed proximal ADMM algorithm which is globally convergent and efficient. The model combined with the specifically designed solver enable our approach to tackle a diverse range of applications. This is practically demonstrated on two distinct applications, that is, to predict personalities based on behaviors at SNSs, as well as to estimation 3D hand pose from single depth images. Empirical experiments on synthetic and real datasets have showcased the applicability of our approach in the presence of label noises. For future work, we plan to integrate with more advanced deep learning techniques to better address more practical problems, including 3D hand pose estimation and beyond.

Regressão com instâncias corrompidas: Uma abordagem robusta e suas aplicações

Feature Screening in Large Scale Cluster Analysis

Mais trabalhos sobre clustering.

Feature Screening in Large Scale Cluster Analysis – Trambak Banerjee, Gourab Mukherjee, Peter Radchenko

Abstract: We propose a novel methodology for feature screening in clustering massive datasets, in which both the number of features and the number of observations can potentially be very large. Taking advantage of a fusion penalization based convex clustering criterion, we propose a very fast screening procedure that efficiently discards non-informative features by first computing a clustering score corresponding to the clustering tree constructed for each feature, and then thresholding the resulting values. We provide theoretical support for our approach by establishing uniform non-asymptotic bounds on the clustering scores of the “noise” features. These bounds imply perfect screening of non-informative features with high probability and are derived via careful analysis of the empirical processes corresponding to the clustering trees that are constructed for each of the features by the associated clustering procedure. Through extensive simulation experiments we compare the performance of our proposed method with other screening approaches, popularly used in cluster analysis, and obtain encouraging results. We demonstrate empirically that our method is applicable to cluster analysis of big datasets arising in single-cell gene expression studies.

Conclusions: We propose COSCI, a novel feature screening method for large scale cluster analysis problems that are characterized by both large sample sizes and high dimensionality of the observations. COSCI efficiently ranks the candidate features in a non-parametric fashion and, under mild regularity conditions, is robust to the distributional form of the true noise coordinates. We establish theoretical results supporting ideal feature screening properties of our proposed procedure and provide a data driven approach for selecting the screening threshold parameter. Extensive simulation experiments and real data studies demonstrate encouraging performance of our proposed approach. An interesting topic for future research is extending our marginal screening method by means of utilizing multivariate objective criteria, which are more potent in detecting multivariate cluster information among marginally unimodal features. Preliminary analysis of the corresponding `2 fusion penalty based criterion, which, unlike the `1 based approach used in this paper, is non-separable across dimensions, suggests that this criterion can provide a way to move beyond marginal screening.

Feature Screening in Large Scale Cluster Analysis

Deterministic quantum annealing expectation-maximization (DQAEM)

Apesar do nome bem complicado o paper fala de uma modificação do mecanismo do algoritmo de cluster Expectation-Maximization (EM) em que o mesmo tem o incremento de uma meta-heurísica similar ao Simulated Annealing (arrefecimento simulado) para eliminar duas deficiências do EM que é de depender muito dos dados de início (atribuições iniciais) e o fato de que as vezes há problemas de mínimos locais.

Relaxation of the EM Algorithm via Quantum Annealing for Gaussian Mixture Models

Abstract: We propose a modified expectation-maximization algorithm by introducing the concept of quantum annealing, which we call the deterministic quantum annealing expectation-maximization (DQAEM) algorithm. The expectation-maximization (EM) algorithm is an established algorithm to compute maximum likelihood estimates and applied to many practical applications. However, it is known that EM heavily depends on initial values and its estimates are sometimes trapped by local optima. To solve such a problem, quantum annealing (QA) was proposed as a novel optimization approach motivated by quantum mechanics. By employing QA, we then formulate DQAEM and present a theorem that supports its stability. Finally, we demonstrate numerical simulations to confirm its efficiency.

Conclusion: In this paper, we have proposed the deterministic quantum annealing expectation-maximization (DQAEM) algorithm for Gaussian mixture models (GMMs) to relax the problem of local optima of the expectation-maximization (EM) algorithm by introducing the mechanism of quantum fluctuations into EM. Although we have limited our attention to GMMs in this paper to simplify the discussion, the derivation presented in this paper can be straightforwardly applied to any models which have discrete latent variables. After formulating DQAEM, we have presented the theorem that guarantees its convergence. We then have given numerical simulations to show its efficiency compared to EM and DSAEM. It is expect that the combination of DQAEM and DSAEM gives better performance than DQAEM. Finally, one of our future works is a Bayesian extension of this work. In other words, we are going to propose a deterministic quantum annealing variational Bayes inference.

Deterministic quantum annealing expectation-maximization (DQAEM)

K-Means distribuído sobre dados binários comprimidos

E quem disse que o K-Means estava morto hein?

Distributed K-means over Compressed Binary Data

Abstract—We consider a network of binary-valued sensors with a fusion center. The fusion center has to perform K-means clustering on the binary data transmitted by the sensors. In order to reduce the amount of data transmitted within the network, the sensors compress their data with a source coding scheme based on LDPC codes. We propose to apply the K-means algorithm directly over the compressed data without reconstructing the original sensors measurements, in order to avoid potentially complex decoding operations. We provide approximated expressions of the error probabilities of the K-means steps in the compressed domain. From these expressions, we show that applying the Kmeans algorithm in the compressed domain enables to recover the clusters of the original domain. Monte Carlo simulations illustrate the accuracy of the obtained approximated error probabilities, and show that the coding rate needed to perform K-means clustering in the compressed domain is lower than the rate needed to reconstruct all the measurements.

Conclusion: In this paper, we considered a network of sensors which transmit their compressed binary measurements to a fusion center. We proposed to apply the K-means algorithm directly over the compressed data, without reconstructing the sensor measurements. From a theoretical analysis and Monte Carlo simulations, we showed the efficiency of applying K-means in the compressed domain. We also showed that the rate needed to perform K-means on the compressed vectors is lower than the rate needed to reconstruct all the measurements.

K-Means distribuído sobre dados binários comprimidos

Modularização do Morfismo de Redes Neurais

Quem foi que disse que não podem ocorrer alterações morfológicas nas arquiteturas/topologias de Redes Neurais?

Modularized Morphing of Neural Networks – Tao Wei, Changhu Wang, Chang Wen Chen

Abstract: In this work we study the problem of network morphism, an effective learning scheme to morph a well-trained neural network to a new one with the network function completely preserved. Different from existing work where basic morphing types on the layer level were addressed, we target at the central problem of network morphism at a higher level, i.e., how a convolutional layer can be morphed into an arbitrary module of a neural network. To simplify the representation of a network, we abstract a module as a graph with blobs as vertices and convolutional layers as edges, based on which the morphing process is able to be formulated as a graph transformation problem. Two atomic morphing operations are introduced to compose the graphs, based on which modules are classified into two families, i.e., simple morphable modules and complex modules. We present practical morphing solutions for both of these two families, and prove that any reasonable module can be morphed from a single convolutional layer. Extensive experiments have been conducted based on the state-of-the-art ResNet on benchmark datasets, and the effectiveness of the proposed solution has been verified.

Conclusions: This paper presented a systematic study on the problem of network morphism at a higher level, and tried to answer the central question of such learning scheme, i.e., whether and how a convolutional layer can be morphed into an arbitrary module. To facilitate the study, we abstracted a modular network as a graph, and formulated the process of network morphism as a graph transformation process. Based on this formulation, both simple morphable modules and complex modules have been defined and corresponding morphing algorithms have been proposed. We have shown that a convolutional layer can be morphed into any module of a network. We have also carried out experiments to illustrate how to achieve a better performing model based on the state-of-the-art ResNet with minimal extra computational cost on benchmark datasets.

Modularização do Morfismo de Redes Neurais

Aplicação de Deep Learning para relacionar Pins

Ao que parece, o Pinterest está virando a nova casa de força de Deep Learning aplicada à imagens.

Using deep learning to generate Related Pins

We built Pin2Vec to embed all the Pins in a 128-dimension space. First, we label a Pin with all the other Pins someone has saved in his/her activity session, each as a Pin tuple. Pin tuples are used in supervised training to train the embedding matrix for each of the tens of millions of Pins of the vocabulary. We use TensorFlow as the trainer. At serving time, a set of nearest neighbors are found as Related Pins in the space for each of the Pins.

Training data is collected from recent engagement, such as saving or clicking, and a sliding window is applied. Low quality Pins and those not engaged with are removed from training. Then, each Pin is assigned with a unique Pin ID. Within the sliding window, training pairs are extracted such that the first Pin is the example and each of the following Pins is its label. Figure 3 illustrates an example session and training pairs. In our case, you can imagine each user session is a sentence with Pins as words.

We used a feedforward neural network with a hidden layer of 128 dimensions. Figure 4 shows the architecture. The network is inspired by word2vec. The input vector is a one-hot vector with a size of vocabulary and, in our case, is tens of millions of Pins. The vector is reduced to the 128-dimension vector by multiplying with the hidden layer weight matrix. An eLu activation function is applied after hidden layer. At last the hidden layer output is multiplied with the softmax matrix and a cross-entropy is used to calculate the loss. We sampled 64 negative Pins in loss optimization in lieu of iterating on tens of millions of Pins. We trained the Pin2Vec embedding on machines with 32 cores and 244GB memory.

Aplicação de Deep Learning para relacionar Pins

Akid: Uma biblioteca de Redes Neurais para pesquisa e produção

Finalmente começaram a pensar em eliminar esse vale entre ciência/academia e indústria.

Akid: A Library for Neural Network Research and Production from a Dataism Approach – Shuai Li
Abstract: Neural networks are a revolutionary but immature technique that is fast evolving and heavily relies on data. To benefit from the newest development and newly available data, we want the gap between research and production as small as possibly. On the other hand, differing from traditional machine learning models, neural network is not just yet another statistic model, but a model for the natural processing engine — the brain. In this work, we describe a neural network library named {\texttt akid}. It provides higher level of abstraction for entities (abstracted as blocks) in nature upon the abstraction done on signals (abstracted as tensors) by Tensorflow, characterizing the dataism observation that all entities in nature processes input and emit out in some ways. It includes a full stack of software that provides abstraction to let researchers focus on research instead of implementation, while at the same time the developed program can also be put into production seamlessly in a distributed environment, and be production ready. At the top application stack, it provides out-of-box tools for neural network applications. Lower down, akid provides a programming paradigm that lets user easily build customized models. The distributed computing stack handles the concurrency and communication, thus letting models be trained or deployed to a single GPU, multiple GPUs, or a distributed environment without affecting how a model is specified in the programming paradigm stack. Lastly, the distributed deployment stack handles how the distributed computing is deployed, thus decoupling the research prototype environment with the actual production environment, and is able to dynamically allocate computing resources, so development (Devs) and operations (Ops) could be separated. 

Akid: Uma biblioteca de Redes Neurais para pesquisa e produção

Tuning via hiper-parametrização para Máquinas de Vetor de Suporte (Support Vector Machines) por estimação de distribuição de algoritmos

Em épocas de Deep Learning, é sempre bom ver um paper com as boas e velhas Máquinas de Vetor de Suporte (Support Vector Machines). Em breve teremos um post sobre essa técnica aqui no blog.

Hyper-Parameter Tuning for Support Vector Machines by Estimation of Distribution Algorithms

Abstract: Hyper-parameter tuning for support vector machines has been widely studied in the past decade. A variety of metaheuristics, such as Genetic Algorithms and Particle Swarm Optimization have been considered to accomplish this task. Notably, exhaustive strategies such as Grid Search or Random Search continue to be implemented for hyper-parameter tuning and have recently shown results comparable to sophisticated metaheuristics. The main reason for the success of exhaustive techniques is due to the fact that only two or three parameters need to be adjusted when working with support vector machines. In this chapter, we analyze two Estimation Distribution Algorithms, the Univariate Marginal Distribution Algorithm and the Boltzmann Univariate Marginal Distribution Algorithm, to verify if these algorithms preserve the effectiveness of Random Search and at the same time make more efficient the process of finding the optimal hyper-parameters without increasing the complexity of Random Search.

Tuning via hiper-parametrização para Máquinas de Vetor de Suporte (Support Vector Machines) por estimação de distribuição de algoritmos

Redes Neurais Coevolucionárias aplicadas na identificação do Mal de Parkinson

Mais um caso de aplicação de Deep Learning em questões médicas.

Convolutional Neural Networks Applied for Parkinson’s Disease Identification

Abstract: Parkinson’s Disease (PD) is a chronic and progressive illness that affects hundreds of thousands of people worldwide. Although it is quite easy to identify someone affected by PD when the illness shows itself (e.g. tremors, slowness of movement and freezing-of-gait), most works have focused on studying the working mechanism of the disease in its very early stages. In such cases, drugs can be administered in order to increase the quality of life of the patients. Since the beginning, it is well-known that PD patients feature the micrography, which is related to muscle rigidity and tremors. As such, most exams to detect Parkinson’s Disease make use of handwritten assessment tools, where the individual is asked to perform some predefined tasks, such as drawing spirals and meanders on a template paper. Later, an expert analyses the drawings in order to classify the progressive of the disease. In this work, we are interested into aiding physicians in such task by means of machine learning techniques, which can learn proper information from digitized versions of the exams, and them recommending a probability of a given individual being affected by PD depending on its handwritten skills. Particularly, we are interested in deep learning techniques (i.e. Convolutional Neural Networks) due to their ability into learning features without human interaction. Additionally, we propose to fine-tune hyper-arameters of such techniques by means of meta-heuristic-based techniques, such as Bat Algorithm, Firefly Algorithm and Particle Swarm Optimization.

Redes Neurais Coevolucionárias aplicadas na identificação do Mal de Parkinson

Para quem quiser saber um pouco mais das evoluções em relação a aplicação de aprendizado por reforço  e Deep Learning em sistemas autônomos, esse paper é uma boa pedida.

Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks

Abstract: We propose an inverse reinforcement learning (IRL) approach using Deep QNetworks to extract the rewards in problems with large state spaces. We evaluate the performance of this approach in a simulation-based autonomous driving scenario. Our results resemble the intuitive relation between the reward function and readings of distance sensors mounted at different poses on the car. We also show that, after a few learning rounds, our simulated agent generates collision-free motions and performs human-like lane change behaviour.

Conclusions: In this paper we proposed using Deep Q-Networks as the refinement step in Inverse Reinforcement Learning approaches. This enabled us to extract the rewards in scenarios with large state spaces such as driving, given expert demonstrations. The aim of this work was to extend the general approach to IRL. Exploring more advanced methods like Maximum Entropy IRL and the support for nonlinear reward functions is currently under investigation.

DeepCancer: Detectando câncer através de expressões genéticas via Deep Learning

Este paper trás uma implementação de Deep Learning que se confirmada pode ser um grande avanço na indústria de diagnósticos para os serviços de saúde, dado que através de aprendizado algorítmico podem ser identificados diversos tipos de genes cancerígenos e isso pode conter duas externalidades positivas que são 1) o barateamento e a rapidez no diagnóstico, e 2) reformulação total da estratégia de combate e prevenção de doenças.

DeepCancer: Detecting Cancer through Gene Expressions via Deep Generative Learning

Abstract: Transcriptional profiling on microarrays to obtain gene expressions has been used to facilitate cancer diagnosis. We propose a deep generative machine learning architecture (called DeepCancer) that learn features from unlabeled microarray data. These models have been used in conjunction with conventional classifiers that perform classification of the tissue samples as either being cancerous or non-cancerous. The proposed model has been tested on two different clinical datasets. The evaluation demonstrates that DeepCancer model achieves a very high precision score, while significantly controlling the false positive and false negative scores.

Conclusions: We presented a deep generative learning model DeepCancer for detection and classification of inflammatory breast cancer and prostate cancer samples. The features are learned through an adversarial feature learning process and then sent as input to a conventional classifier specific to the objective of interest. After modifications through specified hyperparameters, the model performs quite comparatively well on the task tested on two different datasets. The proposed model utilized cDNA microarray gene expressions to gauge its efficacy. Based on deep generative learning, the tuned discriminator and generator models, D and G respectively, learned to differentiate between the gene signatures without any intermediate manual feature handpicking, indicating that much bigger datasets can be experimented on the proposed model more seamlessly. The DeepCloud model will be a vital aid to the medical imaging community and, ultimately, reduce inflammatory breast cancer and prostate cancer mortality.

DeepCancer: Detectando câncer através de expressões genéticas via Deep Learning