# Interpretando a razão de chances

Agora o Matt Bogard do Econometric Sense dá a dica de como interpretar esse número:

*From the basic probabilities above, we know that the probability of event Y is greater for males than females. The odds of event Y are also greater for males than females. These relationships are also reflected in the odds ratios. The odds of event Y for males is 3 times the odds of females. The odds of event Y for females are only .33 times the odds of males. In other words, the odds of event Y for males are greater and the odds of event Y for females is less.*

*This can also be seen from the formula for odds ratios. If the OR M vs F = odds(M)/odds(F), we can see that if the odds (M) > odds(F), the odds ratio will be greater than 1. Alternatively, for OR F vs M = odds(F)/odds(M), we can see that if the odds(F) < odds(M) then the ratio will be less than 1. If the odds for both groups are equal, the odds ratio will be 1 exactly.*

**RELATION TO LOGISTIC REGRESSION**

** **Odds ratios can be obtained from logistic regression by exponentiating the coefficient or beta for a given explanatory variable. For categorical variables, the odds ratios are interpreted as above. For continuous variables, odds ratios are in terms of changes in odds as a result of a one-unit change in the variable.

# Falhar na preparação, é se preparar para falhar…

Assunto antigo, mas que deve ser lembrado sempre que possível:

*Given this context, it is curious to note that so much of what is published (again, especially on-line; think of titles such as: “The 10 Learning Algorithms Every Data Scientist Must Know”) and so many job listings emphasize- almost to the point of exclusivity- learning algorithms, as opposed to practical questions of data sampling, data preparation and enhancement, variable reduction, solving the business problem (instead of the technical one) or ability to deploy the final product.*

# Deep Learning AMI Amazon Web Services

*The Deep Learning AMI is an Amazon Linux image supported and maintained by Amazon Web Services for use on Amazon Elastic Compute Cloud (Amazon EC2). It is designed to provide a stable, secure, and high performance execution environment for deep learning applications running on Amazon EC2. It includes popular deep learning frameworks, including MXNet, Caffe, Tensorflow, Theano, CNTK and Torch as well as packages that enable easy integration with AWS, including launch configuration tools and many popular AWS libraries and tools. It also includes the Anaconda Data Science Platform for Python2 and Python3. Amazon Web Services provides ongoing security and maintenance updates to all instances running the Amazon Linux AMI. The Deep Learning AMI is provided at no additional charge to Amazon EC2 users.*

*The AMI Ids for the Deep Learning Amazon Linux AMI are the following:*

*us-east-1 : ami-e7c96af1*

*us-west-2: ami-dfb13ebf*

*eu-west-1: ami-6e5d6808 *

*Release tags/Branches used for the DW Frameworks:*

*MXNet : v0.9.3 tag*

*Tensorflow : v1.0.0 tag*

*Theano : rel-0.8.2 tag*

*Caffe : rc5 tag*

*CNTK : v2.0beta12.0 tag*

*Torch : master branch*

*Keras : 1.2.2 tag*

# Ferramenta para Machine Learning – MLJAR

**WHAT IS MLJAR?**

*MLJAR is a human-first platform for machine learning.*

*It provides a service for prototyping, development and deploying pattern recognition algorithms.*

*It makes algorithm search and tuning painless!*

**HOW IT WORKS?**

*You pay for computational time used for models training, predictions and data analysis. 1 credit is 1 computation hour on machine with 8 CPU and 15GB RAM. Computational time is aggregated per second basis.*

# Falhas na abordagem de Deep Learning: Arquiteturas e Meta-parametrização

O maior desafio corrente enfrentado pela indústria no que diz respeito à Deep Learning está sem sombra de dúvidas na parte computacional em que todo o mercado está absorvendo tanto os serviços de nuvem para realizar cálculos cada vez mais complexos como também bem como investindo em capacidade de computação das GPU.

Entretanto, mesmo com o hardware nos dias de hoje já ser um *commodity*, a academia está resolvendo um problema que pode revolucionar a forma na qual se faz Deep Learning que é no **aspecto arquitetural/parametrização**.

Esse comentário da thread diz muito a respeito desse problema em que o usuário diz:

“*The main problem I see with Deep Learning: too many parameters.*

*When you have to find the best value for the parameters, that’s a gradient search by itself. The curse of meta-dimensionality.*“

Ou seja, mesmo com toda a disponibilidade do hardware a questão de saber * qual é o melhor arranjo arquitetural de uma rede neural profunda?* ainda não está resolvido.

Este paper do Shai Shalev-Shwartz , Ohad Shamir, e Shaked Shammah chamado “*Failures of Deep Learning*” expõe esse problema de forma bastante rica inclusive com experimentos (este é o repositório no Git).

Os autores colocam que os pontos de falha das redes Deep Learning que são a) f*alta de métodos baseados em gradiente para otimização de parâmetros*, b) *problemas estruturais nos algoritmos de Deep Learning na decomposição dos problemas*, c) *arquitetura* e d) *saturação das funções de ativação*.

Em outras palavras, o que pode estar acontecendo em grande parte das aplicações de Deep Learning é que o tempo de convergência poderia ser muito menor ainda, se estes aspectos já estivessem resolvidos.

Com isso resolvido, grande parte do que conhecemos hoje como indústria de hardware para as redes Deep Learning seria ou sub-utilizada ao extremo (*i.e.* dado que haverá uma melhora do ponto de vista de otimização arquitetural/algorítmica) ou poderia ser aproveitada para tarefas mais complexas (*e.g.* como reconhecimento de imagens com baixo número de pixels).

Desta forma mesmo adotando uma metodologia baseada em hardware como a indústria vem fazendo, há ainda muito espaço de otimização em relação às redes Deep Learning do ponto de vista arquitetural e algorítmico.

Abaixo uma lista de referências direto do Stack Exchange para quem quiser se aprofundar mais no assunto:

Algoritmos Neuro-Evolutivos

- Zaremba, Wojciech. Ilya Sutskever. Rafal Jozefowicz “An empirical exploration of recurrent network architectures.” (2015): used evolutionary computation to find optimal RNN structures.
- Franck Dernoncourt. “The medial Reticular Formation: a neural substrate for action selection? An evaluation via evolutionary computation.“. Master’s Thesis. École Normale Supérieure Ulm. 2011.
- Bayer, Justin, Daan Wierstra, Julian Togelius, and Jürgen Schmidhuber. “Evolving memory cell structures for sequence learning.” In International Conference on Artificial Neural Networks, pp. 755-764. Springer Berlin Heidelberg, 2009.: used evolutionary computation to find optimal RNN structures.

Aprendizado por Reforço:

- Jose M Alvarez, Mathieu Salzmann. Learning the Number of Neurons in Deep Networks. NIPS 2016. https://arxiv.org/abs/1611.06321
- Bowen Baker, Otkrist Gupta, Nikhil Naik, Ramesh Raskar. Designing Neural Network Architectures using Reinforcement Learning. https://arxiv.org/abs/1611.02167
- Barret Zoph, Quoc V. Le. Neural Architecture Search with Reinforcement Learning. https://arxiv.org/abs/1611.01578

Miscelânea:

- Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, Nando de Freitas. Learning to learn by gradient descent by gradient descent. https://arxiv.org/abs/1606.04474
- Franck Dernoncourt, Ji Young Lee Optimizing Neural Network Hyperparameters with Gaussian Processes for Dialog Act Classification, IEEE SLT 2016.
- Cortes, Corinna, Xavi Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, and Scott Yang. “AdaNet: Adaptive Structural Learning of Artificial Neural Networks.” arXiv preprint arXiv:1607.01097 (2016). https://arxiv.org/abs/1607.01097 : Approach that learns both the structure of the network as well as its weights.

PS: O WordPress retirou a opção de justificar texto, logo desculpem de antemão a aparência amadora do blog nos próximos dias.

# Além do aprendizado ativo em Sistemas de Recomendação de domínio cruzado

Um dos problemas mais comuns em Sistemas de Recomendação é o famoso *Cold Start* (*i.e.* quando não há conhecimento prévio sobre os gostos de alguém que acaba de entrar na plataforma).

Esse paper trás uma perspectiva interessante sobre o assunto.

**Conclusions**: In this paper, we have evaluated several widely used active *learning strategies adopted to tackle the cold-start problem **in a novel usage scenario, i.e., Cross-domain recommendation **scenario. In such a case, the user preferences are available **not only in the target domain, but also in additional **auxiliary domain. Hence, the active learner can exploit such **knowledge to better estimate which preferences are more **valuable for the system to acquire. **Our results have shown that the performance of the considered **active learning strategies significantly change in the **cross-domain recommendation scenario in comparison to the **single-domain recommendation. Hence, the presence of the **auxiliary domain may strongly influence the performance of **the active learning strategies. Indeed, while a certain active **learning strategy performs the best for MAE reduction in the **single scenario (i.e., highest-predicted strategy), it actually **performs poor in the cross-domain scenario. On the other **hand, the strategy with the worst MAE in single-domain scenario **(i.e., lowest-predicted strategy) can perform excellent **in the cross-domain scenario. This is an interesting observation **which indicates the importance of further analysis of **these two scenarios in order to better design and develop **active learning strategies for them. **Our future work includes the further analysis of the AL **strategies in other domains such as book, electronic products, **tourism, etc. Moreover, we plan to investigate the **potential impact of considering different rating prediction **models (e.g., context-aware models) on the performance of **different active learning strategies.*

# Ética Estóica para agentes artificiais

E não é que adaptaram a filosofia estóica para a Inteligência Artificial?

# MEBoost – Novo método para seleção de variáveis

Um dos campos bem pouco explorados em termos acadêmicos é sem sombra de dúvidas a parte de seleção de variáveis. Esse paper trás um pouco de luz sobre esse assunto tão importante e que drena parte do tempo produtivo de Data Scientists.

# Regressão com instâncias corrompidas: Uma abordagem robusta e suas aplicações

Trabalho interessante.

**Conclusions**: We consider a new approach dedicating to the multivariate regression problem where some output labels are either corrupted or missing. The gross error is explicitly addressed in our model, while it allows the adaptation of distinct regression elements or tasks according to their own noise levels. We further propose and analyze the convergence and runtime properties of the proposed proximal ADMM algorithm which is globally convergent and efficient. The *model combined with the specifically designed solver enable our approach to tackle a diverse range of applications. This is practically demonstrated on two distinct applications, that is, to predict personalities based on behaviors at SNSs, as well as to estimation 3D hand pose from single depth images. Empirical experiments on synthetic and real datasets have showcased the applicability of our approach in the presence of label noises. For future work, we plan to integrate with more advanced deep learning techniques to better address more practical problems, including 3D hand pose estimation and beyond.*

# Feature Screening in Large Scale Cluster Analysis

Mais trabalhos sobre clustering.

# Deterministic quantum annealing expectation-maximization (DQAEM)

Apesar do nome bem complicado o paper fala de uma modificação do mecanismo do algoritmo de cluster Expectation-Maximization (EM) em que o mesmo tem o incremento de uma meta-heurísica similar ao Simulated Annealing (arrefecimento simulado) para eliminar duas deficiências do EM que é de depender muito dos dados de início (atribuições iniciais) e o fato de que as vezes há problemas de mínimos locais.

**Relaxation of the EM Algorithm via Quantum Annealing for Gaussian Mixture Models**

# K-Means distribuído sobre dados binários comprimidos

E quem disse que o K-Means estava morto hein?

# Modularização do Morfismo de Redes Neurais

Quem foi que disse que não podem ocorrer alterações morfológicas nas arquiteturas/topologias de Redes Neurais?

**Modularized Morphing of Neural Networks – Tao Wei, Changhu Wang, Chang Wen Chen**

# Aplicação de Deep Learning para relacionar Pins

Ao que parece, o Pinterest está virando a nova casa de força de Deep Learning aplicada à imagens.

# Akid: Uma biblioteca de Redes Neurais para pesquisa e produção

Finalmente começaram a pensar em eliminar esse vale entre ciência/academia e indústria.

**Akid: A Library for Neural Network Research and Production from a Dataism Approach – **Shuai Li

**Abstract**: Neural networks are a revolutionary but immature technique that is fast evolving and heavily relies on data. To benefit from the newest development and newly available data, we want the gap between research and production as small as possibly. On the other hand, differing from traditional machine learning models, neural network is not just yet another statistic model, but a model for the natural processing engine — the brain. In this work, we describe a neural network library named {\texttt akid}. It provides higher level of abstraction for entities (abstracted as blocks) in nature upon the abstraction done on signals (abstracted as tensors) by Tensorflow, characterizing the dataism observation that all entities in nature processes input and emit out in some ways. It includes a full stack of software that provides abstraction to let researchers focus on research instead of implementation, while at the same time the developed program can also be put into production seamlessly in a distributed environment, and be production ready. At the top application stack, it provides out-of-box tools for neural network applications. Lower down, akid provides a programming paradigm that lets user easily build customized models. The distributed computing stack handles the concurrency and communication, thus letting models be trained or deployed to a single GPU, multiple GPUs, or a distributed environment without affecting how a model is specified in the programming paradigm stack. Lastly, the distributed deployment stack handles how the distributed computing is deployed, thus decoupling the research prototype environment with the actual production environment, and is able to dynamically allocate computing resources, so development (Devs) and operations (Ops) could be separated.

# Tuning via hiper-parametrização para Máquinas de Vetor de Suporte (Support Vector Machines) por estimação de distribuição de algoritmos

Em épocas de Deep Learning, é sempre bom ver um paper com as boas e velhas Máquinas de Vetor de Suporte (Support Vector Machines). Em breve teremos um post sobre essa técnica aqui no blog.

**Hyper-Parameter Tuning for Support Vector Machines by Estimation of Distribution Algorithms**

# Redes Neurais Coevolucionárias aplicadas na identificação do Mal de Parkinson

Mais um caso de aplicação de Deep Learning em questões médicas.

**Convolutional Neural Networks Applied for Parkinson’s Disease Identification**

Para quem quiser saber um pouco mais das evoluções em relação a aplicação de aprendizado por reforço e Deep Learning em sistemas autônomos, esse paper é uma boa pedida.

**Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks**

# DeepCancer: Detectando câncer através de expressões genéticas via Deep Learning

Este paper trás uma implementação de Deep Learning que se confirmada pode ser um grande avanço na indústria de diagnósticos para os serviços de saúde, dado que através de aprendizado algorítmico podem ser identificados diversos tipos de genes cancerígenos e isso pode conter duas externalidades positivas que são 1) o barateamento e a rapidez no diagnóstico, e 2) reformulação total da estratégia de combate e prevenção de doenças.

**DeepCancer: Detecting Cancer through Gene Expressions via Deep Generative Learning**