Pagou por múltiplas GPUs na Azure e não consegue usar com Deep Learning? Este post é pra você.

Você foi lá no portal da Azure pegou uma NC24R que tem maravilhosos 224 Gb de memória, 24 núcleos, mais de 1Tb de disco, e o melhor: 4 placas M80 para o seu deleite completo no treinamento com Deep Learning.

Tudo perfeito, certo? Quase.

Logo de início tentei usar um script para um treinamento e com um simples htop para monitorar o treinamento, vi que o Tensor Flow estava despejando todo o treinamento nos processadores.

Mesmo com esses 24 processadores maravilhosos batendo 100% de processamento, isso não chega nem perto do que as nossas GPUs mastodônticas podem produzir. (Nota: Você não trocaria 4 Ferraris 2017 por 24 Fiat 147 modelo 1985, certo?)

Acessando a nossa maravilhosa máquina para ver o que tinha acontecido, verifiquei primeiro se as GPUs estavam na máquina, o que de fato aconteceu.

azure_teste@deep-learning:~$ nvidia-smi
Tue Jun 27 18:21:05 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66                 Driver Version: 375.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | B2A3:00:00.0     Off |                    0 |
| N/A   47C    P0    71W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K80           Off  | C4D8:00:00.0     Off |                    0 |
| N/A   57C    P0    61W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K80           Off  | D908:00:00.0     Off |                    0 |
| N/A   52C    P0    56W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla K80           Off  | EEAF:00:00.0     Off |                    0 |
| N/A   42C    P0    69W / 149W |      0MiB / 11439MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
azure_teste@deep-learning:~/deep-learning-moderator-msft$ lspci | grep -i NVIDIA
b2a3:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
c4d8:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
d908:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
eeaf:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)

OK, as Teslas K80 estão prontas para o combate, contudo a própria Azure reconhece que há problemas no processo como um todo, e pra fazer isso basta alguns procedimentos bem simples em dois passos segundo a documentação.

PASSO 1

1) Clone o repositório do Git (Sim, eu dei fork pois rotineiramente isso costuma sumir por motivos que nem sempre sabemos).

$ git clone https://github.com/leestott/Azure-GPU-Setup.git

2) Entre na pasta que foi clonada

$ cd azure-gpu-setup

3) Execute o script que irá instalar algumas libs da NVIDIA e posteriormente fará o servidor reiniciar.

$ bash gpu-setup-part1.sh

PASSO 2

1) Vá na pasta do repositório do git

$ cd azure-gpu-setup

2) Execute o segundo script que fará a instalação do Tensorflow, do Toolkit do CUDA, e do CUDNN além de fazer o set de uma porção de variáveis de ambiente.

$ bash gpu-setup-part2.sh

3) Depois faça o teste da instalação

$ python gpu-test.py

Depois disso é só aproveitar as suas GPUs em carga total e aproveitar para treinar as suas GPUs.

Pagou por múltiplas GPUs na Azure e não consegue usar com Deep Learning? Este post é pra você.

Como criar um Virtualenv no Python sem bullshit

Via Eiti Kimura.

Direto e reto:

1) Realize a instalação do virtualenv pelo pip

$ pip install virtualenv

2) Faça a definição do seu diretório

$ mkdir deep-learning-virtual-env

3) Após a definição, entre no diretório

$ cd deep-learning-virtual-env

4) Faça a inicialização do seu virtualenv

$ virtualenv .

5) Com isso realize a ativação do seu virtualenv

$ source bin/activate

6) Para facilitar o seu trabalho, criamos até mesmo um arquivo de requirements com o Theano, Keras, Jupyter Notebook, Scikit-Learn. Para fazer isso basta rodar o seguinte comando:

$ pip install -r requirements.key

requirements.key

 

 

Como criar um Virtualenv no Python sem bullshit

Stanford CoreNLP – Core natural language software

Stanford CoreNLP provides a set of natural language analysis tools. It can give the base forms of words, their parts of speech, whether they are names of companies, people, etc., normalize dates, times, and numeric quantities, mark up the structure of sentences in terms of phrases and word dependencies, indicate which noun phrases refer to the same entities, indicate sentiment, extract particular or open-class relations between entity mentions, get quotes people said, etc.

Choose Stanford CoreNLP if you need:

  • An integrated toolkit with a good range of grammatical analysis tools
  • Fast, reliable analysis of arbitrary texts
  • The overall highest quality text analytics
  • Support for a number of major (human) languages
  • Available interfaces for most major modern programming languages
  • Ability to run as a simple web service

Stanford CoreNLP’s goal is to make it very easy to apply a bunch of linguistic analysis tools to a piece of text. A tool pipeline can be run on a piece of plain text with just two lines of code. CoreNLP is designed to be highly flexible and extensible. With a single option you can change which tools should be enabled and which should be disabled. Stanford CoreNLP integrates many of Stanford’s NLP tools, including the part-of-speech (POS) tagger, the named entity recognizer (NER), the parser, the coreference resolution system, sentiment analysis, bootstrapped pattern learning, and the open information extraction tools. Moreover, an annotator pipeline can include additional custom or third-party annotators. CoreNLP’s analyses provide the foundational building blocks for higher-level and domain-specific text understanding applications.

Stanford CoreNLP – Core natural language software

Página do professor Jürgen Schmidhuber

Para quem não sabe, o professor Jürgen Schmidhuber é um dos pioneiros em pesquisa aplicada de Deep Learning.

Ele atesta em uma de suas palestras, que as Redes Deep Learning são datadas de 1971 quando o Aleksei Ivakhnenko e Valentin Lapa publicaram o primeiro trabalho com uma rede de 8 camadas.

Para quem quiser saber mais das suas ideias em relação a perspectivas em Deep Learning esse Ask Me Anything (AMA ou pergunte-me qualquer coisa) no Reddit é bem legal.

E aqui para baixar no post a sua revisão sistemática em relação à Deep Learning.

cse597g-deep_learning

Página do professor Jürgen Schmidhuber

SLIM: Sparse Linear Methods for Top-N Recommender Systems

Um ótimo artigo de base teórica, relativo a geração de Top-N recomendações em cenários bem esparsos (e.g. sistema de rating 0-5 em que poucas pessoas fazem a anotação do rating, etc).

Recentemente, esse problema de recomendar dentro de uma matriz muito esparsa foi o motivo pelo qual o Netflix mudou o seu sistema de Rating que era de 1 a 5 para jóia ou ruim.

 

Em todo o caso vale a pena a leitura para ver a forma na qual os autores estão trabalhando nesse tipo de desafio.

Abstract: This paper focuses on developing effective and efficient algorithms for top-N recommender systems. A novel Sparse Linear Method (SLIM) is proposed, which generates top-N recommendations by aggregating from user purchase/rating profiles. A sparse aggregation coefficient matrix W is learned from SLIM by solving an `1-norm and `2-norm regularized optimization problem. W is demonstrated to produce high quality recommendations and its sparsity allows SLIM to generate recommendations very fast. A comprehensive set of experiments is conducted by comparing the SLIM method and other state-of-the-art top-N recommendation methods. The experiments show that SLIM achieves significant improvements both in run time performance and recommendation quality over the best existing methods. 

SLIM: Sparse Linear Methods for Top-N Recommender Systems

Local Item-Item Models For Top-N Recommendation

Esse é um dos segredos teóricos por trás do Netflix: Porque computacionalmente tratar todos os clientes como diferentes, se alguns deles têm preferências semelhantes.

Abstract: Item-based approaches based on SLIM (Sparse LInear Methods) have demonstrated very good performance for top-N recommendation; however they only estimate a single model for all the users. This work is based on the intuition that not all users behave in the same way — instead there exist subsets of like-minded users. By using different item-item models for these user subsets, we can capture differences in their preferences and this can lead to improved performance for top-N recommendations. In this work, we extend SLIM by combining global and local SLIM models. We present a method that computes the prediction scores as a user-specific combination of the predictions derived by a global and local item-item models. We present an approach in which the global model, the local models, their user-specific combination, and the assignment of users to the local models are jointly optimized to improve the top-N recommendation performance. Our experiments show that the proposed method improves upon the standard SLIM model and outperforms competing top-N recommendation approaches.

Local Item-Item Models For Top-N Recommendation

RLScore: Regularized Least-Squares Learners

Uma boa alternativa para ensemble quando a dimensionalidade dos datasets for alta, ou as alternativas com Elastic Net, Lasso e Ridge não derem a convergência desejada.

RLScore: Regularized Least-Squares Learners

RLScore is a Python open source module for kernel based machine learning. The library provides implementations of several regularized least-squares (RLS) type of learners. RLS methods for regression and classification, ranking, greedy feature selection, multi-task and zero-shot learning, and unsupervised classification are included. Matrix algebra based computational short-cuts are used to ensure efficiency of both training and cross-validation. A simple API and extensive tutorials allow for easy use of RLScore.

Regularized least squares (RLS) is a family of methods for solving the least-squares problem while using regularization to further constrain the resulting solution.

RLS is used for two main reasons. The first comes up when the number of variables in the linear system exceeds the number of observations. In such settings, the ordinary least-squares problem is ill-posed and is therefore impossible to fit because the associated optimization problem has infinitely many solutions. RLS allows the introduction of further constraints that uniquely determine the solution.

The second reason that RLS is used occurs when the number of variables does not exceed the number of observations, but the learned model suffers from poor generalization. RLS can be used in such cases to improve the generalizability of the model by constraining it at training time. This constraint can either force the solution to be “sparse” in some way or to reflect other prior knowledge about the problem such as information about correlations between features. A Bayesian understanding of this can be reached by showing that RLS methods are often equivalent to priors on the solution to the least-squares problem.

To sse in Depth

Installation
1) $ pip install rlscore
2) $ export CFLAGS="-I /usr/local/lib/python2.7/site-packages/numpy/core/include $CFLAGS"

Original post

In [1]:
# Import libraries
import numpy as np
from rlscore.learner import RLS
from rlscore.measure import sqerror
from rlscore.learner import LeaveOneOutRLS
In [2]:
# Function to load dataset and split in train and test sets
def load_housing():
    np.random.seed(1)
    D = np.loadtxt("/Volumes/PANZER/Github/learning-space/Datasets/02 - Classification/housing_data.txt")
    np.random.shuffle(D)
    X = D[:,:-1] # Independent variables
    Y = D[:,-1]  # Dependent variable
    X_train = X[:250]
    Y_train = Y[:250]
    X_test = X[250:]
    Y_test = Y[250:]
    return X_train, Y_train, X_test, Y_test
In [3]:
def print_stats():
    X_train, Y_train, X_test, Y_test = load_housing()
    print("Housing data set characteristics")
    print("Training set: %d instances, %d features" %X_train.shape)
    print("Test set: %d instances, %d features" %X_test.shape)

if __name__ == "__main__":
    print_stats()
Housing data set characteristics
Training set: 250 instances, 13 features
Test set: 256 instances, 13 features

Linear regression with default parameters

In [4]:
# Function to train RLS method
def train_rls():
    #Trains RLS with default parameters (regparam=1.0, kernel='LinearKernel')
    X_train, Y_train, X_test, Y_test = load_housing()
    learner = RLS(X_train, Y_train)
    
    #Leave-one-out cross-validation predictions, this is fast due to
    #computational short-cut
    P_loo = learner.leave_one_out()
    
    #Test set predictions
    P_test = learner.predict(X_test)
    
    # Stats
    print("leave-one-out error %f" %sqerror(Y_train, P_loo))
    print("test error %f" %sqerror(Y_test, P_test))
    
    #Sanity check, can we do better than predicting mean of training labels?
    print("mean predictor %f" %sqerror(Y_test, np.ones(Y_test.shape)*np.mean(Y_train)))

if __name__=="__main__":
    train_rls()
leave-one-out error 25.959399
test error 25.497222
mean predictor 81.458770

Choosing regularization parameter with leave-one-out

Regularization parameter with grid search in exponential grid to catch the lowest LOO-CV error.

In [5]:
def train_rls():
    #Select regparam with leave-one-out cross-validation
    X_train, Y_train, X_test, Y_test = load_housing()
    learner = RLS(X_train, Y_train)
    best_regparam = None
    best_error = float("inf")
   
    #exponential grid of possible regparam values
    log_regparams = range(-15, 16)
    for log_regparam in log_regparams:
        regparam = 2.**log_regparam
        
        #RLS is re-trained with the new regparam, this
        #is very fast due to computational short-cut
        learner.solve(regparam)
        
        #Leave-one-out cross-validation predictions, this is fast due to
        #computational short-cut
        P_loo = learner.leave_one_out()
        e = sqerror(Y_train, P_loo)
        print("regparam 2**%d, loo-error %f" %(log_regparam, e))
        if e < best_error:
            best_error = e
            best_regparam = regparam
    learner.solve(best_regparam)
    P_test = learner.predict(X_test)
    print("best regparam %f with loo-error %f" %(best_regparam, best_error)) 
    print("test error %f" %sqerror(Y_test, P_test))

if __name__=="__main__":
    train_rls()
regparam 2**-15, loo-error 24.745479
regparam 2**-14, loo-error 24.745463
regparam 2**-13, loo-error 24.745431
regparam 2**-12, loo-error 24.745369
regparam 2**-11, loo-error 24.745246
regparam 2**-10, loo-error 24.745010
regparam 2**-9, loo-error 24.744576
regparam 2**-8, loo-error 24.743856
regparam 2**-7, loo-error 24.742982
regparam 2**-6, loo-error 24.743309
regparam 2**-5, loo-error 24.750966
regparam 2**-4, loo-error 24.786243
regparam 2**-3, loo-error 24.896991
regparam 2**-2, loo-error 25.146493
regparam 2**-1, loo-error 25.537315
regparam 2**0, loo-error 25.959399
regparam 2**1, loo-error 26.285436
regparam 2**2, loo-error 26.479254
regparam 2**3, loo-error 26.603001
regparam 2**4, loo-error 26.801196
regparam 2**5, loo-error 27.352322
regparam 2**6, loo-error 28.837002
regparam 2**7, loo-error 32.113350
regparam 2**8, loo-error 37.480625
regparam 2**9, loo-error 43.843555
regparam 2**10, loo-error 49.748687
regparam 2**11, loo-error 54.912297
regparam 2**12, loo-error 59.936226
regparam 2**13, loo-error 65.137825
regparam 2**14, loo-error 70.126118
regparam 2**15, loo-error 74.336978
best regparam 0.007812 with loo-error 24.742982
test error 24.509981

Training with RLS and simultaneously selecting the regularization parameter with leave-one-out using LeaveOneOutRLS

In [6]:
def train_rls():
    #Trains RLS with automatically selected regularization parameter
    X_train, Y_train, X_test, Y_test = load_housing()
    
    # Grid search
    regparams = [2.**i for i in range(-15, 16)]
    learner = LeaveOneOutRLS(X_train, Y_train, regparams = regparams)
    loo_errors = learner.cv_performances
    P_test = learner.predict(X_test)
    print("leave-one-out errors " +str(loo_errors))
    print("chosen regparam %f" %learner.regparam)
    print("test error %f" %sqerror(Y_test, P_test))

if __name__=="__main__":
    train_rls()
leave-one-out errors [ 24.74547881  24.74546295  24.74543138  24.74536884  24.74524616
  24.74501033  24.7445764   24.74385625  24.74298177  24.74330936
  24.75096639  24.78624255  24.89699067  25.14649266  25.53731465
  25.95939943  26.28543584  26.47925431  26.6030015   26.80119588
  27.35232186  28.83700156  32.11334986  37.48062503  43.84355496
  49.7486873   54.91229746  59.93622566  65.1378248   70.12611801
  74.33697809]
chosen regparam 0.007812
test error 24.509981

Learning nonlinear predictors using kernels

RLS using a non-linear kernel function.

In [7]:
def train_rls():
    #Selects both the gamma parameter for Gaussian kernel, and regparam with loocv
    X_train, Y_train, X_test, Y_test = load_housing()
    
    regparams = [2.**i for i in range(-15, 16)]
    gammas = regparams
    best_regparam = None
    best_gamma = None
    best_error = float("inf")
    
    for gamma in gammas:
        #New RLS is initialized for each kernel parameter
        learner = RLS(X_train, Y_train, kernel="GaussianKernel", gamma=gamma)
        for regparam in regparams:
            #RLS is re-trained with the new regparam, this
            #is very fast due to computational short-cut
            learner.solve(regparam)
            
            #Leave-one-out cross-validation predictions, this is fast due to
            #computational short-cut
            P_loo = learner.leave_one_out()
            e = sqerror(Y_train, P_loo)
            
            #print "regparam", regparam, "gamma", gamma, "loo-error", e
            if e < best_error:
                best_error = e
                best_regparam = regparam
                best_gamma = gamma
    learner = RLS(X_train, Y_train, regparam = best_regparam, kernel="GaussianKernel", gamma=best_gamma)
    P_test = learner.predict(X_test)
    print("best parameters gamma %f regparam %f" %(best_gamma, best_regparam))
    print("best leave-one-out error %f" %best_error)
    print("test error %f" %sqerror(Y_test, P_test))
    
    
if __name__=="__main__":
    train_rls()
best parameters gamma 0.000031 regparam 0.000244
best leave-one-out error 21.910837
test error 16.340877

Binary classification and Area under ROC curve

In [8]:
from rlscore.utilities.reader import read_svmlight

# Load dataset and stats
def print_stats():
    X_train, Y_train, foo = read_svmlight("/Volumes/PANZER/Github/learning-space/Datasets/02 - Classification/a1a.t")
    X_test, Y_test, foo = read_svmlight("/Volumes/PANZER/Github/learning-space/Datasets/02 - Classification/a1a")
    print("Adult data set characteristics")
    print("Training set: %d instances, %d features" %X_train.shape)
    print("Test set: %d instances, %d features" %X_test.shape)

if __name__=="__main__":
    print_stats()
Adult data set characteristics
Training set: 30956 instances, 123 features
Test set: 1605 instances, 119 features
In [ ]:
from rlscore.learner import RLS
from rlscore.measure import accuracy
from rlscore.utilities.reader import read_svmlight


def train_rls():
    # Train ans test datasets    
    X_train, Y_train, foo = read_svmlight("/Volumes/PANZER/Github/learning-space/Datasets/02 - Classification/a1a.t")
    X_test, Y_test, foo = read_svmlight("/Volumes/PANZER/Github/learning-space/Datasets/02 - Classification/a1a", X_train.shape[1])
    learner = RLS(X_train, Y_train)
    best_regparam = None
    best_accuracy = 0.
    
    #exponential grid of possible regparam values
    log_regparams = range(-15, 16)
    for log_regparam in log_regparams:
        regparam = 2.**log_regparam
        #RLS is re-trained with the new regparam, this
        #is very fast due to computational short-cut
        learner.solve(regparam)
        
        #Leave-one-out cross-validation predictions, this is fast due to
        #computational short-cut
        P_loo = learner.leave_one_out()
        acc = accuracy(Y_train, P_loo)
        
        print("regparam 2**%d, loo-accuracy %f" %(log_regparam, acc))
        if acc > best_accuracy:
            best_accuracy = acc
            best_regparam = regparam
    learner.solve(best_regparam)
    P_test = learner.predict(X_test)
    
    print("best regparam %f with loo-accuracy %f" %(best_regparam, best_accuracy)) 
    print("test set accuracy %f" %accuracy(Y_test, P_test))

if __name__=="__main__":
    train_rls()
RLScore: Regularized Least-Squares Learners

Cleverhans – Lib python para prevenção de ataques de ruídos nos modelos

É um tema bem recente o ataque em sensores e por consequência em modelos de Machine Learning (isso será abordado em algum momento do futuro por aqui, mas esse artigo mostra bem o potencial danoso disso).

O Cleverhans é uma lib em Python que insere de maneira artificial um pouco de ruído/distúrbio na rede como forma de treino para esse tipo de ataque.

This repository contains the source code for cleverhans, a Python library to benchmark machine learning systems’ vulnerability to adversarial examples. You can learn more about such vulnerabilities on the accompanying blog.

The cleverhans library is under continual development, always welcoming contributions of the latest attacks and defenses. In particular, we always welcome help towards resolving the issues currently open.

About the name

The name cleverhans is a reference to a presentation by Bob Sturm titled “Clever Hans, Clever Algorithms: Are Your Machine Learnings Learning What You Think?” and the corresponding publication, “A Simple Method to Determine if a Music Information Retrieval System is a ‘Horse’.” Clever Hans was a horse that appeared to have learned to answer arithmetic questions, but had in fact only learned to read social cues that enabled him to give the correct answer. In controlled settings where he could not see people’s faces or receive other feedback, he was unable to answer the same questions. The story of Clever Hans is a metaphor for machine learning systems that may achieve very high accuracy on a test set drawn from the same distribution as the training data, but that do not actually understand the underlying task and perform poorly on other inputs.

Cleverhans – Lib python para prevenção de ataques de ruídos nos modelos

Interpretabilidade versus Desempenho: Ceticismo e AI-Winter

Neste post do Michael Elad que é editor chefe da SIAM da publicação Journal on Imaging Sciences ele faz uma série de reflexões bem ponderadas de como os métodos de Deep Learning estão resolvendo problemas reais e alcançando um alto grau de visibilidade mesmo com métodos não tão elegantes dentro da perspectiva matemática.

Ele coloca o ponto principal de que, no que tange o processamento de imagens, a academia teve sempre um lugar de destaque em uma abordagem na qual a interpretabilidade e o entendimento do modelos sempre teve precedência em relação aos resultados alcançados.

Isso fica claro no parágrafo abaixo:

A series of papers during the early 2000s suggested the successful application of this architecture, leading to state-of-the-art results in practically any assigned task. Key aspects in these contributions included the following: the use of many network layers, which explains the term “deep learning;” a huge amount of data on which to train; massive computations typically run on computer clusters or graphic processing units; and wise optimization algorithms that employ effective initializations and gradual stochastic gradient learning. Unfortunately, all of these great empirical achievements were obtained with hardly any theoretical understanding of the underlying paradigm. Moreover, the optimization employed in the learning process is highly non-convex and intractable from a theoretical viewpoint.

No final ele coloca uma visão sobre pragmatismo e agenda acadêmica:

Should we be happy about this trend? Well, if we are in the business of solving practical problems such as noise removal, the answer must be positive. Right? Therefore, a company seeking such a solution should be satisfied. But what about us scientists? What is the true objective behind the vast effort that we invested in the image denoising problem? Yes, we do aim for effective noise-removal algorithms, but this constitutes a small fraction of our motivation, as we have a much wider and deeper agenda. Researchers in our field aim to understand the data on which we operate. This is done by modeling information in order to decipher its true dimensionality and manifested phenomena. Such models serve denoising and other problems in image processing, but far more than that, they allow identifying new ways to extract knowledge from the data and enable new horizons.

Isso lembra uma passagem minha na RCB Investimentos quando eu trabalhava com o grande Renato Toledo no mercado de NPL em que ele me ensinou que bons modelos têm um alto grau de interpretabilidade e simplicidade, no qual esse fator deve ser o barômetro da tomada de decisão, dado que um modelo cujo a sua incerteza (ou erro) seja conhecido é melhor do que um modelo que ninguém sabe o que está acontecendo (Nota pessoal: Quem me conhece sabe que eu tenho uma frase sobre isso que é: se você não entende a dinâmica do modelo quando ele funciona, nunca vai saber o que deu errado quando ele falhar.)

Contudo é inegável que as redes Deep Learning estão resolvendo, ao meu ver, uma demanda reprimida de problemas que já existiam e que os métodos computacionais não conseguiam resolver de forma fácil, como reconhecimento facial, classificação de imagens, tradução, e problemas estruturados como fraude (a Fast.AI está fazendo um ótimo trabalho de clarificar isso).

Em que pese o fato dos pesquisadores de DL terem hardware infinito a preços módicos, o fato brutal é que esse campo de pesquisa durante aproximadamente 30 anos engoliu uma pílula bem amarga de ceticismo proveniente da própria academia: seja em colocar esse método em uma esfera de alto ceticismo levando a sua quase total extinção, ou mesmo com alguns jornals implicitamente não aceitarem trabalhos de DL; enquanto matemáticos estavam ganhando prêmios e tendo um alto nível de visibilidade por causa da acurácia dos seus métodos ao invés de uma pretensa ideia de que o mundo gostava da interpretabilidade de seus métodos.

Duas grandes questões estão em tela que são: 1) Será que os matemáticos e comunidades que estão chocadas com esse fenômeno podem aguentar o mesmo que a comunidade de Redes Neurais aguentou por mais de 30 anos? e 2) E em caso de um Math-Winter, a comunidade matemática consegue suportar uma potencial marginalização de sua pesquisa?

É esperar e ver.

 

Interpretabilidade versus Desempenho: Ceticismo e AI-Winter

Self-Driving Cars no GTA 5 usando Deep Learning

Pra quem acompanha o Python Programming, sabe que sempre quando eles postam algo é que coisa boa vem aí; e dessa vez não foi diferente.

O Harrison está fazendo uma série de posts sobre como jogar GTA V usando Deep Learning com Tensor Flow usando CNN (convolutional neural network).

Este é o primeiro vídeo da série em que ele faz o setup da solução:

 

E essa é a última versão treinada:

Para quem estiver interessado o Harrison deixou uma playlist com todos os estágios do treinamento, e um BOT rodando sozinho em um livestream (vale a pena ver o quão divertido é ver o bot tentando dirigir).

E o código está disponível no Github.

Self-Driving Cars no GTA 5 usando Deep Learning

Processamento de Linguagem Natural com o FAIRSeq – Facebook AI Research Sequence-to-Sequence Toolkit

No post recente do Facebook Code foi apresentado o FAIRSeq, acrônimo para Facebook AI Research Sequence-to-Sequence Toolkit em que os pesquisadores conseguiram ter bons resultados misturando uma abordagem com CNN (convolutional neural network) juntamente com sequence to sequence learning; abordagem essa que além de ter uma acurácia maior do que abordagens com RNN (recurrent neural networks) tem um poder de processamento muito maior.

Apesar dos resultados, e da abordagem; o mais interessante é ver como que aspectos básicos da ciência observacional tem uma grande influência na inovação; em outras palavras, como que a observação simples pode levar a ótimos resultados.

Para entender melhor, vejam a inspiração na qual o mecanismo principal da arquitetura que faz a tradução foi pensado:

“A distinguishing component of our architecture is multi-hop attention. An attention mechanism is similar to the way a person would break down a sentence when translating it: Instead of looking at the sentence only once and then writing down the full translation without looking back, the network takes repeated “glimpses” at the sentence to choose which words it will translate next, much like a human occasionally looks back at specific keywords when writing down a translation.3 Multi-hop attention is an enhanced version of this mechanism, which allows the network to make multiple such glimpses to produce better translations. These glimpses also depend on each other. For example, the first glimpse could focus on a verb and the second glimpse on the associated auxiliary verb.”

Para quem interessar há uma versão do código disponível no Github e o paper original com os resultados está aqui.

Processamento de Linguagem Natural com o FAIRSeq – Facebook AI Research Sequence-to-Sequence Toolkit

Melhores papers de Deep Learning de 2012 até 2016

Para estudar com lápis na mão, e café na caneca.

Via Kdnuggets

1. Understanding / Generalization / Transfer

Distilling the knowledge in a neural network (2015), G. Hinton et al. [pdf]

2. Optimization / Training Techniques

Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015), S. Loffe and C. Szegedy [pdf]

3. Unsupervised / Generative Models

Unsupervised representation learning with deep convolutional generative adversarial networks (2015), A. Radford et al. [pdf]

4. Convolutional Neural Network Models

Deep residual learning for image recognition (2016), K. He et al. [pdf]

5. Image: Segmentation / Object Detection

Fast R-CNN (2015), R. Girshick [pdf]

6. Image / Video / Etc.

Show and tell: A neural image caption generator (2015), O. Vinyals et al. [pdf]

7. Natural Language Processing / RNNs

Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014), K. Cho et al. [pdf]

8. Speech / Other Domain

Speech recognition with deep recurrent neural networks (2013), A. Graves [pdf]

9. Reinforcement Learning / Robotics

Human-level control through deep reinforcement learning (2015), V. Mnih et al. [pdf]

10. More Papers from 2016

Domain-adversarial training of neural networks (2016), Y. Ganin et al. [pdf]

Melhores papers de Deep Learning de 2012 até 2016

Generalized Additive Models em Séries Temporais

Aqui no AlgoBeans provavelmente você verá a melhor explicação sobre modelos aditivos generalizados (Generalized Additive Models) da internet. De forma simples e didática, o post explica tudo sobre essa técnica.

Therefore, google search trends for persimmons could well be modeled by adding a seasonal trend to an increasing growth trend, in what’s called a generalized additive model (GAM).

The principle behind GAMs is similar to that of regression, except that instead of summing effects of individual predictors, GAMs are a sum of smooth functions. Functions allow us to model more complex patterns, and they can be averaged to obtain smoothed curves that are more generalizable.

Because GAMs are based on functions rather than variables, they are not restricted by the linearity assumption in regression that requires predictor and outcome variables to move in a straight line. Furthermore, unlike in neural networks, we can isolate and study effects of individual functions in a GAM on resulting predictions.

Generalized Additive Models em Séries Temporais

Accelerating the XGBoost algorithm using GPU computing

A fronteira final em relação ao uso com GPU de um dos mais poderosos algoritmos de todos os tempos está aqui.

Abstract: We present a CUDA based implementation of a decision tree construction algorithm within the gradient boosting library XGBoost. The tree construction algorithm is executed entirely on the GPU and shows high performance with a variety of datasets and settings, including sparse input matrices. Individual boosting iterations are parallelized, combining two approaches. An interleaved approach is used for shallow trees, switching to a more conventional radix sort based approach for larger depths. We show speedups of between 3-6x using a Titan X compared to a 4 core i7 CPU, and 1.2x using a Titan X compared to 2x Xeon CPUs (24 cores). We show that it is possible to process the Higgs dataset (10 million instances, 28 features) entirely within GPU memory. The algorithm is made available as a plug-in within the XGBoost library and fully supports all XGBoost features including classification, regression and ranking tasks. 

Accelerating the XGBoost algorithm using GPU computing

Aplicação de Natural Processing Language com Python em reviews de comida

Um ótimo notebook do Patrick Harrison da S&P Global Market Intelligence.

Para quem tem vontade de trabalhar com NLP, esse de longe é um dos melhores tutoriais da internet, em especial pela riqueza de como trabalhar com texto, em especial na modelagem de tópicos usando LDA e análise semântica usando pyLDAvis.

Para quem deseja trabalhar seriamente com NLP esse post é mandatório.

Aplicação de Natural Processing Language com Python em reviews de comida