Deep learning sharpens views of cells and genes

The research relied on a convolutional neural network, a type of deep-learning algorithm that is transforming how biologists analyse images. Scientists are using the approach to find mutations in genomes and predict variations in the layout of single cells. Google’s method, described in a preprint in August (R. Poplin et al. Preprint at https://arxiv.org/abs/1708.09843; 2017), is part of a wave of new deep-learning applications that are making image processing easier and more versatile — and could even identify overlooked biological phenomena.
Cell biologists at the Allen Institute for Cell Science in Seattle, Washington, are using convolutional neural networks to convert flat, grey images of cells captured with light microscopes into 3D images in which some of a cell’s organelles are labelled in colour. The approach eliminates the need to stain cells — a process that requires more time and a sophisticated lab, and can damage the cell. Last month, the group published details of an advanced technique that can predict the shape and location of even more cell parts using just a few pieces of data — such as the cell’s outline (G. R. Johnson et al. Preprint at bioRxiv http://doi.org/chwv; 2017).
Other machine-learning connoisseurs in biology have set their sights on new frontiers, now that convolutional neural networks are taking flight for image processing. “Imaging is important, but so is chemistry and molecular data,” says Alex Wolf, a computational biologist at the German Research Center for Environmental Health in Neuherberg. Wolf hopes to tweak neural networks so that they can analyse gene expression. “I think there will be a very big breakthrough in the next few years,” he says, “that allows biologists to apply neural networks much more broadly.”

Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning

Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning

Abstract: Traditionally, medical discoveries are made by observing associations, making hypotheses from them and then designing and running experiments to test the hypotheses. However, with medical images, observing and quantifying associations can often be difficult because of the wide variety of features, patterns, colours, values and shapes that are present in real data. Here, we show that deep learning can extract new knowledge from retinal fundus images. Using deep-learning models trained on data from 284,335 patients and validated on two independent datasets of 12,026 and 999 patients, we predicted cardiovascular risk factors not previously thought to be present or quantifiable in retinal images, such as age (mean absolute error within 3.26 years), gender (area under the receiver operating characteristic curve (AUC) = 0.97), smoking status (AUC = 0.71), systolic blood pressure (mean absolute error within 11.23 mmHg) and major adverse cardiac events (AUC = 0.70). We also show that the trained deep-learning models used anatomical features, such as the optic disc or blood vessels, to generate each prediction.

Deep Reinforcement Learning Doesn’t Work Yet

That’s why I like more “tear down” projects and papers (that kind of paper helps to open our eyes in the right criticism) than “look-my-shiny-not-reproducible-paper-project” .

By Sorta Insightful

Optimization for Deep Learning Algorithms: A Review

ABSTRACT: In past few years, deep learning has received attention in the field of artificial intelligence. This paper reviews three focus areas of learning methods in deep learning namely supervised, unsupervised and reinforcement learning. These learning methods are used in implementing deep and convolutional neural networks. They offered unified computational approach, flexibility and scalability capabilities. The computational model implemented by deep learning is used in understanding data representation with multiple levels of abstractions. Furthermore, deep learning enhanced the state-of-the-art methods in terms of domains like genomics. This can be applied in pathway analysis for modelling biological network. Thus, the extraction of biochemical production can be improved by using deep learning. On the other hand, this review covers the implementation of optimization in terms of meta-heuristics methods. This optimization is used in machine learning as a part of modelling methods.
CONCLUSION
In this review, discussed about deep learning techniques which implementing multiple level of abstraction in feature representation. Deep learning can be characterized as rebranding of artificial neural network. This learning methods gains a large interest among the researchers because of better representation and easier to learn tasks. Even though deep learning is implemented, however there are some issues has been arise. There are easily getting stuck at local optima and computationally expensive. DeepBind algorithm shows that deep learning can cooperate in genomics study. It is to ensure on achieving high level of prediction protein binding affinity. On the other hand, the optimization method which has been discusses consists of several meta-heuristics
methods which can be categorized under evolutionary algorithms. The application of the techniques involvedCRO shows the diversity of optimization algorithm to improve the analysis of modelling techniques. Furthermore, these methods are able to solve the problems arise in conventional neural network as it provides high quality in finding solution in a given search space. The application of optimization methods enable the
extraction of biochemical production of metabolic pathway. Deep learning will gives a good advantage in the biochemical production as it allows high level abstraction in cellular biological network. Thus, the use of CRO will improve the problems arise in deep learning which are getting stuck at local optima and it is computationally expensive. As CRO use global search in the search space to identify global minimum point. Thus, it will improve the training process in the network on refining the weight in order to have minimum error.

Optimization for Deep Learning Algorithms: A Review

Abstract: In past few years, deep learning has received attention in the field of artificial intelligence. This paper reviews three focus areas of learning methods in deep learning namely supervised, unsupervised and reinforcement learning. These learning methods are used in implementing deep and convolutional neural networks. They offered unified computational approach, flexibility and scalability capabilities. The computational model implemented by deep learning is used in understanding data representation with multiple levels of abstractions. Furthermore, deep learning enhanced the state-of-the-art methods in terms of domains like genomics. This can be applied in pathway analysis for modelling biological network. Thus, the extraction of biochemical production can be improved by using deep learning. On the other hand, this review covers the implementation of optimization in terms of meta-heuristics methods. This optimization is used in machine learning as a part of modelling methods.
CONCLUSION: In this review, discussed about deep learning techniques which implementing multiple level of abstraction in feature representation. Deep learning can be characterized as rebranding of artificial neural network. This learning methods gains a large interest among the researchers because of better representation and easier to learn tasks. Even though deep learning is implemented, however there are some issues has been arise. There are easily getting stuck at local optima and computationally expensive. DeepBind algorithm shows that deep learning can cooperate in genomics study. It is to ensure on achieving high level of prediction protein binding affinity. On the other hand, the optimization method which has been discusses consists of several meta-heuristics methods which can be categorized under evolutionary algorithms. The application of the techniques involvedCRO shows the diversity of optimization algorithm to improve the analysis of modelling techniques. Furthermore, these methods are able to solve the problems arise in conventional neural network as it provides high quality in finding solution in a given search space. The application of optimization methods enable the extraction of biochemical production of metabolic pathway. Deep learning will gives a good advantage in the biochemical production as it allows high level abstraction in cellular biological network. Thus, the use of CRO will improve the problems arise in deep learning which are getting stuck at local optima and it is computationally expensive. As CRO use global search in the search space to identify global minimum point. Thus, it will improve the training process in the network on refining the weight in order to have minimum error.

A synthetic guide on Adversarial Attack

Fast and simple.

Machine learning algorithms accept inputs as numeric vectors. Designing an input in a specific way to get the wrong result from the model is called an adversarial attack.

How is this possible? No machine learning algorithm is perfect and they make mistakes — albeit very rarely. However, machine learning models consist of a series of specific transformations, and most of these transformations turn out to be very sensitive to slight changes in input. Harnessing this sensitivity and exploiting it to modify an algorithm’s behavior is an important problem in AI security.

In this article we will show practical examples of the main types of attacks, explain why is it so easy to perform them, and discuss the security implications that stem from this technology.

Here are the main types of hacks we will focus on:

1. Non-targeted adversarial attack: the most general type of attack when all you want to do is to make the classifier give an incorrect result.
2. Targeted adversarial attack: a slightly more difficult attack which aims to receive a particular class for your input.

Skin Cancer Detection using Deep Neural Networks

Skin Cancer Detection using Deep Neural Networks

Abstract: Cancer is the most dangerous and stubborn disease known to mankind. It accounts for the most deaths caused by any disease. However, if detected early this medical condition is not very difficult to defeconvoat. Tumors which are cancerous grow very rapidly and spread into different parts of the body and this process continues until that tumor spreads in the entire body and ultimately our organs stop functioning. If any tumor is developed in any part of our body it requires immediate medical attention to verify that the tumor is malignant(cancerous) or Benign(non-cancerous). Until now if any tumor has to be tested for malignancy a sample of tumor should be extracted out and then tested in the laboratory. But using the computational logic of Deep Neural Networks we can predict that the tumor is malignant or Benign by only a photograph of that tumor. If cancer is detected in early stage chances are very high that it can be cured completely. In this work, we detect Melanoma(Skin cancer) in tumors by processing images of those tumors.

Conclusion: We have trained our model using Vgg16, Inception and ResNet50 neural network architecture. In training, we have provided 2 categories of images one with Malignant (MelanomaSkin cancer) tumors and other with benign tumors. After training, we tested our model with random images of tumor and an accuracy of 83.86%-86.02% was recorded in classifying that it is malignant or benign. By using neural network our model can classify Malignant(cancerous) and benign(non-cancerous) tumors with an accuracy of 86.02%. Since cancer, if detected early can be cured completely. This technology can be used to detect cancer when a tumor is developed at early stage and precautions can be taken accordingly.

Nested LSTMs

One of the most misunderstood concepts and the reason that a lot of cash is spent in Machine Learning as a Service (MLaaS) due a lack of optimization in this parameter that is responsible to control the convergence.

Abstract: Any gradient descent optimization requires to choose a learning rate. With deeper and deeper models, tuning that learning rate can easily become tedious and does not necessarily lead to an ideal convergence. We propose a variation of the gradient descent algorithm in the which the learning rate η is not fixed. Instead, we learn η itself, either by another gradient descent (first-order method), or by Newton’s method (second-order). This way, gradient descent for any machine learning algorithm can be optimized.

Conclusion: In this paper, we have built a new way to learn the learning rate at each step using finite differences on the loss. We have tested it on a variety of convex and non-convex optimization tasks. Based on our results, we believe that our method would be able to adapt a good learning rate at every iteration on convex problems. In the case of non-convex problems, we repeatedly observed faster training in the first few epochs. However, our adaptive model seems more inclined to overfit the training data, even though its test accuracy is always comparable to standard SGD performance, if not slightly better. Hence we believe that in neural network architectures, our model can be used initially for pretraining for a few epochs, and then continue with any other standard optimization technique to lead to faster convergence and be computationally more efficient, and perhaps reach a new highest accuracy on the given problem. Moreover, the learning rate that our algorithm converges to suggests an ideal learning rate for the given training task. One could use our method to tune the learning rate of a standard neural network (using Adam for instance), giving a more precise value than with line-search or random search.

PlaidML: An open source portable deep learning engine

Via Vertex.ai

We’re pleased to announce the next step towards deep learning for every device and platform. Today Vertex.AI is releasing PlaidML, our open source portable deep learning engine. Our mission is to make deep learning accessible to every person on every device, and we’re building PlaidML to help make that a reality. We’re starting by supporting the most popular hardware and software already in the hands of developers, researchers, and students. The initial version of PlaidML runs on most existing PC hardware with OpenCL-capable GPUs from NVIDIA, AMD, or Intel. Additionally, we’re including support for running the widely popular Keras framework on top of Plaid to allow existing code and tutorials to run unchanged.

Baidu are bringing HPC Techniques to Deep Learning

The Ring all-reduce approach came to save a lot of work when training deep neural networks. The approach to propagate and update the gradients (and control the convergence of the model) are well explained below:

The Ring Allreduce

The main issue with the simplistic communication strategy described above was that the communication cost grew linearly with the number of GPUs in the system. In contrast, a ring allreduce is an algorithm for which the communication cost is constant and independent of the number of GPUs in the system, and is determined solely by the slowest connection between GPUs in the system; in fact, if you only consider bandwidth as a factor in your communication cost (and ignore latency), the ring allreduce is an optimal communication algorithm [4]. (This is a good estimate for communication cost when your model is large, and you need to send large amounts of data a small number of times.)

The GPUs in a ring allreduce are arranged in a logical ring. Each GPU should have a left neighbor and a right neighbor; it will only ever send data to its right neighbor, and receive data from its left neighbor.

The algorithm proceeds in two steps: first, a scatter-reduce, and then, an allgather. In the scatter-reduce step, the GPUs will exchange data such that every GPU ends up with a chunk of the final result. In the allgather step, the GPUs will exchange those chunks such that all GPUs end up with the complete final result.

More can be found here. To implement there’s a Github project called Tensor All-reduce that can be used for distributed deep learning.

Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow

Yet, this is another tool for Deep Learning, but I think that those guys hit the nail exposing and fixing one of the major concerns about Tensor Flow that is distributed training.

When Uber needed to use Deep Learning they found some endeavors to use the conventional Data Parallelism architecture.  Using Data Parallelism arch they can distribute the training using several instances in parallel and when the gradients for every batch are calculated in each instance (node/worker) these gradients are propagated for all nodes and averaged to control the convergence (update) of the model in the training phase. The following image explains better than words.

But using this architecture Uber faced two problems that were a) the right ratio of worker to parameter servers (to avoid/deal with network and processing bottleneck) and b) the complexity of TensorFlow code (more details here).

To avoid these problems they used an idea of a 2009 paper  “Bandwidth Optimal All-reduce Algorithms for Clusters of Workstations” called Ring all-reduce.

They explain the workflow of this approach:

In the ring-allreduce algorithm, each of N nodes communicates with two of its peers 2*(N-1) times. During this communication, a node sends and receives chunks of the data buffer. In the first N-1 iterations, received values are added to the values in the node’s buffer. In the second N-1 iterations, received values replace the values held in the node’s buffer. Baidu’s paper suggests that this algorithm is bandwidth-optimal, meaning that if the buffer is large enough, it will optimally utilize the available network.

The implementations details can be found here.

Tensorflow sucks (?)

This post of Nico’s blog has good points about why Pytorch even without all Google support and money is taking out users from Tensor Flow.

[…]The most interesting question to me is why Google chose a purely declarative paradigm for Tensorflow in spite of the obvious downsides of this approach. Did they feel that encapsulating all the computation in a single computation graph would simplify executing models on their TPU’s so they can cut Nvidia out of the millions of dollars to be made from cloud hosting of deep learning powered applications? It’s difficult to say. Overall, Tensorflow does not feel like a pure open source project for the common good. Which I would have no problem with, had their design been sound. In comparison with beautiful Google open source projects out there such as Protobuf, Golang, and Kubernetes, Tensorflow falls dramatically short.

While declarative paradigms are great for UI programming, there are many reasons why it is a problematic choice for deep learning.

Take the React Javascript library as an example, the standard choice today for interactive web applications. In React, the complexity of how data flows through the application makes sense to be hidden from the developer, since Javascript execution is generally orders of magnitudes faster than updates to the DOM. React developers don’t want to worry about the mechanics of how state is propagated, so long as the end user experience is “good enough”.

On the other hand, in deep learning, a single layer can literally execute billions of FLOP’s! And deep learning researchers care very much about the mechanics of how computation is done and want fine control because they are constantly pushing the edge of what’s possible (e.g. dynamic networks) and want easy access to intermediate results.[…]

Lack of transparency is the bottleneck in academia

One of my biggest mistakes was to make my whole master’s degrees dissertation using private data (provided by my former employer) using closed tools (e.g. Viscovery Mine).

This was for me a huge blocker to share my research with every single person in the community, and get a second opinion about my work in regard of reproducibility. I working to open my data and making a new version, or book, about this kind of analysis using Non Performing Loans data.

Here in Denny’s blog, he talks about how engineering is the bottleneck in Deep Learning Research, where he made the following statements:

I will use the Deep Learning community as an example, because that’s what I’m familiar with, but this probably applies to other communities as well. As a community of researchers we all share a common goal: Move the field forward. Push the state of the art. There are various ways to do this, but the most common one is to publish research papers. The vast majority of published papers are incremental, and I don’t mean this in a degrading fashion. I believe that research is incremental by definition, which is just another way of saying that new work builds upon what other’s have done in the past. And that’s how it should be. To make this concrete, the majority of the papers I come across consist of more than 90% existing work, which includes datasets, preprocessing techniques, evaluation metrics, baseline model architectures, and so on. The authors then typically add a bit novelty and show improvement over well-established baselines.

So far nothing is wrong with this. The problem is not the process itself, but how it is implemented. There are two issues that stand out to me, both of which can be solved with “just engineering.” 1. Waste of research time and 2. Lack of rigor and reproducibility. Let’s look at each of them.

And the final musing:

Personally, I do not trust paper results at all. I tend to read papers for inspiration – I look at the ideas, not at the results. This isn’t how it should be. What if all researchers published code? Wouldn’t that solve the problem? Actually, no. Putting your 10,000 lines of undocumented code on Github and saying “here, run this command to reproduce my number” is not the same as producing code that people will read, understand, verify, and build upon. It’s like Shinichi Mochizuki’s proof of the ABC Conjecture, producing something that nobody except you understands.

Personally, I think this approach of discarding the results and focus on the novelty of methods is better than to try to understand any result aspect that the researcher wants to cover up through academic BS complexity.

Densely Connected Convolutional Networks – implementations

Abstract: Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections – one between each layer and its subsequent layer – our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance.

Prevendo eventos extremos no Uber com LSTM

Em breve teremos alguns posts aqui no blog sobre o assunto, mas é um case de ML com engenharia de caras questão mandando bem com métodos bem avançados com arquiteturas escaláveis.

ENGINEERING EXTREME EVENT FORECASTING AT UBER WITH RECURRENT NEURAL NETWORKS J – BY NIKOLAY LAPTEV, SLAWEK SMYL, & SANTHOSH SHANMUGAM

We ultimately settled on conducting time series modeling based on the Long Short Term Memory (LSTM) architecture, a technique that features end-to-end modeling, ease of incorporating external variables, and automatic feature extraction abilities.4 By providing a large amount of data across numerous dimensions, an LSTM approach can model complex nonlinear feature interactions.

We decided to build a neural network architecture that provides single-model, heterogeneous forecasting through an automatic feature extraction module.6 As Figure 4 demonstrates, the model first primes the network by automatic, ensemble-based feature extraction. After feature vectors are extracted, they are averaged using a standard ensemble technique. The final vector is then concatenated with the input to produce the final forecast.

During testing, we were able to achieve a 14.09 percent symmetric mean absolute percentage error (SMAPE) improvement over the base LSTM architecture and over 25 percent improvement over the classical time series model used in Argos, Uber’s real-time monitoring and root cause-exploration tool.

Pagou por múltiplas GPUs na Azure e não consegue usar com Deep Learning? Este post é pra você.

Você foi lá no portal da Azure pegou uma NC24R que tem maravilhosos 224 Gb de memória, 24 núcleos, mais de 1Tb de disco, e o melhor: 4 placas M80 para o seu deleite completo no treinamento com Deep Learning.

Tudo perfeito, certo? Quase.

Logo de início tentei usar um script para um treinamento e com um simples htop para monitorar o treinamento, vi que o Tensor Flow estava despejando todo o treinamento nos processadores.

Mesmo com esses 24 processadores maravilhosos batendo 100% de processamento, isso não chega nem perto do que as nossas GPUs mastodônticas podem produzir. (Nota: Você não trocaria 4 Ferraris 2017 por 24 Fiat 147 modelo 1985, certo?)

Acessando a nossa maravilhosa máquina para ver o que tinha acontecido, verifiquei primeiro se as GPUs estavam na máquina, o que de fato aconteceu.

azure_teste@deep-learning:~$nvidia-smi  Tue Jun 27 18:21:05 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.66 Driver Version: 375.66 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | B2A3:00:00.0 Off | 0 | | N/A 47C P0 71W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 Off | C4D8:00:00.0 Off | 0 | | N/A 57C P0 61W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K80 Off | D908:00:00.0 Off | 0 | | N/A 52C P0 56W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K80 Off | EEAF:00:00.0 Off | 0 | | N/A 42C P0 69W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+  azure_teste@deep-learning:~/deep-learning-moderator-msft$ lspci | grep -i NVIDIA

b2a3:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
c4d8:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
d908:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
eeaf:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)


OK, as Teslas K80 estão prontas para o combate, contudo a própria Azure reconhece que há problemas no processo como um todo, e pra fazer isso basta alguns procedimentos bem simples em dois passos segundo a documentação.

PASSO 1

1) Clone o repositório do Git (Sim, eu dei fork pois rotineiramente isso costuma sumir por motivos que nem sempre sabemos).

$git clone https://github.com/leestott/Azure-GPU-Setup.git  2) Entre na pasta que foi clonada $ cd azure-gpu-setup


3) Execute o script que irá instalar algumas libs da NVIDIA e posteriormente fará o servidor reiniciar.

$bash gpu-setup-part1.sh  PASSO 2 1) Vá na pasta do repositório do git $ cd azure-gpu-setup


2) Execute o segundo script que fará a instalação do Tensorflow, do Toolkit do CUDA, e do CUDNN além de fazer o set de uma porção de variáveis de ambiente.

$bash gpu-setup-part2.sh  3) Depois faça o teste da instalação $ python gpu-test.py


Depois disso é só aproveitar as suas GPUs em carga total e aproveitar para treinar as suas GPUs.

Página do professor Jürgen Schmidhuber

Ele atesta em uma de suas palestras, que as Redes Deep Learning são datadas de 1971 quando o Aleksei Ivakhnenko e Valentin Lapa publicaram o primeiro trabalho com uma rede de 8 camadas.

E aqui para baixar no post a sua revisão sistemática em relação à Deep Learning.

cse597g-deep_learning

Cleverhans – Lib python para prevenção de ataques de ruídos nos modelos

É um tema bem recente o ataque em sensores e por consequência em modelos de Machine Learning (isso será abordado em algum momento do futuro por aqui, mas esse artigo mostra bem o potencial danoso disso).

O Cleverhans é uma lib em Python que insere de maneira artificial um pouco de ruído/distúrbio na rede como forma de treino para esse tipo de ataque.

This repository contains the source code for cleverhans, a Python library to benchmark machine learning systems’ vulnerability to adversarial examples. You can learn more about such vulnerabilities on the accompanying blog.

The cleverhans library is under continual development, always welcoming contributions of the latest attacks and defenses. In particular, we always welcome help towards resolving the issues currently open.