Regressão com instâncias corrompidas: Uma abordagem robusta e suas aplicações

Trabalho interessante.

Multivariate Regression with Grossly Corrupted Observations: A Robust Approach and its Applications – Xiaowei Zhang, Chi Xu, Yu Zhang, Tingshao Zhu, Li Cheng

Abstract: This paper studies the problem of multivariate linear regression where a portion of the observations is grossly corrupted or is missing, and the magnitudes and locations of such occurrences are unknown in priori. To deal with this problem, we propose a new approach by explicitly consider the error source as well as its sparseness nature. An interesting property of our approach lies in its ability of allowing individual regression output elements or tasks to possess their unique noise levels. Moreover, despite working with a non-smooth optimization problem, our approach still guarantees to converge to its optimal solution. Experiments on synthetic data demonstrate the competitiveness of our approach compared with existing multivariate regression models. In addition, empirically our approach has been validated with very promising results on two exemplar real-world applications: The first concerns the prediction of \textit{Big-Five} personality based on user behaviors at social network sites (SNSs), while the second is 3D human hand pose estimation from depth images. The implementation of our approach and comparison methods as well as the involved datasets are made publicly available in support of the open-source and reproducible research initiatives.

Conclusions: We consider a new approach dedicating to the multivariate regression problem where some output labels are either corrupted or missing. The gross error is explicitly addressed in our model, while it allows the adaptation of distinct regression elements or tasks according to their own noise levels. We further propose and analyze the convergence and runtime properties of the proposed proximal ADMM algorithm which is globally convergent and efficient. The model combined with the specifically designed solver enable our approach to tackle a diverse range of applications. This is practically demonstrated on two distinct applications, that is, to predict personalities based on behaviors at SNSs, as well as to estimation 3D hand pose from single depth images. Empirical experiments on synthetic and real datasets have showcased the applicability of our approach in the presence of label noises. For future work, we plan to integrate with more advanced deep learning techniques to better address more practical problems, including 3D hand pose estimation and beyond.

Regressão com instâncias corrompidas: Uma abordagem robusta e suas aplicações

Feature Screening in Large Scale Cluster Analysis

Mais trabalhos sobre clustering.

Feature Screening in Large Scale Cluster Analysis – Trambak Banerjee, Gourab Mukherjee, Peter Radchenko

Abstract: We propose a novel methodology for feature screening in clustering massive datasets, in which both the number of features and the number of observations can potentially be very large. Taking advantage of a fusion penalization based convex clustering criterion, we propose a very fast screening procedure that efficiently discards non-informative features by first computing a clustering score corresponding to the clustering tree constructed for each feature, and then thresholding the resulting values. We provide theoretical support for our approach by establishing uniform non-asymptotic bounds on the clustering scores of the “noise” features. These bounds imply perfect screening of non-informative features with high probability and are derived via careful analysis of the empirical processes corresponding to the clustering trees that are constructed for each of the features by the associated clustering procedure. Through extensive simulation experiments we compare the performance of our proposed method with other screening approaches, popularly used in cluster analysis, and obtain encouraging results. We demonstrate empirically that our method is applicable to cluster analysis of big datasets arising in single-cell gene expression studies.

Conclusions: We propose COSCI, a novel feature screening method for large scale cluster analysis problems that are characterized by both large sample sizes and high dimensionality of the observations. COSCI efficiently ranks the candidate features in a non-parametric fashion and, under mild regularity conditions, is robust to the distributional form of the true noise coordinates. We establish theoretical results supporting ideal feature screening properties of our proposed procedure and provide a data driven approach for selecting the screening threshold parameter. Extensive simulation experiments and real data studies demonstrate encouraging performance of our proposed approach. An interesting topic for future research is extending our marginal screening method by means of utilizing multivariate objective criteria, which are more potent in detecting multivariate cluster information among marginally unimodal features. Preliminary analysis of the corresponding `2 fusion penalty based criterion, which, unlike the `1 based approach used in this paper, is non-separable across dimensions, suggests that this criterion can provide a way to move beyond marginal screening.

Feature Screening in Large Scale Cluster Analysis

Deterministic quantum annealing expectation-maximization (DQAEM)

Apesar do nome bem complicado o paper fala de uma modificação do mecanismo do algoritmo de cluster Expectation-Maximization (EM) em que o mesmo tem o incremento de uma meta-heurísica similar ao Simulated Annealing (arrefecimento simulado) para eliminar duas deficiências do EM que é de depender muito dos dados de início (atribuições iniciais) e o fato de que as vezes há problemas de mínimos locais.

Relaxation of the EM Algorithm via Quantum Annealing for Gaussian Mixture Models

Abstract: We propose a modified expectation-maximization algorithm by introducing the concept of quantum annealing, which we call the deterministic quantum annealing expectation-maximization (DQAEM) algorithm. The expectation-maximization (EM) algorithm is an established algorithm to compute maximum likelihood estimates and applied to many practical applications. However, it is known that EM heavily depends on initial values and its estimates are sometimes trapped by local optima. To solve such a problem, quantum annealing (QA) was proposed as a novel optimization approach motivated by quantum mechanics. By employing QA, we then formulate DQAEM and present a theorem that supports its stability. Finally, we demonstrate numerical simulations to confirm its efficiency.

Conclusion: In this paper, we have proposed the deterministic quantum annealing expectation-maximization (DQAEM) algorithm for Gaussian mixture models (GMMs) to relax the problem of local optima of the expectation-maximization (EM) algorithm by introducing the mechanism of quantum fluctuations into EM. Although we have limited our attention to GMMs in this paper to simplify the discussion, the derivation presented in this paper can be straightforwardly applied to any models which have discrete latent variables. After formulating DQAEM, we have presented the theorem that guarantees its convergence. We then have given numerical simulations to show its efficiency compared to EM and DSAEM. It is expect that the combination of DQAEM and DSAEM gives better performance than DQAEM. Finally, one of our future works is a Bayesian extension of this work. In other words, we are going to propose a deterministic quantum annealing variational Bayes inference.

Deterministic quantum annealing expectation-maximization (DQAEM)

Modularização do Morfismo de Redes Neurais

Quem foi que disse que não podem ocorrer alterações morfológicas nas arquiteturas/topologias de Redes Neurais?

Modularized Morphing of Neural Networks – Tao Wei, Changhu Wang, Chang Wen Chen

Abstract: In this work we study the problem of network morphism, an effective learning scheme to morph a well-trained neural network to a new one with the network function completely preserved. Different from existing work where basic morphing types on the layer level were addressed, we target at the central problem of network morphism at a higher level, i.e., how a convolutional layer can be morphed into an arbitrary module of a neural network. To simplify the representation of a network, we abstract a module as a graph with blobs as vertices and convolutional layers as edges, based on which the morphing process is able to be formulated as a graph transformation problem. Two atomic morphing operations are introduced to compose the graphs, based on which modules are classified into two families, i.e., simple morphable modules and complex modules. We present practical morphing solutions for both of these two families, and prove that any reasonable module can be morphed from a single convolutional layer. Extensive experiments have been conducted based on the state-of-the-art ResNet on benchmark datasets, and the effectiveness of the proposed solution has been verified.

Conclusions: This paper presented a systematic study on the problem of network morphism at a higher level, and tried to answer the central question of such learning scheme, i.e., whether and how a convolutional layer can be morphed into an arbitrary module. To facilitate the study, we abstracted a modular network as a graph, and formulated the process of network morphism as a graph transformation process. Based on this formulation, both simple morphable modules and complex modules have been defined and corresponding morphing algorithms have been proposed. We have shown that a convolutional layer can be morphed into any module of a network. We have also carried out experiments to illustrate how to achieve a better performing model based on the state-of-the-art ResNet with minimal extra computational cost on benchmark datasets.

Modularização do Morfismo de Redes Neurais

Akid: Uma biblioteca de Redes Neurais para pesquisa e produção

Finalmente começaram a pensar em eliminar esse vale entre ciência/academia e indústria.

Akid: A Library for Neural Network Research and Production from a Dataism Approach – Shuai Li
Abstract: Neural networks are a revolutionary but immature technique that is fast evolving and heavily relies on data. To benefit from the newest development and newly available data, we want the gap between research and production as small as possibly. On the other hand, differing from traditional machine learning models, neural network is not just yet another statistic model, but a model for the natural processing engine — the brain. In this work, we describe a neural network library named {\texttt akid}. It provides higher level of abstraction for entities (abstracted as blocks) in nature upon the abstraction done on signals (abstracted as tensors) by Tensorflow, characterizing the dataism observation that all entities in nature processes input and emit out in some ways. It includes a full stack of software that provides abstraction to let researchers focus on research instead of implementation, while at the same time the developed program can also be put into production seamlessly in a distributed environment, and be production ready. At the top application stack, it provides out-of-box tools for neural network applications. Lower down, akid provides a programming paradigm that lets user easily build customized models. The distributed computing stack handles the concurrency and communication, thus letting models be trained or deployed to a single GPU, multiple GPUs, or a distributed environment without affecting how a model is specified in the programming paradigm stack. Lastly, the distributed deployment stack handles how the distributed computing is deployed, thus decoupling the research prototype environment with the actual production environment, and is able to dynamically allocate computing resources, so development (Devs) and operations (Ops) could be separated. 

Akid: Uma biblioteca de Redes Neurais para pesquisa e produção

Para quem quiser saber um pouco mais das evoluções em relação a aplicação de aprendizado por reforço  e Deep Learning em sistemas autônomos, esse paper é uma boa pedida.

Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks

Abstract: We propose an inverse reinforcement learning (IRL) approach using Deep QNetworks to extract the rewards in problems with large state spaces. We evaluate the performance of this approach in a simulation-based autonomous driving scenario. Our results resemble the intuitive relation between the reward function and readings of distance sensors mounted at different poses on the car. We also show that, after a few learning rounds, our simulated agent generates collision-free motions and performs human-like lane change behaviour.

Conclusions: In this paper we proposed using Deep Q-Networks as the refinement step in Inverse Reinforcement Learning approaches. This enabled us to extract the rewards in scenarios with large state spaces such as driving, given expert demonstrations. The aim of this work was to extend the general approach to IRL. Exploring more advanced methods like Maximum Entropy IRL and the support for nonlinear reward functions is currently under investigation.

Hardware para Machine Learning: Desafios e oportunidades

Um ótimo paper de como o hardware vai exercer função crucial em alguns anos em relação à Core Machine Learning, em especial em sistemas embarcados.

Hardware for Machine Learning: Challenges and Opportunities

Abstract—Machine learning plays a critical role in extracting meaningful information out of the zetabytes of sensor data collected every day. For some applications, the goal is to analyze and understand the data to identify trends (e.g., surveillance, portable/wearable electronics); in other applications, the goal is to take immediate action based the data (e.g., robotics/drones, self-driving cars, smart Internet of Things). For many of these applications, local embedded processing near the sensor is preferred over the cloud due to privacy or latency concerns, or limitations in the communication bandwidth. However, at the sensor there are often stringent constraints on energy consumption and cost in addition to throughput and accuracy requirements. Furthermore, flexibility is often required such that the processing can be adapted for different applications or environments (e.g., update the weights and model in the classifier). In many applications, machine learning often involves transforming the input data into a higher dimensional space, which, along with programmable weights, increases data movement and consequently energy consumption. In this paper, we will discuss how these challenges can be addressed at various levels of hardware design ranging from architecture, hardware-friendly algorithms, mixed-signal circuits, and advanced technologies (including memories and sensors).

Conclusions: Machine learning is an important area of research with many promising applications and opportunities for innovation at various levels of hardware design. During the design process, it is important to balance the accuracy, energy, throughput and cost requirements. Since data movement dominates energy consumption, the primary focus of recent research has been to reduce the data movement while maintaining performance accuracy, throughput and cost. This means selecting architectures with favorable memory hierarchies like a spatial array, and developing dataflows that increase data reuse at the low-cost levels of the memory hierarchy. With joint design of algorithm and hardware, reduced bitwidth precision, increased sparsity and compression are used to minimize the data movement requirements. With mixed-signal circuit design and advanced technologies, computation is moved closer to the source by embedding computation near or within the sensor and the memories. One should also consider the interactions between these different levels. For instance, reducing the bitwidth through hardware-friendly algorithm design enables reduced precision processing with mixed-signal circuits and non-volatile memory. Reducing the cost of memory access with advanced technologies could result in more energy-efficient dataflows.

Hardware para Machine Learning: Desafios e oportunidades

Uma abordagem híbrida de aprendizado supervisionado com Machine Learning para composição de melodias de forma algorítmica

A hybrid approach to supervised machine learning for algorithmic melody composition

Abstract: In this work we present an algorithm for composing monophonic melodies similar in style to those of a given, phrase annotated, sample of melodies. For implementation, a hybrid approach incorporating parametric Markov models of higher order and a contour concept of phrases is used. This work is based on the master thesis of Thayabaran Kathiresan (2015). An online listening test conducted shows that enhancing a pure Markov model with musically relevant context, like count and planed melody contour, improves the result significantly.

Conclusions: Even though Markov models alone are seen as no proper method for algorithmic composition, we successfully showed that when combined with further methods they can yield much better results in terms of being closer to human composed melodies. This can be seen when comparing our results with the ones of Kathiresan [Kat15], whose basic algorithm solely relies on Markov models. Apart from the previous works, our algorithm outperforms a random guessing baseline, meaning that humans are not able to clearly distinguish its compositions from humans anymore.

Uma abordagem híbrida de aprendizado supervisionado com Machine Learning para composição de melodias de forma algorítmica

Deep Learning para análise de séries temporais

Por mais que problemas de reconhecimento de imagens, ou mesmo de segmentação sonora estejam em alta em Deep Learning, 90% dos problemas do mundo quando falamos de dados, passam por dados estruturados, em especial séries temporais. Esse paper mostra uma metodologia pouco convencional (a transformação de séries temporais em uma ‘imagem’ para o uso de uma Rede Coevolucionária) mas que pode mostrar que o céu é o limite quando falamos de arranjos para solução de problemas de predição usando dados estruturados.

Deep Learning for Time-Series Analysis – John Cristian Borges Gamboa

Abstract: In many real-world application, e.g., speech recognition or sleep stage classification, data are captured over the course of time, constituting a Time-Series. Time-Series often contain temporal dependencies that cause two otherwise identical points of time to belong to different classes or predict different behavior. This characteristic generally increases the difficulty of analysing them. Existing techniques often depended on hand-crafted features that were expensive to create and required expert knowledge of the field. With the advent of Deep Learning new models of unsupervised learning of features for Time-series analysis and forecast have been developed. Such new developments are the topic of this paper: a review of the main Deep Learning techniques is presented, and some applications on Time-Series analysis are summaried. The results make it clear that Deep Learning has a lot to contribute to the field.

Conclusions: When applying Deep Learning, one seeks to stack several independent neural network layers that, working together, produce better results than the already existing shallow structures. In this paper, we have reviewed some of these modules, as well the recent work that has been done by using them, found in the literature. Additionally, we have discussed some of the main tasks normally performed when manipulating Time-Series data using deep neural network structures. Finally, a more specific focus was given on one work performing each one of these tasks. Employing Deep Learning to Time-Series analysis has yielded results in these cases that are better than the previously existing techniques, which is an evidence that this is a promising field for improvement.

deep-learning-for-time-series-analysis

Deep Learning para análise de séries temporais

Learning Pulse: Uma abordagem de Machine Learning para previsão de performance em regimes auto-regulados de aprendizado usando dados multimodais

Todo mundo sabe que educação é um assunto muito atual nos dias de hoje, e o principal: como usar os smartphones para que os mesmos saiam de vilões da atenção para uma ferramenta de monitoramento e acompanhamento do desempenho acadêmico?

Esse artigo trás uma resposta interessante sobre esse tema.

Learning Pulse: a machine learning approach for predicting performance in self-regulated learning using multimodal data

screen-shot-2017-01-14-at-11-35-48-am

Abstract: Learning Pulse explores whether using a machine learning approach on multimodal data such as heart rate, step count, weather condition and learning activity can be used to predict learning performance in self-regulated learning settings. An experiment was carried out lasting eight weeks involving PhD students as participants, each of them wearing a Fitbit HR wristband and having their application on their computer recorded during their learning and working activities throughout the day. A software infrastructure for collecting multimodal learning experiences was implemented. As part of this infrastructure a Data Processing Application was developed to pre-process, analyse and generate predictions to provide feedback to the users about their learning performance. Data from different sources were stored using the xAPI standard into a cloud-based Learning Record Store. The participants of the experiment were asked to rate their learning experience through an Activity Rating Tool indicating their perceived level of productivity, stress, challenge and abilities. These self-reported performance indicators were used as markers to train a Linear Mixed Effect Model to generate learner-specific predictions of the learning performance. We discuss the advantages and the limitations of the used approach, highlighting further development points.

screen-shot-2017-01-14-at-11-35-35-am

Conclusions: This paper described Learning Pulse, an exploratory study whose aim was to use predictive modelling to generate timely predictions about learners’ performance during self-regulated learning by collecting multimodal data about their body, activity and context. Although the prediction accuracy with the data sources and experimental setup chosen in Learning Pulse led to modest results, all the research questions have been answered positively and have lead towards new insights on the storing, modelling and processing multimodal data. We raise some of the unsolved challenges that can be considered a research agenda for future work in the field of Predictive Learning Analytics with “beyond-LMS” multimodal data. The ones identified are: 1) the number of self-reports vs unobtrusiveness; 2) the homogeneity of the learning task specifications; 3) the approach to model random effects; 4) alternative machine learning techniques. There is a clear trade-off between the frequency of selfreports and the seamlessness of the data collection. The number of self-reports cannot be increased without worsening the quality of the learning process observed. On the other side, having a high number of labels is essential to make supervised machine learning work correctly. In addition, a more robust way of modelling random effects must be found. The found solution to group them manually into categories is not scalable. Learning is inevitably made up by random effects, i.e. by voluntary and unpredictable actions taken by the learners. The sequence of such events is also important and must be taken into account with appropriate models. As an alternative to supervised learning techniques, also unsupervised methods can be investigated, as with those methods fine graining the data into small intervals does not generate problems with matching the corresponding labels also the amount of labels is no longer needed. Regarding the experimental setup, it would be best to have a set of coherent learning tasks that the participants of the experiment need to accomplish, contrarily to as it was done in Learning Pulse, where the participants had completely different tasks, topics and working rhythms. It would be also useful to have a baseline group of participants, which do not have access to the visualisations while another group does have access; that would allow to see the difference of performance, whether there is an actual increase. To conclude, Learning Pulse set the first steps towards a new and exciting research direction, the design and the development of predictive learning analytics systems exploiting multimodal data about the learners, their contexts and their activities with the aim to predict their current learning state and thus being able to generate timely feedback for learning support.

screen-shot-2017-01-14-at-11-24-11-am

learningpulse_lak17_preprint

Learning Pulse: Uma abordagem de Machine Learning para previsão de performance em regimes auto-regulados de aprendizado usando dados multimodais

Abordagem de Machine Learning para descoberta de regras para performance de guitarra jazz

Um estudo muito interessante de padrões de guitarra Jazz.

A Machine Learning Approach to Discover Rules for Expressive Performance Actions in Jazz Guitar Music

Expert musicians introduce expression in their performances by manipulating sound properties such as timing, energy, pitch, and timbre. Here, we present a data driven computational approach to induce expressive performance rule models for note duration, onset, energy, and ornamentation transformations in jazz guitar music. We extract high-level features from a set of 16 commercial audio recordings (and corresponding music scores) of jazz guitarist Grant Green in order to characterize the expression in the pieces. We apply machine learning techniques to the resulting features to learn expressive performance rule models. We (1) quantitatively evaluate the accuracy of the induced models, (2) analyse the relative importance of the considered musical features, (3) discuss some of the learnt expressive performance rules in the context of previous work, and (4) assess their generailty. The accuracies of the induced predictive models is significantly above base-line levels indicating that the audio performances and the musical features extracted contain sufficient information to automatically learn informative expressive performance patterns. Feature analysis shows that the most important musical features for predicting expressive transformations are note duration, pitch, metrical strength, phrase position, Narmour structure, and tempo and key of the piece. Similarities and differences between the induced expressive rules and the rules reported in the literature were found. Differences may be due to the fact that most previously studied performance data has consisted of classical music recordings. Finally, the rules’ performer specificity/generality is assessed by applying the induced rules to performances of the same pieces performed by two other professional jazz guitar players. Results show a consistency in the ornamentation patterns between Grant Green and the other two musicians, which may be interpreted as a good indicator for generality of the ornamentation rules.

Algumas das regras encontradas

3.1.2. Duration Rules

• D1: IF note is the final note of a phrase AND the note appears in the third position of an IP (Narmour) structure THEN shorten note
• D2: IF note duration is longer than a dotted half note AND tempo is Medium (90–160 BPM) THEN shorten note
• D3: IF note duration is less than an eighth note AND note is in a very strong metrical position THEN lengthen note.
3.1.3. Onset Deviation Rules

• T1: IF the note duration is short AND piece is up-tempo (≥ 180 BPM) THEN advance note
• T2: IF the duration of the previous note is nominal AND the note’s metrical strength is very strong THEN advance note
• T3: IF the duration of the previous note is short AND piece is up-tempo (≥ 180 BPM) THEN advance note
• T4: IF the tempo is medium (90–160 BPM) AND the note is played within a tonic chord AND the next note’s duration is not short nor long THEN delay note
3.1.4. Energy Deviation Rules

• E1: IF the interval with next note is ascending AND the note pitch not high (lower than B3) THEN play piano
• E2: IF the interval with next note is descending AND the note pitch is very high (higher than C5) THEN play forte
• E3: IF the note is an eight note AND note is the initial note of a phrase THEN play forte.

Conclusões do estudo

Concretely, the obtained accuracies (over the base-line) for the ornamentation, duration, onset, and energy models of 70%(67%), 56%(50%), 63%(54%), and 52%(43%), respectively. Both the features selected and model rules showed musical significance. Similarities and differences among the obtained rules and the ones reported in the literature were discussed. Pattern similarities between classical and jazz music expressive rules were identified, as well as expected dissimilarities expected by the inherent particular musical aspects of each tradition. The induced rules specificity/generality was assessed by applying them to performances of the same pieces performed by two other professional jazz guitar players. Results show a consistency in the ornamentation patterns between Grant Green and the other two musicians, which may be interpreted as a good indicator for generality of the ornamentation rules.

 

Abordagem de Machine Learning para descoberta de regras para performance de guitarra jazz

Hibridização de modelos de Machine Learning pessoais e impessoais para reconhecimento de atividades nos dispositivos móveis

Para quem ainda tem dúvidas que em breve termos modelos de Machine Learning em nossos dispositivos móveis para identificar diversos comportamentos como andar, estar movimento em um veículo automotor, ou mesmo em situações de buffer (i.e. filas, ou outras situações que estamos parados) esse paper mostra um ótimo caminho de implementação.

Hybridizing Personal and Impersonal Machine Learning Models for Activity Recognition on Mobile Devices

Abstract: Recognition of human activities, using smart phones and wearable devices, has attracted much attention recently. The machine learning (ML) approach to human activity recognition can broadly be classified into two categories: training an ML model on (i) an impersonal dataset or (ii) a personal dataset. Previous research shows that models learned from personal datasets can provide better activity recognition accuracy compared to models trained on impersonal datasets. In this paper, we develop a hybrid incremental (HI) method with logistic regression models. This method uses incremental learning of logistic regression to combine the advantages of the impersonal and personal approaches. We investigate two essential issues for this method, which are the selection of the learning rate schedule and the class imbalance problem. Our experiments show that the models learned using our HI method give better accuracy than the models learned from personal or impersonal data only. Besides, the techniques of adaptive learning rate and cost-sensitive learning generally give faster updates and more robust ML models in incremental learning. Our method also has potential bene- fits in the area of privacy preservation.

Conclusions: In this paper, we propose a novel hybrid incremental (HI) method for activity recognition. Traditionally, activity recognition models have been trained on either impersonal or personal datasets. Our HI method effectively combines the advantages of these two approaches. After learning a model on an impersonal dataset in servers, the mobile devices can apply incremental learning on the model using personal data. We focus on logistic regression due to its several benefits, including its small model size that saves bandwidth, good performance in activity recognition, and easy incremental update. We address two important problems that are likely to arise in practical implementations of this incremental learning task. The first problem is associated with user diversity, making it very difficult to tune the learning-rate for each user. The second issue is related to personal data being so imbalanced at times that it may spoil the impersonal model. To overcome those problems, we applied an adaptive learning rate and a cost-sensitive technique. Finally, experimental results are used to validate our solutions.

Hibridização de modelos de Machine Learning pessoais e impessoais para reconhecimento de atividades nos dispositivos móveis

The Predictron: End-To-End Learning and Planning

Via arXiv.

Por David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris

One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning. In this document we introduce the predictron architecture. The predictron consists of a fully abstract model, represented by a Markov reward process, that can be rolled forward multiple “imagined” planning steps. Each forward pass of the predictron accumulates internal rewards and values over multiple planning depths. The predictron is trained end-to-end so as to make these accumulated values accurately approximate the true value function. We applied the predictron to procedurally generated random mazes and a simulator for the game of pool. The predictron yielded significantly more accurate predictions than conventional deep neural network architectures.

Um review do resultado em relação à arquitetura:

The predictron is a single differentiable architecture that rolls forward an internal model to estimate values. This internal model may be given both the structure and the semantics of traditional reinforcement learning models. But unlike most approaches to model-based reinforcement learning, the model is fully abstract: it need not correspond to the real environment in any human understandable fashion, so long as its rolled-forward “plans” accurately predict outcomes in the true environment.
The predictron may be viewed as a novel network architecture that incorporates several separable ideas. First, the predictron outputs a value by accumulating rewards over a series of internal planning steps. Second, each forward pass of the predictron outputs values at multiple planning depths. Third, these values may be combined together, also within a single forward pass, to output an overall ensemble value. Finally, the different values output by the predictron may be encouraged to be self-consistent with each other, to provide an additional signal during learning. Our experiments demonstrate that these differences result in more accurate predictions of value, in reinforcement learning environments, than more conventional network architectures.
We have focused on value prediction tasks in uncontrolled environments. However, these ideas may transfer to the control setting, for example by using the predictron as a Q-network (Mnih et al., 2015). Even more intriguing is the possibility of learning an internal MDP with abstract internal actions, rather than the MRP model considered in this paper. We aim to explore these ideas in future work.

The Predictron: End-To-End Learning and Planning

Learning Planar Ising Models

Jason K. Johnson, Diane Oyen, Michael Chertkov, Praneeth Netrapalli; 17(215):1−26, 2016.

Inference and learning of graphical models are both well-studied problems in statistics and machine learning that have found many applications in science and engineering. However, exact inference is intractable in general graphical models, which suggests the problem of seeking the best approximation to a collection of random variables within some tractable family of graphical models. In this paper, we focus on the class of planar Ising models, for which exact inference is tractable using techniques of statistical physics. Based on these techniques and recent methods for planarity testing and planar embedding, we propose a greedy algorithm for learning the best planar Ising model to approximate an arbitrary collection of binary random variables (possibly from sample data). Given the set of all pairwise correlations among variables, we select a planar graph and optimal planar Ising model defined on this graph to best approximate that set of correlations. We demonstrate our method in simulations and for two applications: modeling senate voting records and identifying geo-chemical depth trends from Mars rover data.

Learning Planar Ising Models

Automatic time-series phenotyping using massive feature extraction

por Ben D Fulcher, Nick S Jones

Across a far-reaching diversity of scientific and industrial applications, a general key problem involves relating the structure of time-series data to a meaningful outcome, such as detecting anomalous events from sensor recordings, or diagnosing patients from physiological time-series measurements like heart rate or brain activity. Currently, researchers must devote considerable effort manually devising, or searching for, properties of their time series that are suitable for the particular analysis problem at hand. Addressing this non-systematic and time-consuming procedure, here we introduce a new tool, hctsa, that selects interpretable and useful properties of time series automatically, by comparing implementations over 7700 time-series features drawn from diverse scientific literatures. Using two exemplar biological applications, we show how hctsa allows researchers to leverage decades of time-series research to quantify and understand informative structure in their time-series data.

Automatic time-series phenotyping using massive feature extraction

Simple Black-Box Adversarial Perturbations for Deep Networks

por Nina Narodytska, Shiva Prasad Kasiviswanathan

Deep neural networks are powerful and popular learning models that achieve state-of-the-art pattern recognition performance on many computer vision, speech, and language processing tasks. However, these networks have also been shown susceptible to carefully crafted adversarial perturbations which force misclassification of the inputs. Adversarial examples enable adversaries to subvert the expected system behavior leading to undesired consequences and could pose a security risk when these systems are deployed in the real world.
In this work, we focus on deep convolutional neural networks and demonstrate that adversaries can easily craft adversarial examples even without any internal knowledge of the target network. Our attacks treat the network as an oracle (black-box) and only assume that the output of the network can be observed on the probed inputs. Our first attack is based on a simple idea of adding perturbation to a randomly selected single pixel or a small set of them. We then improve the effectiveness of this attack by carefully constructing a small set of pixels to perturb by using the idea of greedy local-search. Our proposed attacks also naturally extend to a stronger notion of misclassification. Our extensive experimental results illustrate that even these elementary attacks can reveal a deep neural network’s vulnerabilities. The simplicity and effectiveness of our proposed schemes mean that they could serve as a litmus test for designing robust networks.

Simple Black-Box Adversarial Perturbations for Deep Networks

Desenvolvimento e validação de um algoritmo de Deep Learning para detecção da retinopatia diabética em fotografias de fundo retinal

Em 2008 quando eu iniciei grande parte dos meus estudos e interesse em Data Mining (que é uma disciplina irmã do que chamamos hoje de Data Science) desde aquela época tinha uma convicção muito forte de que os dados seriam o motor do que estamos vivendo hoje que é a quarta revolução industrial; revolução esta que tem os dados como principal combustível seja nas mas diversas áreas do conhecimento humano.

Dito isso, a Google juntamente com pesquisadores dos times da University of Texas, 3EyePACS LLC, University of California, Aravind Medical Research Foundation, Shri Bhagwan Mahavir Vitreoretinal Services, Verily Life Sciences e da Harvard Medical School conseguiram um avanço da aplicação de Deep Learning que confirma essa tese acima.

A retinopatia diabética é uma doença causada pela evolução de um quadro de diabetes em que os vasos sanguíneos do fundo da retina sofrem algum tipo de dilatação ou rompimento e que e não tratada pode causar cegueira. É uma doença que tem mais de 150 mil casos no brasil e não tem cura. Contudo, se o diagnóstico for realizado de forma antecipada o tratamento pode ajudar na minimização da cegueira.

E a aplicação de Deep Learning por esses pesquisadores ocorreu para auxiliar os médicos no diagnóstico desta doença.

retinopatia

Colocando de forma bem resumida: esse time utilizou Deep Learning para detecção da retinopatia diabética usando uma base de fotografias de fundo retinal de treinamento e obteve 90.3% e 87.0% de sensibilidade (i.e. a capacidade do algoritmo saber corretamente quem está com a doença ou não dos casos em que a doença é presente) e 98.1% e 98.5% de especificidade (i.e. o grau de precisão do algoritmo para identificar corretamente as pessoas que não tem a doença em casos em que de fato a doença não está presente) na detecção da retinopatia diabética.

Essa alta taxa de especificidade de 98.1% (i.e. capacidade de saber corretamente quem não tem a doença dentro do grupo que realmente não tem, ou seja, saber identificar corretamente a classe negativa) direciona de forma muito assertiva os tratamentos no sentido em que há uma certeza maior nos resultados da classe negativa (e essa forma os esforços não seriam direcionados de fato para quem não tem a doença).

Uma ideia de estratégia de saúde preventiva ou mesmo de diagnósticos pontuais é que após o exame de imagem a fotografia de fundo de retina passaria pelo algoritmo de Deep Learning já treinado, e caso o resultado do teste fosse positivo , o médico já iniciaria o tratamento e ao mesmo tempo a rede hospitalar e de medicamentos seria notificada de mais um caso, e haveria automaticamente o provisionamento de medicação e alocação de recursos médicos para o acompanhamento/tratamento.

Isso sem dúvidas muda todo o eixo de como vemos a medicina hoje, dado que isso daria um ganho ainda maior de eficiência para a medicina preventiva e evitaria a ineficiente (e bem mais cara) medicina curativa.

Abaixo os principais resultados do estudo.

Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs

Varun Gulshan, PhD1; Lily Peng, MD, PhD1; Marc Coram, PhD1; et al Martin C. Stumpe, PhD1; Derek Wu, BS1; Arunachalam Narayanaswamy, PhD1; Subhashini Venugopalan, MS1,2; Kasumi Widner, MS1; Tom Madams, MEng1; Jorge Cuadros, OD, PhD3,4; Ramasamy Kim, OD, DNB5; Rajiv Raman, MS, DNB6; Philip C. Nelson, BS1; Jessica L. Mega, MD, MPH7,8; Dale R. Webster, PhD1

Key Points
Question How does the performance of an automated deep learning algorithm compare with manual grading by ophthalmologists for identifying diabetic retinopathy in retinal fundus photographs?

Finding In 2 validation sets of 9963 images and 1748 images, at the operating point selected for high specificity, the algorithm had 90.3% and 87.0% sensitivity and 98.1% and 98.5% specificity for detecting referable diabetic retinopathy, defined as moderate or worse diabetic retinopathy or referable macular edema by the majority decision of a panel of at least 7 US board-certified ophthalmologists. At the operating point selected for high sensitivity, the algorithm had 97.5% and 96.1% sensitivity and 93.4% and 93.9% specificity in the 2 validation sets.

Meaning Deep learning algorithms had high sensitivity and specificity for detecting diabetic retinopathy and macular edema in retinal fundus photographs.

Abstract
Importance Deep learning is a family of computational methods that allow an algorithm to program itself by learning from a large set of examples that demonstrate the desired behavior, removing the need to specify rules explicitly. Application of these methods to medical imaging requires further assessment and validation.

Objective To apply deep learning to create an algorithm for automated detection of diabetic retinopathy and diabetic macular edema in retinal fundus photographs.

Design and Setting A specific type of neural network optimized for image classification called a deep convolutional neural network was trained using a retrospective development data set of 128 175 retinal images, which were graded 3 to 7 times for diabetic retinopathy, diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists and ophthalmology senior residents between May and December 2015. The resultant algorithm was validated in January and February 2016 using 2 separate data sets, both graded by at least 7 US board-certified ophthalmologists with high intragrader consistency.

Exposure Deep learning–trained algorithm.

Main Outcomes and Measures The sensitivity and specificity of the algorithm for detecting referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy, referable diabetic macular edema, or both, were generated based on the reference standard of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2 operating points selected from the development set, one selected for high specificity and another for high sensitivity.

Results The EyePACS-1 data set consisted of 9963 images from 4997 patients (mean age, 54.4 years; 62.2% women; prevalence of RDR, 683/8878 fully gradable images [7.8%]); the Messidor-2 data set had 1748 images from 874 patients (mean age, 57.6 years; 42.6% women; prevalence of RDR, 254/1745 fully gradable images [14.6%]). For detecting RDR, the algorithm had an area under the receiver operating curve of 0.991 (95% CI, 0.988-0.993) for EyePACS-1 and 0.990 (95% CI, 0.986-0.995) for Messidor-2. Using the first operating cut point with high specificity, for EyePACS-1, the sensitivity was 90.3% (95% CI, 87.5%-92.7%) and the specificity was 98.1% (95% CI, 97.8%-98.5%). For Messidor-2, the sensitivity was 87.0% (95% CI, 81.1%-91.0%) and the specificity was 98.5% (95% CI, 97.7%-99.1%). Using a second operating point with high sensitivity in the development set, for EyePACS-1 the sensitivity was 97.5% and specificity was 93.4% and for Messidor-2 the sensitivity was 96.1% and specificity was 93.9%.

Conclusions and Relevance In this evaluation of retinal fundus photographs from adults with diabetes, an algorithm based on deep machine learning had high sensitivity and specificity for detecting referable diabetic retinopathy. Further research is necessary to determine the feasibility of applying this algorithm in the clinical setting and to determine whether use of the algorithm could lead to improved care and outcomes compared with current ophthalmologic assessment.

Desenvolvimento e validação de um algoritmo de Deep Learning para detecção da retinopatia diabética em fotografias de fundo retinal

Reproducible Research with R and RStudio – Livro sobre Pesquisa Reprodutível

Ainda sobre o assunto da reprodução de pesquisas, está em vias de ser lançado um livro sobre o assunto chamado Reproducible Research with R and RStudio escrito por Christopher Gandrud.

No enxerto do livro o autor disponibiliza 5 dicas práticas para criação/reprodução de pesquisas que são:

  1. Document everything!,
  2. Everything is a (text) file,
  3. All files should be human readable,
  4. Explicitly tie your files together,
  5. Have a plan to organize, store, and make your files available.

 

 

Reproducible Research with R and RStudio – Livro sobre Pesquisa Reprodutível

Replicação em Pesquisa Acadêmica em Mineração de Dados

Lendo este post do John Taylor sobre a replicação da pesquisa econômica publicada até em journals de alto impacto lembrei de uma prática bem comum em revistas acadêmicas da área de Engenharia de Produção e Mineração de Dados que é a irreprodutibilidade dos artigos publicados.

Essa irreprodutibilidade se dá na forma em que se conseguem os resultados, em especial, de técnicas como Clustering, Regras de Associação, e principalmente Redes Neurais.

Um trabalho acadêmico/técnico/experimental que não pode ser reproduzido é a priori 1) metodologicamente fraco, e 2) pessimamente revisado. Trabalhos com essas características tem tanto suporte para o conhecimento como a chamada evidência anedótica.

Depois de ler mais de 150 papers em 2012 (e rumo aos 300 em 2013) a estrutura não muda:

  • Introdução;
  • Revisão Bibliográfica;
  • Aplicação da Técnica;
  • Resultados; e
  • Discussão na qual fala que teve  ganho de 90% em redes neurais.

Há um check-list bem interessante para analisar um artigo acadêmico com um péssimo DOE, e mal fundamentado metologicamente:

Artigos de Clustering 

  • Qual foi o tamanho da amostra?;
  • Qual é o tamanho mínimo da amostra dentro da população estimada?
  • Foram realizados testes estatísticos sobre a população como teste-Z ou ANOVA?
  • Qual é o P-Valor?
  • Qual foi a técnica para a determinação da separação dos clusters?
  • Quais os parâmetros foram usados para a clusterização?
  • Porque foi escolhido o algoritmo Z?

Artigos de Regras de Associação

  • Qual foi o suporte mínimo?
  • Qual é o tamanho da amostra e o quanto ela é representativa estatisticamente de acordo com a população?
  • O quanto o SUPORTE representa a POPULAÇÃO dentro do seu estudo?
  • Como foi realizado o prunning as regras acionáveis?
  • A amostra é generalizável? Porque não foi realizado o experimento em TODA a população?

Redes Neurais

  • Qual é a arquitetura da rede?
  • Porque foi utilizada a função de ativação Tangente e não a Hiperbólica (ou vice-versa)?
  • A função de ativação é adequada para os dados que estão sendo estudados? Como foi feito o pré-processamento e a discretização dos dados?
  • Porque foi escolhida o número de camadas internas?
  • Tem taxa de aprendizado? Qual foi e porque foi determinada essa taxa?
  • Tem decaímento (Decay)? Porque?
  • E o momentum? Foi utilizado? Com quais parâmetros?
  • Qual estrutura de custos está vinculada nos resultados? Qual foi a quantidade de erros tipo I e II que foram realizados pela rede?
  • E o número de épocas? Como foi determinada e em qual momento a rede deixou de convergir? Você acha que é um erro mínimo global ou local? Como você explica isso no resultado do artigo

Pode parecer algo como o desconstrucionismo acadêmico fantasiado de exame crítico em um primeiro momento mas para quem vive em um meio no qual estudos mais do que fraudulentos são pintados como revolucionários é um recurso como um escudo contra besteiras (Bullshit Shield).

Em suma, com 50% das respostas das perguntas acima o risco de ser um paper ruim com resultados do tipo “caixa-preta” já caí para 10% e aí entra o verdadeiro trabalho de análise para a reprodução do artigo.

Abaixo um vídeo bem interessante sobre papers que nada mais passam de evidência anedótica.

Replicação em Pesquisa Acadêmica em Mineração de Dados