Em épocas de Deep Learning, é sempre bom ver um paper com as boas e velhas Máquinas de Vetor de Suporte (Support Vector Machines). Em breve teremos um post sobre essa técnica aqui no blog.
Abstract: Hyper-parameter tuning for support vector machines has been widely studied in the past decade. A variety of metaheuristics, such as Genetic Algorithms and Particle Swarm Optimization have been considered to accomplish this task. Notably, exhaustive strategies such as Grid Search or Random Search continue to be implemented for hyper-parameter tuning and have recently shown results comparable to sophisticated metaheuristics. The main reason for the success of exhaustive techniques is due to the fact that only two or three parameters need to be adjusted when working with support vector machines. In this chapter, we analyze two Estimation Distribution Algorithms, the Univariate Marginal Distribution Algorithm and the Boltzmann Univariate Marginal Distribution Algorithm, to verify if these algorithms preserve the effectiveness of Random Search and at the same time make more efficient the process of finding the optimal hyper-parameters without increasing the complexity of Random Search.
Um ótimo paper de como o hardware vai exercer função crucial em alguns anos em relação à Core Machine Learning, em especial em sistemas embarcados.
Hardware for Machine Learning: Challenges and Opportunities
Abstract—Machine learning plays a critical role in extracting meaningful information out of the zetabytes of sensor data collected every day. For some applications, the goal is to analyze and understand the data to identify trends (e.g., surveillance, portable/wearable electronics); in other applications, the goal is to take immediate action based the data (e.g., robotics/drones, self-driving cars, smart Internet of Things). For many of these applications, local embedded processing near the sensor is preferred over the cloud due to privacy or latency concerns, or limitations in the communication bandwidth. However, at the sensor there are often stringent constraints on energy consumption and cost in addition to throughput and accuracy requirements. Furthermore, flexibility is often required such that the processing can be adapted for different applications or environments (e.g., update the weights and model in the classifier). In many applications, machine learning often involves transforming the input data into a higher dimensional space, which, along with programmable weights, increases data movement and consequently energy consumption. In this paper, we will discuss how these challenges can be addressed at various levels of hardware design ranging from architecture, hardware-friendly algorithms, mixed-signal circuits, and advanced technologies (including memories and sensors).
Conclusions: Machine learning is an important area of research with many promising applications and opportunities for innovation at various levels of hardware design. During the design process, it is important to balance the accuracy, energy, throughput and cost requirements. Since data movement dominates energy consumption, the primary focus of recent research has been to reduce the data movement while maintaining performance accuracy, throughput and cost. This means selecting architectures with favorable memory hierarchies like a spatial array, and developing dataflows that increase data reuse at the low-cost levels of the memory hierarchy. With joint design of algorithm and hardware, reduced bitwidth precision, increased sparsity and compression are used to minimize the data movement requirements. With mixed-signal circuit design and advanced technologies, computation is moved closer to the source by embedding computation near or within the sensor and the memories. One should also consider the interactions between these different levels. For instance, reducing the bitwidth through hardware-friendly algorithm design enables reduced precision processing with mixed-signal circuits and non-volatile memory. Reducing the cost of memory access with advanced technologies could result in more energy-efficient dataflows.
Mais um estudo colocando alguns algoritmos de Machine Learning contra métodos tradicionais de scoring, e levando a melhor.
Abstract: The benefits of cardiac surgery are sometimes difficult to predict and the decision to operate on a given individual is complex. Machine Learning and Decision Curve Analysis (DCA) are recent methods developed to create and evaluate prediction models.
Methods and finding: We conducted a retrospective cohort study using a prospective collected database from December 2005 to December 2012, from a cardiac surgical center at University Hospital. The different models of prediction of mortality in-hospital after elective cardiac surgery, including EuroSCORE II, a logistic regression model and a machine learning model, were compared by ROC and DCA. Of the 6,520 patients having elective cardiac surgery with cardiopulmonary bypass, 6.3% died. Mean age was 63.4 years old (standard deviation 14.4), and mean EuroSCORE II was 3.7 (4.8) %. The area under ROC curve (IC95%) for the machine learning model (0.795 (0.755–0.834)) was significantly higher than EuroSCORE II or the logistic regression model (respectively, 0.737 (0.691–0.783) and 0.742 (0.698–0.785), p < 0.0001). Decision Curve Analysis showed that the machine learning model, in this monocentric study, has a greater benefit whatever the probability threshold.
Conclusions: According to ROC and DCA, machine learning model is more accurate in predicting mortality after elective cardiac surgery than EuroSCORE II. These results confirm the use of machine learning methods in the field of medical prediction.
Um tutorial para quem quiser aplicar no dia-a-dia.
O paper é seminal (ou seja precisa ser revisado com um pouco mais de cautela), mas representa um bom avanço na utilização das RNAs, tendo em vista que as Random Forests (Florestas Aleatórias) e as Support Vector Machines (Máquinas de Vetor de Suporte) estão apresentando resultados bem melhores, academicamente falando.
Abaixo o resumo do artigo:
Artificial neural networks are powerful pattern classifiers; however, they have been surpassed in accuracy by methods such as support vector machines and random forests that are also easier to use and faster to train. Backpropagation, which is used to train artificial neural networks, suffers from the herd effect problem which leads to long training times and limit classification accuracy. We use the disjunctive normal form and approximate the boolean conjunction operations with products to construct a novel network architecture. The proposed model can be trained by minimizing an error function and it allows an effective and intuitive initialization which solves the herd-effect problem associated with backpropagation. This leads to state-of-the art classification accuracy and fast training times. In addition, our model can be jointly optimized with convolutional features in an unified structure leading to state-of-the-art results on computer vision problems with fast convergence rates. A GPU implementation of LDNN with optional convolutional features is also available.
Neste paper de Rosillo, Giner, De la fuente e Pino é realizado um estudo experimental da aplicação de SVM para um sistema de trading. Em linhas gerais o sistema teve um comportamento satisfatório em períodos de retração do mercado. Vale a pena a leitura para quem quiser realizar adaptações em relação à metodologia aplicada.
Esse artigo apresenta um framework muito elaborado no qual Yang Liu passa pelos aspectos básicos da mineração de dados. O artigo conta com uma ótima bibliografia de apoio. De maneira geral o artigo coloca a mineração de dados como um meio de obter análises de portfólios através de métodos indutivos paramétricos e/ou não paramétricos. A diagramação é ótima na qual dá apoio significativo ao que está sendo explicado. Obrigatório para quem trabalha com scoring de crédito em geral.
Nesse post do Matt Bogard, há um link para um artigo bem interessante para quem deseja conhecer mais sobre essa técnica de mineração de dados, na qual há uma explicação bem estruturada e didática sobre essa técnica.
Um ótimo estudo do BioDataMining que poderia ser reproduzido aqui em terra brasilis. Uma crítica que eu vejo nesse trabalho foi que a seleção de atributos como diria o Daniel Larose foi um pouco black-box e particularmente a abordagem em Algoritmos Genéticos não deve ser tão performática em relação a SVM (o ponto dos autores é que os dados tinha uma dimensionalidade razoável).
Esse artigo de 2007 apresenta os 10 algoritmos de Mineração de Dados mais utilizados dentro dos mais diversos tipos de domínios. O processo de determinação desses algoritmos deram-se através de uma pesquisa da ACM KDD no qual diversos pesquisadores deram seus respectivos pareceres. Os algoritmos apresentados nessa pesquisa estão descritos de maneira bem sucinta e objetiva e vale a pena a leitura.