Interpretando a razão de chances

Agora o Matt Bogard do Econometric Sense dá a dica de como interpretar esse número:

From the basic probabilities above, we know that the probability of event Y is greater for males than females. The odds of event Y are also greater for males than females. These relationships are also reflected in the odds ratios. The odds of event Y for males is 3 times the odds of females. The odds of event Y for females are only .33 times the odds of males. In other words, the odds of event Y for males are greater and the odds of event Y for females is less.

This can also be seen from the formula for odds ratios. If the OR M vs F  = odds(M)/odds(F), we can see that if the odds (M) > odds(F), the odds ratio will be greater than 1. Alternatively, for OR  F vs M = odds(F)/odds(M), we can see that if the odds(F) < odds(M) then the ratio will be less than 1.  If the odds for both groups are equal, the odds ratio will be 1 exactly.


 Odds ratios can be obtained from logistic regression by exponentiating the coefficient or beta for a given explanatory variable.  For categorical variables, the odds ratios are interpreted as above. For continuous variables, odds ratios are in terms of changes in odds as a result of a one-unit change in the variable.

Interpretando a razão de chances

Regressão com instâncias corrompidas: Uma abordagem robusta e suas aplicações

Trabalho interessante.

Multivariate Regression with Grossly Corrupted Observations: A Robust Approach and its Applications – Xiaowei Zhang, Chi Xu, Yu Zhang, Tingshao Zhu, Li Cheng

Abstract: This paper studies the problem of multivariate linear regression where a portion of the observations is grossly corrupted or is missing, and the magnitudes and locations of such occurrences are unknown in priori. To deal with this problem, we propose a new approach by explicitly consider the error source as well as its sparseness nature. An interesting property of our approach lies in its ability of allowing individual regression output elements or tasks to possess their unique noise levels. Moreover, despite working with a non-smooth optimization problem, our approach still guarantees to converge to its optimal solution. Experiments on synthetic data demonstrate the competitiveness of our approach compared with existing multivariate regression models. In addition, empirically our approach has been validated with very promising results on two exemplar real-world applications: The first concerns the prediction of \textit{Big-Five} personality based on user behaviors at social network sites (SNSs), while the second is 3D human hand pose estimation from depth images. The implementation of our approach and comparison methods as well as the involved datasets are made publicly available in support of the open-source and reproducible research initiatives.

Conclusions: We consider a new approach dedicating to the multivariate regression problem where some output labels are either corrupted or missing. The gross error is explicitly addressed in our model, while it allows the adaptation of distinct regression elements or tasks according to their own noise levels. We further propose and analyze the convergence and runtime properties of the proposed proximal ADMM algorithm which is globally convergent and efficient. The model combined with the specifically designed solver enable our approach to tackle a diverse range of applications. This is practically demonstrated on two distinct applications, that is, to predict personalities based on behaviors at SNSs, as well as to estimation 3D hand pose from single depth images. Empirical experiments on synthetic and real datasets have showcased the applicability of our approach in the presence of label noises. For future work, we plan to integrate with more advanced deep learning techniques to better address more practical problems, including 3D hand pose estimation and beyond.

Regressão com instâncias corrompidas: Uma abordagem robusta e suas aplicações

Comparação entre um modelo de Machine Learning e EuroSCOREII na previsão de mortalidade após cirurgia cardíaca eletiva

Mais um estudo colocando  alguns algoritmos de Machine Learning contra métodos tradicionais de scoring, e levando a melhor.

A Comparison of a Machine Learning Model with EuroSCORE II in Predicting Mortality after Elective Cardiac Surgery: A Decision Curve Analysis

Abstract: The benefits of cardiac surgery are sometimes difficult to predict and the decision to operate on a given individual is complex. Machine Learning and Decision Curve Analysis (DCA) are recent methods developed to create and evaluate prediction models.

Methods and finding: We conducted a retrospective cohort study using a prospective collected database from December 2005 to December 2012, from a cardiac surgical center at University Hospital. The different models of prediction of mortality in-hospital after elective cardiac surgery, including EuroSCORE II, a logistic regression model and a machine learning model, were compared by ROC and DCA. Of the 6,520 patients having elective cardiac surgery with cardiopulmonary bypass, 6.3% died. Mean age was 63.4 years old (standard deviation 14.4), and mean EuroSCORE II was 3.7 (4.8) %. The area under ROC curve (IC95%) for the machine learning model (0.795 (0.755–0.834)) was significantly higher than EuroSCORE II or the logistic regression model (respectively, 0.737 (0.691–0.783) and 0.742 (0.698–0.785), p < 0.0001). Decision Curve Analysis showed that the machine learning model, in this monocentric study, has a greater benefit whatever the probability threshold.

Conclusions: According to ROC and DCA, machine learning model is more accurate in predicting mortality after elective cardiac surgery than EuroSCORE II. These results confirm the use of machine learning methods in the field of medical prediction.

Comparação entre um modelo de Machine Learning e EuroSCOREII na previsão de mortalidade após cirurgia cardíaca eletiva

Hibridização de modelos de Machine Learning pessoais e impessoais para reconhecimento de atividades nos dispositivos móveis

Para quem ainda tem dúvidas que em breve termos modelos de Machine Learning em nossos dispositivos móveis para identificar diversos comportamentos como andar, estar movimento em um veículo automotor, ou mesmo em situações de buffer (i.e. filas, ou outras situações que estamos parados) esse paper mostra um ótimo caminho de implementação.

Hybridizing Personal and Impersonal Machine Learning Models for Activity Recognition on Mobile Devices

Abstract: Recognition of human activities, using smart phones and wearable devices, has attracted much attention recently. The machine learning (ML) approach to human activity recognition can broadly be classified into two categories: training an ML model on (i) an impersonal dataset or (ii) a personal dataset. Previous research shows that models learned from personal datasets can provide better activity recognition accuracy compared to models trained on impersonal datasets. In this paper, we develop a hybrid incremental (HI) method with logistic regression models. This method uses incremental learning of logistic regression to combine the advantages of the impersonal and personal approaches. We investigate two essential issues for this method, which are the selection of the learning rate schedule and the class imbalance problem. Our experiments show that the models learned using our HI method give better accuracy than the models learned from personal or impersonal data only. Besides, the techniques of adaptive learning rate and cost-sensitive learning generally give faster updates and more robust ML models in incremental learning. Our method also has potential bene- fits in the area of privacy preservation.

Conclusions: In this paper, we propose a novel hybrid incremental (HI) method for activity recognition. Traditionally, activity recognition models have been trained on either impersonal or personal datasets. Our HI method effectively combines the advantages of these two approaches. After learning a model on an impersonal dataset in servers, the mobile devices can apply incremental learning on the model using personal data. We focus on logistic regression due to its several benefits, including its small model size that saves bandwidth, good performance in activity recognition, and easy incremental update. We address two important problems that are likely to arise in practical implementations of this incremental learning task. The first problem is associated with user diversity, making it very difficult to tune the learning-rate for each user. The second issue is related to personal data being so imbalanced at times that it may spoil the impersonal model. To overcome those problems, we applied an adaptive learning rate and a cost-sensitive technique. Finally, experimental results are used to validate our solutions.

Hibridização de modelos de Machine Learning pessoais e impessoais para reconhecimento de atividades nos dispositivos móveis