Machine Learning Methods to Predict Diabetes Complications
Abstract: One of the areas where Artificial Intelligence is having more impact is machine learning, which develops algorithms able to learn patterns and decision rules from data. Machine learning algorithms have been embedded into data mining pipelines, which can combine them with classical statistical strategies, to extract knowledge from data. Within the EU-funded MOSAIC project, a data mining pipeline has been used to derive a set of predictive models of type 2 diabetes mellitus (T2DM) complications based on electronic health record data of nearly one thousand patients. Such pipeline comprises clinical center profiling, predictive model targeting, predictive model construction and model validation. After having dealt with missing data by means of random forest (RF) and having applied suitable strategies to handle class imbalance, we have used Logistic Regression with stepwise feature selection to predict the onset of retinopathy, neuropathy, or nephropathy, at different time scenarios, at 3, 5, and 7 years from the first visit at the Hospital Center for Diabetes (not from the diagnosis). Considered
variables are gender, age, time from diagnosis, body mass index (BMI), glycated hemoglobin (HbA1c), hypertension, and smoking habit. Final models, tailored in accordance with the complications, provided an accuracy up to 0.838. Different variables were selected for each complication and time scenario, leading to specialized models easy to translate to the clinical
Conclusions: This work shows how data mining and computational methods can be effectively adopted in clinical medicine to derive models that use patient-specific information to predict an outcome of interest. Predictive data mining methods may be applied to the construction of decision models for procedures such as prognosis, diagnosis and treatment planning, which—once evaluated and verified—may be embedded within clinical information systems. Developing predictive models for the onset of chronic microvascular complications in patients suffering from T2DM could contribute to evaluating the relation between exposure to individual factors and the risk of onset of a specific complication, to stratifying the patients’ population in a medical center with respect to this risk, and to developing tools for the support of clinical informed decisions in patients’ treatment.
Reliability Estimation of Individual Multi-target Regression Predictions
Abstract: To estimate the quality of the induced predictive model we generally use measures of averaged prediction accuracy, such as the relative mean squared error on test data. Such evaluation fails to provide local information about reliability of individual predictions, which can be important in risk-sensitive fields (medicine, finance, industry etc.). Related work presented several ways for computing individual prediction reliability estimates for single-target regression models, but has not considered their use with multi-target regression models that predict a vector of independent target variables. In this paper we adapt the existing single-target reliability estimates to multi-target models. In this way we try to design reliability estimates, which can estimate the prediction errors without knowing true prediction errors, for multi-target regression algorithms, as well. We approach this in two ways: by aggregating reliability estimates for individual target components, and by generalizing the existing reliability estimates to higher number of dimensions. The results revealed favorable performance of the reliability estimates that are based on bagging variance and local cross-validation approaches. The results are consistent with the related work in single-target reliability estimates and provide a support for multi-target decision making.
Conclusion: In the paper we proposed several approaches for estimating the reliabilities of individual multi-target regression predictions. The aggregated variants (AM, l2 and +) produce a single-valued estimate which is preferable for interpretation and comparison. The last variant (+) is a direct generalization of the singletarget estimators from the related work. Our evaluation showed that best results were achieved using the BAGV and the LCV reliability estimates regardless the estimate variant. This complies with the related work on the single-target predictions, where these two estimates also performed well. Although all of the proposed variants achieve comparable results, our proposed generalization of existing methods (+) is still the preferred variant due to its lower computational complexity (as estimates are only calculated once for all of the target attributes) and the solid theoretical background. In our further work we intend to additionally evaluate other reliability estimates in combination with several other regression models. We also plan to test the adaptation of the proposed methods to multi-target classification. Reliability estimation of individual predictions offers many advantages especially when making decisions in highly sensitive environment. Our work provides an effective support for model-independent multi-target regression.
Objective: To explore the value of machine learning methods for predicting multiple sclerosis disease course.
Methods: 1693 CLIMB study patients were classified as increased EDSS≥1.5 (worsening) or not (non-worsening) at up to five years after baseline visit. Support vector machines (SVM) were used to build the classifier, and compared to logistic regression (LR) using demographic, clinical and MRI data obtained at years one and two to predict EDSS at five years follow-up.
Results: Baseline data alone provided little predictive value. Clinical observation for one year improved overall SVM sensitivity to 62% and specificity to 65% in predicting worsening cases. The addition of one year MRI data improved sensitivity to 71% and specificity to 68%. Use of non-uniform misclassification costs in the SVM model, weighting towards increased sensitivity, improved predictions (up to 86%). Sensitivity, specificity, and overall accuracy improved minimally with additional follow-up data. Predictions improved within specific groups defined by baseline EDSS. LR performed more poorly than SVM in most cases. Race, family history of MS, and brain parenchymal fraction, ranked highly as predictors of the non-worsening group. Brain T2 lesion volume ranked highly as predictive of the worsening group.
apply the Logistics algorithm, BP neural network and the AdaBoost algorithm to
build the model (Logistic-BP-AdaBoost model) which can estimate credit score of
the applicant with their multidimensional personal data. Compared with other
the possibility of loan default of the applicant and provide a score for each applicant.
We apply this model to a websites and establish an online loan platform which
is expected to improve the efficiency and reduce costs of traditional lending
Agora o Matt Bogard do Econometric Sense dá a dica de como interpretar esse número:
From the basic probabilities above, we know that the probability of event Y is greater for males than females. The odds of event Y are also greater for males than females. These relationships are also reflected in the odds ratios. The odds of event Y for males is 3 times the odds of females. The odds of event Y for females are only .33 times the odds of males. In other words, the odds of event Y for males are greater and the odds of event Y for females is less.
This can also be seen from the formula for odds ratios. If the OR M vs F = odds(M)/odds(F), we can see that if the odds (M) > odds(F), the odds ratio will be greater than 1. Alternatively, for OR F vs M = odds(F)/odds(M), we can see that if the odds(F) < odds(M) then the ratio will be less than 1. If the odds for both groups are equal, the odds ratio will be 1 exactly.
RELATION TO LOGISTIC REGRESSION
Odds ratios can be obtained from logistic regression by exponentiating the coefficient or beta for a given explanatory variable. For categorical variables, the odds ratios are interpreted as above. For continuous variables, odds ratios are in terms of changes in odds as a result of a one-unit change in the variable.
Para quem ainda tem dúvidas que em breve termos modelos de Machine Learning em nossos dispositivos móveis para identificar diversos comportamentos como andar, estar movimento em um veículo automotor, ou mesmo em situações de buffer (i.e. filas, ou outras situações que estamos parados) esse paper mostra um ótimo caminho de implementação.
Hybridizing Personal and Impersonal Machine Learning Models for Activity Recognition on Mobile Devices
Abstract: Recognition of human activities, using smart phones and wearable devices, has attracted much attention recently. The machine learning (ML) approach to human activity recognition can broadly be classified into two categories: training an ML model on (i) an impersonal dataset or (ii) a personal dataset. Previous research shows that models learned from personal datasets can provide better activity recognition accuracy compared to models trained on impersonal datasets. In this paper, we develop a hybrid incremental (HI) method with logistic regression models. This method uses incremental learning of logistic regression to combine the advantages of the impersonal and personal approaches. We investigate two essential issues for this method, which are the selection of the learning rate schedule and the class imbalance problem. Our experiments show that the models learned using our HI method give better accuracy than the models learned from personal or impersonal data only. Besides, the techniques of adaptive learning rate and cost-sensitive learning generally give faster updates and more robust ML models in incremental learning. Our method also has potential bene- fits in the area of privacy preservation.
Conclusions: In this paper, we propose a novel hybrid incremental (HI) method for activity recognition. Traditionally, activity recognition models have been trained on either impersonal or personal datasets. Our HI method effectively combines the advantages of these two approaches. After learning a model on an impersonal dataset in servers, the mobile devices can apply incremental learning on the model using personal data. We focus on logistic regression due to its several benefits, including its small model size that saves bandwidth, good performance in activity recognition, and easy incremental update. We address two important problems that are likely to arise in practical implementations of this incremental learning task. The first problem is associated with user diversity, making it very difficult to tune the learning-rate for each user. The second issue is related to personal data being so imbalanced at times that it may spoil the impersonal model. To overcome those problems, we applied an adaptive learning rate and a cost-sensitive technique. Finally, experimental results are used to validate our solutions.