Driver behavior profiling: An investigation with different smartphone sensors and machine learning

Driver behavior profiling: An investigation with different smartphone sensors and machine learning

Abstract: Driver behavior impacts traffic safety, fuel/energy consumption and gas emissions. Driver behavior profiling tries to understand and positively impact driver behavior. Usually driver behavior profiling tasks involve automated collection of driving data and application of computer models to generate a classification that characterizes the driver aggressiveness profile. Different sensors and classification methods have been employed in this task, however, low-cost solutions and high performance are still research targets. This paper presents an investigation with different Android smartphone sensors, and classification algorithms in order to assess which sensor/method assembly enables classification with higher performance. The results show that specific combinations of sensors and intelligent methods allow classification performance improvement.
Results: We executed all combinations of the 4 MLAs and their configurations described on Table 1 over the 15 data sets described in Section 4.3 using 5 different nf values. We trained, tested, and assessed every evaluation assembly with 15 different random seeds. Finally, we calculated the mean AUC for these executions, grouped them by driving event type, and ranked the 5 best performing assemblies in the boxplot displayed in Fig 6. This figure shows the driving events on the left-hand side and the 5 best evaluation assemblies for each event on the right-hand side, with the best ones at the bottom. The assembly text identification in Fig 6 encodes, in this order: (i) the nf value; (ii) the sensor and its axis (if there is no axis indication, then all sensor axes are used); and (iii) the MLA and its configuration identifier.
Conclusions and future work: In this work we presented a quantitative evaluation of the performances of 4 MLAs (BN, MLP, RF, and SVM) with different configurations applied in the detection of 7 driving event types using data collected from 4 Android smartphone sensors (accelerometer, linear acceleration, magnetometer, and gyroscope). We collected 69 samples of these event types in a real-world experiment with 2 drivers. The start and end times of these events were recorded serve as the experiment ground-truth. We also compared the performances when applying different sliding time window sizes.
We performed 15 executions with different random seeds of 3865 evaluation assemblies of the form EA = {1:sensor, 2:sensor axis(es), 3:MLA, 4:MLA configuration, 5:number of frames in sliding window}. As a result, we found the top 5 performing assemblies for each driving event type. In the context of our experiment, these results show that (i) bigger window sizes perform better; (ii) the gyroscope and the accelerometer are the best sensors to detect our driving events; (iii) as general rule, using all sensor axes perform better than using a single one, except for aggressive left turns events; (iv) RF is by far the best performing MLA, followed by MLP; and (v) the performance of the top 35 combinations is both satisfactory and equivalent, varying from 0.980 to 0.999 mean AUC values.
As future work, we expect to collect a greater number of driving events samples using different vehicles, Android smartphone models, road conditions, weather, and temperature. We also expect to add more MLAs to our evaluation, including those based on fuzzy logic and DTW. Finally, we intend use the best evaluation assemblies observed in this work to develop an Android smartphone application which can detect driving events in real-time and calculate the driver behavior profile.
Anúncios
Driver behavior profiling: An investigation with different smartphone sensors and machine learning

Study of Engineered Features and Learning Features in Machine Learning – A Case Study in Document Classification

Study of Engineered Features and Learning Features in Machine Learning – A Case Study in Document Classification

Abstract:. Document classification is challenging due to handling of voluminous and highly non-linear data, generated exponentially in the era of digitization. Proper representation of documents increases efficiency and performance of classification, ultimate goal of retrieving information from large corpus. Deep neural network models learn features for document classification unlike the engineered feature based approaches where features are extracted or selected from the data. In the paper we investigate performance of different classifiers based on the features obtained using two approaches. We apply deep autoencoder for learning features while engineering features are extracted by exploiting semantic association within the terms of the documents. Experimentally it has been observed that learning feature based classification always perform better than the proposed engineering feature based classifiers.

Conclusion and Future Work: In the paper we emphasize the importance of feature representation for classification. The potential of deep learning in feature extraction process for efficient compression and representation of raw features is explored. By conducting multiple experiments we deduce that a DBN – Deep AE feature extractor and a DNNC outperforms most other techniques providing a trade-off between accuracy and execution time. In this paper we have dealt with the most significant feature extraction and classification techniques for text documents where each text document belongs to a single class label. With the explosion of digital information a large number of documents may belong to multiple class labels handling of which is a new challenge and scope of future work. Word2vec models [18] in association with Recurrent Neural Networks(RNN) [4,14] have recently started gaining popularity in feature representation domain. We would like to compare their performance with our deep learning method in future. Similar feature extraction techniques can also be applied to image data to generate compressed feature which can facilitate efficient classification. We would also like to explore such possibilities in our future work.

Study of Engineered Features and Learning Features in Machine Learning – A Case Study in Document Classification

Machine Learning Methods to Predict Diabetes Complications

Machine Learning Methods to Predict Diabetes Complications

Abstract: One of the areas where Artificial Intelligence is having more impact is machine learning, which develops algorithms able to learn patterns and decision rules from data. Machine learning algorithms have been embedded into data mining pipelines, which can combine them with classical statistical strategies, to extract knowledge from data. Within the EU-funded MOSAIC project, a data mining pipeline has been used to derive a set of predictive models of type 2 diabetes mellitus (T2DM) complications based on electronic health record data of nearly one thousand patients. Such pipeline comprises clinical center profiling, predictive model targeting, predictive model construction and model validation. After having dealt with missing data by means of random forest (RF) and having applied suitable strategies to handle class imbalance, we have used Logistic Regression with stepwise feature selection to predict the onset of retinopathy, neuropathy, or nephropathy, at different time scenarios, at 3, 5, and 7 years from the first visit at the Hospital Center for Diabetes (not from the diagnosis). Considered
variables are gender, age, time from diagnosis, body mass index (BMI), glycated hemoglobin (HbA1c), hypertension, and smoking habit. Final models, tailored in accordance with the complications, provided an accuracy up to 0.838. Different variables were selected for each complication and time scenario, leading to specialized models easy to translate to the clinical
practice.

Conclusions: This work shows how data mining and computational methods can be effectively adopted in clinical medicine to derive models that use patient-specific information to predict an outcome of interest. Predictive data mining methods may be applied to the construction of decision models for procedures such as prognosis, diagnosis and treatment planning, which—once evaluated and verified—may be embedded within clinical information systems. Developing predictive models for the onset of chronic microvascular complications in patients suffering from T2DM could contribute to evaluating the relation between exposure to individual factors and the risk of onset of a specific complication, to stratifying the patients’ population in a medical center with respect to this risk, and to developing tools for the support of clinical informed decisions in patients’ treatment.

Machine Learning Methods to Predict Diabetes Complications

Reliability Estimation of Individual Multi-target Regression Predictions

Reliability Estimation of Individual Multi-target Regression Predictions

Abstract: To estimate the quality of the induced predictive model we generally use measures of averaged prediction accuracy, such as the relative mean squared error on test data. Such evaluation fails to provide local information about reliability of individual predictions, which can be important in risk-sensitive fields (medicine, finance, industry etc.). Related work presented several ways for computing individual prediction reliability estimates for single-target regression models, but has not considered their use with multi-target regression models that predict a vector of independent target variables. In this paper we adapt the existing single-target reliability estimates to multi-target models. In this way we try to design reliability estimates, which can estimate the prediction errors without knowing true prediction errors, for multi-target regression algorithms, as well. We approach this in two ways: by aggregating reliability estimates for individual target components, and by generalizing the existing reliability estimates to higher number of dimensions. The results revealed favorable performance of the reliability estimates that are based on bagging variance and local cross-validation approaches. The results are consistent with the related work in single-target reliability estimates and provide a support for multi-target decision making.

Conclusion: In the paper we proposed several approaches for estimating the reliabilities of individual multi-target regression predictions. The aggregated variants (AM, l2 and +) produce a single-valued estimate which is preferable for interpretation and comparison. The last variant (+) is a direct generalization of the singletarget estimators from the related work. Our evaluation showed that best results were achieved using the BAGV and the LCV reliability estimates regardless the estimate variant. This complies with the related work on the single-target predictions, where these two estimates also performed well. Although all of the proposed variants achieve comparable results, our proposed generalization of existing methods (+) is still the preferred variant due to its lower computational complexity (as estimates are only calculated once for all of the target attributes) and the solid theoretical background. In our further work we intend to additionally evaluate other reliability estimates in combination with several other regression models. We also plan to test the adaptation of the proposed methods to multi-target classification. Reliability estimation of individual predictions offers many advantages especially when making decisions in highly sensitive environment. Our work provides an effective support for model-independent multi-target regression.

Reliability Estimation of Individual Multi-target Regression Predictions

The Impact of Random Models on Clustering Similarity

Abstract: Clustering is a central approach for unsupervised learning. After clustering is applied, the most fundamental analysis is to quantitatively compare clusterings. Such comparisons are crucial for the evaluation of clustering methods as well as other tasks such as consensus clustering. It is often argued that, in order to establish a baseline, clustering similarity should be assessed in the context of a random ensemble of clusterings. The prevailing assumption for the random clustering ensemble is the permutation model in which the number and sizes of clusters are fixed. However, this assumption does not necessarily hold in practice; for example, multiple runs of K-means clustering reurns clusterings with a fixed number of clusters, while the cluster size distribution varies greatly. Here, we derive corrected variants of two clustering similarity measures (the Rand index and Mutual Information) in the context of two random clustering ensembles in which the number and sizes of clusters vary. In addition, we study the impact of one-sided comparisons in the scenario with a reference clustering. The consequences of different random models are illustrated using synthetic examples, handwriting recognition, and gene expression data. We demonstrate that the choice of random model can have a drastic impact on the ranking of similar clustering pairs, and the evaluation of a clustering method with respect to a random baseline; thus, the choice of random clustering model should be carefully justified.
Discussion: Given the prevalence of clustering methods for analyzing data, clustering comparison is a fundamental problem that is pertinent to numerous areas of science. In particular, the correction of clustering similarity for chance serves to establish a baseline that facilitates comparisons between different clustering solutions. Expanding previous studies on the selection of an appropriate model for random clusterings (Meila, 2005; Vinh et al., 2009; Romano et al., 2016), our work provides an extensive summary of random models and clearly demonstrates the strong impact of the random model on the interpretation of clustering results.
Our results underpin the importance of selecting the appropriate random model for a
given context. To that end, we offer the following guidelines: 1. Consider what is fixed by the clustering method: do all clusterings have a user specified number of clusters (use Mnum), or is the cluster size sequence fixed (use Mperm)? 2. Is the comparison against a reference clustering (use a one-sided comparison), or are you comparing two derived clusterings (then use a two-sided comparison)? The specific comparisons studied here are not meant to establish the superiority of a particular clustering identification technique or a specific random clustering model, rather, they illustrate the importance of the choice of the random model. Crucially, conclusions based on corrected similarity measures can change depending on the random model for clusterings. Therefore, previous studies which did promote methods based on evidence from corrected similarity measures should be re-evaluated in the context of the appropriate random model for clusterings (Yeung et al., 2001; de Souto et al., 2008; Yeung and Ruzzo, 2001; Thalamuthu et al., 2006; McNicholas and Murphy, 2010).
The Impact of Random Models on Clustering Similarity

Learning Scalable Deep Kernels with Recurrent Structure

Abstract: Many applications in speech, robotics, finance, and biology deal with sequential data, where ordering matters and recurrent structures are common. However, this structure cannot be easily captured by standard kernel functions. To model such structure, we propose expressive closed-form kernel functions for Gaussian processes. The resulting model, GP-LSTM, fully encapsulates the inductive biases of long short-term memory (LSTM) recurrent networks, while retaining the non-parametric probabilistic advantages of Gaussian processes. We learn the properties of the proposed kernels by optimizing the Gaussian process marginal likelihood using a new provably convergent semi-stochastic gradient procedure, and exploit the structure of these kernels for scalable training and prediction. This approach provides a practical representation for Bayesian LSTMs. We demonstrate state-of-the-art performance on several benchmarks, and thoroughly investigate a consequential autonomous driving application, where the predictive uncertainties provided by GP- LSTM are uniquely valuable.
Discussion: We proposed a method for learning kernels with recurrent long short-term memory structure on sequences. Gaussian processes with such kernels, termed the GP-LSTM, have the structure and learning biases of LSTMs, while retaining a probabilistic Bayesian nonparametric representation. The GP-LSTM outperforms a range of alternatives on several sequence-toreals regression tasks. The GP-LSTM also works on data with low and high signal-to-noise ratios, and can be scaled to very large datasets, all with a straightforward, practical, and generally applicable model specification. Moreover, the semi-stochastic scheme proposed in our paper is provably convergent and efficient in practical settings, in conjunction with structure exploiting algebra. In short, the GP-LSTM provides a natural mechanism for Bayesian LSTMs, quantifying predictive uncertainty while harmonizing with the standard deep learning toolbox. Predictive uncertainty is of high value in robotics applications, such as autonomous driving, and could also be applied to other areas such as financial modeling and computational biology.
Learning Scalable Deep Kernels with Recurrent Structure

Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers

Abstract: There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting’s interpretation as the optimization of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less explanation in the literature. We introduce a novel perspective on AdaBoost and random forests that proposes that the two algorithms work for similar reasons. While both classifiers achieve similar predictive accuracy, random forests cannot be conceived as a direct optimization procedure. Rather, random forests is a self- averaging, interpolating algorithm which creates what we denote as a spiked-smooth classifier, and we view AdaBoost in the same light. We conjecture that both AdaBoost and random forests succeed because of this mechanism. We provide a number of examples to support this explanation. In the process, we question the conventional wisdom that suggests that boosting algorithms for classification require regularization or early stopping and should be limited to low complexity classes of learners, such as decision stumps. We conclude that boosting should be used like random forests: with large decision trees, without regularization or early stopping.
Concluding Remarks: AdaBoost is an undeniably successful algorithm and random forests is at least as good, if not better. But AdaBoost is as puzzling as it is successful; it broke the basic rules of statistics by iteratively fitting even noisy data sets until every training set data point was fit without error. Even more puzzling, to statisticians at least, it will continue to iterate an already perfectly fit algorithm which lowers generalization error. The statistical view of boosting understands AdaBoost to be a stage wise optimization of an exponential loss, which suggest (demands!) regularization of tree size and control on the number of iterations.
In contrast, a random forest is not an optimization; it appears to work best with large
trees and as many iterations as possible. It is widely believed that AdaBoost is effective
because it is an optimization, while random forests works—well because it works. Breiman conjectured that “it is my belief that in its later stages AdaBoost is emulating a random forest” (Breiman, 2001). This paper sheds some light on this conjecture by providing a novel intuition supported by examples to show how AdaBoost and random forest are successful for the same reason.
A random forests model is a weighted ensemble of interpolating classifiers by construction. Although it is much less evident, we have shown that AdaBoost is also a weighted ensemble of interpolating classifiers. Viewed in this way, AdaBoost is actually a “random” forest of forests. The trees in random forests and the forests in the AdaBoost each interpolate the data without error. As the number of iterations increase the averaging of decision surface because smooths but nevertheless still interpolates. This is accomplished by whittling down the decision boundary around error points. We hope to have cast doubt on the commonly held belief that the later iterations of AdaBoost only serve to overfit the data. Instead, we argue that these later iterations lead to an “averaging effect”, which causes AdaBoost to behave like a random forest.
A central part of our discussion also focused on the merits of interpolation of the training
data, when coupled with averaging. Again, we hope to dispel the commonly held belief that interpolation always leads to overfitting. We have argued instead that fitting the training data in extremely local neighborhoods actually serves to prevent overfitting in the presence of averaging. The local fits serve to prevent noise points from having undue influence over the fit in other areas. Random forests and AdaBoost both achieve this desirable level of local interpolation by fitting deep trees. It is our hope that our emphasis on the “self-averaging” and interpolating aspects of AdaBoost will lead to a broader discussion of this classifier’s success that extends beyond the more traditional emphasis on margins and exponential loss minimization.
Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers

Time Series Prediction with the Self-Organizing Map: A Review

Summary. We provide a comprehensive and updated survey on applications of
Kohonen’s self-organizing map (SOM) to time series prediction (TSP). The main
goal of the paper is to show that, despite being originally designed as an unsupervised
learning algorithm, the SOM is flexible enough to give rise to a number of
efficient supervised neural architectures devoted to TSP tasks. For each SOM-based
architecture to be presented, we report its algorithm implementation in detail. Similarities and differences of such SOM-based TSP models with respect to standard
linear and nonlinear TSP techniques are also highlighted. We conclude the paper
with indications of possible directions for further research on this field.
Conclusion: In this paper we reviewed several applications of Kohonen’s SOM-based models to time series prediction. Our main goal was to show that the SOM can perform efficiently in this task and can compete equally with well-known neural
architectures, such as MLP and RBF networks, which are more commonly
used. In this sense, the main advantages of SOM-based models over MLPor
RBF-based models are the inherent local modeling property, which favors
the interpretability of the results, and the facility in developing growing architectures, which alleviates the burden of specifying an adequate number of neurons (prototype vectors).
Time Series Prediction with the Self-Organizing Map: A Review

Applying deep learning to classify pornographic images and videos

Abstract. It is no secret that pornographic material is now a one-clickaway
from everyone, including children and minors. General social media
networks are striving to isolate adult images and videos from normal
ones. Intelligent image analysis methods can help to automatically
detect and isolate questionable images in media. Unfortunately, these
methods require vast experience to design the classifier including one or
more of the popular computer vision feature descriptors. We propose to
build a classifier based on one of the recently flourishing deep learning
techniques. Convolutional neural networks contain many layers for both
automatic features extraction and classification. The benefit is an easier
system to build (no need for hand-crafting features and classifiers). Additionally,
our experiments show that it is even more accurate than the
state of the art methods on the most recent benchmark dataset.
Conclusions: We proposed applying convolutional neural networks to automatically classify
pornographic images and videos. We showed that our proposed fully automated
solution outperformed the accuracy of hand-crafted feature descriptors solutions.
We are continuing our research to find an even better network architecture for
this problem. Nevertheless, all the successful applications so far rely on supervised
training methods. We expect a new wave of deep learning networks would
emerge by combining supervised and unsupervised methods where a network
can learn from its mistakes while in actual deployment. We believe further research
can also be directed toward allowing machines to consider the context
and overall rhetorical meaning of a video clip while relating them to the images
involved.
Applying deep learning to classify pornographic images and videos

Exploration of machine learning techniques in predicting multiple sclerosis disease course

Abstract
Objective: To explore the value of machine learning methods for predicting multiple sclerosis disease course.
Methods: 1693 CLIMB study patients were classified as increased EDSS≥1.5 (worsening) or not (non-worsening) at up to five years after baseline visit. Support vector machines (SVM) were used to build the classifier, and compared to logistic regression (LR) using demographic, clinical and MRI data obtained at years one and two to predict EDSS at five years follow-up.
Results: Baseline data alone provided little predictive value. Clinical observation for one year improved overall SVM sensitivity to 62% and specificity to 65% in predicting worsening cases. The addition of one year MRI data improved sensitivity to 71% and specificity to 68%. Use of non-uniform misclassification costs in the SVM model, weighting towards increased sensitivity, improved predictions (up to 86%). Sensitivity, specificity, and overall accuracy improved minimally with additional follow-up data. Predictions improved within specific groups defined by baseline EDSS. LR performed more poorly than SVM in most cases. Race, family history of MS, and brain parenchymal fraction, ranked highly as predictors of the non-worsening group. Brain T2 lesion volume ranked highly as predictive of the worsening group.
Exploration of machine learning techniques in predicting multiple sclerosis disease course

When do traumatic experiences alter risk-taking behavior? A machine learning analysis of reports from refugees

Abstract: Exposure to traumatic stressors and subsequent trauma-related mental changes may alter a person’s risk-taking behavior. It is unclear whether this relationship depends on the specific types of traumatic experiences. Moreover, the association has never been tested in displaced individuals with substantial levels of traumatic experiences. The present study assessed risk-taking behavior in 56 displaced individuals by means of the balloon analogue risk task (BART). Exposure to traumatic events, symptoms of posttraumatic stress disorder and depression were assessed by means of semi-structured interviews. Using a novel statistical approach (stochastic gradient boosting machines), we analyzed predictors of risk-taking behavior. Exposure to organized violence was associated with less risk-taking, as indicated by fewer adjusted pumps in the BART, as was the reported experience of physical abuse and neglect, emotional abuse, and peer violence in childhood. However, civil traumatic stressors, as well as other events during childhood were associated with lower risk taking. This suggests that the association between global risk-taking behavior and exposure to traumatic stress depends on the particular type of the stressors that have been experienced.
Results: All participants had experienced a minimum of one traumatic event, and the overall majority of 93 percent had been exposed to various forms and frequencies of organized violence. The mean exposure to types of torture and war events (vivo checklist) was 7.9 (SD = 6.5, median = 5); the mean exposure in the PSS-I event checklist was 3.3 (SD = 1.4, median = 3).
Childhood maltreatment measured by the KERF was generally high and had been experienced by 94% of participants, but the types presented a very heterogeneous pattern. Physical abuse was most common (85%; mean = 7.9, SD = 5.9, median = 6.6), followed by emotional abuse (65%; mean = 4.8, SD = 4.8, median = 3.3). Peer violence, emotional neglect, and physical neglect were experienced by half of the participants (54%, 52%, and 50%, respectively; mean peer violence = 3.8, SD = 3.9, median = 3.3; mean emotional neglect = 3.1, SD = 3.5, median = 3.3; mean physical neglect = 2.3, SD = 2.7, median = 1.7). The least frequent adverse experiences during childhood were witnessing an event (37%, mean = 2.5, SD = 3.7, median = 0) and sexual abuse (17%, mean = .4, SD = 1.4, median = 0).
Regarding PTSD diagnosis, 55% fulfilled criteria according to DSM-IV (PSS-I mean = 16.4, SD = 13.3, median = 18). The mean score in the PHQ-9 was 10.9 (SD = 7.8, median = 10), indicating a mild to intermediate severity of depression symptoms.
Risk behavior as measured by the BART had a large range between 3.1 and 77.5 adjusted pumps. The mean number of adjusted pumps was 33.0 (SD = 18.9, median = 28.2).
Conclusions: Altogether, the current study suggests that the experience of organized violence versus domestic violence differentially impacts subsequent performance on a laboratory test for risk-taking behavior such as the BART. Further research with larger sample sizes is needed in order to clarify the specific associations between types of exposure to traumatic events and risk-taking behavior.
When do traumatic experiences alter risk-taking behavior? A machine learning analysis of reports from refugees

Ensemble machine learning and forecasting can achieve 99% uptime for rural handpumps

Abstract: Broken water pumps continue to impede efforts to deliver clean and economically-viable water to the global poor. The literature has demonstrated that customers’ health benefits and willingness to pay for clean water are best realized when clean water infrastructure performs extremely well (>99% uptime). In this paper, we used sensor data from 42 Afridev-brand handpumps observed for 14 months in western Kenya to demonstrate how sensors and supervised ensemble machine learning could be used to increase total fleet uptime from a best-practices baseline of about 70% to >99%. We accomplish this increase in uptime by forecasting pump failures and identifying existing failures very quickly. Comparing the costs of operating the pump per functional year over a lifetime of 10 years, we estimate that implementing this algorithm would save 7% on the levelized cost of water relative to a sensor-less scheduled maintenance program. Combined with a rigorous system for dispatching maintenance personnel, implementing this algorithm in a real-world program could significantly improve health outcomes and customers’ willingness to pay for water services.
Discussion: From a program viewpoint, implementers are primarily interested in increasing reliable and cost-effective water services. Fig 4 illustrates the trade-off between fleet uptime and dispatch responsiveness as a function of the number of model-initiated dispatches per pump-year. The figure is faceted by dispatch delays ranging from 1 to 21 days. There are two important insights visible in this figure. First, on a per-dispatch perspective, there is very little difference between current, forecast, and combined models. The current failure model typically performs slightly better on a per-dispatch basis (as a result of its higher positive predictive value). However, the most important difference in fleet uptime results from the implementing agency’s dispatch delay, and, to a lesser extent, the implementing agency’s capacity to perform many dispatches in a pump-year. The goal of 99% fleet uptime could be achieved with our machine learning model using just 2 dispatches per pump-year paired with a 1-day dispatch delay, or 22 dispatches per pump-year with a 7-day dispatch delay.
The marginal cost of implementing sensors, machine learning, and preventative maintenance activity are spread over the total utility that the equipment (a handpump in this case), delivers to customers over its lifetime. For this reason, there would be an even greater per-dollar benefit from implementing a sensor and machine learning-enabled preventative maintenance program on larger commercial assets such as motorized borehole pumping stations. While the cost of sensors and algorithms would not be significantly changed, the total benefit delivered to customers per functional pump-year would be greatly increased because of the larger pumping capacity of these stations.
In conclusion, the highly non-linear relationship between pump performance and health & economic outcomes illustrates that pumps need to perform extremely well before their benefits to society can be realized. This non-linear relationship also suggests that there is more consumer surplus to be gained by improving the function of existing pumps rather than building ever more new pumps that function only marginally well. This study has demonstrated that a machine-learning-enabled preventative maintenance model has the potential to enable fleets of handpumps that function extremely well by driving total fleet uptime to >99%, thus providing a realistic path forward towards reliable and sustained clean water delivery.
Ensemble machine learning and forecasting can achieve 99% uptime for rural handpumps

Nonparametric Risk Bounds for Time-Series Forecasting

Abstract: We derive generalization error bounds for traditional time- series forecasting models. Our results hold for many standard forecasting tools including autoregressive models, moving average models, and, more generally, linear state-space models. These non-asymptotic bounds need only weak assumptions on the data-generating process, yet allow forecasters to select among competing models and to guarantee, with high probability, that their chosen model will perform well. We motivate our techniques with and apply them to standard economic and financial forecasting tools—a GARCH model for predicting equity volatility and a dynamic stochastic general equilibrium model (DSGE), the standard tool in macroeconomic forecasting. We demonstrate in particular how our techniques can aid forecasters and policy makers in choosing models which behave well under uncertainty and mis-specification.
Conclusion: This paper demonstrates how to control the generalization error of common time-series forecasting models, especially those used in economics and engineering—ARMA models, vector autoregressions (Bayesian or otherwise), linearized dynamic stochastic general equilibrium models, and linear state-space models. We derive upper bounds on the risk, which hold with high probability while requiring only weak assumptions on the data-generating process. These bounds are finite sample in nature, unlike standard model selection penalties such as AIC or BIC. Furthermore, they do not suffer the biases inherent in other risk estimation techniques such as the pseudo-cross validation approach often used in the economic forecasting literature.
While we have stated these results in terms of standard economic forecasting models, they have very wide applicability. Theorem 12 applies to any forecasting procedure with fixed memory length, linear or non-linear. Theorem 17 applies only to methods whose forecasts are linear in the observations, but a similar result for nonlinear methods would just need to ensure that the dependence of the forecast on the past decays in some suitable way. Rather than deriving bounds theoretically, one could attempt to estimate bounds on the risk. While cross-validation is tricky (Racine, 2000), nonparametric bootstrap procedures may do better. A fully nonparametric version is possible, using the circular bootstrap (reviewed in Lahiri, 1999). Bootstrapping lengthy out-of-sample sequences for testing fitted model predictions yields intuitively sensible estimates of Rn(f), but there is currently no theory about the coverage level. Also, while models like VARs can be fit quickly to simulated data, general state-space models, let alone DSGEs, require large amounts of computational power, which is an obstacle to any resampling method.
Nonparametric Risk Bounds for Time-Series Forecasting

Machine Learning Methods to Predict Diabetes Complications

Abstract: One of the areas where Artificial Intelligence is having more impact is machine learning, which develops algorithms able to learn patterns and decision rules from data. Machine learning algorithms have been embedded into data mining pipelines, which can combine them with classical statistical strategies, to extract knowledge from data. Within the EU-funded MOSAIC project, a data mining pipeline has been used to derive a set of predictive models of type 2 diabetes mellitus (T2DM) complications based on electronic health record data of nearly one thousand patients. Such pipeline comprises clinical center profiling, predictive model targeting, predictive model construction and model validation. After having dealt with missing data by means of random forest (RF) and having applied suitable strategies to handle class imbalance, we have used Logistic Regression with stepwise feature selection to predict the onset of retinopathy, neuropathy, or nephropathy, at different time scenarios, at 3, 5, and 7 years from the first visit at the Hospital Center for Diabetes (not from the diagnosis). Considered variables are gender, age, time from diagnosis, body mass index (BMI), glycated hemoglobin (HbA1c), hypertension, and smoking habit. Final models, tailored in accordance with the complications, provided an accuracy up to 0.838. Different variables were selected for each complication and time scenario, leading to specialized models easy to translate to the clinical practice.
Conclusions: This work shows how data mining and computational methods can be effectively adopted in clinical medicine to derive models that use patient-specific information to predict an outcome of interest. Predictive data mining methods may be applied to the construction of decision models for procedures such as prognosis, diagnosis and treatment planning, which—once evaluated and verified—may be embedded within clinical information systems. Developing predictive models for the onset of chronic microvascular complications in patients suffering from T2DM could contribute to evaluating the relation between exposure to individual factors and the risk of onset of a specific complication, to stratifying the patients’ population in a medical center with respect to this risk, and to developing tools for the support of clinical informed decisions in patients’ treatment.
Machine Learning Methods to Predict Diabetes Complications

Predicting Risk of Suicide Attempts Over Time Through Machine Learning

Abstract: Traditional approaches to the prediction of suicide attempts have limited the accuracy and scale of risk detection for these dangerous behaviors. We sought to overcome these limitations by applying machine learning to electronic health records within a large medical database. Participants were 5,167 adult patients with a claim code for self-injury (i.e., ICD-9, E95x); expert review of records determined that 3,250 patients made a suicide attempt (i.e., cases), and 1,917 patients engaged in self-injury that was nonsuicidal, accidental, or nonverifiable (i.e., controls). We developed machine learning algorithms that accurately predicted future suicide attempts (AUC = 0.84, precision = 0.79, recall = 0.95, Brier score = 0.14). Moreover, accuracy improved from 720 days to 7 days before the suicide attempt, and predictor importance shifted across time. These findings represent a step toward accurate and scalable risk detection and provide insight into how suicide attempt risk shifts over time.
Discussion: Accurate and scalable methods of suicide attempt risk detection are an important part of efforts to reduce these behaviors on a large scale. In an effort to contribute to the development of one such method, we applied ML to EHR data. Our major findings included the following: (a) this method produced more accurate prediction of suicide attempts than traditional methods (e.g., ML produced AUCs in the 0.80s, traditional regression in the 0.50s and 0.60s, which also demonstrated wider confidence intervals/greater variance than the ML approach), with notable lead time (up to 2 years) prior to attempts; (b) model performance steadily improved as the suicide attempt become more imminent; (c) model performance was similar for single and repeat attempters; and (d) predictor importance within algorithms shifted over time. Here, we discuss each of these findings in more detail. ML models performed with acceptable accuracy using structured EHR data mapped to known clinical terminologies like CMS-HCC and ATC, Level 5. Recent metaanalyses indicate that traditional suicide risk detection approaches produce near-chance accuracy (Franklin et al., 2017), and a traditional method—multiple logistic regression—produced similarly poor accuracy in the present study. ML to predict suicide attempts obtained greater discriminative accuracy than typically obtained with traditional approaches like logistic regression (i.e., AUC = 0.76; Kessler, Stein, et al., 2016). The present study extends this pioneering work with its use of a larger comparison group of self-injurers without suicidal intent, ability to display a temporally variant risk profile over time, scalability of this approach to any EHR data adhering to accepted clinical data standards, and performance in terms of discriminative accuracy (AUC = 0.84, 95% CI [0.83, 0.85]), precision recall, and calibration (see Table 1). This approach can be readily applied within large medical databases to provide constantly updating risk assessments for millions of patients based on an outcome derived from expert review. Although short-term risk and shifts in risk over time are often noted in clinical lore, risk guidelines, and suicide theories (e.g., O’Connor, 2011; Rudd et  al., 2006; Wenzel & Beck, 2008), few studies have directly investigated these issues. The present study examined risk at several intervals from 720 to 7 days and found that model performance improved as suicide attempts became more imminent. This finding was consistent with hypotheses; however, two aspects of the present study should be considered when interpreting this finding. First, this pattern was confounded by the fact that more data were available naturally over time; predictive modeling efforts at point of care should take advantage of this fact to improve model performance as additional data are collected. Second, due to the limitations of EHR data, we were unable to directly integrate information about potential precipitating events (e.g., job loss) or data not recorded in routine clinical care into the present models. Such information may have further improved short-term prediction of suicide attempts. Future studies should build on the present findings to further elucidate how risk changes as suicide attempts become more imminent.
Predicting Risk of Suicide Attempts Over Time Through Machine Learning

Reliability Estimation of Individual Multi-target Regression Predictions

Abstract. To estimate the quality of the induced predictive model we
generally use measures of averaged prediction accuracy, such as the relative
mean squared error on test data. Such evaluation fails to provide
local information about reliability of individual predictions, which can
be important in risk-sensitive fields (medicine, finance, industry etc.).
Related work presented several ways for computing individual prediction
reliability estimates for single-target regression models, but has not
considered their use with multi-target regression models that predict a
vector of independent target variables. In this paper we adapt the existing
single-target reliability estimates to multi-target models. In this way
we try to design reliability estimates, which can estimate the prediction
errors without knowing true prediction errors, for multi-target regression
algorithms, as well. We approach this in two ways: by aggregating reliability
estimates for individual target components, and by generalizing
the existing reliability estimates to higher number of dimensions. The
results revealed favorable performance of the reliability estimates that
are based on bagging variance and local cross-validation approaches. The
results are consistent with the related work in single-target reliability
estimates and provide a support for multi-target decision making.
Conclusion
In the paper we proposed several approaches for estimating the reliabilities of
individual multi-target regression predictions. The aggregated variants (AM, l
2
and +) produce a single-valued estimate which is preferable for interpretation
and comparison. The last variant (+) is a direct generalization of the singletarget
estimators from the related work.
Our evaluation showed that best results were achieved using the BAGV and
the LCV reliability estimates regardless the estimate variant. This complies with
the related work on the single-target predictions, where these two estimates also
performed well. Although all of the proposed variants achieve comparable results,
our proposed generalization of existing methods (+) is still the preferred variant
due to its lower computational complexity (as estimates are only calculated once
for all of the target attributes) and the solid theoretical background.
In our further work we intend to additionally evaluate other reliability estimates
in combination with several other regression models. We also plan to test
the adaptation of the proposed methods to multi-target classification.
Reliability estimation of individual predictions offers many advantages especially
when making decisions in highly sensitive environment. Our work provides
an effective support for model-independent multi-target regression.
Reliability Estimation of Individual Multi-target Regression Predictions

Machine Learning Principles Can Improve Hip Fracture Prediction

Abstract:  Apply machine learning principles to predict
hip fractures and estimate predictor importance in
Dual-energy X-ray absorptiometry (DXA)-scanned men
and women. Dual-energy X-ray absorptiometry data from
two Danish regions between 1996 and 2006 were combined
with national Danish patient data to comprise 4722
women and 717 men with 5 years of follow-up time (original
cohort n=6606 men and women). Twenty-four statistical
models were built on 75% of data points through k-5,
5-repeat cross-validation, and then validated on the remaining
25% of data points to calculate area under the curve
(AUC) and calibrate probability estimates. The best models
were retrained with restricted predictor subsets to estimate
the best subsets. For women, bootstrap aggregated flexible
discriminant analysis (“bagFDA”) performed best with
a test AUC of 0.92 [0.89; 0.94] and well-calibrated probabilities
following Naïve Bayes adjustments. A “bagFDA”
model limited to 11 predictors (among them bone mineral
densities (BMD), biochemical glucose measurements,
general practitioner and dentist use) achieved a test AUC
of 0.91 [0.88; 0.93]. For men, eXtreme Gradient Boosting
(“xgbTree”) performed best with a test AUC of 0.89 [0.82;
0.95], but with poor calibration in higher probabilities. A
ten predictor subset (BMD, biochemical cholesterol and
liver function tests, penicillin use and osteoarthritis diagnoses)
achieved a test AUC of 0.86 [0.78; 0.94] using an
xgbTree” model. Machine learning can improve hip fracture
prediction beyond logistic regression using ensemble
models. Compiling data from international cohorts of
longer follow-up and performing similar machine learning
procedures has the potential to further improve discrimination
and calibration.
Conclusion: We conclude that hip fracture risk can be modelled with
high discriminative performance for men (Test AUC of
0.89 [0.82; 0.95], sensitivity 100%, specificity 69% at the
Youden probability cut-off) and particularly for women
(Test AUC 0.91 [0.88; 0.94], sensitivity 88%, specificity
81% at the Youden probability cut-off) using advanced predictive
models. Ensemble models using bootstrap aggregation
and boosting performed best in both cohorts, and
probabilities can generally be calibrated well with a Naïve
Bayes approach, although poor for high probability estimates
in men. Models of 11 predictors for women and 9 for
men with combinations of DXA BMD measurements and
primary sector use achieved the highest numerical AUC
values. Further improvements in predictive capability are
likely possible with compilations of more data points and
longer observation periods. We strongly suggest the use of
machine learning principles to model hip fracture risk, and
we welcome an effort to compile existing datasets and perform
advanced predictive modelling.
Machine Learning Principles Can Improve Hip Fracture Prediction

Intelligent Movie Recommender System Using Machine Learning

purpose of suggesting items to view or purchase. The Intelligent movie recommender
system that is proposed combines the concept of Human-Computer
Interaction and Machine Learning. The proposed system is a subclass of
information filtering system that captures facial feature points as well as emotions
of a viewer and suggests them movies accordingly. It recommends movies
best suited for users as per their age and gender and also as per the genres they
prefer to watch. The recommended movie list is created by the cumulative effect
of ratings and reviews given by previous users. A neural network is trained to
detect genres of movies like horror, comedy based on the emotions of the user
watching the trailer. Thus, proposed system is intelligent as well as secure as a
user is verified by comparing his face at the time of login with one stored at the
time of registration. The system is implemented by a fully dynamic interface i.e.
a website that recommends movies to the user [22].
Conclusion and Future Work
Learning method for training data as well as sentiment analysis on reviews. The system
facilitates a web-based user interface i.e. a website that has a user database and has a
Learning model tailored to each user. This interface is dynamic and updates regularly.
Afterward, it tags a movie with genres to which they belong based on expressions of
users watching the trailer. The major problem arises with this technique is when the
viewer gives neutral face expressions while watching a movie. In this case the system is
unable to determine the genre of the movie accurately. The recommendations are
refined with the help of reviews and rating taken by the users who have watched that
movie.
A user is allowed to create a single account, and only he can log in from his account
as we verify face every time. The accuracy of the proposed recommendation system
can be improved by adding more analysis factor to user behavior. Location or mood of
the user, special occasions in the year like festivals can also be taken into consideration
to recommend movies. In further updates text summarization on reviews can be
implemented which summaries user comment into single line will comments. Review
Authenticity can be applied to the system to prevent fake and misguiding reviews. Only
genuine reviews would be considered for evaluation of movie rating. In future, the
system can be used with nearby cinema halls to book movie tickets online through our
website [22]. Our approach can be extended to various application domains to
recommend music, books, etc.
Intelligent Movie Recommender System Using Machine Learning

The Credit Scoring Model Based on Logistic-BP-AdaBoost Algorithm and its Application in P2P Credit Platform

apply the Logistics algorithm, BP neural network and the AdaBoost algorithm to
build the model (Logistic-BP-AdaBoost model) which can estimate credit score of
the applicant with their multidimensional personal data. Compared with other
the possibility of loan default of the applicant and provide a score for each applicant.
We apply this model to a websites and establish an online loan platform which
is expected to improve the efficiency and reduce costs of traditional lending
business.
Conclusion: Based on the data mining technology and learned other researchers’ achievements, we studied the methods of logistic regression, BP neural network and AdaBoost, and improve complex approval work and reduce prediction error for the traditional loan. In this paper we combine logistic regression with BP neural network and then we use AdaBoost to intensify the model. For the traditional loan approval problem, we fully consider the user registration information and user sources to more accurately predict user success rate for the loan. According to the user multidimensional messages, we can clearly know the users, furthermore, through analyzing the sources of users as well as the user fraud score, we can make accurate judgment to user. Finally L-B-A model was used to the P2P loan platform, and the practice proved that model had high practicability and can achieve the purpose of simplifying the loan approval process.
The Credit Scoring Model Based on Logistic-BP-AdaBoost Algorithm and its Application in P2P Credit Platform

Classification using deep learning neural networks for brain tumors

Abstract: Deep Learning is a new machine learning field that gained a lot of interest over the past few years. It was widely applied to several applications and proven to be a powerful machine learning tool for many of the complex problems. In this paper we used Deep Neural Network classifier which is one of the DL architectures for classifying a dataset of 66 brain MRIs into 4 classes e.g. normal, glioblastoma, sarcoma and metastatic bronchogenic carcinoma tumors. The classifier was combined with the discrete wavelet transform (DWT) the powerful feature extraction tool and principal components analysis (PCA) and the evaluation of the performance was quite good over all the performance measures.

Conclusion and future work: In this paper we proposed an efficient methodology which combines the discrete wavelet transform (DWT) with the Deep Neural Network (DNN) to classify the brain MRIs into Normal and 3 types of malignant brain tumors: glioblastoma, sarcoma and metastatic bronchogenic carcinoma. The new methodology architecture resemble the convolutional neural networks (CNN) architecture but requires less hardware specifications and takes a convenient time of processing for large size images (256 × 256). In addition using the DNN classifier shows high accuracy compared to traditional classifiers. The good results achieved using the DWT could be employed with the CNN in the future and compare the results.

Classification using deep learning neural networks for brain tumors