Abstract: We present a CUDA based implementation of a decision tree construction algorithm within the gradient boosting library XGBoost. The tree construction algorithm is executed entirely on the GPU and shows high performance with a variety of datasets and settings, including sparse input matrices. Individual boosting iterations are parallelized, combining two approaches. An interleaved approach is used for shallow trees, switching to a more conventional radix sort based approach for larger depths. We show speedups of between 3-6x using a Titan X compared to a 4 core i7 CPU, and 1.2x using a Titan X compared to 2x Xeon CPUs (24 cores). We show that it is possible to process the Higgs dataset (10 million instances, 28 features) entirely within GPU memory. The algorithm is made available as a plug-in within the XGBoost library and fully supports all XGBoost features including classification, regression and ranking tasks.
Uma (longa e) boa resposta está nesta tese de Didrik Nielsen.
Abstract: Tree boosting has empirically proven to be a highly effective approach to predictive modeling.
It has shown remarkable results for a vast array of problems.
For many years, MART has been the tree boosting method of choice.
More recently, a tree boosting method known as XGBoost has gained popularity by winning numerous machine learning competitions.
In this thesis, we will investigate how XGBoost differs from the more traditional MART.
We will show that XGBoost employs a boosting algorithm which we will term Newton boosting. This boosting algorithm will further be compared with the gradient boosting algorithm that MART employs.
Moreover, we will discuss the regularization techniques that these methods offer and the effect these have on the models.
In addition to this, we will attempt to answer the question of why XGBoost seems to win so many competitions.
To do this, we will provide some arguments for why tree boosting, and in particular XGBoost, seems to be such a highly effective and versatile approach to predictive modeling.
The core argument is that tree boosting can be seen to adaptively determine the local neighbourhoods of the model. Tree boosting can thus be seen to take the bias-variance tradeoff into consideration during model fitting. XGBoost further introduces some subtle improvements which allows it to deal with the bias-variance tradeoff even more carefully.
Conclusion: After determining the different boosting algorithms and regularization techniques these methods utilize and exploring the effects of these, we turned to providing arguments for why XGBoost seems to win “every” competition. To provide possible answers to this question, we first gave reasons for why tree boosting in general can be an effective approach. We provided two main arguments for this. First off, additive tree models can be seen to have rich representational abilities. Provided that enough trees of sufficient depth is combined, they are capable of closely approximating complex functional relationships, including high-order interactions. The most important argument provided for the versatility of tree boosting however, was that tree boosting methods are adaptive. Determining neighbourhoods adaptively allows tree boosting methods to use varying degrees of flexibility in different parts of the input space. They will consequently also automatically perform feature selection. This also makes tree boosting methods robust to the curse of dimensionality. Tree boosting can thus be seen actively take the bias-variance tradeoff into account when fitting models. They start out with a low variance, high bias model and gradually reduce bias by decreasing the size of neighbourhoods where it seems most necessary. Both MART and XGBoost have these properties in common. However, compared to MART, XGBoost uses a higher-order approximation at each iteration, and can thus be expected to learn “better” tree structures. Moreover, it provides clever penalization of individual trees. As discussed earlier, this can be seen to make the method even more adaptive. It will allow the method to adaptively determine the appropriate number of terminal nodes, which might vary among trees. It will further alter the learnt tree structures and leaf weights in order to reduce variance in estimation of the individual trees. Ultimately, this makes XGBoost a highly adaptive method which carefully takes the bias-variance tradeoff into account in nearly every aspect of the learning process.
Para quem já teve a oportunidade de trabalhar com previsão de variáveis categóricas em Machine Learning sabe que o XGBoost é um dos melhores pacotes do mercado, sendo largamente utilizado em inúmeras competições no Kaggle.
A grande diferença feita pelo Peter Ellis foi realizar algumas adaptações para incorporar algumas variáveis independentes através do parâmetro xreg ao modelo preditivo de séries temporais.
Para quem trabalha com análise de séries temporais, esse trabalho é muito importante até porque o forecastxgb tríade Média-Móvel/ARIMA (ARMA)/(S)ARIMA em que tanto estatísticos/Data Miners/Data Scientists ficam presos por comodidade ou falta de meios.
Um exemplo da utilização do pacote está abaixo:
# Install devtools to install packages that aren't in CRAN install.packages("devtools") # Installing package from github devtools::install_github("ellisp/forecastxgb-r-package/pkg") # Load the libary library(forecastxgb) # Time Series Example gas # Model model <- xgbts(gas) # Summary of the model summary(model) # Forecasting 12 periods fc <- forecast(model, h = 12) # Plot plot(fc)