Smoothing Time Series Data

From R Bloggers simple and direct to the point.

Kernel smoothers

An alternative approach to specifying a neighborhood is to decrease weights further away from the target value. In the figure below, we see that the continuous Gaussian kernel gives a smoother trend than a moving average or running-line smoother.

Smoothing Time Series Data

Time Series Prediction with the Self-Organizing Map: A Review

Summary. We provide a comprehensive and updated survey on applications of
Kohonen’s self-organizing map (SOM) to time series prediction (TSP). The main
goal of the paper is to show that, despite being originally designed as an unsupervised
learning algorithm, the SOM is flexible enough to give rise to a number of
efficient supervised neural architectures devoted to TSP tasks. For each SOM-based
architecture to be presented, we report its algorithm implementation in detail. Similarities and differences of such SOM-based TSP models with respect to standard
linear and nonlinear TSP techniques are also highlighted. We conclude the paper
with indications of possible directions for further research on this field.
Conclusion: In this paper we reviewed several applications of Kohonen’s SOM-based models to time series prediction. Our main goal was to show that the SOM can perform efficiently in this task and can compete equally with well-known neural
architectures, such as MLP and RBF networks, which are more commonly
used. In this sense, the main advantages of SOM-based models over MLPor
RBF-based models are the inherent local modeling property, which favors
the interpretability of the results, and the facility in developing growing architectures, which alleviates the burden of specifying an adequate number of neurons (prototype vectors).
Time Series Prediction with the Self-Organizing Map: A Review

Nonparametric Risk Bounds for Time-Series Forecasting

Abstract: We derive generalization error bounds for traditional time- series forecasting models. Our results hold for many standard forecasting tools including autoregressive models, moving average models, and, more generally, linear state-space models. These non-asymptotic bounds need only weak assumptions on the data-generating process, yet allow forecasters to select among competing models and to guarantee, with high probability, that their chosen model will perform well. We motivate our techniques with and apply them to standard economic and financial forecasting tools—a GARCH model for predicting equity volatility and a dynamic stochastic general equilibrium model (DSGE), the standard tool in macroeconomic forecasting. We demonstrate in particular how our techniques can aid forecasters and policy makers in choosing models which behave well under uncertainty and mis-specification.
Conclusion: This paper demonstrates how to control the generalization error of common time-series forecasting models, especially those used in economics and engineering—ARMA models, vector autoregressions (Bayesian or otherwise), linearized dynamic stochastic general equilibrium models, and linear state-space models. We derive upper bounds on the risk, which hold with high probability while requiring only weak assumptions on the data-generating process. These bounds are finite sample in nature, unlike standard model selection penalties such as AIC or BIC. Furthermore, they do not suffer the biases inherent in other risk estimation techniques such as the pseudo-cross validation approach often used in the economic forecasting literature.
While we have stated these results in terms of standard economic forecasting models, they have very wide applicability. Theorem 12 applies to any forecasting procedure with fixed memory length, linear or non-linear. Theorem 17 applies only to methods whose forecasts are linear in the observations, but a similar result for nonlinear methods would just need to ensure that the dependence of the forecast on the past decays in some suitable way. Rather than deriving bounds theoretically, one could attempt to estimate bounds on the risk. While cross-validation is tricky (Racine, 2000), nonparametric bootstrap procedures may do better. A fully nonparametric version is possible, using the circular bootstrap (reviewed in Lahiri, 1999). Bootstrapping lengthy out-of-sample sequences for testing fitted model predictions yields intuitively sensible estimates of Rn(f), but there is currently no theory about the coverage level. Also, while models like VARs can be fit quickly to simulated data, general state-space models, let alone DSGEs, require large amounts of computational power, which is an obstacle to any resampling method.
Nonparametric Risk Bounds for Time-Series Forecasting

Anomaly Detection in Multivariate Non-stationary Time Series for Automatic DBMS Diagnosis

Anomaly Detection in Multivariate Non-stationary Time Series for Automatic DBMS Diagnosis

ABSTRACT— Anomaly detection in database management systems (DBMSs) is difficult because of increasing number of statistics (stat) and event metrics in big data system. In this paper, I propose an automatic DBMS diagnosis system that detects anomaly periods with abnormal DB stat metrics and finds causal events in the periods. Reconstruction error from deep autoencoder and statistical process control approach are applied to detect time period with anomalies. Related events are found using time series similarity measures between events and abnormal stat metrics. After training deep autoencoder with DBMS metric data, efficacy of anomaly detection is investigated from other DBMSs containing anomalies. Experiment results show effectiveness of proposed model, especially, batch temporal normalization layer. Proposed model is used for publishing automatic DBMS diagnosis reports in order to determine DBMS configuration and SQL tuning.

CONCLUSION AND FUTURE WORK I proposed a machine learning model for automatic DBMS diagnosis. The proposed model detects anomaly periods from reconstruct error with deep autoencoder. I also verified empirically that temporal normalization is essential when input data is non-stationary multivariate time series. With SPC approach, time period is considered anomaly period when reconstruction error is outside of control limit. According types or users of DBMSs, decision rules that are used in SPC can be added. For example, warning line with 2 sigma can be utilized to decide whether it is anomaly or not [12, 13]. In this paper, anomaly detection test is proceeded in other DBMSs whose data is not used in training, because performance of basic pre-trained model is important in service providers’ perspective. Efficacy of detection performance is validated with blind test and DBAs’ opinions. The result of automatic anomaly diagnosis would help DB consultants save time for anomaly periods and main wait events. Thus, they can concentrate on only making solution when DB disorders occur. For better performance of anomaly detection, additional training can be proceeded after pre-trained model is adopted. In addition, recurrent and convolutional neural network can be used in reconstruction part to capture hidden representation of sequential and local relationship. If anomaly labeled data is generated, detection result can be analyzed with numerical performance measures. However, in practice, it is hard to secure labeled anomaly dataset according to each DBMS. Proposed model is meaningful in unsupervised anomaly detection model that doesn’t need labeled data and can be generalized to other DBMSs with pre-trained model

Anomaly Detection in Multivariate Non-stationary Time Series for Automatic DBMS Diagnosis

Anomaly Detection in Multivariate Non-stationary Time Series for Automatic DBMS Diagnosis

Prevendo eventos extremos no Uber com LSTM

Em breve teremos alguns posts aqui no blog sobre o assunto, mas é um case de ML com engenharia de caras questão mandando bem com métodos bem avançados com arquiteturas escaláveis.


We ultimately settled on conducting time series modeling based on the Long Short Term Memory (LSTM) architecture, a technique that features end-to-end modeling, ease of incorporating external variables, and automatic feature extraction abilities.4 By providing a large amount of data across numerous dimensions, an LSTM approach can model complex nonlinear feature interactions.

We decided to build a neural network architecture that provides single-model, heterogeneous forecasting through an automatic feature extraction module.6 As Figure 4 demonstrates, the model first primes the network by automatic, ensemble-based feature extraction. After feature vectors are extracted, they are averaged using a standard ensemble technique. The final vector is then concatenated with the input to produce the final forecast.

During testing, we were able to achieve a 14.09 percent symmetric mean absolute percentage error (SMAPE) improvement over the base LSTM architecture and over 25 percent improvement over the classical time series model used in Argos, Uber’s real-time monitoring and root cause-exploration tool.

Prevendo eventos extremos no Uber com LSTM

Generalized Additive Models em Séries Temporais

Aqui no AlgoBeans provavelmente você verá a melhor explicação sobre modelos aditivos generalizados (Generalized Additive Models) da internet. De forma simples e didática, o post explica tudo sobre essa técnica.

Therefore, google search trends for persimmons could well be modeled by adding a seasonal trend to an increasing growth trend, in what’s called a generalized additive model (GAM).

The principle behind GAMs is similar to that of regression, except that instead of summing effects of individual predictors, GAMs are a sum of smooth functions. Functions allow us to model more complex patterns, and they can be averaged to obtain smoothed curves that are more generalizable.

Because GAMs are based on functions rather than variables, they are not restricted by the linearity assumption in regression that requires predictor and outcome variables to move in a straight line. Furthermore, unlike in neural networks, we can isolate and study effects of individual functions in a GAM on resulting predictions.

Generalized Additive Models em Séries Temporais

Deep Learning para análise de séries temporais

Por mais que problemas de reconhecimento de imagens, ou mesmo de segmentação sonora estejam em alta em Deep Learning, 90% dos problemas do mundo quando falamos de dados, passam por dados estruturados, em especial séries temporais. Esse paper mostra uma metodologia pouco convencional (a transformação de séries temporais em uma ‘imagem’ para o uso de uma Rede Coevolucionária) mas que pode mostrar que o céu é o limite quando falamos de arranjos para solução de problemas de predição usando dados estruturados.

Deep Learning for Time-Series Analysis – John Cristian Borges Gamboa

Abstract: In many real-world application, e.g., speech recognition or sleep stage classification, data are captured over the course of time, constituting a Time-Series. Time-Series often contain temporal dependencies that cause two otherwise identical points of time to belong to different classes or predict different behavior. This characteristic generally increases the difficulty of analysing them. Existing techniques often depended on hand-crafted features that were expensive to create and required expert knowledge of the field. With the advent of Deep Learning new models of unsupervised learning of features for Time-series analysis and forecast have been developed. Such new developments are the topic of this paper: a review of the main Deep Learning techniques is presented, and some applications on Time-Series analysis are summaried. The results make it clear that Deep Learning has a lot to contribute to the field.

Conclusions: When applying Deep Learning, one seeks to stack several independent neural network layers that, working together, produce better results than the already existing shallow structures. In this paper, we have reviewed some of these modules, as well the recent work that has been done by using them, found in the literature. Additionally, we have discussed some of the main tasks normally performed when manipulating Time-Series data using deep neural network structures. Finally, a more specific focus was given on one work performing each one of these tasks. Employing Deep Learning to Time-Series analysis has yielded results in these cases that are better than the previously existing techniques, which is an evidence that this is a promising field for improvement.


Deep Learning para análise de séries temporais

Automatic time-series phenotyping using massive feature extraction

por Ben D Fulcher, Nick S Jones

Across a far-reaching diversity of scientific and industrial applications, a general key problem involves relating the structure of time-series data to a meaningful outcome, such as detecting anomalous events from sensor recordings, or diagnosing patients from physiological time-series measurements like heart rate or brain activity. Currently, researchers must devote considerable effort manually devising, or searching for, properties of their time series that are suitable for the particular analysis problem at hand. Addressing this non-systematic and time-consuming procedure, here we introduce a new tool, hctsa, that selects interpretable and useful properties of time series automatically, by comparing implementations over 7700 time-series features drawn from diverse scientific literatures. Using two exemplar biological applications, we show how hctsa allows researchers to leverage decades of time-series research to quantify and understand informative structure in their time-series data.

Automatic time-series phenotyping using massive feature extraction

Porque intervalos de confiança em previsões de séries temporais não são boas quanto desejamos?

Direto do Peter Stats Stuff

Para quem trabalha com modelos tipicamente de previsão usando ARIMA e Auto-ARIMA um aspecto bem difícil de se estimar é a a incerteza dos termos auto-regressivos (isso é, os seus intervalos de erros).

Na prática, no momento em que temos os termos dos coeficientes auto regressores absorvendo parte da incerteza seja por conta de meta-parametrização (ou a falta dela) ou mesmo devido à natureza dos dados, os modelos de previsão de séries temporais não conseguem captar esse tipo de incerteza, e nesse caso acontecem os problemas dos intervalos de confiança não representarem exatamente um range aceitável/factível.

The problem is that for all but the most trivial time series forecasting method there is no simple way of estimating the uncertainty that comes from having estimated the parameters from the data, and much less so the values of meta-parameters like the amount of differencing needed, how many autoregressive terms, how many moving average terms, etc (those example meta-parameters come from the Box-Jenkins ARIMA approach, but other forecasting methods have their own meta-parameters to estimate too).

Porque intervalos de confiança em previsões de séries temporais não são boas quanto desejamos?

Porque os métodos de árvores de decisão não são os mais ideais para problemas de extrapolação?

Neste artigo do Peter Stats ele demonstra que métodos baseados em árvores como Random Forests e Árvores de Decisão não tem um bom desempenho quando trabalham com dados muito fora do range do seu conjunto de treinamento, ou em termos estatísticos não realizam uma boa extrapolação.

Isso acontece na prática devido a alguns fatores mais relacionados à natureza dos algoritmos de árvores de decisão do que uma limitação em si como:

(a) os particionamentos recursivos das árvores de decisão por si só no momento em que encontram os dados de treinamento já estabelecem implicitamente algumas fronteiras em relação aos seus critérios de divisão dos dados;

(b) os critérios de divisão de dados (split criteria) mais comuns (information gain, entropia, CHAID, et cetera) levam em consideração os valores dos atributos de forma completa, antes do particionamento, o que já contribuí para esses limites não considerarem dados fora de um range específico; e

(c)  algumas ferramentas como o Spark MLIIb tem alguns parâmetros como Max Depth (máximo de profundidade) que controla a especificidade da árvore e de seus nós folha, e Max Bins (número máximo de agrupamento de dados dos valores de cada coluna) que determina por parametrização um range fixo (e com isso, mais fronteiras estabelecidas).

Dessa forma, para esse tipo de problema de séries temporais, o uso de algoritmos como o XGBoost ou até mesmo de Árvores de Decisão tem um bom desempenho quando aplicados em problemas em que a série temporal tem um comportamento bem previsível dentro de um range estabelecido (ou interpolação clássica) (não mais de 1 ou 2 desvios padrão) ou mesmo para problemas de previsão de limiares como usamos na Movile e que pode ser visto aqui.

Porque os métodos de árvores de decisão não são os mais ideais para problemas de extrapolação?

Previsão de Séries Temporais usando XGBoost – Pacote forecastxgb

Para quem já teve a oportunidade de trabalhar com previsão de variáveis categóricas em Machine Learning sabe que o XGBoost é um dos melhores pacotes do mercado, sendo largamente utilizado em inúmeras competições no Kaggle.

A grande diferença feita pelo Peter Ellis foi realizar algumas adaptações para incorporar algumas variáveis independentes através do parâmetro xreg ao modelo preditivo de séries temporais.

Para quem trabalha com análise de séries temporais, esse trabalho é muito importante até porque o forecastxgb  tríade Média-Móvel/ARIMA (ARMA)/(S)ARIMA em que tanto estatísticos/Data Miners/Data Scientists ficam presos por comodidade ou falta de meios.

Um exemplo da utilização do pacote está abaixo:

# Install devtools to install packages that aren't in CRAN

# Installing package from github 

# Load the libary

# Time Series Example

# Model
model <- xgbts(gas)

# Summary of the model

# Forecasting 12 periods 
fc <- forecast(model, h = 12)

# Plot
Previsão de Séries Temporais usando XGBoost – Pacote forecastxgb

STR: A Seasonal-Trend Decomposition Procedure Based on Regression

Um dos maiores desafios em predição/decomposição de séries temporais (no espectro de aprendizado de máquina) é a inclusão de diversos efeitos sazonais ou até mesmo como saber quais efeitos cíclicos que estão contidos na série.

Esse paper do  Dokumentov e do Rob J Hyndman ataca essa questão com a criação do STR que é um procedimento para decomposição sazonal e de tendência baseado em regressão.

We propose new generic methods for decomposing seasonal data: STR (a Seasonal-Trend decomposition procedure based on Regression) and Robust STR. In some ways, STR is similar to Ridge Regression and Robust STR can be related to LASSO. Our new methods are much more general than any alternative time series decomposition methods. They allow for multiple seasonal and cyclic components, and multiple linear regressors with constant, flexible, seasonal and cyclic influence. Seasonal patterns (for both seasonal components and seasonal regressors) can be fractional and flexible over time; moreover they can be either strictly periodic or have a more complex topology. We also provide confidence intervals for the estimated components, and discuss how STR can be used for forecasting.


STR: A Seasonal-Trend Decomposition Procedure Based on Regression

Previsão do tempo dos 100 Metros Rasos em Londres 2012

Dia 5 de Agosto do corrente ano será um dia histórico independente de quem ganhar o evento mais importante das Olimpíadas de Londres, os 100 metros rasos.

Provavelmente essa modalidade esportiva, perde em termos de complexidade de treinamento e desenvolvimento somente para o tênis e o Golf. Entretanto, o que chama mais atenção nesse evento são as possibilidades e a expectativa sobre se haverá se a quebra do record olímpico de 9.69s de Usain Bolt.

Passando para a parte que interessa que é de análise de dados e mineração de dados, há um post bem interessante do Markus Gesmann, no qual ele apresenta resultados de uma regressão logística e regressão linear sobre os tempos dos 100 metros rasos em uma série histórica. Os resultados são bem plausíveis e o modelo está bem estimado.

Previsão do tempo dos 100 Metros Rasos em Londres 2012

Busca e Mineração de Trilhões subsequências de Séries Temporais sob Dynamic Time Warping

Neste paper os pesquisadores acreditam que o gargalo da performance da mineração de dados utilizando séries temporais é o tempo de resposta do cálculo das medidas de distância que são utilizadas; e a proposta é a utilização do algoritmo Dynamic Time Warping que faz a comparação entre duas instâncias (ou sequências) ao longo de um determinado período de tempo. É bem interessante e saí do lugar comum quando se trata de medidas de distância.

Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping – Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista, Brandon Westover, Qiang Zhu, Jesin Zakaria, Eamonn Keogh

Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling  search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of  all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow  us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible. 

Link –

Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping

Busca e Mineração de Trilhões subsequências de Séries Temporais sob Dynamic Time Warping