Além do aprendizado ativo em Sistemas de Recomendação de domínio cruzado

Um dos problemas mais comuns em Sistemas de Recomendação é o famoso Cold Start (i.e. quando não há conhecimento prévio sobre os gostos de alguém que acaba de entrar na plataforma).

Esse paper trás uma perspectiva interessante sobre o assunto.

Toward Active Learning in Cross-domain Recommender Systems – Roberto Pagano, Massimo Quadrana, Mehdi Elahi, Paolo Cremonesi

Abstract: One of the main challenges in Recommender Systems (RSs) is the New User problem which happens when the system has to generate personalised recommendations for a new user whom the system has no information about. Active Learning tries to solve this problem by acquiring user preference data with the maximum quality, and with the minimum acquisition cost. Although there are variety of works in active learning for RSs research area, almost all of them have focused only on the single-domain recommendation scenario. However, several real-world RSs operate in the cross-domain scenario, where the system generates recommendations in the target domain by exploiting user preferences in both the target and auxiliary domains. In such a scenario, the performance of active learning strategies can be significantly influenced and typical active learning strategies may fail to perform properly. In this paper, we address this limitation, by evaluating active learning strategies in a novel evaluation framework, explicitly suited for the cross-domain recommendation scenario. We show that having access to the preferences of the users in the auxiliary domain may have a huge impact on the performance of active learning strategies w.r.t. the classical, single-domain scenario.

Conclusions: In this paper, we have evaluated several widely used active learning strategies adopted to tackle the cold-start problem in a novel usage scenario, i.e., Cross-domain recommendation scenario. In such a case, the user preferences are available not only in the target domain, but also in additional auxiliary domain. Hence, the active learner can exploit such knowledge to better estimate which preferences are more valuable for the system to acquire. Our results have shown that the performance of the considered active learning strategies significantly change in the cross-domain recommendation scenario in comparison to the single-domain recommendation. Hence, the presence of the auxiliary domain may strongly influence the performance of the active learning strategies. Indeed, while a certain active learning strategy performs the best for MAE reduction in the single scenario (i.e., highest-predicted strategy), it actually performs poor in the cross-domain scenario. On the other hand, the strategy with the worst MAE in single-domain scenario (i.e., lowest-predicted strategy) can perform excellent in the cross-domain scenario. This is an interesting observation which indicates the importance of further analysis of these two scenarios in order to better design and develop active learning strategies for them. Our future work includes the further analysis of the AL strategies in other domains such as book, electronic products, tourism, etc. Moreover, we plan to investigate the potential impact of considering different rating prediction models (e.g., context-aware models) on the performance of different active learning strategies.

Além do aprendizado ativo em Sistemas de Recomendação de domínio cruzado

Ética Estóica para agentes artificiais

E não é que adaptaram a filosofia estóica para a Inteligência Artificial?

Stoic Ethics for Artificial Agents – Gabriel Murray

Abstract: We present a position paper advocating the notion that Stoic philosophy and ethics can inform the development of ethical A.I. systems. This is in sharp contrast to most work on building ethical A.I., which has focused on Utilitarian or Deontological ethical theories. We relate ethical A.I. to several core Stoic notions, including the dichotomy of control, the four cardinal virtues, the ideal Sage, Stoic practices, and Stoic perspectives on emotion or affect. More generally, we put forward an ethical view of A.I. that focuses more on internal states of the artificial agent rather than on external actions of the agent. We provide examples relating to near-term A.I. systems as well as hypothetical superintelligent agents.

Conclusions: In this position paper, we have attempted to show how Stoic ethics could be applied to the development of ethical A.I. systems. We argued that internal states matter for ethical A.I. agents, and that internal states can be analyzed by describing the four cardinal Stoic virtues in terms of characteristics of an intelligent system. We also briefly described other Stoic practices and how they could be realized by an A.I. agent. We gave a brief sketch of how to start developing Stoic A.I. systems by creating approval-directed agents with Stoic overseers, and/or by employing a syncretic paramedic ethics algorithm with a step featuring Stoic constraints. While it can be beneficial to analyze the ethics of an A.I. agent from several different perspectives, including consequentialist perspectives, we have argued for the importance of also conducting a Stoic ethical analysis of A.I. agents, where the agent’s internal states are analyzed, and moral judgments are not based on consequences outside of the agent’s control.

Ética Estóica para agentes artificiais

MEBoost – Novo método para seleção de variáveis

Um dos campos bem pouco explorados em termos acadêmicos é sem sombra de dúvidas a parte de seleção de variáveis. Esse paper trás um pouco de luz sobre esse assunto tão importante e que drena parte do tempo produtivo de Data Scientists.

MEBoost: Variable Selection in the Presence of Measurement Error – Benjamin Brown, Timothy Weaver, Julian Wolfson

Abstract:  We present a novel method for variable selection in regression models when covariates are measured with error. The iterative algorithm we propose, MEBoost, follows a path defined by estimating equations that correct for covariate measurement error. Via simulation, we evaluated our method and compare its performance to the recently-proposed Convex Conditioned Lasso (CoCoLasso) and to the “naive” Lasso which does not correct for measurement error. Increasing the degree of measurement error increased prediction error and decreased the probability of accurate covariate selection, but this loss of accuracy was least pronounced when using MEBoost. We illustrate the use of MEBoost in practice by analyzing data from the Box Lunch Study, a clinical trial in nutrition where several variables are based on self-report and hence measured with error.

Conclusions: We examined the variable selection problem in regression when the number of potential covariates is large compared to the sample size and when these potential covariates are measured with measurement error. We proposed MEBoost, a computationally simple descent-based approach which follows a path determined by measurement error-corrected estimating equations. We compared MEBoost, via simulation and in a real data example, with the recently-proposed Convex Conditioned Lasso (CoCoLasso) as well as the naive Lasso which assumes that covariates are measured without error. In almost all simulation scenarios, MEBoost performed best in terms of prediction error and coefficient bias. The CoCoLasso is more conservative with the highest specificity in each case, but sensitivity and prediction are better with MEBoost. In the comparison of selection paths, we saw that MEBoost was more aggressive in identifying variables to be included in the model more quickly than the CoCoLasso. These differences were most apparent when the measurement error had a larger variance and a more complex correlation structure. In addition, MEBoost was 7 times faster than the CoCoLasso. One application of MEBoost took 0.04 seconds versus 0.28 seconds for the CoCoLasso. MEBoost, while a promising approach, has some limitations. One limitation–which is shared with many methods that correct for measurement error–is that we assume that the covariance matrix of the measurement error process is known, an assumption which in many settings may be unrealistic. In some cases, it may be possible to estimate these structures using external data sources, but absent such data one could perform a sensitivity analysis with different measurement error variances and correlation structures, as we demonstrate in the real data application. Another challenging aspect of model selection with error-prone covariates is that, even if the set of candidate models is generated via a technique which accounts for measurement error, the process of selecting a final model (e.g., via cross-validation) still uses covariates that are measured with error. However, we showed in our simulation study that MEBoost performs well in selecting a model which recovers the relationship between the true (error-free) covariates and the outcome, even when using error-prone covariates to select the final model. This finding suggests that the procedure for generating a “path” of candidate models has a greater influence on prediction error and variable selection accuracy than the procedure picking a final model from among those candidates. To conclude, we note that while we only considered linear and Poisson regression in this paper, MEBoost can easily be applied to other regression models by, e.g., using the estimating equations presented by Nakamura (1990) or others which correct for measurement error. In contrast, the approaches of Sørensen et al. (2012) and Datta and Zou (2017) exploit the structure of the linear regression model and it is not obvious how they could be extended to the broader family of generalized linear models. The robustness and simplicity of MEBoost, along with its strong performance against other methods in the linear model case suggests that this novel method is a reliable way to deal with variable selection in the presence of measurement error.

MEBoost – Novo método para seleção de variáveis

Regressão com instâncias corrompidas: Uma abordagem robusta e suas aplicações

Trabalho interessante.

Multivariate Regression with Grossly Corrupted Observations: A Robust Approach and its Applications – Xiaowei Zhang, Chi Xu, Yu Zhang, Tingshao Zhu, Li Cheng

Abstract: This paper studies the problem of multivariate linear regression where a portion of the observations is grossly corrupted or is missing, and the magnitudes and locations of such occurrences are unknown in priori. To deal with this problem, we propose a new approach by explicitly consider the error source as well as its sparseness nature. An interesting property of our approach lies in its ability of allowing individual regression output elements or tasks to possess their unique noise levels. Moreover, despite working with a non-smooth optimization problem, our approach still guarantees to converge to its optimal solution. Experiments on synthetic data demonstrate the competitiveness of our approach compared with existing multivariate regression models. In addition, empirically our approach has been validated with very promising results on two exemplar real-world applications: The first concerns the prediction of \textit{Big-Five} personality based on user behaviors at social network sites (SNSs), while the second is 3D human hand pose estimation from depth images. The implementation of our approach and comparison methods as well as the involved datasets are made publicly available in support of the open-source and reproducible research initiatives.

Conclusions: We consider a new approach dedicating to the multivariate regression problem where some output labels are either corrupted or missing. The gross error is explicitly addressed in our model, while it allows the adaptation of distinct regression elements or tasks according to their own noise levels. We further propose and analyze the convergence and runtime properties of the proposed proximal ADMM algorithm which is globally convergent and efficient. The model combined with the specifically designed solver enable our approach to tackle a diverse range of applications. This is practically demonstrated on two distinct applications, that is, to predict personalities based on behaviors at SNSs, as well as to estimation 3D hand pose from single depth images. Empirical experiments on synthetic and real datasets have showcased the applicability of our approach in the presence of label noises. For future work, we plan to integrate with more advanced deep learning techniques to better address more practical problems, including 3D hand pose estimation and beyond.

Regressão com instâncias corrompidas: Uma abordagem robusta e suas aplicações

Feature Screening in Large Scale Cluster Analysis

Mais trabalhos sobre clustering.

Feature Screening in Large Scale Cluster Analysis – Trambak Banerjee, Gourab Mukherjee, Peter Radchenko

Abstract: We propose a novel methodology for feature screening in clustering massive datasets, in which both the number of features and the number of observations can potentially be very large. Taking advantage of a fusion penalization based convex clustering criterion, we propose a very fast screening procedure that efficiently discards non-informative features by first computing a clustering score corresponding to the clustering tree constructed for each feature, and then thresholding the resulting values. We provide theoretical support for our approach by establishing uniform non-asymptotic bounds on the clustering scores of the “noise” features. These bounds imply perfect screening of non-informative features with high probability and are derived via careful analysis of the empirical processes corresponding to the clustering trees that are constructed for each of the features by the associated clustering procedure. Through extensive simulation experiments we compare the performance of our proposed method with other screening approaches, popularly used in cluster analysis, and obtain encouraging results. We demonstrate empirically that our method is applicable to cluster analysis of big datasets arising in single-cell gene expression studies.

Conclusions: We propose COSCI, a novel feature screening method for large scale cluster analysis problems that are characterized by both large sample sizes and high dimensionality of the observations. COSCI efficiently ranks the candidate features in a non-parametric fashion and, under mild regularity conditions, is robust to the distributional form of the true noise coordinates. We establish theoretical results supporting ideal feature screening properties of our proposed procedure and provide a data driven approach for selecting the screening threshold parameter. Extensive simulation experiments and real data studies demonstrate encouraging performance of our proposed approach. An interesting topic for future research is extending our marginal screening method by means of utilizing multivariate objective criteria, which are more potent in detecting multivariate cluster information among marginally unimodal features. Preliminary analysis of the corresponding `2 fusion penalty based criterion, which, unlike the `1 based approach used in this paper, is non-separable across dimensions, suggests that this criterion can provide a way to move beyond marginal screening.

Feature Screening in Large Scale Cluster Analysis

Deterministic quantum annealing expectation-maximization (DQAEM)

Apesar do nome bem complicado o paper fala de uma modificação do mecanismo do algoritmo de cluster Expectation-Maximization (EM) em que o mesmo tem o incremento de uma meta-heurísica similar ao Simulated Annealing (arrefecimento simulado) para eliminar duas deficiências do EM que é de depender muito dos dados de início (atribuições iniciais) e o fato de que as vezes há problemas de mínimos locais.

Relaxation of the EM Algorithm via Quantum Annealing for Gaussian Mixture Models

Abstract: We propose a modified expectation-maximization algorithm by introducing the concept of quantum annealing, which we call the deterministic quantum annealing expectation-maximization (DQAEM) algorithm. The expectation-maximization (EM) algorithm is an established algorithm to compute maximum likelihood estimates and applied to many practical applications. However, it is known that EM heavily depends on initial values and its estimates are sometimes trapped by local optima. To solve such a problem, quantum annealing (QA) was proposed as a novel optimization approach motivated by quantum mechanics. By employing QA, we then formulate DQAEM and present a theorem that supports its stability. Finally, we demonstrate numerical simulations to confirm its efficiency.

Conclusion: In this paper, we have proposed the deterministic quantum annealing expectation-maximization (DQAEM) algorithm for Gaussian mixture models (GMMs) to relax the problem of local optima of the expectation-maximization (EM) algorithm by introducing the mechanism of quantum fluctuations into EM. Although we have limited our attention to GMMs in this paper to simplify the discussion, the derivation presented in this paper can be straightforwardly applied to any models which have discrete latent variables. After formulating DQAEM, we have presented the theorem that guarantees its convergence. We then have given numerical simulations to show its efficiency compared to EM and DSAEM. It is expect that the combination of DQAEM and DSAEM gives better performance than DQAEM. Finally, one of our future works is a Bayesian extension of this work. In other words, we are going to propose a deterministic quantum annealing variational Bayes inference.

Deterministic quantum annealing expectation-maximization (DQAEM)

K-Means distribuído sobre dados binários comprimidos

E quem disse que o K-Means estava morto hein?

Distributed K-means over Compressed Binary Data

Abstract—We consider a network of binary-valued sensors with a fusion center. The fusion center has to perform K-means clustering on the binary data transmitted by the sensors. In order to reduce the amount of data transmitted within the network, the sensors compress their data with a source coding scheme based on LDPC codes. We propose to apply the K-means algorithm directly over the compressed data without reconstructing the original sensors measurements, in order to avoid potentially complex decoding operations. We provide approximated expressions of the error probabilities of the K-means steps in the compressed domain. From these expressions, we show that applying the Kmeans algorithm in the compressed domain enables to recover the clusters of the original domain. Monte Carlo simulations illustrate the accuracy of the obtained approximated error probabilities, and show that the coding rate needed to perform K-means clustering in the compressed domain is lower than the rate needed to reconstruct all the measurements.

Conclusion: In this paper, we considered a network of sensors which transmit their compressed binary measurements to a fusion center. We proposed to apply the K-means algorithm directly over the compressed data, without reconstructing the sensor measurements. From a theoretical analysis and Monte Carlo simulations, we showed the efficiency of applying K-means in the compressed domain. We also showed that the rate needed to perform K-means on the compressed vectors is lower than the rate needed to reconstruct all the measurements.

K-Means distribuído sobre dados binários comprimidos

Modularização do Morfismo de Redes Neurais

Quem foi que disse que não podem ocorrer alterações morfológicas nas arquiteturas/topologias de Redes Neurais?

Modularized Morphing of Neural Networks – Tao Wei, Changhu Wang, Chang Wen Chen

Abstract: In this work we study the problem of network morphism, an effective learning scheme to morph a well-trained neural network to a new one with the network function completely preserved. Different from existing work where basic morphing types on the layer level were addressed, we target at the central problem of network morphism at a higher level, i.e., how a convolutional layer can be morphed into an arbitrary module of a neural network. To simplify the representation of a network, we abstract a module as a graph with blobs as vertices and convolutional layers as edges, based on which the morphing process is able to be formulated as a graph transformation problem. Two atomic morphing operations are introduced to compose the graphs, based on which modules are classified into two families, i.e., simple morphable modules and complex modules. We present practical morphing solutions for both of these two families, and prove that any reasonable module can be morphed from a single convolutional layer. Extensive experiments have been conducted based on the state-of-the-art ResNet on benchmark datasets, and the effectiveness of the proposed solution has been verified.

Conclusions: This paper presented a systematic study on the problem of network morphism at a higher level, and tried to answer the central question of such learning scheme, i.e., whether and how a convolutional layer can be morphed into an arbitrary module. To facilitate the study, we abstracted a modular network as a graph, and formulated the process of network morphism as a graph transformation process. Based on this formulation, both simple morphable modules and complex modules have been defined and corresponding morphing algorithms have been proposed. We have shown that a convolutional layer can be morphed into any module of a network. We have also carried out experiments to illustrate how to achieve a better performing model based on the state-of-the-art ResNet with minimal extra computational cost on benchmark datasets.

Modularização do Morfismo de Redes Neurais

Aplicação de Deep Learning para relacionar Pins

Ao que parece, o Pinterest está virando a nova casa de força de Deep Learning aplicada à imagens.

Using deep learning to generate Related Pins

We built Pin2Vec to embed all the Pins in a 128-dimension space. First, we label a Pin with all the other Pins someone has saved in his/her activity session, each as a Pin tuple. Pin tuples are used in supervised training to train the embedding matrix for each of the tens of millions of Pins of the vocabulary. We use TensorFlow as the trainer. At serving time, a set of nearest neighbors are found as Related Pins in the space for each of the Pins.

Training data is collected from recent engagement, such as saving or clicking, and a sliding window is applied. Low quality Pins and those not engaged with are removed from training. Then, each Pin is assigned with a unique Pin ID. Within the sliding window, training pairs are extracted such that the first Pin is the example and each of the following Pins is its label. Figure 3 illustrates an example session and training pairs. In our case, you can imagine each user session is a sentence with Pins as words.

We used a feedforward neural network with a hidden layer of 128 dimensions. Figure 4 shows the architecture. The network is inspired by word2vec. The input vector is a one-hot vector with a size of vocabulary and, in our case, is tens of millions of Pins. The vector is reduced to the 128-dimension vector by multiplying with the hidden layer weight matrix. An eLu activation function is applied after hidden layer. At last the hidden layer output is multiplied with the softmax matrix and a cross-entropy is used to calculate the loss. We sampled 64 negative Pins in loss optimization in lieu of iterating on tens of millions of Pins. We trained the Pin2Vec embedding on machines with 32 cores and 244GB memory.

Aplicação de Deep Learning para relacionar Pins