A sua empresa é Engineering-First ou Product-First?

Uma das maiores vantagens que eu tive ao longo da minha carreira foi ter trabalhado em diferentes tipos de empresas atuando com Consultoria ou mesmo em empresas do tipo Holding. Acho que esses tipos de empresa te dão um tipo de exposição que é muito difícil de se encontrar em empresas convencionais (e.g. um negócio com uma única vertical).

Contudo, por mais que estes negócios tivessem dinâmicas diferentes, depois de um determinado tempo atuando em cada uma delas, uma distinção bem clara na qual essas empresas utilizavam a engenharia/tecnologia começava aparecer.

Estas distinções que eu via não era somente dentro do campo estritamente técnico, mas sim dentro do aspecto cultural da organização.

E aqui quando eu falo cultura, eu remeto a definição da Wikipedia que usa a citação direta do Edward B. Tylor no qual ele coloca que a cultura é “todo aquele complexo que inclui o conhecimento, as crenças, a arte, a moral, a lei, os costumes e todos os outros hábitos e capacidades adquiridos pelo homem como membro de uma sociedade“.

Transpondo essa definição para o contexto corporativo, a cultura de uma empresa é determinante na forma na qual as pessoas são contratadas, na avaliação dos comportamentos esperados, estrutura de incentivos e principalmente na forma em que as decisões são tomadas e os rumos estratégicos são decididos.

Estas abordagens que em um primeiro momento podem parecer coisas com baixa relevância, tem um peso muito grande seja para as organizações e principalmente para as pessoas.

Ao final deste post eu pretendo deixar clara a ideia de empresas com abordagens Engineering-First e Product-First, falar um pouco das suas vantagens e desvantagens e do porquê isso é relevante em termos de gestão de carreira [1], [2].

A ideia aqui não é falar que abordagem A é melhor que B; mas sim fazer um paralelo e mostrar um pouco das distinções entre essas abordagens e as vantagens e potenciais desafios.

Engineering-First e Product-First: Duas abordagens

Disclaimer: Diferente do que pode parecer essas abordagens não são excludentes. Isto é, a adoção de uma ou de outra abordagem não significa explicitamente que a organização só faz A ou B. Na verdade as melhores organizações são aquelas que conseguem alternar de forma quase que adaptativa estas duas abordagens dependendo da estratégia e o que deve ser alcançado.

Escrevo isto devido ao fato de que uma coisa comum que eu venho notando em algumas discussões com alguns colegas é que a falta de conhecimento do tipo de empresa que está se entrando juntamente com o fato de falta de autoconhecimento acaba levando a uma espiral de frustrações ao longo do tempo em que sempre os dois lados estão insatisfeitos e com isso levando ao inevitável boreout, pedidos de demissão, stress corporativo e similares.

Em um caso a parte de produto fica frustrada por conta de constantes atrasos e reclamações, seja a parte de engenharia devido a falta de uma correta conceptualização do projeto ou escopos. Isto é, uma situação de eterno atrito.

Desta forma entender os tipos de empresas é o primeiro passo para sair desta espiral de frustrações e atritos. Mas primeiramente vamos a algumas características relativas a cada tipo de abordagem.

Paradigmas e Características

Empresas Engineering-First

Empresas Engineering-First são empresas que têm na sua engenharia e capacidade de inovação tecnológicas as suas principais vantagens comparativas. Geralmente por causa desses fatores algumas destas empresas são muito dominantes chegando às vezes a construírem monopólios quase que invencíveis (e.g. Siemens, Airbus, Google, Facebook, Amazon) devido de que estas empresas usam novos métodos (ou métodos clássicos) de engenharia e tecnologia não para alavancar o seu Core Business mas como forma de sobrevivência.

Geralmente são empresas que trabalham com plataformas/infraestrutura como principais serviços, ou empresas que têm a engenharia no centro do processo de inovação e conceptualização dos produtos. Ou seja, o negócio é voltado para a incorporação de tecnologias de ponta nos produtos ou formas mais eficientes de se fazer algo via otimização.

Em linhas gerais, aqui a abordagem é a engenharia vai fazer e o produto vai colocar em uma embalagem mais atraente e deixar o sabor mais palatável. Algumas das vezes estas empresas Engineering-First trabalham oferecem plataformas/infraestrutura do tipo missão crítica e produtos de alta complexidade tecnológica que não tem uma interface direta com o cliente final comum.

Essas empresas têm a engenharia/tecnologia em seu core business e uma estagnação ou defasagem tecnológica pode ser a diferença entre a glória e a bancarrota.

Alguns exemplos de empresas Engineering-First poderiam ser a Honda e Mercedes-AMG na parte de motores de Fórmula 1, Huawei na parte de plataformas de telecomunicações, Amazon Web Services no que se refere à parte de infraestrutura e a Airbus na parte aeroespacial.

Essas empresas têm uma cultura de engenharia extrema em que a otimização é levada ao extremo e pequenos ganhos fazem toda a diferença no resultado final, o que de certa maneira coloca uma pressão muito grande no grau de especialização necessário para fazer isso acontecer.  

Empresas Product-First

Já as empresas Product-First caracterizam-se por serem organizações em que a tecnologia entra mais como elemento de alavancagem, otimização e escalabilidade para facilitar transações de negócios que não demandam tecnologia de uma maneira direta.

Grande parte das vezes estas empresas são extremamente pragmáticas em relação à tecnologia devido ao fato de que para estas empresas elas têm um problema de ordem maior para ser resolvido e a engenharia seria apenas uma ferramenta/meio para resolver esse problema.

Essas empresas geralmente trabalham centrados no consumidor final desde a conceptualização até a experiência do usuário final, e diferente da abordagem Engineering-First aqui há a inclusão de fatores humanos no resultado final.

Neste caso a engenharia entra nas empresas Product-First apenas para materializar o conceito definido na visão de produto de forma para ou alavancar ou escalar os seus processos de negócios.

Ou seja, a tecnologia pode ser usada para alavancar ou otimizar um processo, mas no final isso tem que fazer parte de um produto específico que irá satisfazer um usuário final, e este é o real Core Business da empresa.  

Alguns exemplos de empresas Product-First poderiam ser Apple no que se refere a parte de computadores pessoais, Microsoft em sua linha de vídeo-games, Netflix na parte de distribuição de conteúdo e Spotify como um marketplace entre artistas e ouvintes de música/podcast.  

A grosso modo, empresas Engineering-First são baseadas na ultra-otimização, escalabilidade, redundância, e do foco nos seus componentes de engenharia/tecnologia como core business; enquanto as empresas Product-First geralmente trabalham em como colocar a tecnologia para resolver algum tipo de necessidade humana através de produtos que as pessoas gostem e usem, ou seja, neste caso essas empresas usam a engenharia/tecnologia como suporte de suas operações e não como foco principal.

Já que temos uma definição do que são as empresas, eu vou tentar colocar alguns pontos do que eu vi ao longo do tempo em relação às vantagens e desvantagens de se trabalhar em cada tipo de empresa de forma não é exaustiva.

Engineering-First: Vantagens e Desvantagens

·  Vantagens

  • Geralmente este tipo de empresa têm uma abertura bem grande para Research/PoCs para incorporação de novas tecnologias seja como novas formas de solucionar problemas latentes ou para ganhar performance e escalabilidade (e.g. Bell Labs).
  • Fine-Tuning/Otimização em algumas empresas são usadas como estratégias que se bem definida pode virar uma vantagem competitiva.  Aqui não tem segredo: cada pequena otimização pode ter ganhos de escala muito grandes (e.g. aumento de throughput de uma plataforma, eficiência de um algoritmo novo de compactação que reduz necessidade de espaço, redes que têm a redução de latência, etc.)
  • Tende a atrair engenheiros com um grau de especialização maior em poucos aspectos, mas com um grau de profundidade muito maior (e.g. aqui seria o equivalente ao engenheiro responsável pela liga de titânio que faz parte da construção das blades de um motor de Airbus A320).

·  Desvantagens

  • Parte do trabalho sempre será invisível, e pior: Este mesmo trabalho tende a mover com um grau de incerteza muito maior e com velocidade que pode ser muito menor em termos de ritmo de entrega;
  • Trabalhar em ambientes voltados para ganhos em otimização pode ser frustrante para quem quer trabalhar com inovações voltadas ao usuário final ou quem está saturado por conta de trabalhar no mesmo assunto por meses com pouco ou nenhum resultado (e.g. gastar 30.000% em pesquisa para ganhar os 3% finais).
  • Muita otimização pode levar a estagnação tecnológica dado que enquanto buscam-se os parâmetros corretos para algo, algum concorrente pode estar trabalhando em algo completamente novo (e.g. Airbus passando a Boeing com um paradigma totalmente diferente de engenharia)
  • Pode ser um pesadelo para Product-Engineers dado que como o trabalho em algumas vezes pode ser para fazer algo rodar melhor, o impacto no cliente final pode não ser tão perceptível
  • Pode trazer algo interessante tecnicamente e em termos de execução, mas pode ser pouquíssimo atrativo ao cliente final (e.g. Concorde, Windows Vista)
  • Over-engineering aqui pode ser um problema muito grande e quando não há comunicação com os times de produtos, e geralmente leva a desperdícios aumenta o custo de manutenção em um momento no futuro.

Product-First: Vantagens e Desvantagens

·  Vantagens

  • Tem um enorme atrativo para Product-Engineers. Para quem gosta de ver o resultado do seu trabalho no final do dia impactando o usuário final e ter a real visibilidade e impacto do que está sendo feito uma empresa Product-First é o melhor lugar para se estar. Eu já estive em algumas e posso afirmar que ver algo funcionando e as pessoas usando a sua aplicação é algo que vale a pena.
  • Uma abordagem Product-First consegue capturar o timing do mercado para lançar novas features e /ou defender parte do Market Share, as vezes até ao mesmo tempo. Em alguns cenários de engenharia isso nem sempre é possível, dado que uma empresa Product-First deve ser capaz de adaptar-se em uma velocidade maior às tendências do mercado ou acompanhar os competidores se for o caso de uma competição mais acirrada para ganho ou manutenção de vantagem competitiva (e.g. usar Swift ao invés de Objective-C em projetos de iOS, para soluções de dados serem acessíveis em tempo real, ao invés de usar um SGBD tradicional usar Redis, etc)
  • Aqui a competição sobe o jogo de todos. Esse é um assunto que ao meu ver é pouco discutido que é como uma competição dinâmica pode elevar o jogo e todos, e para uma empresa Product-First competição (e elevação do seu nível de excelência operacional) é quase uma questão de quem sobrevive no mercado e de quem vai à falência.

·  Desvantagens

  • Em muitas das vezes a otimização de engenharia fica em segundo plano devido a pipelines de produtos muito cheios ou falta de visibilidade do que deve ser feito. Em uma analogia simples, seria como um chef pedir uma reforma na cozinha por causa de condições de trabalho, mas ao mesmo tempo o dono do restaurante está satisfeito que o seu comércio está gerando lucro apesar dos problemas.
  • Se você gosta de ir a fundo em um tópico técnico (e.g. resolução de cenários técnicos world class) talvez estar em uma empresa Product-First não seja o ideal, pois na busca para a solução ideal talvez a sua empresa já esteja pensando em outro produto, outra implementação e muitas das vezes as soluções tecnicamente melhores perdem prioridade devido ao fato de que coisas novas precisam ser feitas;
  • Uma continuação do ponto anterior é que se as coisas não têm uma prioridade do ponto de vista de engenharia na concepção inicial, o Débito Técnico vai ser a lei. Isto significa que todo projeto já começa com um débito técnico a ser pago e ao longo do tempo os juros começarão a ser um problema real em caso de potenciais melhorias e manutenções. Isto é um dos pontos cegos mais comuns em projetos, dado que a resolução de um débito técnico muitas das vezes não aparece no backlog de produto, contudo a sua execução pode eliminar problemas latentes no produto que venham a causar um mau maior (e.g. indisponibilidades, vulnerabilidades de segurança).
  • E já que as empresas Product-First lida com usuários finais em grande parte das vezes, quando as coisas dão errado elas vão acontecer da forma mais trágica possível e a pressão tanto de entrega quanto de ajustar algo errada é quase que permanente quando se trabalha em uma empresa Product-First.

Mas isso realmente importa?

Bem dentro do que eu vivi no ambiente corporativo, importa e muito. Grande parte dos erros que eu cometi na minha carreira estão intimamente relacionados com o fato de uma falta de identificação da minha parte de que tipo de empresa eu estava entrando. Este fato me levou grande parte das vezes em uma estrada de frustrações, boredom/boreout com fricções extremas dentro do ambiente corporativo (todas em âmbito respeitoso).

Duas situações para exemplificar o meu ponto.

A primeira foi quando eu estava em uma empresa Engineering-First com a mentalidade de Product-Engineer. Na época eu era analista de BI onde eu era responsável pela parte de tuning do banco de dados e do DWH; mas o que eu realmente queria fazer era participar do processo de precificação dos ativos da empresa e fazer um trabalho menos voltado a rotinas operacionais de otimização e monitoramento; mas sim realizar modelagem matemática para prever os preços dos derivativos que eu colocava dentro do meu banco de dados.  

A segunda foi quando eu cheguei em uma empresa Product-First com uma mentalidade de como se eu estivesse em uma empresa Engineering-First. Ao mesmo tempo em que eu fazia algumas análises mais complexas do negócio, consultava benckmarks técnicos e até mesmo implementava algoritmos de alguns papers da nossa plataforma, a empresa queria simplesmente deixar uma solução muito ruim (em termos de débito técnico) e mover para o próximo projeto (e isso com eles totalmente satisfeitos com o resultado da péssima implementação em termos técnicos).

E por que eu estou colocando isto? Pelo simples motivo de que estar na empresa errada no momento errado da sua carreira pode ser algo que não vai apenas desperdiçar o seu potencial de carreira: isso pode literalmente escalar para doenças mentais com consequências quase que imprevisíveis.

Óbvio que em muitas das vezes não tem como escolhermos o nosso emprego, mas se eu tivesse que deixar uma recomendação eu diria saiba o tipo de engenheiro que você é no momento e procure uma empresa que tenha uma abordagem que tenha o máximo de intersecção com os seus interesses.

Considerações Finais

Ambos os paradigmas têm as suas vantagens e desvantagens e essas características não são totalmente excludentes.

Contudo, é muito difícil encontrar organizações que operam bem usando as duas abordagens em paralelo. Dessa forma se eu tivesse que deixar uma dica essa seria algo como no aforismo grego conhece a ti mesmo como engenheiro. Isso por si só vai levar a melhores decisões de carreira ao longo do tempo e com sorte evitar todo tipo de frustrações.

Dedicado especialmente ao Daniel “Oz” Santos que passou comigo grande parte dos apuros descritos nestas duas abordagens (i.e. vivemos as desvantagens das duas abordagens por mais de 3 anos) e para o Rodolfo Zahn pelas conversas que deram origem a este post.

Links úteis

How to cultivate an engineering first culture — from a coders perspective

Product engineers

The Over-Engineering Problem (and How to Avoid It)

Notas

[1] – Existe um terceiro tipo de abordagem que eu venho notando que são as empresas Research/Experimentation-First, mas isso é tema para um outro momento.

[2] – Este post não tem como ponto principal falar de projetos em si ou de negócios específicos, mas sim discutir essas duas abordagens que a meu ver definem bem o mercado para engenheiros, cientistas de dados, e demais profissionais de dados. Este argumento pode ser estendido para outras áreas, mas aqui eu vou falar especificamente para o público da parte de dados e IT no geral.

[3] – E como vocês já sabem eu fui alfabetizado no pior sistema educacional do mundo e a minha mente fala muito mais rápido que os meus dedos e a vontade de revisar isso é tão alta quanto uma folha de papel deitada, então relevem todos os erros cometidos.

A sua empresa é Engineering-First ou Product-First?

RecSys 2019 – Recommendation in Multi-Stakeholder Environments (RMSE) and 7th International Workshop on News Recommendation and Analytics (INRA 2019) in RecSys 2019

Once that you’re in a conference, the first thing that you do is certainly go to the main talks and see the presentations of big companies, look for the big cases, hang out with authors of great papers stating the SOTA and so on. 

This is the safest path and probably most of those cases will have press releases discussed in media or subject in blogposts and you can have access to this before almost everyone. 

However, one thing that I think is very underestimated in conferences are the workshops.

My favorite definition of what a workshop is comes from Oxford Dictionary that is “a meeting of people to discuss and/or perform practical work in a subject or activity”.

For me, workshops are the best blend between the conference format – that contains the peer-review and trusteeship from the chair – in a smaller format that you can go for a specific but subjacent subject and have direct contact with the authors. In those places, there’s some information that is not available for the greater public.

I’ll talk today about two workshops that I attended in RecSys 2019 that is Recommendation in Multi-Stakeholder Environments (RMSE) and 7th International Workshop on News Recommendation and Analytics (INRA 2019).

But first I’ll explain why I attended those two workshops in RecSys. 

First of all, I decided to attend RMSE because here at MyHammer, we’re dealing with several challenges regarding the recommendations in a marketplace. We need to take into consideration not only a single platform user but several different users that not only interact with each other, but we have all the dynamics of a job marketplace. We have complex competition dynamics and seeing the proceedings. I saw that I could learn tons of ways to apply this knowledge in the company.

For the INRA as this specific topic doesn’t have much in common with job recommendation, I decided to attend because some papers are talking about some very relevant aspects that have a big intersection with our use case like giving recommendations for non-logged users, contextual multi-armed bandits, content-representation and strategies to use word embeddings.   

For the matter of clarity, I’ll include the official descriptions of those workshops:

7th International Workshop on News Recommendation and Analytics (INRA 2019)

This workshop primarily addresses news recommender systems and analytics. The news ecosystem engulfs a variety of actors including publishers, journalists, and readers. The news may originate in large media companies or digital social networks. INRA aims to connect researchers, media companies, and practitioners to exchange ideas about creating and maintaining a reliable and sustainable environment for digital news production and consumption.

Topics of interests for this workshop include but are not limited to:

  • News Recommendation
  • News Analytics
  • Ethical Aspects of News Recommendation

For the RMSE: Recommendation in Multi-Stakeholder Environments we have the following description:

One of the most essential aspects of any recommender system is personalization — how well the recommendations delivered suit the user’s interests. However, in many real world applications, there are other stakeholders whose needs and interests should be taken into account. In multisided e-commerce platforms, such as auction sites, there are parties on both sides of the recommendation transaction whose perspectives should be considered. There are also contexts in which the recommender system itself also has certain objectives that should be incorporated into the recommendation generation. Problems like long-tail promotion, fairness-aware recommendation, and profit maximization are all examples of objectives that may be important in different applications. In such multistakeholder environments, the recommender system will need to balance the (possibly conflicting) interests of different parties.

This workshop will encourage submissions that address the challenges of producing recommendations in multistakeholder settings, including but not limited to the following topics:

  • The requirements of different multistakeholder applications such as:
    • Recommendation in multisided platforms
    • Fairness-aware recommendation
    • Multi-objective optimization in Recommendation
    • Value-aware recommendation in commercial settings
    • Reciprocal recommendation
  • Algorithms for multistakeholder recommendation including multi-objective optimization, re-ranking and others
  • Evaluation of multistakeholder recommendation systems
  • User experience considerations in multistakeholder recommendation including ethics, transparency, and interfaces for different stakeholders.

RecSys is one of the best conferences for Recommendation Systems because it’s an excellent blend between industry and academics, where in one side in academia we have a fast paced rhythm of research in scenarios more complex than ever before and for industry most of the companies are moving forward those new methods in battle-tested environments where we have not only sterile benchmarks, but recommender systems applied in real live data. 

Below I’ll highlight some interesting papers and some quick notes about them. I strongly suggest the read on the full papers because this is the SOTA in terms of research and industrial applications in recommender systems.

7th International Workshop on News Recommendation and Analytics (INRA 2019)

  • On the Importance of News Content Representation in Hybrid Neural Session-based Recommender Systems, Gabriel De Souza P. Moreira, Dietmar Jannach and Adilson Marques Da Cunha. Abstract: News recommender systems are designed to surface relevant information for online readers by personalizing their user experiences. A particular problem in that context is that online readers are often anonymous, which means that this personalization can only be based on the last few recorded interactions with the user, a setting named session-based recommendation. Another particularity of the news domain is that constantly fresh articles are published, which should be immediately considered for recommendation. To deal with this item cold-start problem, it is important to consider the actual content of items when recommending. Hybrid approaches are therefore often considered as the method of choice in such settings. In this work, we analyze the importance of considering content information in a hybrid neural news recommender system. We contrast content-aware and content-agnostic techniques and also explore the effects of using different content encodings. Experiments on two public datasets confirm the importance of adopting a hybrid approach. Furthermore, we show that the choice of the content encoding can have an impact on the resulting performance.
  • Defining a Meaningful Baseline for News Recommender Systems, Benjamin Kille and Andreas Lommatzsch.Abstract: The analysis of images in the context of recommender systems is a challenging research topic. NewsREEL Multimedia enables researchers to study new algorithms with a large dataset. The dataset comprises news items and the number of impressions as a proxy for interestingness. Each news article comes with textual and image features. This paper presents data characteristics and baseline prediction models. We discuss the performance of these predictors and explain the detected patterns.
  • Trend-responsive user segmentation enabling traceable publishing insights. A case study of a real-world large-scale news recommendation system, Joanna Misztal-Radecka, Dominik Rusiecki, Michał Żmuda and Artur Bujak. Abstract: The traditional offline approaches are no longer sufficient for building modern recommender systems in domains such as online news services, mainly due to the high dynamics of environment changes and necessity to operate on a large scale with high data sparsity. The ability to balance exploration with exploitation makes the multi-armed bandits an efficient alternative to the conventional methods, and a robust user segmentation plays a crucial role in providing the context for such online recommendation algorithms. In this work, we present an unsupervised and trend-responsive method for segmenting users according to their semantic interests, which has been integrated with a real-world system for large-scale news recommendations. The results of an online A/B test show significant improvements compared to a global-optimization algorithm on several services with different characteristics. Based on the experimental results as well as the exploration of segments descriptions and trend dynamics, we propose extensions to this approach that address particular real-world challenges for different use-cases. Moreover, we describe a method of generating traceable publishing insights facilitating the creation of content that serves the diversity of all users needs.

RMSE: Recommendation in Multi-Stakeholder Environments

  • Multi-stakeholder Recommendation and its Connection to Multi-sided Fairness (Himan Abdollahpouri and Robin Burke). Abstract: There is growing research interest in recommendation as a multistakeholder problem, one where the interests of multiple parties should be taken into account. This category subsumes some existing well-established areas of recommendation research including reciprocal and group recommendation, but a detailed taxonomy of different classes of multi-stakeholder recommender systems is still lacking. Fairness-aware recommendation has also grown as a research area, but its close connection with multi-stakeholder recommendation is not always recognized. In this paper, we define the most commonly observed classes of multi-stakeholder recommender systems and discuss how different fairness concerns may come into play in such systems.
  • Simple Objectives Work Better (Joaquin Delgado, Samuel Lind, Carl Radecke and Satish Konijeti). Abstract: Groupon is a dynamic two-sided marketplace where millions of deals organized in three different lines of businesses or verticals: Local, Goods and Getaways, using various taxonomies, are matched with customers’ demand across 15 countries around the world. Customers discover deals by directly entering the search query or browsing on the mobile or desktop devices. Relevance is Groupon’s homegrown search and recommendation engine, tasked to find the best deals for its users while ensuring the business objectives are also met at the same time. Hence the objective function is designed to calibrate the score to meet the needs of multiple stakeholders. Currently, the function is comprised of multiple weighted factors that are combined to satisfy the needs of the respective stakeholders in the multi-objective scorer, a key component of Groupon’s ranking pipeline. The purpose of this paper is to describe various techniques explored by Groupon’s Relevance team to improve various parts of Search and Ranking algorithms specifically related to the multi-objective scorer. It is for research only, and it does not reflect the views, plans, policy or practices of Groupon. The main contributions of this paper are in the areas of factorization of the different abstract objectives and the simplification of the objective function to capture the essence of short, mid and long term benefits while preserv
  • Recommender Systems Fairness Evaluation via Generalized Cross Entropy (Yashar Deldjoo, Vito Walter Anelli, Hamed Zamani, Alejandro Bellogin Kouki and Tommaso Di Noia). Abstract: Fairness in recommender systems has been considered with respect to sensitive attributes of users (e.g., gender, race) or items (e.g., revenue in a multistakeholder setting). Regardless, the concept has been commonly interpreted as some form of equality — i.e., the degree to which the system is meeting the information needs of all its users in an equal sense. In this paper, we argue that fairness in recommender systems does not necessarily imply equality, but instead it should consider a distribution of resources based on merits and needs.
    We present a probabilistic framework based on generalized cross entropy to evaluate fairness of recommender systems under this perspective, where we show that the proposed framework is flexible and explanatory by allowing to incorporate domain knowledge (through an ideal fair distribution) that can help to understand which item or user aspects a recommendation algorithm is over- or under-representing. Results on two real-world datasets show the merits of the proposed evaluation framework both in terms of user and item fairness.
  • The Unfairness of Popularity Bias in Recommendation (Himan Abdollahpouri, Masoud Mansoury, Robin Burke and Bamshad Mobasher). Abstract: Recommender systems are known to suffer from the popularity bias problem: popular (i.e. frequently rated) items get a lot of exposure while less popular ones are under-represented in the recommendations. Research in this area has been mainly focusing on finding ways to tackle this issue by increasing the number of recommended long-tail items or otherwise the overall catalog coverage. In this paper, however, we look at this problem from the users’ perspective: we want to see how popularity bias causes the recommendations to deviate from what the user expects to get from the recommender system. We define three different groups of users according to their interest in popular items (Niche, Diverse and Blockbuster-focused) and show the impact of popularity bias on the users in each group. Our experimental results on a movie dataset show that in many recommendation algorithms the recommendations the users get are extremely concentrated on popular items even if a user is interested in long-tail and non-popular items showing an extreme bias disparity.
  • Bias Disparity in Recommendation Systems (Virginia Tsintzou, Evaggelia Pitoura and Panayiotis Tsaparas).Abstract: Recommender systems have been applied successfully in a number of different domains, such as, entertainment, commerce, and employment. Their success lies in their ability to exploit the collective behavior of users in order to deliver highly targeted, personalized recommendations. Given that recommenders learn from user preferences, they incorporate different biases that users exhibit in the input data. More importantly, there are cases where recommenders may amplify such biases, leading to the phenomenon of bias disparity. In this short paper, we present a preliminary experimental study on synthetic data, where we investigate different conditions under which a recommender exhibits bias disparity, and the long-term effect of recommendations on data bias. We also consider a simple re-ranking algorithm for reducing bias disparity, and present some observations for data disparity on real data.
  • Joint Optimization of Profit and Relevance for Recommendation Systems in E-commerce (Raphael Louca, Moumita Bhattacharya, Diane Hu and Liangjie Hong). Abstract: Traditionally, recommender systems for e-commerce platforms are designed to optimize for relevance (e.g., purchase or click probability). Although such recommendations typically align with users’ interests, they may not necessarily generate the highest profit for the platform. In this paper, we propose a novel revenue model which jointly optimizes both for probability of purchase and profit. The model is tested on a recommendation module at Etsy.com, a two-sided marketplace for buyers and sellers. Notably, optimizing for profit, in addition to purchase probability, benefits not only the platform but also the sellers. We show that the proposed model outperforms several baselines by increasing offline metrics associated with both relevance and profit.
  • A Multistakeholder Recommender Systems Algorithm for Allocating Sponsored Recommendations (Edward Malthouse, Khadija Ali Vakeel, Yasaman Kamyab Hessary, Robin Burke and Morana Fuduric). Abstract:Retailing and social media platforms recommend two types of items to their users: sponsored items that generate ad revenue and non-sponsored ones that do not. The platform selects sponsored items to maximize ad revenue, often through some form of programmatic auction, and non-sponsored items to maximize user utility with a recommender system (RS). We develop a multiobjective binary integer programming model to allocate sponsored recommendations considering a dual objective of maximizing ad revenue and user utility. We propose an algorithm to solve it in a computationally efficient way. Our method can be applied as a form of post processing to an existing RS, making it widely applicable. We apply the model to data from an online grocery retailer and show that user utility for the recommended items can be improved while reducing ad revenue by a small amount. This multiobjective approach, which unifies programmatic advertising and RS, opens a new frontier for advertising and RS research and we therefore provide an extended discussion of future research topics.

Final Remarks

I think that for every practitioner or researcher engineer involved in Recommendation Systems, RecSys is a great conference to attend. There’s a great overlap with academia and industry where the first one pushes forward in terms of new methods, algorithms, and a reflexive attitude for important themes like bias and fairness; and the second one applies those methods using engineering and presents some results on battle tested applications using contextual bandits, click prediction and the combination between domain heuristics with optimization method in machine learning. 

RecSys 2019 – Recommendation in Multi-Stakeholder Environments (RMSE) and 7th International Workshop on News Recommendation and Analytics (INRA 2019) in RecSys 2019

Security in Machine Learning Engineering: A white-box attack and simple countermeasures

Some weeks ago during a security training for developers provided by Marcus from Hackmanit (by the way, it’s a very good course that goes in some topics since web development until vulnerabilities of NoSQL and some defensive coding) we discussed about some white box attacks in web applications (e.g.attacks where the offender has internal access in the object) I got a bit curious to check if there’s some similar vulnerabilities in ML models. 

After running a simple script based in [1],[2],[3] using Scikit-Learn, I noticed there’s some latent vulnerabilities not only in terms of objects but also in regarding to have a proper security mindset when we’re developing ML models. 

But first let’s check a simple example.

A white-box attack in a Scikit-Learn random forest object

I have a dataset called Layman Brothers that consists in a database of loans that I did grab from internet (if someone knows the authors let me know to give the credit) that contains records regarding consumers of a bank that according some variables indicates whether the consumer defaulted or not. This is a plain vanilla case of classification and for a matter of simplicity I used a Random Forest to generate a classification model. 

The main points in this post it’s check what kind of information the Scikit-Learn object (model) reveals in a white-box attack and raises some simple countermeasures to reduce the attack surface in those models.

After ran the classifier, I serialized the Random Forest model using Pickle. The model has the following performance against the test set:

# Accuracy: 0.81
# status
# 0 8071
# 1 929
view raw test-set-results.py hosted with ❤ by GitHub

Keep attention in those numbers because we’re going to talk about them later on in this post. 

In a quick search in internet the majority of applications that uses Scikit-Learn for production environments deals with a pickled (or serialized model in Joblib) object that it’s hosted in a machine or S3 bucket and an API REST take care to do the servicing of this model. The API receives some parameters in the request and bring back some response (the prediction result). 

In our case, the response value based on the independent variables (loan features) will be defaulted {1}or not {0}. Quite straightforward.  

Having access in the Scikit-Learn object I noticed that the object discloses valuable pieces of information that in the hands of a white-box attacker could be potentially very damaging for a company. 

Loading the pickled object, we can check all classes contained in a model:

So, we have a model with 2 possible outcomes, {0} and {1}. From the perspective of an attacker we can infer that this model has some binary decision containing a yes {1}or no {0}decision. 

I need to confess that I expected to have only a read accessin this object (because the Scikit-Learn documentation gives it for grant), but I got surprised when I discovered that I can write in the objecti.e. overriding the training object completely. I made that using the following snippet:

# Load model from Pickle
model_rf_reload_pkl = pickle.load(open(‘model_rf.pkl’, ‘rb’))
# Displays prediction classes
model_rf_reload_pkl.classes_
# >>> array([0, 1])

One can noticed that with this command I changed all the possible classes of the model only using a single numpy array and hereafter this model will contain only the outcome {1}.

Just for a matter of exemplification I ran the same function against the test dataset to get the results and I got the following surprise:

# Actual against test set
# Accuracy: 0.2238888888888889
# status
# 1 9000
# Previous against test set
# Accuracy: 0.8153333333333334
# status
# 0 8071
# 1 929
view raw comparison.py hosted with ❤ by GitHub

In this simple example we moved more than 8k records to the wrong class. It’s unnecessary to say how damaging this could be in production in some critical domain like that. 

If we do a simple mental exercise, where this object could be a credit score application, or some classifier for in a medical domain, or some pre-order of some market stocks; we can see that it brings a very cold reality that we’re not close to be safe doing the traditional ML using the popular tools. 

In the exact moment that we Machine Learning Engineers or Data Scientists just run some scripts without even think in terms of vulnerabilities and security, we’re exposing our employers, business and exposing ourselves in such liability risk that can cause a high and unnecessary damage because of the lack of a better security thinking in ML models/objects. 

After that, I opened an issue/question in Scikit-Learn project to check the main reason why this type of modification it’s possible. Maybe I was missing something that was thought by the developers during the implementation phase. My issue in the project can be seeing below:

And I got the following response:

Until the day when this post was published there’s no answer for my last question about this potential vulnerability in a parameter that should not be changed after model training.

This is a simple white-box attack that can interfere directly in the model object itself. Now let’s pretend that we’re not an attacker in the object, but we want to explore other attack surfaces and check which valuable information those models can give for us. 

Models revealing more than expected

Using the same object, I’ll explore the same attributes that is given in the docs to check if we’re able to fetch more information from the model and how this information can be potentially useful.

First, I’ll try to see the number of estimators:

print(fNumber of Estimators: {len(model_rf_reload_pkl.estimators_)}’)
# >>> Number of Estimators: 10
view raw estimators.py hosted with ❤ by GitHub

This Random Forest training should be not so complex, because we have only 10 estimators (i.e. 10 different trees) and grab all the complexities of this Random Forest won’t be so hard to a mildly motivated attacker. 

I’ll grab a single estimator to perform a quick assessment in the tree (estimator) complexity:

model_rf_reload_pkl.estimators_[5]
# >>> DecisionTreeClassifier(class_weight=None, criterion='gini',
# max_depth=5,max_features='auto',
# max_leaf_nodes=5, min_impurity_decrease=0.0,
# min_impurity_split=None, min_samples_leaf=100,
# min_samples_split=2, min_weight_fraction_leaf=0.0,
# presort=False, random_state=1201263687,
# splitter='best')
view raw estimators-info.py hosted with ❤ by GitHub

Then this tree it’s not using a class_weight to perform training adjustments if there’s some unbalance in the dataset. As an attacker with this piece of information, I know that if I want to perform attacks in this model, I need to be aware to alternate the classes during my requests. 

It means that if I get a single positive result, I can explore in alternate ways without being detected as the requests are following by a non-weighted distribution.

Moving forward we can see that this tree has only 5 levels of depth (max_depth) with a maximum 5 leaf nodes (max_leaf_nodes) with a minimum of 100 records per leaf (min_samples_leaf).

It means that even with such depth I can see that this model can concentrate a huge amount of cases in some leaf nodes (i.e. low depth with limited number of leaf nodes). As an attacker maybe I don’t could not have access in the number of transactions that Layman Brothers used in the training, but I know that the tree it’s simple and it’s not so deep. 

In other words, it means that my search space in terms of parameters won’t be so hard because with a small number of combinations I can easily get a single path from the root until the leaf node and explore it.

As an attacker I would like to know how many features one estimator contains. The point here is if I get the features and their importance, I can prune my search space and concentrate attack efforts only in the meaningful features. To know how many features one estimator contains, I need to run the follow snippet:

# Extract single tree
estimator = model_rf_reload_pkl.estimators_[5]
print(fNumber of Features: {estimator.n_features_}’)
# >> Number of Features: 9
view raw tree-features.py hosted with ❤ by GitHub

As we can see, we have only 9 parameters that were used in this model. Then, my job as an attacker could not be better. This is a model of dreams for an attacker. 

But 9 parameters can be big enough in terms of a search space. To prune out some non-fruitful attacks of my search space, I’ll try to check which variables are relevant to the model. With this information I can reduce my search space and go directly in the relevant part of the attack surface. For that let’s run the following snippet:

features_list = [str(x + 0) for x in range(estimator.n_features_)]
features_list
# >>> ['0', '1', '2', '3', '4', '5', '6', '7', '8']
importances = estimator.feature_importances_
indices = np.argsort(importances)
plt.figure(1)
plt.title(‘Feature Importances’)
plt.barh(range(len(indices)), importances[indices], color=b’, align=center’)
plt.yticks(range(len(indices)), indices)
plt.xlabel(‘Relative Importance’)

Let me explain: I did grab the number of the features and put an id for each one and after that I checked the relative importance using np.argsort()to assess the importance of those variables. 

As we can see I need only to concentrate my attack in the features in the position [4][3]and [5]. This will reduce my work in tons because I can discard other 6 variables and the only thing that I need to do it’s just tweak 3 knobs in the model. 

But trying something without a proper warmup could be costly for me as an attacker and I can lose the momentum to perform this attack (e.g. the ML team can update the model, someone can identify the threat and remove the model and rollback an old artifact, etc).

To solve my problem, I’ll check the values of one of trees and use it as a warmup before to do the attack. To check those values, I’ll run the following code that will generate the complete tree:

from sklearn.tree import export_graphviz
# Export as dot file
export_graphviz(estimator, out_file=tree.dot’,
feature_names = features_list,
rounded = True, proportion = False,
precision = 2, filled = True)
# Convert to png using system command (requires Graphviz)
from subprocess import call
call([‘dot’, ‘Tpng’, ‘tree.dot’, ‘o’, ‘tree.png’, ‘Gdpi=600’])
# Display in jupyter notebook
from IPython.display import Image
Image(filename =tree.png’)
view raw plot-tree.py hosted with ❤ by GitHub

Looking the tree graph, it’s clearer that to have our Loan in Layman Brothers always with false {0}in the defaultvariable we need to tweak the values in {Feature 3<=1.5 && Feature 5<=-0.5 && Feature 4<=1.5}.

Doing a small recap in terms of what we discovered: (i) we have the complete model structure, (ii) we know which features it’s important or not, (iii) we know the complexity of the tree, (iv) which features to tweak and (v) the complete path how to get our Loan in Layman Brothers bank always approved and game the system. 

With this information until the moment that the ML Team changes the model, as an attacker I can explore the model using all information contained in a single object. 

As discussed before this is a simple white-box approach that takes in consideration the access in the object. 

The point that I want to make here it’s how a single object can disclose a lot for a potential attacker and ML Engineers must be aware about it.

Some practical model security countermeasures

There are some practical countermeasures that can take place to reduce the surface attack or at least make it harder for an attacker. This list it’s not exhaustive but the idea here is give some practical advice for ML Engineers of what can be done in terms of security and how to incorporate that in ML Production Pipelines. Some of those countermeasures can be:

  • Consistency Checks on the object/model: If using a model/object it’s unavoidable, one thing that can be done is load this object and using some routine check (i) the value of some attributes (e.g. number of classes in the model, specific values of the features, tree structure, etc), (ii) get the model accuracy against some holdout dataset (e.g. using a holdout dataset that has 85% of accuracyand raise an error in any different value), (iii) object size and last modification.  

  • Use “false features: False features will be some information that can be solicited in the request but in reality, won’t be used in the model. The objective here it’s to increase the complexity for an attacker in terms of search space (e.g.a model can use only 9 features, but the API will request 25 features (14 false features)). 

  • Model requests monitoring: Some tactics can be since monitoring IP requests in API, cross-check requests based in some patterns in values, time intervals between requests.

  • Incorporate consistency checks in CI/CD/CT/CC: CI/CD it’s a common term in ML Engineering but I would like to barely scratch two concepts that is Continuous Training (CT) and Continuous Consistency (CC). In Continuous Training the model will have some constant routine of training during some period of time in the way that using the same data and model building parameters the model will always produce the same results.  In Continuous Consistency it’s an additional checking layer on top of CI/CD to assess the consistency of all parameters and all data contained in ML objects/models. In CC if any attribute value got different from the values provided by the CT, the pipeline will break, and someone should need to check which attribute it’s inconsistent and investigate the root cause of the difference.

  • Avoid expose pickled models in any filesystem (e.g. S3) where someone can have access:  As we saw before if someone got access in the ML model/objects, it’s quite easy to perform some white-box attacks, i.e.no object access will reduce the exposure/attack surface.

  • If possible, encapsulates the coefficients in the application code and make it private: The heart of those vulnerabilities it’s in the ML object access. Remove those objects and incorporates only model coefficients and/or rules in the code (using private classes) can be a good way out to disclose less information about the model.

  • Incorporate the concept of Continuous Training in ML Pipeline: The trick here it’s to change the model frequently to confuse potential attackers (e.g.different positions model features between the API and ML object, check the reproducibility of results (e.g.accuracy, recall, F1 Score, etc) in the pipeline).

  • Use heuristics and rules to prune edge cases: Attackers likes to start test their search space using some edge (absurd) cases and see if the model gives some exploitable results and fine tuning on top of that. Some heuristics and/or rules in the API side can catch those cases and throw a cryptic error to the attacker make their job quite harder.

  • Talk its silver, silence its gold: One of the things that I learned in security it’s less you talk about what you’re doing in production, less you give free information to attackers. This can be harsh but its a basic countermeasure regarding social engineering. I saw in several conferences people giving details about the training set in images like image sizing in the training, augmentation strategies, pre-checks in API side, and even disclosing the lack of strategies to deal with adversarial examples. This information itself can be very useful to attackers in order to give a clear perspective of model limitations. If you want to talk about your solution talk more about reasons (why) and less in terms of implementation (how). Telling in Social Media that I keep all my money underneath my pillow, my door has only a single lock and I’ll arrive from work only after 8PM do not make my house safer. Remember: Less information = less exposure. 

Conclusion

I’ll try to post some countermeasures in a future post, but I hope that as least this post can shed some light in the ML Security. 

There’s a saying in aviation culture (a good example of industry security-oriented) that means “the price of safety it’s the eternal vigilance” and I hope that hereafter more ML Engineers and Data Scientists can be more vigilant about ML and security. 

As usual, all codes and notebooks are in Github.

Security in Machine Learning Engineering: A white-box attack and simple countermeasures