Do you have some co-worker that wants to left your company because he’s not working with bleeding edge Deep Learning tools/algos?

Please, show this post of Ben Lorica’s podcast:

Adoption of machine learning and deep learning in large companies

Everything in the enterprise space is ROI driven. They don’t know that the newest deep learning paper just came out from Google. They’re not going to clone some random GitHub repository and try it out, and just try to put it in production. They don’t do that. They want to understand ROI. They work a job, they have a goal, and they have a budget. They need to figure out what to do with that budget as it relates to their job at their company. Their company is usually a for-profit corporation trying to make money, or trying to increase margins for shareholders.

… Frankly, they don’t care if it’s linear regression, or random forest, either. … Machine learning has barely penetrated the Fortune 2000. Despite all these tools existing, most of them don’t have it in production because they don’t see a point in adopting it. I think Intel said this right: as far as enterprise adoption is concerned, it’s still fairly early for machine learning.

Do you have some co-worker that wants to left your company because he’s not working with bleeding edge Deep Learning tools/algos?

As piores práticas na implantação de um modelo de predição

Esse post do ZSL Services mostra em passos bem didáticos as piores práticas no momento da implantação de um modelo preditivo; os quais eles enumeram:

  1. Falta de foco específico no negócio;
  2. Ignorar os steps iniciais;
  3. Desperdício de tempo em avaliação de modelo ;
  4. Alto investimento em ferramentas que representam um Return Of Investment (Retorno do Investimento) ROI baixo ou nulo; e
  5. Falha na operacionalização.

Apesar de serem passos simples, os mesmos requerem muitos cuidados na análise; e em geral é por esses motivos principalmente que os projetos que lidam com predição de dados falham em sua maioria.

A metodologia CRISP-DM é excelente nesse aspecto, na qual ela não engessa o projeto; mas também faz esse equilíbrio entre essas práticas e a implantação do projeto.

PS: O post fez tanto sucesso que os autores escreveram outro com algumas recomendações.

As piores práticas na implantação de um modelo de predição

2011 Data Miner Survey – Relatório sobre práticas em Mineração de Dados

No ano de 2011 foi realizada uma pesquisa pela Rexer Analytics sobre as principais práticas de mineração de dados, bem como as tendências. De forma geral pelos highlights dá para se perceber que apesar da evolução das técnicas muito do que está em ‘produção’ hoje tem a ver com o básico: Árvore de Decisão, Análise de Cluster e Regressão.

Isso mostra que deve haver um trabalho de base forte para consolidação da mineração de dados no cenário nacional; em especial, na área acadêmica na qual há a apresentação de conceitos avançados sendo que na prática há pouco sendo feito.

 Alguns dos pontos principais elencados pela pesquisa:

 FIELDS & GOALS:  Data miners work in a diverse set of fields.  CRM / Marketing has been the #1 field in each of the past five years.  Fittingly, “improving the understanding of customers”, “retaining customers” and other CRM goals continue to be the goals identified by the most data miners.

 ALGORITHMS:  Decision trees, regression, and cluster analysis continue to form a triad of core algorithms for most data miners.  However, a wide variety of algorithms are being used.   A third of data miners currently use text mining and another third plan to in the future.  Text mining is most often used to analyze customer surveys and blogs/social media.

TOOLS:  R continued its rise this year and is now being used by close to half of all data miners (47%).  R users report preferring it for being free, open source, and having a wide variety of algorithms.  Many people also cited R’s flexibility and the strength of the user community.  In the 2011 survey we asked R users to tell us more about their use of R.  Read the R user comments about why these use R (pros), the cons of using R, why they select their R interface, and how they use R in conjuction with other tools.  STATISTICA is selected as the primary data mining tool by the most data miners (17%).  Data miners report using an average of 4 software tools overall.  STATISTICA, KNIME, Rapid Miner and Salford Systems received the strongest satisfaction ratings in 2011.

TECHNOLOGY:  Data Mining most often occurs on a desktop or laptop computer, and  requently the data is stored locally.  Model scoring typically happens using the same software used to develop models.

VISUALIZATION:  Data miners frequently use data visualization techniques.  More than four in five use them to explain results to others.  MS Office is the most often used tool  for data visualization.  Extensive use of data visualization is less prevalent in the Asia-Pacific region than other parts of the world.

ANALYTIC CAPABILITY & SUCCESS:  Only 12% of corporate respondents rate their company as having very high analytic sophistication.  However, companies with better analytic capabilities are outperforming their peers.  Respondents report analyzing analytic success via Return on Investment (ROI), and analyzing the predictive validity or accuracy of their models.  Challenges to measuring analytic success include client or user cooperation and data availability / quality.  

FUTURE:  Data miners are optimistic about continued growth in data mining adoption and the positive impact data mining will have.  As in previous years, data miners see growth in the number of projects they will be conducting.  And growth in data mining adoption is the number one “future trend” identified.  Participants pointed out that care must be taken to protect privacy when conducting data mining.  Data miners also shared many examples of the positive impact they feel data mining can have to benefit society.  Health / medical advances was the area of positive impact identified by the most data miners. 

2011 Data Miner Survey – Relatório sobre práticas em Mineração de Dados

Investing in Analytics in difficult times

Esse artigo tem um ótimo foco em relação ao investimento em tempos de crise e orçamentos apertados. É mais do que comum em qualquer empresa, em um momento de crise faça cortes em diversos departamentos, e não raramente vemos setores de análises sofrendo reduções e até eliminação total. É mais do que necessário saber que em tempos de crise o mais importante não é fazer; mas sim saber o que tem que ser feito para sair da situação, e somente com um time de análise essa situação pode ser revertida.

Expertise: Advanced (e.g. Predictive) Analytics is a very specific domain requiring very specific skills. Experts have usually grown into their role by combining advanced and detailed training with professional experience on real-life projects. Today, both service suppliers and vendors focus highly on R&D activities and the creation of relevant new business applications.

Focus: While some organizations are purely focused on Advanced Analytics, other companies may offer Analytics as a part of their broader services offering. For some situations, a niche player will prove most valuable, while in other situations the broader range of services might prove most useful. Choose carefully.

Partnership potential: Engaging in longer term analytical partnerships usually requires a more intense form of commitment. It may make sense for all parties to work transparently and to share more strategic insights in return for agreements of confidentiality, knowledge transfer and perhaps even exclusivity.

Budget: Obviously, the budget may play an important role. However, to allow comparisons, it may make sense to take into account daily rates, speed (time to execute standard projects), and expertise when comparing budgets.

Investing in Analytics in difficult times