Uma pequena lição de Cross-Validation (Validação Cruzada)

Aqui com o Eric Cai. 

Anúncios
Uma pequena lição de Cross-Validation (Validação Cruzada)

Teorema de que “Não há almoço grátis” na construção de modelos.

Um bom artigo do Eric Cai.

Porque sempre a representação de um modelo seguirá a fórmula abaixo:

Modelo = Realidade – Erro

Teorema de que “Não há almoço grátis” na construção de modelos.

Lição de Aprendizado de Máquina do Dia – Classificação e Regressão

Uma explicação que cabe uma reflexão:

“[…]Thus, regression in statistics is different from regression in supervised learning.
In statistics,
• regression is used to model relationships between predictors and targets, and the targets could be continuous or categorical.
• a regression model usually includes 2 components to describe such relationships:
o a systematic component
o a random component. The random component of this relationship is mathematically described by some probability distribution.
• most regression models in statistics also have assumptions about thestatistical independence or dependence between the predictors and/or between the observations.
• many statistical models also aim to provide interpretable relationships between the predictors and targets.
o For example, in simple linear regression, the slope parameter, , predicts the change in the target, , for every unit increase in the predictor, .
In supervised learning,
• target variables in regression must be continuous
• regression has less or even no emphasis on using probability to describe the random variation between the predictor and the target
o Random forests are powerful tools for both classification and regression, but they do not use probability to describe the relationship between the predictors and the target.
• regression has less or even no emphasis on providing interpretable relationships between the predictors and targets.
o Neural networks are powerful tools for both classification and regression, but they do not provide interpretable relationships between the predictors and the target.
[…]”

Lição de Aprendizado de Máquina do Dia – Classificação e Regressão

Análise de Whiskies usando K-Means

Uma ótima análise usando K-Means com o R. Mais do que a análise, esse post é uma aula de como proceder com uma análise de cluster usando a determinação arbitrária de clusters como o K-means exige.

Com isso a geração dos resultados e da análise ficam muito mais ‘walk-thru’ e muito menos black-box.

O resultado final?

“[…]The results indicate that there is a lot of variation in flavor profiles within the different scotch whisky regions. Note that initial cluster centers are chosen at random. In order to replicate the results, you will need to run the following code before your analysis.
set.seed(1) Further data analysis would be required to determine whether proximity to types of water sources or terrain types drive common flavor profiles. This could be done by obtaining shape files and adding them as an additional layer to the ggmap plot.
For me, I have identified my next to-try single malt. Talisker is still within the familiar realm of cluster 4 but a little more malty, fruity and spicy. Sounds like the perfect holiday mix. […]”

Análise de Whiskies usando K-Means