Uma pequena lição de Cross-Validation (Validação Cruzada)

Aqui com o Eric Cai.

Teorema de que “Não há almoço grátis” na construção de modelos.

Um bom artigo do Eric Cai.

Porque sempre a representação de um modelo seguirá a fórmula abaixo:

Lição de Aprendizado de Máquina do Dia – Classificação e Regressão

Uma explicação que cabe uma reflexão:

“[…]Thus, regression in statistics is different from regression in supervised learning.
In statistics,
• regression is used to model relationships between predictors and targets, and the targets could be continuous or categorical.
• a regression model usually includes 2 components to describe such relationships:
o a systematic component
o a random component. The random component of this relationship is mathematically described by some probability distribution.
• most regression models in statistics also have assumptions about thestatistical independence or dependence between the predictors and/or between the observations.
• many statistical models also aim to provide interpretable relationships between the predictors and targets.
o For example, in simple linear regression, the slope parameter, , predicts the change in the target, , for every unit increase in the predictor, .
In supervised learning,
• target variables in regression must be continuous
• regression has less or even no emphasis on using probability to describe the random variation between the predictor and the target
o Random forests are powerful tools for both classification and regression, but they do not use probability to describe the relationship between the predictors and the target.
• regression has less or even no emphasis on providing interpretable relationships between the predictors and targets.
o Neural networks are powerful tools for both classification and regression, but they do not provide interpretable relationships between the predictors and the target.
[…]”

Análise de Whiskies usando K-Means

Uma ótima análise usando K-Means com o R. Mais do que a análise, esse post é uma aula de como proceder com uma análise de cluster usando a determinação arbitrária de clusters como o K-means exige.

Com isso a geração dos resultados e da análise ficam muito mais ‘walk-thru’ e muito menos black-box.