Porque sempre a representação de um modelo seguirá a fórmula abaixo:
Modelo = Realidade – Erro
“[…]Thus, regression in statistics is different from regression in supervised learning.
• regression is used to model relationships between predictors and targets, and the targets could be continuous or categorical.
• a regression model usually includes 2 components to describe such relationships:
o a systematic component
o a random component. The random component of this relationship is mathematically described by some probability distribution.
• most regression models in statistics also have assumptions about thestatistical independence or dependence between the predictors and/or between the observations.
• many statistical models also aim to provide interpretable relationships between the predictors and targets.
o For example, in simple linear regression, the slope parameter, , predicts the change in the target, , for every unit increase in the predictor, .
In supervised learning,
• target variables in regression must be continuous
• regression has less or even no emphasis on using probability to describe the random variation between the predictor and the target
o Random forests are powerful tools for both classification and regression, but they do not use probability to describe the relationship between the predictors and the target.
• regression has less or even no emphasis on providing interpretable relationships between the predictors and targets.
o Neural networks are powerful tools for both classification and regression, but they do not provide interpretable relationships between the predictors and the target.
Com isso a geração dos resultados e da análise ficam muito mais ‘walk-thru’ e muito menos black-box.
O resultado final?
“[…]The results indicate that there is a lot of variation in flavor profiles within the different scotch whisky regions. Note that initial cluster centers are chosen at random. In order to replicate the results, you will need to run the following code before your analysis.
set.seed(1) Further data analysis would be required to determine whether proximity to types of water sources or terrain types drive common flavor profiles. This could be done by obtaining shape files and adding them as an additional layer to the ggmap plot.
For me, I have identified my next to-try single malt. Talisker is still within the familiar realm of cluster 4 but a little more malty, fruity and spicy. Sounds like the perfect holiday mix. […]”