GLM vs regression

less than 1 minute read

Published:

In normal regression the outcome is a continuous variable, and we assume that it is normally distributed.

However, not in all the cases the outcomes are normally distributed continuous variables. Here are a few examples:

1) Outcome variable is binary (i.e. two classes; e.g. survived/not-survived) 2) Outcome variable is categorical (i.e. multiple classes; e.g. 10 land cover classes) 3) Outcome variable is count data (e.g. number of accidents a year) 4) Outcome variable is continuous but not normally distributed, rather skewed (e.g. US income distribution, right skewed) 5) Other cases

Linear regression is not good in those instances. So we choose other models as follows:

In #1 and #2 cases above where outcome variables are binary or categorical, we could employ any of the machine learning classification models (e.g. logistic regression for #1, random forest for #2).

So that leaves two more cases to deal with: for count data and continuous but skewed data. The way we deal with them is by fitting a GLM function, but specifying a “link” function in the glm model as appropriate.