GLM vs regression

less than 1 minute read

Published: December 23, 2019

In normal regression the outcome is a continuous variable, and we assume that it is normally distributed.

However, not in all the cases the outcomes are normally distributed continuous variables. Here are a few examples:

1) Outcome variable is binary (i.e. two classes; e.g. survived/not-survived) 2) Outcome variable is categorical (i.e. multiple classes; e.g. 10 land cover classes) 3) Outcome variable is count data (e.g. number of accidents a year) 4) Outcome variable is continuous but not normally distributed, rather skewed (e.g. US income distribution, right skewed) 5) Other cases

Linear regression is not good in those instances. So we choose other models as follows:

In #1 and #2 cases above where outcome variables are binary or categorical, we could employ any of the machine learning classification models (e.g. logistic regression for #1, random forest for #2).

So that leaves two more cases to deal with: for count data and continuous but skewed data. The way we deal with them is by fitting a GLM function, but specifying a “link” function in the glm model as appropriate.

Share on

Twitter Facebook LinkedIn

Spatial data visualization with tidycensus

3 minute read

Published: May 06, 2019

Location! Location!! Location!! The location people live in tells us a lot about the space itself as well as the people who live in there. This demo is about spatial data visualization with tidycensus R package with two variables of interest – population and race distribution. First we will get the big picture at the Virginia state scale, then will zoom in on northern Virginia in Washington DC metro area.

Optimizing price, maximizing revenue

4 minute read

Published: April 19, 2019

Setting a right price of products/services is one of the most important decisions a business can make. Under-pricing (and over-pricing) can hurt a company’s bottom line. Two determinants/indicators of business revenue are product prices and quantity sold. At higher price revenue is expected to be higher, if quantity sold is constant. However we know from our everyday experience that price and quantity are inversely related – as the price of something goes up, people show less intent to buy it.

Predicting the demise of retail bookstores: a time series forecasting

4 minute read

Published: December 24, 2018

“The internet is killing retail. Bookstores are just the first to go.” – quoted in the NYT article. Signs are everywhere. Book World is closing it’s stores and Barnes & Noble closed 10% of it’s stores in just the last 5/6 years and this February it shedded 1800 jobs.

Mahbubul Alam, PhD

GLM vs regression

Share on

You May Also Enjoy

My journey as a quantitative scientist

Spatial data visualization with `tidycensus`

Optimizing price, maximizing revenue

Predicting the demise of retail bookstores: a time series forecasting

Mahbubul Alam, PhD

Share on

You May Also Enjoy

My journey as a quantitative scientist

Spatial data visualization with tidycensus

Optimizing price, maximizing revenue

Predicting the demise of retail bookstores: a time series forecasting

Spatial data visualization with `tidycensus`