Blog posts

2019

GLM vs regression

less than 1 minute read

Published:

In normal regression the outcome is a continuous variable, and we assume that it is normally distributed.

My journey as a quantitative scientist

6 minute read

Published:

This is a not-so-brief history of my research journey that took off when I was an undergrad, that continued in grad school & postdoc, to where I am today in 8+ years of my post-PhD career.

Spatial data visualization with tidycensus

3 minute read

Published:

Location! Location!! Location!! The location people live in tells us a lot about the space itself as well as the people who live in there. This demo is about spatial data visualization with tidycensus R package with two variables of interest – population and race distribution. First we will get the big picture at the Virginia state scale, then will zoom in on northern Virginia in Washington DC metro area.

Optimizing price, maximizing revenue

4 minute read

Published:

Setting a right price of products/services is one of the most important decisions a business can make. Under-pricing (and over-pricing) can hurt a company’s bottom line. Two determinants/indicators of business revenue are product prices and quantity sold. At higher price revenue is expected to be higher, if quantity sold is constant. However we know from our everyday experience that price and quantity are inversely related – as the price of something goes up, people show less intent to buy it.

2018

Time series forecasting resources in Python and R

1 minute read

Published:

Beginner or an advanced learner, if you are interested in time series analysis and forecasting, there are only few materials and blogposts that should meet your 95% needs.

Time Series Analysis & Forecasting of New Home Sales

9 minute read

Published:

Table of Contents

  1. Introduction
  2. Objectives
  3. Data & Methods
  4. Results
    4.1 Exploratory Data Analysis (EDA)
    4.2 Forecasting 4.2.1 Input data & decomposition
    4.2.2 Forecasting with HW Exponential Smoothing
    4.2.3 Forecasting with ETS
    4.2.4 Forecasting with ARIMA
  5. Discussion & Conclusion
    5.1 Model evaluation
    5.2 General conclusions
    5.3 Discussion

Benchmark timeseries forecasting exercise using wbstats package

5 minute read

Published:

I’m deliberately avoiding forecasting theories here. If you are interested in theories, plenty of materials are out there (see Rob Hyndsman’s extensive work, for example). Instead, in this series I’ll to do lot’s of forecasting with many different types and shapes of real world data. I’ll pick a dataset, do some analysis. Along the way I may explain why I’ m doing what I’m doing; but no theories.

Learning Data Science: theories or examples?

1 minute read

Published:

What is a great learning process in data science? Traditionally it starts with theories behind an algorithm, then the mechanics of it and finally by exercising with one or two examples. Unfortunately this approach rapidly kills the excitement of learners, who loose interest in that algorithm real fast. They never get to the exitment of solving problems. Question is – is this the most efficient way to learn something in data science? Probably not.

Basic finance that all data scientists should know

2 minute read

Published:

I picked up a little book called “Finance Basics” published by Harvard Business Review Press, for a short in-flight reading. This tiny book is certainly not going to make someone a finance expert but I did find a few things useful for data scientists and business analysts whose background is not finance or economics. Data science is truly a multi-disciplinary area with people coming from many different backgrounds and areas of expertise, often with little to no exposure to finance and economics. So I am highlighting few things that could be valuable for data science community.

Apartment Rents 15% Higher Near Metro Stations

1 minute read

Published:

Introduction

This is a quick peek into my #MicroDataScience work on housing/real estate topics. Overall I am interested in to what extent metro stations affect housing rent. For this I chose two metro stations – one is Franconia-Springfield in the south of DC and the other is Vienna-GMU in the west of DC, both in northern Virginia. The metro stations serve 4 cities in the dataset: Franconia/Mt Vernon (Franconia-Springfield metro), Vienna/Oakton (Vienna-GMU metro). Apartments within 20 minutes walking distance to metro stations are assumed to take advantage of higher rental benefits (if any).

Time series data mining: 6 applications

1 minute read

Published:

Businesses and organizations generate a high volume of data every single day – be it sales figure, revenue, traffic, or operating cost. These are valuable information for everyday business decisions and long-term policy development. Despite the tremendous potentials time series data are often under-utilized. Here are 6 ways how temporal data can be used in business analytics.

Data science meets system dynamics (part 1)

1 minute read

Published:

Developed at MIT’s Sloan School of Management in 1950s system dynamics is a methodological approach to model the behavior of complex systems, where change in one component leads to change in others (like the domino effect with feedback loops added). This approach is widely applied in industries such as healthcare, disease research, public transportation, business management and revenue forecasting. The most famous application of system dynamics probably is in Limits to Growth.

Protected areas and “deforestation leakage”: There is a smoke, is there a fire?

1 minute read

Published:

Do protected areas create spill over deforestation by shifting pressure outside restricted areas? This can undermine conservation efforts and will be unfortunate if this is indeed the case. I haven’t seen much work on this issue. There are few studies, though, which hint at deforestation leakage (see Renwick et al 2015; Ewers and Rodrigues 2008).