GLM vs regression
Published:
In normal regression the outcome is a continuous variable, and we assume that it is normally distributed.
Published:
In normal regression the outcome is a continuous variable, and we assume that it is normally distributed.
Published:
This is a not-so-brief history of my research journey that took off when I was an undergrad, that continued in grad school & postdoc, to where I am today in 8+ years of my post-PhD career.
tidycensus
Published:
Location! Location!! Location!! The location people live in tells us a lot about the space itself as well as the people who live in there. This demo is about spatial data visualization with tidycensus
R package with two variables of interest – population and race distribution. First we will get the big picture at the Virginia state scale, then will zoom in on northern Virginia in Washington DC metro area.
Published:
Setting a right price of products/services is one of the most important decisions a business can make. Under-pricing (and over-pricing) can hurt a company’s bottom line. Two determinants/indicators of business revenue are product prices and quantity sold. At higher price revenue is expected to be higher, if quantity sold is constant. However we know from our everyday experience that price and quantity are inversely related – as the price of something goes up, people show less intent to buy it.
Published:
“The internet is killing retail. Bookstores are just the first to go.” – quoted in the NYT article. Signs are everywhere. Book World is closing it’s stores and Barnes & Noble closed 10% of it’s stores in just the last 5/6 years and this February it shedded 1800 jobs.
Published:
Beginner or an advanced learner, if you are interested in time series analysis and forecasting, there are only few materials and blogposts that should meet your 95% needs.
Published:
wbstats
package Published:
I’m deliberately avoiding forecasting theories here. If you are interested in theories, plenty of materials are out there (see Rob Hyndsman’s extensive work, for example). Instead, in this series I’ll to do lot’s of forecasting with many different types and shapes of real world data. I’ll pick a dataset, do some analysis. Along the way I may explain why I’ m doing what I’m doing; but no theories.
Published:
What is a great learning process in data science? Traditionally it starts with theories behind an algorithm, then the mechanics of it and finally by exercising with one or two examples. Unfortunately this approach rapidly kills the excitement of learners, who loose interest in that algorithm real fast. They never get to the exitment of solving problems. Question is – is this the most efficient way to learn something in data science? Probably not.
Published:
I picked up a little book called “Finance Basics” published by Harvard Business Review Press, for a short in-flight reading. This tiny book is certainly not going to make someone a finance expert but I did find a few things useful for data scientists and business analysts whose background is not finance or economics. Data science is truly a multi-disciplinary area with people coming from many different backgrounds and areas of expertise, often with little to no exposure to finance and economics. So I am highlighting few things that could be valuable for data science community.
Published:
In a previous blog I wrote about 6 potential applications of time series data. To recap, they are the following:
Published:
This is a quick peek into my #MicroDataScience work on housing/real estate topics. Overall I am interested in to what extent metro stations affect housing rent. For this I chose two metro stations – one is Franconia-Springfield in the south of DC and the other is Vienna-GMU in the west of DC, both in northern Virginia. The metro stations serve 4 cities in the dataset: Franconia/Mt Vernon (Franconia-Springfield metro), Vienna/Oakton (Vienna-GMU metro). Apartments within 20 minutes walking distance to metro stations are assumed to take advantage of higher rental benefits (if any).
Published:
Businesses and organizations generate a high volume of data every single day – be it sales figure, revenue, traffic, or operating cost. These are valuable information for everyday business decisions and long-term policy development. Despite the tremendous potentials time series data are often under-utilized. Here are 6 ways how temporal data can be used in business analytics.
Published:
Developed at MIT’s Sloan School of Management in 1950s system dynamics is a methodological approach to model the behavior of complex systems, where change in one component leads to change in others (like the domino effect with feedback loops added). This approach is widely applied in industries such as healthcare, disease research, public transportation, business management and revenue forecasting. The most famous application of system dynamics probably is in Limits to Growth.
Published:
Do protected areas create spill over deforestation by shifting pressure outside restricted areas? This can undermine conservation efforts and will be unfortunate if this is indeed the case. I haven’t seen much work on this issue. There are few studies, though, which hint at deforestation leakage (see Renwick et al 2015; Ewers and Rodrigues 2008).