Benchmark forecasting: Japanese population by 2030

I’m deliberately avoiding forecasting theories here. If you are interested in theories, plenty of materials are out there (see Rob Hyndsman’s extensive work, for example). Instead, in this series I’ll to do lot’s of forecasting with many different types and shapes of real world data. I’ll pick a dataset, do some analysis. Along the way I may explain why I’ m doing what I’m doing; but no theories.

The dataset

In todays example I picked population growth dataset from Japan. I know that population in Japan is going down, so just out of curiocity I was interested in looking at historical trend and see what the future looks like in a business-as-usual situation. We will talk about some alternatives scenarios at the end.

First we need to get the data. There’re several sources, but the World Bank has the richest country sacle datasets on numerous different indicators. We could download the data from the World Bank website as csv file, then clean it up and import here. But fortunately, someone has done all of these in an R package called wbstats so we don’t have to go through the trouble.

Below I’m going step by step, from data preparation to forecasting and all the way to interpretation of the results.

Preparing the data

# load `wbstats` library
library(wbstats)
# we also need data wrangling package `dplyr`
library(dplyr)

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

# import data
jppop = wb(indicator = "SP.POP.TOTL", country = "JP", startdate=1960, , enddate=2017)

# view just first 2 rows
head(jppop)[1:2,]

iso3c	date	value	indicatorID	indicator	iso2c	country
JPN	2017	126785797	SP.POP.TOTL	Population, total	JP	Japan
JPN	2016	126994511	SP.POP.TOTL	Population, total	JP	Japan

# from the dataframe we'll keep only 2 columns: data & value.
jppop = jppop[c(2,3)]

# change the order of the year from descending to ascending
jppop = jppop[order(jppop$date),]

# plot the data to see how it looks like
options(repr.plot.width = 7, repr.plot.height = 5) # set figure size
plot(jppop$value~jppop$date)

png

# remove the date column, we don't need it any more
jppop=jppop[c(2)]

# convert the dataframe into a times series (ts)) object
data = ts(jppop, start =1960)

# conver population to millions for easy visuals
data = data/1000000

Forecasting

# now we are in forecasting business.
# first load `fpp2` and `forecast()` package (importing just `fpp2` should work, but just in case). 
library(fpp2)
library(forecast)
library(ggplot2) # you may or may not need it, but just in case

# plot the ts object we created, this time using autoplot() function that comes with forecast() package
options(repr.plot.width = 7, repr.plot.height = 3) # set figure size
autoplot(data) + ggtitle("Population trend in Japan") + xlab("Year") +  ylab("Millions")

png

From this plot alone we can say a lot about population in Japan, some are obvious from the figure while some needs little digging. Here’s a few:

Current population in Japan is around 126 million
Total population has grown until around 2010 and then declining ever since
Besides few east European nations Japan is the only developed country to experience population drop
There are many reasons for this, but growing number of aging population and younger generation not willing to have kids are two big causes

# we are doing forecasting for the year 2030 (13 years into the future from 2017, hence h=13) using five simple models
data.mean=meanf(data, h=13) # mean forecast
data.naive = naive(data, h=13) # naive forecat
data.rwf_drift = rwf(data, h=13, drift=TRUE) # random walk forecast with drift
data.spline = splinef(data, h=13) # local linear forecast
data.ets = forecast(data, h=13) # automatic ETS forecast (Exponential Smoothing)

# view all the forecasts we just made alltogether in one figure
options(repr.plot.width = 12, repr.plot.height = 4.5) # set figure size
autoplot(data) + autolayer(data.naive, series = "Naive", PI=FALSE) + 
autolayer(data.rwf_drift, series = "RWF with drift", PI=FALSE) + 
autolayer(data.mean, series = "Mean forecast", PI=FALSE) + 
autolayer(data.spline, series = "Local linear forecast", PI=FALSE) + 
autolayer(data.ets, series = "Automatic ETS forecast", PI=FALSE)

png

These forecasting results can be interpreted in many ways. First it doesn’t look like RW or mean model looks any plausible, may in the longer term but not for the next 13 years period. On the other hand the other 3 forecasts look really plausible. But let’s watch what all the forecast values are

# find out forecast values of each method
mean = round(data.mean$mean[13])
naive = round(data.naive$mean[13])
rwf_drift = round(data.rwf_drift$mean[13])
ets = round(data.ets$mean[13])
local_linear = round(data.spline$mean[13])
t(data.frame(mean, naive, rwf_drift, ets, local_linear))

mean	118
naive	127
rwf_drift	135
ets	125
local_linear	124

Are these models any good?

In a range of scenarios and uncertainties the UN Population Prospects (www.population.un.org) meadian projection is that the population in Japan will be 121.5 million in 2030. The World Bank prjection is dire - 120.2 million by 2030. The nearest of those UN and WB model projections is the linear trend model that shows 124 million people. Any of these model prediction can be right, depending on how Japan responds to current population decline (Random Walk projection is probably not going to happen).

Endnote:

With this wbstats package it’s really easy to make a population forecast. Just by changing country in the very first line of codes one can run the whole forecast (just type “US” instead of “JP”, then run all and watch!). In addition one can have unilimted fun by changing indicator in the same line of code!