My journey as a quantitative scientist

6 minute read

Published:

This is a not-so-brief history of my research journey that took off when I was an undergrad, that continued in grad school & postdoc, to where I am today in 8+ years of my post-PhD career.

I did my undergrad in forest resource management. It was all about understanding the different kinds of forests and how to manage them properly to optimize multi-purpose goals of production, income and environmental sustainability. A lot of my undergrad work was field based, basically going into forests, collecting all kinds of data on tree height, diameter, etc, then analyzing them and creating biometric models to predict future growth and utilization of forest resources and suggest different management options.

Then I went on to the grad school in Japan to study socio-economic aspects of forest management. That is, what kind of trees people like to have, how do people use different forest products, how much income they earn from selling forest products etc. I did not have enough data, so I had to design household surveys to collect data and develop the biophysical, economic and statistical models (mostly non-parametric comparative analysis, regression, some logistic regression). Besides econometric modeling, part of the analysis was segmentation of households and the forest users, trying to understand their characteristics and behaviour that lead to the decisions they make in forest use and management.

Then after finishing PhD I accepted a postdoctoral fellowship position in Canada to study the differences in environmental cost and benefits in different agricultural systems in Quebec. There were about 30 years of time series data on many features such as production, growth, cost and revenue collected from experimental plots of different agricultural systems. There was no established methodology for doing the analysis we planned, so I had to develop my own methodological process for that and then analysed the data to measure environmental benefits & profitability of different systems. I published both the method and the analyses in a top tier journal in the field.

Then I moved out of the academia to work for an environmental non-profit. The reason being, I thought I had developed suffient expertise and skill set in the “laboratory” environment, now it’s time to put my knowledge into practice and solving real world problems. So I moved to the USA to work for an environmental organization callled Conservation International.

Conservation International works in about 30 different countries, working with governments, businesses and all kinds of stakeholders for protection, conservation and mangement of natural resources. Within the org we have a programm called Ecosystem Values Assessments and Accounting. This multi-disciplinary program has scientists coming from all kinds of backgrounds; we have economists, statisticians, remote sensing specialist, conservationists, geographers, GIS analysts and hydrologists. What we do is basically supporting different countries to develop policies and make informed decisions about their natural resources management, through a wide range of environmental data, analytics, tools and maps. We work closely with stantistical agencies in each country, advise them on what data to collect, how to collect, how to analyse them as well as how to deliver policy briefs based on the analysis.

My role here is all throughout the analytics pipeline - from project design all the way to development of policy suggestions. In a typical project cycle this is what I do:

  • Diagnose the problem along with understanding policy environment and decision-making priorities
  • Find out what data they have. Sometimes data are organized sometimes not, chasing data is a big part of my work
  • If there’s not enough data then I have to design experiments and design surveys to collect data
  • Once we have the data then starts the fun of data analysis, writing reports, creating policy briefs, presenting to the policy-makers
  • I also work closely with United Nations Statistics Division (UNSD) in the development of methodological guidelines and standdards for this kind of work.

At the moment I am leading 3 projects. The first project is in Cambodia, where I’m working with the World Bank. In that region there is a big forest that is being deforested and degraded. Several industries and businesses depend on good health of this forest – one is hydropower companies and the other is rice irrigation systems. There I am working to develop economic models to (1) measure the value of the forest to hydropower companies and to irrigation, (2) then develop alternative scenarios of how these businesses will be affected financially if all forests are gone. (3) Based on that information we will develop policies proposals on how to protect the forest and who will pay for the protection. The second project is in the African country of Angola, where a dataset of 400k households surveys are available on a number of features. I am now searching for additional data to complement the household datasets and will analyse them later on. The third project is in Liberia, in west Africa. It’s a really data poor country, they do not have any data. So the first step is to develop some datasets thought household surveys and forest inventory. This is a five year project, so we have some time to go through whole data science life cycle – from data collection all the way to policy development.

What tools do I use? I am basically a model/tool agnostic. I do not have specific preference for what tool to use. Because of the nature of the problems I deal with it’s not productive to be thinking about tools first problems later. Like many in the academia I started with R, but then gradually moved to Python for data analysis. Currently Python is my go to platform, although I still use R for some specific modeling such as time series forecasting, spatial data analysis, creating maps etc. For data visualization ggplot was my favorite, but because I now use python environment most often, I switched to seaborn and matplotlib. For my machine learning side projects I don’t thinkg there’s a great alternative to scikit-learn. For writing of course I use jupyter notebook; not just for writing codes but sometimes for general writing as well. Now-a-days Google Colab seems to be a good alternative for collaborative writing, but I use it less frequently. Also occationally I use GIS tools (e.g. ArcGIS, rstats libraries).