The Impact of Questionnaire Design on Global Poverty

Daniel Gerszon Mahler, Christoph Lakner, Elizabeth Foster, Samuel Kofi Tetteh-Baah, Zander Prinsloo, and Rostand Tchouakam Mbouendeu

Motivation

Comparing apples to oranges

Poverty rate comparisons across countries inform resource allocation and policy design.
Yet the welfare aggregates underlying poverty rates are fundamentally not comparable.
This matters, because changes to how poverty is measured can have large impacts on measured poverty.

Comparability issues

Lack of comparability arises due to differences in measurement of

income or consumption
consumption across countries
consumption within countries over time
income across countries
income within countries over time

Data

The Poverty and Inequality Platform (PIP)

The source of monetary poverty and inequality estimates for the SDGs
Contains poverty estimates from 2500+ surveys spanning 170+ countries
Contains information on whether welfare aggregates within countries are comparable over time

The Poverty Measurement Database (PMD)

Includes more than 200 questions on the construction of welfare aggregates and national poverty lines
Filled out with the help of an AI algorithm that browses through poverty and household survey reports from national statistical offices and the World Bank
Followed by human cross-checking
Contains a substantial amount of missing information when details are unknown

Issues particularly relevant for poverty comparisons

Recall or diary
Recall periods for food items
Recall periods for non-food items
Food-away-from-home included
Durable goods included
Housing included
Spatial deflation accounted for

Method

Intuition

Define the “best practice” consumption aggregate:
- Durable goods, housing, food-away-from-home included
- Spatial deflation accounted for
- Multiple recall periods for food and non-food
Predict what consumption aggregates would have looked like in all countries
Compare poverty rates across countries

Modeling the impact

Denote the inclusion of housing in the consumption aggregate as \(x=1\) and \(x=0\) otherwise.
We want to estimate the causal impact of adding housing on mean consumption (\(y\)) in country, \(c\), at time, \(t\). Suppose we run an OLS:

\(ln(y_{c,t}) = \beta_0 + \beta_1*x_{c,t} + \epsilon_{c,t}\)

The impact likely depends on a country’s income level

\(ln(y_{c,t}) = \beta_0 + \beta_1*x_{c,t} + \color{red}{\beta_2*x_{c,t}*ln(GDPpc_{c,t})} + \epsilon_{c,t}\)

The impact likely differs within countries as well. Suppose we now observe mean consumption per decile, \(d\).

\(ln(y_{\color{red}{d},c,t}) = \beta_0 + \beta_1*x_{c,t} + \beta_2*x_{c,t}*ln(GDPpc_{c,t}) + \color{red}{\beta_3*x_{c,t}*d_{c,t}} + \epsilon_{c,t}\)

Empirical challenges

We have 12 different x’s; with OLS, we would severely overfit
Strong linearity assumptions
Variation may be due to factors correlated with measurement choices and well-being.

Our approach

Use gradient boosting to predict consumption.
Where possible, subtract elements from welfare aggregates (such as removing housing) to minimize omitted variable bias.

Towards comparable consumption aggregates

\(ln(y_{d,c,t}) = f(\mathbf{x_{c,t}},GDP_{c,t},d)\)
Predicted log consumption with current measurement choices \(= ln(\hat{y}_{d,c,t})\)
Define the best practice measurement choices with \(^*\), i.e. \(x_{housing}^*=1\)
Predicted log consumption with best practice measurement choices = \(ln(\hat{y}^*_{d,c,t})\)
Adjusted consumption aggregate \(= ln(y_{d,c,t}) + (ln(\hat{y}^*_{d,c,t})-ln(\hat{y}_{d,c,t}))\)

Uzbekistan 2002

Uzbekistan 2002

Coming up

Add more data to the PMD
Decompose the added welfare
Rederive the international poverty line based on comparable aggregates

Thank you for listening

dmahler@worldbank.org