Daniel Gerszon Mahler, Christoph Lakner, Elizabeth Foster, Samuel Kofi Tetteh-Baah, Zander Prinsloo, and Rostand Tchouakam Mbouendeu
Poverty rate comparisons across countries inform resource allocation and policy design.
Yet the welfare aggregates underlying poverty rates are fundamentally not comparable.
This matters, because changes to how poverty is measured can have large impacts on measured poverty.
Lack of comparability arises due to differences in measurement of
The source of monetary poverty and inequality estimates for the SDGs
Contains poverty estimates from 2500+ surveys spanning 170+ countries
Contains information on whether welfare aggregates within countries are comparable over time
Includes more than 200 questions on the construction of welfare aggregates and national poverty lines
Filled out with the help of an AI algorithm that browses through poverty and household survey reports from national statistical offices and the World Bank
Followed by human cross-checking
Contains a substantial amount of missing information when details are unknown
Recall or diary
Recall periods for food items
Recall periods for non-food items
Food-away-from-home included
Durable goods included
Housing included
Spatial deflation accounted for
Define the “best practice” consumption aggregate:
Predict what consumption aggregates would have looked like in all countries
Compare poverty rates across countries
Denote the inclusion of housing in the consumption aggregate as \(x=1\) and \(x=0\) otherwise.
We want to estimate the causal impact of adding housing on mean consumption (\(y\)) in country, \(c\), at time, \(t\). Suppose we run an OLS:
\(ln(y_{c,t}) = \beta_0 + \beta_1*x_{c,t} + \epsilon_{c,t}\)
\(ln(y_{c,t}) = \beta_0 + \beta_1*x_{c,t} + \color{red}{\beta_2*x_{c,t}*ln(GDPpc_{c,t})} + \epsilon_{c,t}\)
\(ln(y_{\color{red}{d},c,t}) = \beta_0 + \beta_1*x_{c,t} + \beta_2*x_{c,t}*ln(GDPpc_{c,t}) + \color{red}{\beta_3*x_{c,t}*d_{c,t}} + \epsilon_{c,t}\)
We have 12 different x’s; with OLS, we would severely overfit
Strong linearity assumptions
Variation may be due to factors correlated with measurement choices and well-being.
Our approach
Use gradient boosting to predict consumption.
Where possible, subtract elements from welfare aggregates (such as removing housing) to minimize omitted variable bias.
\(ln(y_{d,c,t}) = f(\mathbf{x_{c,t}},GDP_{c,t},d)\)
Predicted log consumption with current measurement choices \(= ln(\hat{y}_{d,c,t})\)
Define the best practice measurement choices with \(^*\), i.e. \(x_{housing}^*=1\)
Predicted log consumption with best practice measurement choices = \(ln(\hat{y}^*_{d,c,t})\)
Adjusted consumption aggregate \(= ln(y_{d,c,t}) + (ln(\hat{y}^*_{d,c,t})-ln(\hat{y}_{d,c,t}))\)
Add more data to the PMD
Decompose the added welfare
Rederive the international poverty line based on comparable aggregates
dmahler@worldbank.org
