There are some missing values in the consumption variables e.g 5 NAs in “consumed1300”, is this supposed to be imputed or removed? How should I interpret such missing values given they seem only to exist in the consumption variables?
Can you confirm if the percentile rank (p_t) corresponding to threshold t in poverty rate MAPE computation corresponds to survey 300000, even for leaderboard eval?
Sure they are same irrespective of the survey. But are those weights based on survey 300000 data, i.e. 3rd row in train_rates_gt.csv? Please confirm this!
I’m a little unclear on the question, let me try to answer with an example:
The dollar value of the various poverty thresholds are fixed, so for your predictions for every survey you are predicting the percentage of the population with a consumption below $3.17, $3.94, $4.60, … etc. Your prediction for the poverty rate at $7.70 is weighted with a weight of 1. Your predictions for the poverty rate at $7.06 and $8.40 are weighted with a weight of 0.95. And so on as specified in the metric section.
So, the thresholds are set by survey 300000. The weights at each threshold are the same for all submitted predictions.
Oh man! I’m not sure how costly this might be for me. I’ve been using the exact poverty rates at various thresholds from train_rates_gt.csv (survey `300000`) for the wMAPE calculation the entire time.
However, the poverty rates even in that file are very close to the aforesaid values. Thanks for the clarification!