How should I interpret NA values in food consumption variables?

There are some missing values in the consumption variables e.g 5 NAs in “consumed1300”, is this supposed to be imputed or removed? How should I interpret such missing values given they seem only to exist in the consumption variables?

Also is the training data already weighted?

Hi @seanprb,

NA values are “not applicable,” effectively undefined or null. How you treat these values is up to you!

The training data (train_hh_features) contains weights in the “weight” column.

Best,
Chris

Can you confirm if the percentile rank (p_t) corresponding to threshold t in poverty rate MAPE computation corresponds to survey 300000, even for leaderboard eval?

Hi @oknaitik-

The poverty rate distribution MAPE shown on the leaderboard corresponds to values in the test dataset, not in the training dataset.

Best,
Chris

But what about the weights w_t (refer to snap attached) used in computation for leaderboard?

The weights w_t are the same at each threshold for each survey (i.e., the weight is 1 for the 40th percentile threshold for each survey).

Sure they are same irrespective of the survey. But are those weights based on survey 300000 data, i.e. 3rd row in train_rates_gt.csv? Please confirm this!

I’m a little unclear on the question, let me try to answer with an example:

The dollar value of the various poverty thresholds are fixed, so for your predictions for every survey you are predicting the percentage of the population with a consumption below $3.17, $3.94, $4.60, … etc. Your prediction for the poverty rate at $7.70 is weighted with a weight of 1. Your predictions for the poverty rate at $7.06 and $8.40 are weighted with a weight of 0.95. And so on as specified in the metric section.

So, the thresholds are set by survey 300000. The weights at each threshold are the same for all submitted predictions.

Does this answer your question?

Oh man! I’m not sure how costly this might be for me. I’ve been using the exact poverty rates at various thresholds from train_rates_gt.csv (survey `300000`) for the wMAPE calculation the entire time. :persevering_face:

However, the poverty rates even in that file are very close to the aforesaid values. Thanks for the clarification!