About the Poverty Prediction Challenge category

hannahmoro · December 6, 2025, 3:44am

This category is for posts about the Poverty Prediction Challenge. Please keep all posts here specific to the competition.

tomkyte · December 14, 2025, 1:06pm

ask two questions: first question how to understand the leaderboad 2 metrics.In my opioion the main score is the end score.the second score is poverty score.the poverty score is np.sum(weight*(1 if below threod else 0))/np.sum(weight) and use averge w2 to sum. is that ture? second question how to calculus the household mape .it is mean mape of money of or weighted mean mape of money?can any one help me . thank you very much.

yummyfe · December 21, 2025, 11:02am

I try to use some kind of “second stage“ to transform hh consumption into these thresholds, but it seems that you can get the threshold in train_rates_gt.csv from hh consumption in the train data multiplied by the corresponding weights in the train feature csv.

AmineSamoudi · December 24, 2025, 3:30pm

For each survey and each threshold t, povery rates can be recomputed using a weighted ratio of consumptions. and can be correct up to 1.1e-16 . But it is not always the best way to predict the poverty rates by infering them from the consumption values because :

Mapping consumption → poverty rates is deterministic only when consumption is correct. and in that challenge errors in consumption is waaay higher than errors in poverty rates
Error propagation is brutal at thresholds (step-function sensitivity)
Poverty metric is a weighted MAPE across thresholds: emphasizes certain ventiles and penalizes relative error
Weighted aggregation (for poverty rates) amplifies mistakes in high-weight regions

oknaitik · December 29, 2025, 12:56pm

Could you elaborate on how one could model the 2nd stage without overfitting, given that there are only 3 survey distributions? Also, did you think about data shift given that surveys are from different years? I want to do learn and get better. Thanks!

oknaitik · December 29, 2025, 1:01pm

Thanks for those inputs. Yes, the consumption MAPE is far greater than poverty rate which probably mandates one to build a robust yet accurate consumption predictor. I’d engineered a diverse set of features only to see marginal improvements in validation score.
Also, did you think about data shift given that surveys are from different years? I want to learn more and get better. Thanks!

AmineSamoudi · December 29, 2025, 6:14pm

I think survey 100k is the outlier here. You can perform a drift study and compare surveys but you can not use it in solving this challenge, because they clearly mentioned that “By default, this precludes using information gathered across multiple test samples as feature inputs or target labels for model training, for instance through pseudo labeling or unsupervised learning on the test set. As a result, running model training code with the same training data but a different set of test data or no test data should produce the same model weights and fitted feature parameters. Eligible solutions must be able to run inference on new test data automatically, without retraining the model.”
So I would not think a lot about data shift as input foro the modelling but I will do calibration afterwards

AmineSamoudi · December 29, 2025, 6:19pm

If I understand the reponse correctly, he was talking about getting poverty rates from consumption, which, which is completely deterministics . You can get poverty rates from consumption via a single equation using weights column and the consumption. So , there will be no overfitting here.

chrisk-dd · January 9, 2026, 2:15pm

Hi @tomkyte, and apologies for my belated response here.

The way I think about the leaderboard metrics is that the primary metric is a “mixed” or blended metric, and the secondary metric is one of the components of the blended metric.

It’s easiest to start with the secondary metric. The secondary metric is a weighted average of the absolute percentage error between the predicted poverty rate at a given threshold and the actual poverty rate at a given threshold. In effect, this is measuring how well you measure the overall poverty distribution. Values closer to the 40th percentile of the distribution are given more weight.

The primary metric also factors in the household-level absolute percentage error between predicted consumption and actual consumption.

I hope this explanation helps!

Best,
Chris

phucdkbk · January 24, 2026, 4:03am

is train poverty rate in train_rates_gt.csv calculated from train_hh_gt.csv?
I tried to calculate poverty rate from train_hh_gt.csv and see different result

or train_rates_gt.csv is calculated from bigger dataset

Topic		Replies	Views
How should I interpret NA values in food consumption variables? Poverty Prediction Challenge	8	402	January 12, 2026
Submission metric is wrong Poverty Prediction Challenge	2	85	February 5, 2026
Concern about the bonus prize Poverty Prediction Challenge	3	256	February 2, 2026
About the Pover-T Tests: Predicting Poverty category Pover-T Tests: Predicting Poverty	0	1166	December 19, 2017
Test Data - Labels Poverty Prediction Challenge	3	116	February 9, 2026

About the Poverty Prediction Challenge category

Related topics