Household feature hashing in two countries

sgenzer · December 26, 2017, 6:17pm

hello. Just starting to look at the data here and I’m unclear about the hashing of the question names. Column “CtFxPQPT” appears in both household country A (training) and household country C (training) but they are clearly different questions. In country A, column CtFxPQPT has two answers that are both hashed text (8185 answers are “vSqQC” and 18 answers are “atYJj”) and yet in country C, column CtFxPQPT looks like integers ranging from -1 to -1611. So may we assume that column labels are unique to that country irrespective of them having the same hashed header? And if so, do we assume that there are NO questions that repeat in countries A, B or C? Thank you!

caseyalan · December 27, 2017, 8:30pm

Hi @sgenzer,

Thanks for pointing this out. It looks like a small bug in our obfuscation process led to 6 hashing collisions in the household training data.

All occur between countries A and C:

SlDKnCuu enTUTSQi znHDEHZP CtFxPQPT CNkSTLvx hJrMTBVd

We have confirmed that none of these correspond to the same question, e.g., question SlDKnCuu asks something different for country A than it does C.

As for your other question:

There is some small overlap across countries for each question but not many. The reason these were hashed differently is that the original surveys coded the questions differently. So it’s best to assume no overlap.

Good luck!

sgenzer · January 29, 2018, 11:24pm

thank you @caseyalan!

Topic		Replies	Views
Country B: New Category values observed for Few columns Pover-T Tests: Predicting Poverty	0	925	January 8, 2018
Is the data coded? Is there a way to re-code? Pover-T Tests: Predicting Poverty	1	1091	January 11, 2018
Household train data missing "poor" column Pover-T Tests: Predicting Poverty	3	1389	February 1, 2018
Final prediction on countries different from A,B,C Pover-T Tests: Predicting Poverty	1	1066	December 27, 2017
Household country B Data has so many bugs in R Pover-T Tests: Predicting Poverty	3	1026	February 6, 2018

Household feature hashing in two countries

Related topics