Hi…
Any idea how to aggregate hhold and indiv data by “id” after merging the two files. I want to aggregate and use the most frequent values of the features.
Thanks.
Hi…
Any idea how to aggregate hhold and indiv data by “id” after merging the two files. I want to aggregate and use the most frequent values of the features.
Thanks.
You can group the items by the id and then perform operations inside the groups by using the pandas groupby function.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html
pd.merge
pd.concat
pd.join
How would you do it in R?
it’s not streightforward but you can look at dplyr helpfile at join, this is my code:
train_ind_a <- fread("./data/A_indiv_train.csv", stringsAsFactors = TRUE)
test_ind_a <- fread("./data/A_indiv_test.csv", stringsAsFactors = TRUE)
combi_a_indiv <- bind_rows(train_ind_a, test_ind_a)
combi_a_indiv <- subset( combi_a_indiv, select = -c(iid, poor,country ) )
combi_a_indiv <- combi_a_indiv[!duplicated(combi_a_indiv$id), ]
combi_a <- join(combi_a, combi_a_indiv, by=“id”, type=“inner”)