Back to DrivenData | Blog

How to aggregate in pandas



Any idea how to aggregate hhold and indiv data by “id” after merging the two files. I want to aggregate and use the most frequent values of the features.



You can group the items by the id and then perform operations inside the groups by using the pandas groupby function.




How would you do it in R?


it’s not streightforward but you can look at dplyr helpfile at join, this is my code:

train_ind_a <- fread("./data/A_indiv_train.csv", stringsAsFactors = TRUE)
test_ind_a <- fread("./data/A_indiv_test.csv", stringsAsFactors = TRUE)

combi_a_indiv <- bind_rows(train_ind_a, test_ind_a)

combi_a_indiv <- subset( combi_a_indiv, select = -c(iid, poor,country ) )
combi_a_indiv <- combi_a_indiv[!duplicated(combi_a_indiv$id), ]

combi_a <- join(combi_a, combi_a_indiv, by=“id”, type=“inner”)