How to aggregate in pandas

Hi…

Any idea how to aggregate hhold and indiv data by “id” after merging the two files. I want to aggregate and use the most frequent values of the features.

Thanks.

You can group the items by the id and then perform operations inside the groups by using the pandas groupby function.

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html

pd.merge
pd.concat
pd.join

How would you do it in R?

it’s not streightforward but you can look at dplyr helpfile at join, this is my code:

train_ind_a <- fread("./data/A_indiv_train.csv", stringsAsFactors = TRUE)
test_ind_a <- fread("./data/A_indiv_test.csv", stringsAsFactors = TRUE)

combi_a_indiv <- bind_rows(train_ind_a, test_ind_a)

combi_a_indiv <- subset( combi_a_indiv, select = -c(iid, poor,country ) )
combi_a_indiv <- combi_a_indiv[!duplicated(combi_a_indiv$id), ]

combi_a <- join(combi_a, combi_a_indiv, by=“id”, type=“inner”)

1 Like