How to aggregate in pandas

MaherIbrahim · February 3, 2018, 10:01pm

Hi…

Any idea how to aggregate hhold and indiv data by “id” after merging the two files. I want to aggregate and use the most frequent values of the features.

Thanks.

RonL · February 17, 2018, 8:34am

You can group the items by the id and then perform operations inside the groups by using the pandas groupby function.

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html

authman · February 20, 2018, 4:59pm

pd.merge
pd.concat
pd.join

wbickelmann · February 21, 2018, 6:54pm

How would you do it in R?

payback · February 21, 2018, 7:13pm

it’s not streightforward but you can look at dplyr helpfile at join, this is my code:

train_ind_a <- fread("./data/A_indiv_train.csv", stringsAsFactors = TRUE)
test_ind_a <- fread("./data/A_indiv_test.csv", stringsAsFactors = TRUE)

combi_a_indiv <- bind_rows(train_ind_a, test_ind_a)

combi_a_indiv <- subset( combi_a_indiv, select = -c(iid, poor,country ) )
combi_a_indiv <- combi_a_indiv[!duplicated(combi_a_indiv$id), ]

combi_a <- join(combi_a, combi_a_indiv, by=“id”, type=“inner”)

Topic		Replies	Views
How to merge hhold_train and indiv_train datasets in r Pover-T Tests: Predicting Poverty	5	1004	January 28, 2018
How to combine data? Pover-T Tests: Predicting Poverty	2	1186	January 10, 2018
Calling on the LB leaders: Did you use the indiv data at all? Pover-T Tests: Predicting Poverty	15	1554	February 22, 2018
Prediction format Pover-T Tests: Predicting Poverty	4	1274	February 6, 2018
Having trouble submitting? Pump it Up: Data Mining the Water Table	13	5225	October 20, 2016

How to aggregate in pandas

Related topics