Best Single Model Scores

nickil21 · February 25, 2018, 6:34pm

My stratified 10 fold CV scores from a single model which give me the current best LB score of 0.1498 are as follows:

Country A : 0.2679 (with 344 features)
Country B : 0.1990 (with 1906 features)
Country C: 0.0189 (with 164 features)

This corresponds to an overall weighted mean logloss score of 0.1656

Knowing that the feature dimensions are pretty large, I tried feature selection using Boruta, RFE, removing columns containing too many missing values - This showed some improvements in my CV, but the scores worsened upon submitting.

I would like to know the best logloss scores you have achieved when you validated countrywise and also whether you had any luck with feature reduction/stacking/Neural Nets so far.

All the best to everyone for the last 3 days!

LastRocky · February 26, 2018, 9:01am

Thanks for your sharing! My stratified 10 fold CV scores from a single model which give me current best LB score of 0.1526 are as follows:

Country A : 0.2619
Country B : 0.1962
Country C: 0.0159

This corresponds to an overall weighted mean logloss score of 0.1612. Compared to your results, it seems my local validation scores don’t correspond to the PB well.

Don’t have any success with feature reduction. Haven’t tried stacking yet. But Neural Nets help me a lot. Good luck for the last 2 days! Cheers!

nickil21 · February 26, 2018, 9:19am

@LastRocky: I appreciate your response. It’s good to see Neural Nets are working for you, maybe you could do a weighted average with another model to improve your scores even further. What are the country-wise breakup of feature dimensions you’re using?

payback · February 26, 2018, 9:27am

hi i haven’t submitted my last results so far, i ll do in these days
with RFE based on random forest my results are
0.2943909 0.2014885 0.018
I ll try submitting again with all the features but when you say 1096 features do you mean with dummy encoding?
For feature selection i tried Boruta and RFE(very long time computing it) and like you removing features with too many NAs

I have lost so much time trying to balance country B, i have tried:
oversampling the minority class
oversampling the whole dataset
Smote
Rose
weighted observation
also i tried a lot of times to use autoencoders
all these approaches failed to improve

for the final model i’m using a stacked ensemble of neural network, GBM, lasso, elastic net, ridge, random forest, i would like to use also SVM but aren’t implemented in h2o

nickil21 · February 26, 2018, 9:39am

@payback: Wow, I see a lot of models there. And yes, those 1906 features are one hot encoded values of categorical variables. I believe if you focus on improving predictions for Country A, you can significantly increase your scores. All the best!

payback · February 26, 2018, 9:46am

@nickil21 yes for country A and B i’ve built a lot of models! I know country A is the best way to improve score, but i have lost too much time on country B and on feature selection

payback · February 26, 2018, 3:47pm

@nickil21 have you done any feature engineering/extraction? i have done one that led me significant improvements on A and B

nickil21 · February 26, 2018, 3:52pm

Yes - Like I mentioned in the other thread, grouping on the ID column and aggregating features for both categorical and numerical independent variables and finally merging helped improve scores a bit. Apart from that, I haven’t been able to generate any.

payback · February 26, 2018, 3:56pm

@nickil21 what do you mean for grouping on the ID column?
maybe you mean the iid column on individual trainset?

nickil21 · February 26, 2018, 3:58pm

IId is just an indicator to capture the size, isn’t it?

payback · February 26, 2018, 4:01pm

iid are the others family members

nickil21 · February 26, 2018, 4:03pm

Cool, I did mean the ID column.

payback · February 26, 2018, 4:13pm

@nickil21 please explain

nickil21 · February 26, 2018, 4:50pm

I think if you follow this thread closely, you should be able to comprehend fairly easily.

payback · February 26, 2018, 5:16pm

@nickil21 cool, i use R not python that python function is very cool, it does implicit engineering, i was confused of it’s name groupby, thxs!

Topic		Replies	Views
Share the knowledge Pover-T Tests: Predicting Poverty	29	2714	March 4, 2018
Help with setting up cross validation Pover-T Tests: Predicting Poverty	13	2923	February 14, 2018
Need suggestion Pover-T Tests: Predicting Poverty	2	739	February 24, 2018
Sharing CV scores and LB scores VisioMel Challenge	6	668	May 11, 2023
1st Place Solution Countable Care	3	5269	August 13, 2015

Best Single Model Scores

Related topics