Simple cleaned and processed data with random forest classifier implemented and score 0.8162

maverick_vaibhav · May 26, 2018, 7:19pm

Link to the notebook-
Github link

Hi guys,

A simple visualisation of data in python, trying to explain the meaning of different features and the relationship between different features through plots and charts with very clear visible relationship between some features and target variable .

During cleaning of data I have removed some features which were similar to each other and made one more new feature.

Evaluation of the model is mainly done through two classifiers-
1)Random Forest Classifier
2)XGboost

Of which RFC proved to be the better one after trying various forms of both the classifiers.

Yours suggestions on improving my model are mostly humbly welcome.

Nilzone · May 28, 2018, 12:50pm

My best attempt is 0.8248. I used virtually every feature + some very minor feature engineering. The model was a parameter-tuned version of xgboost.

The last couple of days before the competition ends, I have played around with stacking of different models, which have gotten me 0.82xx, but not better than 0.8248…yet.

There is not a lot of time left, but here is a few things you can try none the less:

ensembles (either with bagging, boosting or stacking)
mean-encoding of target-variable
plot importance if using tree-based models, and drop non important features.
create new features from the ones you have. (feature-engineering)
try to remove features that are very similar to each other.
play around with both dummies and label-encoding.
if you use just dummies-variables and get A LOT of features, perform PCA or LDA on it to minimize the feature-space

rahil049 · April 24, 2019, 4:31pm

Hi @Nilzone !
Congrats on your score. I would like to understand the .82xx score you have achieved…
Is it the score after submitting the predictions of the test data or is it the score of your model on the training/validation data

Topic		Replies	Views
Share your approach! Pump it Up: Data Mining the Water Table	46	20353	December 27, 2021
Is there anybody who tried to use multinomial logistic regression, multiclass linear discriminant analysis or multinomial Naive Bayes? Pump it Up: Data Mining the Water Table	3	2807	December 28, 2017
1st Place Solution Countable Care	3	5266	August 13, 2015
Classification Rate - XGB Model Pump it Up: Data Mining the Water Table	4	2571	October 19, 2016
Spitballing for fun? Richter's Predictor	9	2103	September 30, 2020

Simple cleaned and processed data with random forest classifier implemented and score 0.8162

Related topics