my score is 0.7921 and my code is here
Hi @zlatankr! You mentioned that you had a bug in your code that caused más overfitting, could you please elaborate a bit on this? Im also overfitting and not sure why! Thanks!
Hello,
I am a novice in Machine learning. currently rank 104 on blood donation challenge.
I am planning to add new features on my data. My question, when we add feature in your data, do we need to add feature column as well to the holdout data?
apologise if my question is not clear or need clarification.
Thanks
Hi dcart,
Yes, your hold-out data set will always have the same structure as the training data set
Score = 08125. Current rank = 794
Cleaned a little bit. Replaced some 0 and NaN
Created 1 new feature
Transformed all categorical feature with <100 cats
Dropped some highly correlated feature
Used Random Forest & Gradient Boosting Tree Army. The Army performed much better
Watched out for overfitting
Remember, when choosing your model: split your labeled data into train (which can be further splitted for crosvalidation) and test set. But in the end retrain your best model on the entire labeled set for final prediction of unlabeled data
Also please make sure to double check that your labeled set used for training and unlabeled set have identical set of independent features!!
Hello,
We are a team of 3 students and we would like to share our approach.
Our current score is 0.7991
Here is our code :
I currently hold rank 504 with a score of 0.8235. I used an ensemble of four tuned models to get to my final score.

I have written 4 Medium articles on my approach to EDA, data cleaning, feature engineering and modelling, which you can find here: Brenda Loznik – Medium
All code is available on my Github: BrendaLoznik/waterpumps (github.com)