Train/Test split ratio

BKR · January 22, 2015, 1:22am

Hi,

With such a (relative) small dataset, what ratio do you use for training/testing? Also what is the recommended validation method Bootstrapping or K-Fold in this particular scenario?

I would love to hear your thoughts.

Cheers
BKR

jhpincus · June 9, 2016, 6:28pm

There are no hard and fast rules for sample splitting. A 70/30 (training/test) split worked for me. I use the sample.split function in R’s caTools package.

zenzizenzi · September 20, 2016, 4:17am

Are you talking about splitting the dataset where prediction variable is known, further into training/test datasets? I believe that helps you test the algorithms you have created.

neonqwerty · October 4, 2016, 5:43am

The fact that the dataset is so small is kind of annoying. From what I’ve read, the 70.30 split that jhpincus mentioned seems standard. That said, I was frustrated to see that that the model that performed the best when tested with the 70/30 strategy ended up being in the middle of the pack when I decided to just go ahead and upload all the sane models I tried out.

I’d love to hear more thoughts by experienced users on how to handle small datasets.

Topic		Replies	Views
Training data not enough Warm Up: Predict Blood Donations	0	814	March 16, 2018
Help on predicting Warm Up: Predict Blood Donations	0	763	November 20, 2018
How do you choose your final model? Warm Up: Predict Blood Donations	0	844	July 26, 2018
Train / Test split From Fog Nets to Neural Nets	3	1829	March 14, 2016
First competition question Warm Up: Predict Blood Donations	4	2109	September 12, 2018

Train/Test split ratio

Related topics