Hello, I have been working through the magnet benchmark blog notebook (link below) and I think the validation set has been taken from the test set… i.e. they both have the same data points.
In notebook cell 19 an interim set has been created which excludes the test set. However the validation set is taken from the master data set. Given the method for splitting the set is to hold out the last 6000/3000 rows for test/val, we end up with the validation set being the same as half the test set?
If I’m wrong, and I probably am, can anyone put me straight?
If I’m right and others are re-using the code, this will likely mean the models won’t generalise well.
Any comments appreciated. Thanks.
@JM1000 Thanks for pointing this out, I hadn’t noticed it. In fact I was answering saying that you were missing the fact that they were removing the used data in each of the datasets in this interim dataframe. But I didn’t noticed that they introduce a bug in get_train_test_val function:
test = data.groupby(“period”).tail(test_per_period)
interim = data[~data.index.isin(test.index)]
The problem is here:
val = data.groupby(“period”).tail(val_per_period) —> BUG
val = interim.groupby(“period”).tail(val_per_period)
train = interim[~interim.index.isin(val.index)]
Hope this helps others too.
this might explain part of the big difference between my local test score and the leaderboard. I’m in fact calculating the test score using half of the validation data
@JM1000 Thanks for catching this bug! You are indeed right, validation should be taken from the
interim set instead of
data. @ngcferreira’s fixes are correct. I’ll update the blog today to reflect. Thanks again to both of you!
Hi, thanks for the responses. Glad this was sorted.
Thanks a lot, great catch!