Are we allowed to use metadata to make predictions or will that lead to disqualification ?
Hi @Loki_K! Can you be a little more specific about what metadata you are considering using?
I’m talking about metadata provided on competition website. Am wondering if I can use date time feature, cause test and train dateranges overlap know?!
Also can i know what percentage of actual test data is used for public leaderboard?
You can use the date and time metadata for prediction. The date ranges do overlap between train and test, but geographic areas do not. This means that none of the test set locations are in the training data, so your model’s performance will be measured on unseen locations.
If you are worried about a more specific approach, don’t hesitate to reach out again!
We are not releasing the percent of the test data used for the public leaderboard.
Ohh. I didn’t see this. Thank you.
Sry for these silly questions. I just want to make sure my approach is right.
So does this refer to metadata too?
Yes, that does also refer to metadata. For a given sample, you can only use information that was already available at the time the sample was taken.
Thank you for the clarification
My question is similar to Loki’s. Is it possible to use any feature in the metadata (month, latitude, longitude and density) in the modeling? I understand dates should not overlap. That is, if I am making a prediction for a datapoint in Jan-2015 I can only use data obtained before Jan-2015 to make that prediction.
I guess you can use any feature as long as there is no overlap.
@trex3 Yes, you can use features in the provided metadata (either from the competition data or from one of the approved sources) as long as the data was available at the time of the sample you are predicting on. Your example is correct - for a data point in Jan 2015, you could only use metadata from Jan 2015 or earlier.
Feel free to reach out with any other questions!