Can we use metadata in modelling?

Loki_K · January 5, 2023, 2:24am

Are we allowed to use metadata to make predictions or will that lead to disqualification ?

kwetstone · January 5, 2023, 2:18pm

Hi @Loki_K! Can you be a little more specific about what metadata you are considering using?

Loki_K · January 5, 2023, 5:30pm

I’m talking about metadata provided on competition website. Am wondering if I can use date time feature, cause test and train dateranges overlap know?!

Loki_K · January 5, 2023, 5:33pm

Also can i know what percentage of actual test data is used for public leaderboard?

kwetstone · January 5, 2023, 10:35pm

You can use the date and time metadata for prediction. The date ranges do overlap between train and test, but geographic areas do not. This means that none of the test set locations are in the training data, so your model’s performance will be measured on unseen locations.

If you are worried about a more specific approach, don’t hesitate to reach out again!

We are not releasing the percent of the test data used for the public leaderboard.

Good luck!

Loki_K · January 6, 2023, 4:56am

Ohh. I didn’t see this. Thank you.

Sry for these silly questions. I just want to make sure my approach is right.

So does this refer to metadata too?

kwetstone · January 9, 2023, 5:19pm

Yes, that does also refer to metadata. For a given sample, you can only use information that was already available at the time the sample was taken.

Loki_K · January 10, 2023, 1:40am

Thank you for the clarification

trex3 · January 28, 2023, 1:13pm

Hi Katie,

My question is similar to Loki’s. Is it possible to use any feature in the metadata (month, latitude, longitude and density) in the modeling? I understand dates should not overlap. That is, if I am making a prediction for a datapoint in Jan-2015 I can only use data obtained before Jan-2015 to make that prediction.

Regards,
T.

Loki_K · January 29, 2023, 7:30am

I guess you can use any feature as long as there is no overlap.

kwetstone · January 30, 2023, 4:12pm

@trex3 Yes, you can use features in the provided metadata (either from the competition data or from one of the approved sources) as long as the data was available at the time of the sample you are predicting on. Your example is correct - for a data point in Jan 2015, you could only use metadata from Jan 2015 or earlier.

Feel free to reach out with any other questions!

Topic		Replies	Views
How are you guys validating? Tick Tick Bloom Challenge	9	486	February 7, 2023
Clarification on features' dates used for prediction NASA Airathon	9	690	February 4, 2022
Can we use time-series model? Snowcast Showdown	9	728	January 7, 2022
Are we allowed to use metadata (gsd and city) as input to model for prediction? Overhead Geopose Challenge	3	518	July 12, 2021
Are we allowed to use the age and gender metadata as input to the model? PREPARE Challenge	1	78	December 13, 2024

Can we use metadata in modelling?

Related topics