Site id in test set but not in train set

There’s a site id in the test set (STOK) that is not there in the train set. If this is deliberate where can I get the Lat/Long for this site id?

Hi @jash.shah, thanks for bringing this to our attention! It is indeed one of the cases where the first observations occur in the test set. Since we’re not asking you to predict the location of the sites, here’s the site information:

site_id: STOK
camlr_region: 48.1
longitude_epsg_4326: -59.85
latitude_epsg_4326: -62.4

Thank you @charles.hornbaker!

Hi @jash.shah and @charles.hornbaker,

Could I ask two questions about this please.

First, what do we mean by “test set” here? Is it the nest_counts.csv file? If so, that’s not really labelled test data in the standard sense, is it? Isn’t it just a consolidated time series view of the training_set_observations.csv data?

Second, I can see STOK in nest_count but not in training_set_observations.csv. In nest_count, as far as I can tell, it has no observations whatsoever against it. This is also true in the error file (training_set_e_n.csv). So what does Charles mean by saying STOK is “one of the cases where the first observations occur in the test set”?


Hi @jgaines,

By “test set”, I’m referring to the set of nest counts that you need to predict (Nest count data for 2014-2017). If you look in the submission_format.csv file, the first two columns contain the site and species pairs that you will provide predictions for. For a few of these, such as STOK, the first nest count observation occurs in 2014, so it does not appear in the observations you received in the training set. It’s up to you how to make predictions for these sites using the available information.