Historical input/output features

I am still a bit confused on the historical features that we are allowed to use. Can you please clarify? I’ll try to ask as precisely as possible.

Inputs:

a) GROUND MEASURES including SNOTEL, CDEC "ONLY in PROVIDED CSVs"

b) MODIS Terra MOD10A1 and Aqua MYD10A1

c) Landsat 8 Collection 2 Level-2

d) Sentinel 2 Level-2A

e) Sentinel 1 GRD data

f) HRRR

g) Copernicus DEM

Output:

a) Labels (actual cell SWE)

Suppose that today is “Feb 15,2022”

For train and inference:

Assumption1: We are not allowed to use output historical values prior to “Feb 15,2022”.

(No lag on target is allowed).

Assumption2: We are allowed to use inputs (from a-g) historical values prior to “Feb 15,2022”.

Assumption3: We are not allowed to use any inputs (from a-g) historical values prior to “Feb 15,2022” (only current date value is allowed).

Can you please let us know which assumption is correct and/or any modification needed for the assumptions?

And “Feb 15,2022” means 00:01 Feb 15,2022 or 23:59 Feb 15,2022 ?

@glipstein Can you please answer this one as well?

@glipstein or @tglazer Would you guys please shed some lights on this part, it will affect our models! we are sooo close to the last part of the competition and we need to pack everything and freeze our models!

Hi @nima.shahbazi @FBykov - It sounds like assumption 2 is correct. As stated on the competition website:

You may only use data up through the day of estimation when generating predictions. Use of any future data after the day of estimation is prohibited.

So for Feb 15, 2022, these are approved features through 23:59 on Feb 15, 2022.

It’s not clear exactly what you mean by assumption 1. If you want to use your own estimates from previous weeks you can, so long as those also adhere to the rules. SWE is a cumulative process. If you mean you cannot use the ground truth measures that aren’t approved, that is correct.