The problem is that the algorithm could not find SNOTEL data for a period of several tens of days preceding the issue date. Could you tell if this is expected behaviour? I mean there is no data for April and May 2023 - is this expected, or some ETL process just didn’t put the data in the right folder.
To my knowledge, there is not anything wrong with our ETL, and no other competitors have reported any issues like what you are describing.
As with any real world data source, there can be many reasons why some specific station might not have had data available in a certain time period. Additionally, this data is often provisional and can be updated several months later by NRCS, which means it can often be added or removed.
My algorithm uses aggregation of data from n days before the forecast generation date (issue date) - in the current implementation: 90 days. The data does not necessarily have to be on every day during that period. It is enough to have at least one day in several months to collect features for prediction model
The algorithm generated predictions for all sites and for all previous issue dates (6500 cases) without any errors, which confirms the fact that it works correctly in the execution environment
So I came to the conclusion that perhaps it was the execution environment that didn’t load the data.
About your comment that no such notifications were received from other participants - I see two reasons:
Not all participants use SNOTEL data in the model
Such errors can be simply unnoticed by participants. For example, in conditions where the number of runs is limited, contributors may set try-Except blocks in the code to ensure fail-safe execution. Since custom logs from developers are not output to the console, we (as participants) may simply not know that a few cases failed to predict and therefore an additional prediction algorithm (usually very simple and inaccurate) was used
So to any assumptions about the data, I can give my counterarguments (I hope that doesn’t seem rude from my side). As an engineer, I would prefer to see, for example, logs about the loaded data. But as far as I understand it is no longer possible to do
Anyway, thank you for the reply (and for the opportunity to compete)
This is incorrect. You are expected to and encouraged to log out information about your submission to ensure that it is running in the way you are expecting. You can see an example of a custom logging statement in the provided example that successfully will print out.
From a quick look at the stations in sites_to_snotel_stations.csv associated with hungry_horse_reservoir_inflow, there are observations present in March and April 2023 as expected. If you believe there is specific data missing, you will need to be more specific.
Given that the data looks to be available, you may consider that there is a different cause of the error that you are seeing.