Dear competition organisers, I have a question about the “Water Supply Forecast Rodeo: Hindcast Evaluation”.
My last submit (id-247784) failed to complete due to this error:
2023-12-21 23:03:26.739 | INFO | __main__:main:103 - 90%|████████▉ | 6500/7240 [59:26<06:50, 1.80it/s]
2023-12-21 23:03:47.443 | ERROR | __main__:main:122 - Error predicting ('hungry_horse_reservoir_inflow', '2023-05-08')
The problem is that the algorithm could not find SNOTEL data for a period of several tens of days preceding the issue date. Could you tell if this is expected behaviour? I mean there is no data for April and May 2023 - is this expected, or some ETL process just didn’t put the data in the right folder.
Thanks in advance!
Happy New Year, everyone! I’d be grateful for a response - I have no problem predicting for these dates when running locally
To my knowledge, there is not anything wrong with our ETL, and no other competitors have reported any issues like what you are describing.
As with any real world data source, there can be many reasons why some specific station might not have had data available in a certain time period. Additionally, this data is often provisional and can be updated several months later by NRCS, which means it can often be added or removed.
Dear Jay, thank you very much for your reply!
Just to be clear:
- My algorithm uses aggregation of data from n days before the forecast generation date (issue date) - in the current implementation: 90 days. The data does not necessarily have to be on every day during that period. It is enough to have at least one day in several months to collect features for prediction model
- The algorithm generated predictions for all sites and for all previous issue dates (6500 cases) without any errors, which confirms the fact that it works correctly in the execution environment
- When I saw the error, I decided to check how my code was executed locally - I did this the next day after sumbission (same date as the question - 10 days ago). To do this I ran the code in the runtime repository (GitHub - drivendataorg/water-supply-forecast-rodeo-runtime: Data and runtime repository for the Water Supply Forecast Rodeo competition on DrivenData) and the code worked without errors including for the case ‘hungry_horse_reservoir_inflow’ - ‘2023-05-08’.
So I came to the conclusion that perhaps it was the execution environment that didn’t load the data.
About your comment that no such notifications were received from other participants - I see two reasons:
- Not all participants use SNOTEL data in the model
- Such errors can be simply unnoticed by participants. For example, in conditions where the number of runs is limited, contributors may set try-Except blocks in the code to ensure fail-safe execution. Since custom logs from developers are not output to the console, we (as participants) may simply not know that a few cases failed to predict and therefore an additional prediction algorithm (usually very simple and inaccurate) was used
So to any assumptions about the data, I can give my counterarguments (I hope that doesn’t seem rude from my side). As an engineer, I would prefer to see, for example, logs about the loaded data. But as far as I understand it is no longer possible to do
Anyway, thank you for the reply (and for the opportunity to compete)
This is incorrect. You are expected to and encouraged to log out information about your submission to ensure that it is running in the way you are expecting. You can see an example of a custom logging statement in the provided example that successfully will print out.
Regarding the data that you believe is missing: you can a list of the files for WY2023 in the data drive here: water-supply-forecast-rodeo-runtime/data.find.txt at d91b4339779530aad5857d7fdf5b8b40b5deab49 · drivendataorg/water-supply-forecast-rodeo-runtime · GitHub
From a quick look at the stations in
sites_to_snotel_stations.csv associated with
hungry_horse_reservoir_inflow, there are observations present in March and April 2023 as expected. If you believe there is specific data missing, you will need to be more specific.
Given that the data looks to be available, you may consider that there is a different cause of the error that you are seeing.
Dear Jay, thank you very much for the clarification!