Why submission_format.csv and test_2h.csv have different lengh?

And time stamps, actually.

Could you explain, please?

Edit: Ok, I get it — no micro data.

Yep, that’s right–sorry if it is a little confusing.

For other folks who may be wondering the same thing, the easiest way to deal with this is a left outer join on the micro data. If you’re using Python, you can see an example of how to do this in pandas at the end of our benchmark blogpost.

OK, so you are predicting a bunch of NaNs?

1 Like

In the blog post model, those NaNs get imputed using a simple imputation strategy. Competitors have a number of options for imputing those NaNs:

  • Basic strategies like the example (e.g., mean imputation)
  • Create forecasts using the existing microclimate data
  • Use the available macroclimate data during those time periods
  • Any other strategy you can think of

Part of the challenge is to answer the question: how well can we predict water yield if we don’t have a weather station right next to the nets. Or, to put it another way, can we predict water yield accurately from understanding the larger meteorological picture?