Why submission_format.csv and test_2h.csv have different lengh?

Littus · March 2, 2016, 11:20am

And time stamps, actually.

Could you explain, please?

Edit: Ok, I get it — no micro data.

bull · March 2, 2016, 3:30pm

Yep, that’s right–sorry if it is a little confusing.

For other folks who may be wondering the same thing, the easiest way to deal with this is a left outer join on the micro data. If you’re using Python, you can see an example of how to do this in pandas at the end of our benchmark blogpost.

lucasribeiroabreu · March 2, 2016, 10:38pm

OK, so you are predicting a bunch of NaNs?

bull · March 2, 2016, 10:45pm

In the blog post model, those NaNs get imputed using a simple imputation strategy. Competitors have a number of options for imputing those NaNs:

Basic strategies like the example (e.g., mean imputation)
Create forecasts using the existing microclimate data
Use the available macroclimate data during those time periods
Any other strategy you can think of

Part of the challenge is to answer the question: how well can we predict water yield if we don’t have a weather station right next to the nets. Or, to put it another way, can we predict water yield accurately from understanding the larger meteorological picture?

Topic		Replies	Views
Present vs Future From Fog Nets to Neural Nets	20	3507	May 1, 2016
Trying to save `ndvi_location_1` variable DengAI Competition	2	835	May 16, 2020
Allowed to use external data Pump it Up: Data Mining the Water Table	0	588	February 24, 2021
Temperature or wether data help to improve the accuracy Water Supply Forecast Rodeo	1	373	October 26, 2023
Many NAs on train_labels.csv Snowcast Showdown	4	922	December 9, 2021

Why submission_format.csv and test_2h.csv have different lengh?

Related topics