Duplicate rows in 5 minute microclimate data


There seem to be some duplicate times in the 5 minutes microclimate data.

For example 12:00 on 20th Jan 2014 (5 minute training dataset) appears twice, with different values for the gusts, and wind features. I’ve checked and the 2 hour dataset is an average of the 5 minute dataset, including the duplicates. Should we therefore assume that two readings were taken in some 5 minutes intervals?



Hi @david.foster, yep there are some duplicates–a hazard or working with real data! We heard yet why that might be from the case from the net management team, but thanks for bringing it up. For the competition, you’ll have to decide what’s best to do with the duplicate measurements.