I am confused about the real meaning of ‘Timestamp’ column in submission_format.csv. Does the ‘Timestamp’ mean ending time of a time period or? I am confused because: for example, for series_id 102781, the first shown Timestamp is “2013-03-03 00:00:00” in submission_format.csv, while the latest timestamp in cold_start_test.csv for this series_id is “2013-03-02 23:00:00” and the prediction_window is ‘daily’. Does it mean we need to predict the consumption sum from “03-02 23:00:00” to “03-03 23:00:00” (overall 24 hours) or from “03-02 00:00:00” to “03-03 00:00:00” (if the timestamp means ending time point).
If it’s the first mentioned case, then not like “hourly” case, the “Timestamp” doesn’t mean ending time point but a meaningless value. If it’s the second mentioned case, then it makes no sense, because the consumption data from “03-02 00:00:00” to “03-03 23:00:00” are already given, so we need only predict the consumption from “03-02 23:00:00” to “03-03 00:00:00” (one hour consumption), then sum up with the previous provided 23 consumptions to get the ‘daily’ consumption.
I think it means start point, not end point.
In that case, for example, for series_id 102493, the prediction_window of which is ‘Weekly’, the first timestamp shown in Submission_format.csv is “2015-10-17 00:00:00”, while the lastest timestamp shown in cold_start_test.csv is “2015-10-10 23:00:00”. So you mean the first prediction period for this series_id is from “2015-10-17 00:00:00” to “2015-10-24 00:00:00”, which I don’t think so…
FYI - I had an issue with this because of how Pandas aggregates. Weekly resolutions and above are treated as period ending. Daily and below are treated as period beginning.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html
see closed and label kwargs
Thanks for bringing this up @LastRocky, and thanks for the tip @c3josh. Upon review, we can see that the weekly timestamps are actually endpoints, likely because of the issue that @c3josh mentioned. Apologies for any confusion!