While looking into sampling horizon of solar_wind and target value t0 and t1, I found t0 is an hour past record and t1 is a current one respect to the last aggregated time (strinctly speaking, t0 is 59min ago, and r1 is 1min future), which are assumed to be current and one hour future. My suggestion is to rewrite process_labels as
def process_labels(dst):
y = dst.copy()
y[“dst”] = y.groupby(“period”).dst.shift(-1) # <---- inserted line
y[“t1”] = y.groupby(“period”).dst.shift(-1)
y.columns = YCOLS
return y
But one minute is still lagging between aggregated solar wind data and targets. Any comments on this are welcome.
In task description:
Thus, your task is to build a model that can predict Dst in real-time for both the current hour and the next hour. For example, if the current timestep is 10:00 am, you are must predict Dst for both 10:00 am and 11:00 am using data up until but not including 10:00 am.
So you get data up to 9:59 am (Aggregate transform to … 9:00). A shift must be -1 and -2.
No. When you’re making predictions, you’re guaranteed to get feature data up until but not including t0 for timesteps t0 and t1 - you do not have to do any re-alignment. This only affects alignment for the training code. That said, if you trained your model with this bug in place, your model will be trying to predict t-1 and t0, not t0 and t1.
@hklee I’m not sure I completely follow your scheme, but in general, yes, I think it’s fine to keep t0 stationary and instead to shift t1 and your solar_wind features. It all depends on how you process your features. The main thing that you have to ensure is that you are only using data up until t0, not during or after.