Sampling issue in benchmark

hklee · January 24, 2021, 4:08pm

While looking into sampling horizon of solar_wind and target value t0 and t1, I found t0 is an hour past record and t1 is a current one respect to the last aggregated time (strinctly speaking, t0 is 59min ago, and r1 is 1min future), which are assumed to be current and one hour future. My suggestion is to rewrite process_labels as

def process_labels(dst):
y = dst.copy()
y[“dst”] = y.groupby(“period”).dst.shift(-1) # <---- inserted line
y[“t1”] = y.groupby(“period”).dst.shift(-1)
y.columns = YCOLS
return y

But one minute is still lagging between aggregated solar wind data and targets. Any comments on this are welcome.

leigh.plt · January 25, 2021, 3:22pm

In task description:
Thus, your task is to build a model that can predict Dst in real-time for both the current hour and the next hour. For example, if the current timestep is 10:00 am, you are must predict Dst for both 10:00 am and 11:00 am using data up until but not including 10:00 am.

So you get data up to 9:59 am (Aggregate transform to … 9:00). A shift must be -1 and -2.

cszc · January 25, 2021, 5:25pm

Hi @hklee, thanks for bringing this up.

I believe @leigh.plt is correct. In that example, data from 9am - 9:59am (aggregated to 9am) should be used to predict 10am and 11am.

Given this, I think the correct way to process the labels is this:

def process_labels(dst):
    y = dst.copy()
    y["t0"] = y.groupby("period").dst.shift(-1)
    y["t1"] = y.groupby("period").dst.shift(-2)
    return y[YCOLS]

@hklee and @leigh.plt Let me know if that makes sense and you agree. If so I’ll update the blog today.

Thanks!

leigh.plt · January 25, 2021, 7:51pm

For my solution, I use -1 and -2. I don’t use process_labels function from blog, so i missed this from the beginning

mchahhou · January 25, 2021, 8:59pm

is this bug also present in the way you make t0 and t1 during prediction on test data?

cszc · January 25, 2021, 10:41pm

No. When you’re making predictions, you’re guaranteed to get feature data up until but not including t0 for timesteps t0 and t1 - you do not have to do any re-alignment. This only affects alignment for the training code. That said, if you trained your model with this bug in place, your model will be trying to predict t-1 and t0, not t0 and t1.

hklee · January 26, 2021, 8:49am

@cszc , @leigh.pi, Yes. both codes are the same. But I think about shifting solar_wind by -59min instead and t1 by -1 hour to be punctual.

cszc · January 26, 2021, 4:24pm

@hklee I’m not sure I completely follow your scheme, but in general, yes, I think it’s fine to keep t0 stationary and instead to shift t1 and your solar_wind features. It all depends on how you process your features. The main thing that you have to ensure is that you are only using data up until t0, not during or after.

Topic		Replies	Views
Prediction timedelta MagNet: Model the Geomagnetic Field	2	428	February 2, 2021
Can we use time-series model? Snowcast Showdown	9	728	January 7, 2022
Historical input/output features Snowcast Showdown	4	422	January 24, 2022
Clarification on features' dates used for prediction NASA Airathon	9	690	February 4, 2022
LSTM t-1 feature DengAI Competition	0	581	October 20, 2020

Sampling issue in benchmark

Related topics