Back to DrivenData | Blog

Did anyone notice that Benchmark LSTM solution is not correct?

In LSTM benchmark solution, when they are training the model, the input consumption data order is (latest … last) but in the prediction, they are using input consumption order (last… latest). It seems to need a flip of array.

Could you please share the code? I just want to understand what do you mean by the a flip of array?

In their benchmark solution, check generate_hourly_forecast function and see what I did with the X array below.

def generate_hourly_forecast(num_pred_hours, consumption, model, scaler, lag):
    """ Uses last hour's prediction to generate next for num_pred_hours, 
        initialized by most recent cold start prediction. Inverts scale of 
        predictions before return.
    # allocate prediction frame
    preds_scaled = np.zeros(num_pred_hours)
    # initial X is last lag values from the cold start
    X = scaler.transform(consumption.values.reshape(-1, 1))[-lag:]
    # forecast
    for i in range(num_pred_hours):
        # predict scaled value for next time step
        X1 = np.flip(X.ravel(),axis=0)
        yhat = model.predict(X1.reshape(1, 1, lag), batch_size=1)[0][0]
        preds_scaled[i] = yhat
        # update X to be latest data plus prediction
        X = pd.Series(X.ravel()).shift(-1).fillna(yhat).values

    # revert scale back to original range
    hourly_preds = scaler.inverse_transform(preds_scaled.reshape(-1, 1)).ravel()
    return hourly_preds
1 Like

Good catch! I didn’t notice that when I first ran through the benchmark code, but it looks like the input array is not in proper order of (t-1, t-2, t-3, t-4…). You would have to shift it by 1 instead of -1 though in order to update the latest prediction to t-1.