The LSTM benchmark model scaled the data with a new scaler each time a series was trained. Can anyone explain some of the advantages/disadvantages of using a new scaler every single time as opposed to scaling the entire dataset with just one scaler?
My gut instinct is that a new scaler each time will not represent the differences between the series accurately. For example, a series with the highest consumption at 4 MWh would be scaled to the same value as a series with the highest consumption at 20 kWh.