Hi, this nice competition gave me the opportunity to study and improve my skills on time series data. I tried both Prophet and ARIMA (actually SARIMAX) and now I’m trying with deep learning models, specifically LSTM. The model seems to perform pretty well on test data but it performs poorly on submission set. Any idea on how to improve it?
I’m using 11 time steps (n_input), with just a subset of features (7 + total cases: n_features = 8), with a batch of 26 and 130 units (one bidirectional layer with relu activation and 1 dense):
model = Sequential()
model.add(Bidirectional(LSTM(130, activation=‘relu’), input_shape=(n_input, n_features)))`
model.fit(x=generator, epochs=65, validation_data=generator_test, shuffle=False)
I tried normal XGBoostRegressor and it works fine. I found that feature selection and engineering are impacting the results immensely. Careful selection can provide you with good accuracy on test set.
did u drop many features?
i tried implementing xgboost on the test data in competition & got 27 mae
Hi, thank you, I tried XGBoost too with feature selection (without, till now, any feature engineering but I had better results with Negative Binomial; well better than XGBoost, actually average results 25.3438). I was curious about LSTM since it seems to perform very well on test set and poorly on submission and I wanted to understand why.
Hi tried also with a boosting model with some feature engineering. The results on the test set seem very good (MAE around 4 for San Juan), but when I submit it performs poorly. I really don’t understand what’s wrong.
Hi everyone and @adalseno. I have a similar issue where my RF leads to much better scores than the benchmark model (Negative Binomial model with feature selection, https://www.drivendata.co/blog/dengue-benchmark/) on my validation/holdout set but performs worse (27) than the benchmark (25.8) on the test set. I split my train/validation set in the same way that the benchmark does so I could better compare my model to theirs. I performed very little feature engineering so that, again, it matches the benchmark’s methods and leads to a better comparison. Does anyone know why this is happening?
Hi, You probably don’t need the answer anymore but this seems to me to be a classic case of overfitting.
RNN are meant to deal with a lot of data and you have a really small training dataset.