Team,First of all thank you for sharing benchmark model.It did bring me to a baseline.Let me know if any one is interested in discussing ways we can manipulate data to make smarter models.Specially,how much of time-series components we need to incorporate in building models.For example,would Ariama models make sense to this problem set?..what would be better approach to fill in missing data?How do we come over drawbacks of base model?..
Could you comment on what you mean by Ariama models?
In general I am interested in theory from Epidemiology which could help me as an absolute beginner in Epidemiology
who tries his normal R tricks.
Guess that spatial and temporal models play a role. Especially temporal models in this special case (where we do not have to predict the location inside the 2 locations).
For missing values I used prediction models to predict the values. Basically I treated them as an ML problem and used random forests to predict the NaN values by splitting into training and testing datasets
Since all of the independent variables contain NULLs, using ML to fill them in is a lot of work. Is it worthwhile? Did you find that it improved the final model?
I used standard data imputation algorithms like Hmisc and Amelia. Used them separately on the test and training sets and then combined them, applied random forest algorithm.
How good your model was?