Hi Everyone,
I’m curious, would people mind sharing what approaches they took, what worked and what didn’t? I’m writing a chapter of my dissertation on the competition so I’m very curious to know what ended up being successful, especially for teams in the top 100 or so.
Roughly speaking, we built a shallow 4-layer CNN on MODIS imagery using the Azure storage, built a 6-layer CNN on Sentinel 2 “VV” band using google earth engine, and aggregated their predictions with a linear model along with a couple day of year dummies. We got very little lift from the Sentinel CNN, and the majority of our performance was from MODIS (although we might have had data quality issues with Sentinel from GEE). We tried more complicated things like random forest aggregators, and were able to get a performance of roughly 10.0 just using fuzzed coordinates in the random forest.
We had a hard time getting any extra performance from using weather variables or the digital elevation model, which was a surprise to me, but perhaps we could have done that better. We did a significant amount of hand-tuning that probably explains most of our drop from 10.0 - ~8.0, with the sentinel CNN getting us the rest of the way there. Lastly, we did not control for cloud cover or aggregate over time to improve image quality in MODIS as we ran out of time.
Generally we were frustrated by how much time went into preprocessing, and how building the pipeline absorbed much of the effort we would have put into optimizing additional datasets, but I suppose that’s just part of being an applied ML practitioner.
I’ve made my team’s repository public so anybody can look if they would like at this link: GitHub - M-Harrington/SnowComp: Competition for Bureau of Reclamation's SWE competition. Please pardon the mess there, we started the competition a month late so there was a fair bit of rush to finish!
Thanks!
Matt
PS: if you’d like to keep secret some of your methods, that’s totally understandable, but anything you can share would be super helpful!