Is it possible that the Top-ranked competitors share their approaches for education purposes?
Think most of the people will do that after phase 2.
Yep I think if you win it becomes property of Driven Data, but if not you can share it as you wish. So after phase 2 results are out (they said around late November) I’m planning to release a GitHub repo with a nice Readme explaining what I did and why along with my Jupyter notebook. I’ll defo add it to the “Share my work” section on here, and I’ll try and remember to post it here too. Excited to read what other people did too
not top-ranked (16th place), but simple, briefly:
- append reverse complement (GAT -> GAT + NNN + ATC)
- TF-IDF (custom n-grams)
- TSVD (550 components)
- MLPClassifier (1 hidden, 800)
main estimator: ensemble (soft voting) of 3 basic estimators with different TF-IDF n-grams generator windows (custom analyzer parameter used):
[0,1,2,3,4,5, 7 ] [0,1,2,3,4, 7 ] [0,1,2,3,4, 8]
Much later than I planned, but here’s the repo of my final submission and report along with the Jupyter Notebooks of my model development along the way. Hope it helps
My report is a bit brief given the page limit for the report. I plan to extend it at some point in the future using the feedback I’ve received.