Is it possible that the Top-ranked competitors share their approaches for education purposes?
Think most of the people will do that after phase 2.
Yep I think if you win it becomes property of Driven Data, but if not you can share it as you wish. So after phase 2 results are out (they said around late November) I’m planning to release a GitHub repo with a nice Readme explaining what I did and why along with my Jupyter notebook. I’ll defo add it to the “Share my work” section on here, and I’ll try and remember to post it here too. Excited to read what other people did too
not top-ranked (16th place), but simple, briefly:
- append reverse complement (GAT -> GAT + NNN + ATC)
- TF-IDF (custom n-grams)
- TSVD (550 components)
- MLPClassifier (1 hidden, 800)
main estimator: ensemble (soft voting) of 3 basic estimators with different TF-IDF n-grams generator windows (custom analyzer parameter used):
[0,1,2,3,4,5, 7 ] [0,1,2,3,4, 7 ] [0,1,2,3,4, 8]