We’re excited to start phase 2!
It doesn’t seem realistic to align all teams across the competition to use a single docker image containing the exactly the same set of dependencies. It’s going to become increasingly complex and result in a lot of strife - particularly where we’re trying to use elements from our existing solution inside the container. Is there a particular goal here that might be better achieved in a different way?
In particular, we’re likely to submit using a patched version of pytorch and java libraries (spark) which won’t work with the universal image for everyone.
There are a few ways I can think of we can resolve this;
- We submit “vendored” dependencies inside the zip, e.g. a full conda environment with all required dependency. This might end up making the code packages large, e.g. 1gb+ - is there a size limit?
- We add a layer on top of your Docker container with our dependencies and submit that instead, e.g.
FROM drivendata/pets-prize:gpu-local, otherwise sticking to the prescribed API.
Our other thought was that if there 100+ participants and multiple PRs happening;
i) Packages frequently conflict with each other. Adding a package might mean another, dependent package goes back a few versions to meet all the dependency requirements, which breaks someone’s code if they’re depending on something being there.
ii) As a consequence, for PR requests, all participants are going to have to test their code still works every time a PR is merged. iii) There’s no guarantee that two completely separate teams will both be able to build their solutions if this happens - one will have a working with a PR, the other will not work once the PR is merged. The only way to fix this would be to go upstream and fix the transitive dependency problem there. Eeek.
I’m just really concerned this becomes a recipe for intractable python dependency hell - and I’m more than happy to help with testing any solution, including submitting a PR to package conda environments or similar.
Hi @matthew.shipton, thanks for posting!
Despite the potential concerns, this hasn’t turned out to be a problem in past code execution challenges with similar setups. Between version pinning by maintainers (us) and the ability to selectively vendorize within a submission, I think you should be able to achieve your goals and planned setup without much trouble.
Of course you’re always free to reach out if you’re having issues, and as you rightly pointed out we have the ability to tweak the setup or loosen some constraints if necessary. Well reasoned and thoroughly worked out Github issues and PRs will be evaluated on the technical merits, so we can certainly revisit, but for now it probably makes sense to give it a shot and see if you hit any snags.