Code submission - is it realistic that everyone has the same dependencies?

matthew.shipton · October 14, 2022, 2:12pm

Hi All,

We’re excited to start phase 2!

It doesn’t seem realistic to align all teams across the competition to use a single docker image containing the exactly the same set of dependencies. It’s going to become increasingly complex and result in a lot of strife - particularly where we’re trying to use elements from our existing solution inside the container. Is there a particular goal here that might be better achieved in a different way?

In particular, we’re likely to submit using a patched version of pytorch and java libraries (spark) which won’t work with the universal image for everyone.

There are a few ways I can think of we can resolve this;

We submit “vendored” dependencies inside the zip, e.g. a full conda environment with all required dependency. This might end up making the code packages large, e.g. 1gb+ - is there a size limit?
We add a layer on top of your Docker container with our dependencies and submit that instead, e.g.
FROM drivendata/pets-prize:gpu-local, otherwise sticking to the prescribed API.

matthew.shipton · October 14, 2022, 6:00pm

Our other thought was that if there 100+ participants and multiple PRs happening;

i) Packages frequently conflict with each other. Adding a package might mean another, dependent package goes back a few versions to meet all the dependency requirements, which breaks someone’s code if they’re depending on something being there.
ii) As a consequence, for PR requests, all participants are going to have to test their code still works every time a PR is merged. iii) There’s no guarantee that two completely separate teams will both be able to build their solutions if this happens - one will have a working with a PR, the other will not work once the PR is merged. The only way to fix this would be to go upstream and fix the transitive dependency problem there. Eeek.

I’m just really concerned this becomes a recipe for intractable python dependency hell - and I’m more than happy to help with testing any solution, including submitting a PR to package conda environments or similar.

isms · October 17, 2022, 1:35pm

Hi @matthew.shipton, thanks for posting!

Despite the potential concerns, this hasn’t turned out to be a problem in past code execution challenges with similar setups. Between version pinning by maintainers (us) and the ability to selectively vendorize within a submission, I think you should be able to achieve your goals and planned setup without much trouble.

Of course you’re always free to reach out if you’re having issues, and as you rightly pointed out we have the ability to tweak the setup or loosen some constraints if necessary. Well reasoned and thoroughly worked out Github issues and PRs will be evaluated on the technical merits, so we can certainly revisit, but for now it probably makes sense to give it a shot and see if you hit any snags.

Topic		Replies	Views
Runtime environment, private dependency PETs Prize Challenge	4	275	January 3, 2023
Runtime environment PETs Prize Challenge	2	235	December 22, 2022
Looking to get your code submission running? We can help! DEID2 Sprint 1 (Prescreened Arena)	4	456	October 20, 2020
Package installation and submission failure Youth Mental Health: Automated Abstraction	1	56	October 17, 2024
Access to test environment - Phase 2 / Track B PETs Prize Challenge	1	193	December 5, 2022

Code submission - is it realistic that everyone has the same dependencies?

Related topics