Developing Own Submission

Hi,
I’m trying to test the development of my own submission using the steps outlined in the runtime repo. I’m struggling with a few items and perhaps you could clarify them for me:

  1. It seems there needs to be a directory named submission in order for it to create the submission.zip file. ‘SUBMISSION_TRACK=fincrime make pack-submission’ only worked once I manually added the submission directory.

  2. I’ve edited my .yml file, but
    ‘SUBMISSION_TRACK=fincrime SUBMISSION_TYPE=centralized make test-submission’
    does not seem to use my local .yml file so none of my packages are being loaded into the environment? Do I need to wait until the Pull Request is merged? Can I test this locally?

  3. I’m unclear about the pred_format.csv file - I don’t see it in the repo, are we supposed to add it, if so, where?

Thanks!

Hi @jimking100,

  1. That is correct. Thanks for reporting—we will fix the make recipes to create that directory if it doesn’t already exist.

  2. The dependencies are installed from environment-*.yml at image build time, and not at container run time. If you would like to test your submission locally with local dependency changes, you can build the image yourself and then use the local image.

    • The make build command will build an image from your local copies of environment-*.yml. You can optionally explicitly specify CPU_OR_GPU=cpu or CPU_OR_GPU=gpu.
    • By default, if a local image exists, it will use that when you run make test-submission. You can explicitly override which image to use (if you also have the official image pulled) with the SUBMISSION_IMAGE variable.
  3. We don’t currently have the predictions format files available for download, but we will add them. I will update here when those are available. In the meantime, you can see the documented specifications in the challenge documentation and create your own.

1 Like

Hi,

Thanks, can you tell me where the predictions format files will be located in the repo? Also, how will the current Centralized Submission Testing work with the .yml files since the Pull Requests have yet to be merged?

Jim

1 Like

The predictions format files are always named predictions_format.csv and will be located in the test/ directory within the centralized case, or within each test/{partition_name}/ directory for the federated case.

Here is a file tree that should be instructive for how to set things up locally in the runtime repository:

Click here to expand:
❯ tree data/ -F
data/
├── fincrime/
│   ├── centralized/
│   │   ├── test/
│   │   │   ├── bank_dataset.csv
│   │   │   ├── data.json
│   │   │   ├── predictions_format.csv
│   │   │   └── swift_transaction_test_dataset.csv
│   │   └── train/
│   │       ├── bank_dataset.csv
│   │       ├── data.json
│   │       └── swift_transaction_train_dataset.csv
│   ├── scenario01/
│   │   ├── test/
│   │   │   ├── bank01/
│   │   │   │   └── bank_dataset.csv
│   │   │   ├── bank02/
│   │   │   │   └── bank_dataset.csv
│   │   │   ├── partitions.json
│   │   │   └── swift/
│   │   │       ├── predictions_format.csv
│   │   │       └── swift_transaction_test_dataset.csv
│   │   └── train/
│   │       ├── bank01/
│   │       │   └── bank_dataset.csv
│   │       ├── bank02/
│   │       │   └── bank_dataset.csv
│   │       ├── partitions.json
│   │       └── swift/
│   │           └── swift_transaction_train_dataset.csv
│   └── scenarios.txt
└── pandemic/
    ├── centralized/
    │   ├── test/
    │   │   ├── data.json
    │   │   ├── predictions_format.csv
    │   │   ├── va_activity_location_assignment.csv.gz
    │   │   ├── va_activity_locations.csv.gz
    │   │   ├── va_disease_outcome_training.csv.gz
    │   │   ├── va_household.csv.gz
    │   │   ├── va_person.csv.gz
    │   │   ├── va_population_network.csv.gz
    │   │   └── va_residence_locations.csv.gz
    │   └── train/
    │       ├── data.json
    │       ├── predictions_format.csv
    │       ├── va_activity_location_assignment.csv.gz
    │       ├── va_activity_locations.csv.gz
    │       ├── va_disease_outcome_training.csv.gz
    │       ├── va_household.csv.gz
    │       ├── va_person.csv.gz
    │       ├── va_population_network.csv.gz
    │       └── va_residence_locations.csv.gz
    ├── scenario01/
    │   ├── test/
    │   │   ├── client01/
    │   │   │   ├── predictions_format.csv
    │   │   │   ├── va_activity_location_assignment.csv.gz
    │   │   │   ├── va_activity_locations.csv.gz
    │   │   │   ├── va_disease_outcome_training.csv.gz
    │   │   │   ├── va_household.csv.gz
    │   │   │   ├── va_person.csv.gz
    │   │   │   ├── va_population_network.csv.gz
    │   │   │   └── va_residence_locations.csv.gz
    │   │   ├── client02/
    │   │   │   ├── predictions_format.csv
    │   │   │   ├── va_activity_location_assignment.csv.gz
    │   │   │   ├── va_activity_locations.csv.gz
    │   │   │   ├── va_disease_outcome_training.csv.gz
    │   │   │   ├── va_household.csv.gz
    │   │   │   ├── va_person.csv.gz
    │   │   │   ├── va_population_network.csv.gz
    │   │   │   └── va_residence_locations.csv.gz
    │   │   ├── client03/
    │   │   │   ├── predictions_format.csv
    │   │   │   ├── va_activity_location_assignment.csv.gz
    │   │   │   ├── va_activity_locations.csv.gz
    │   │   │   ├── va_disease_outcome_training.csv.gz
    │   │   │   ├── va_household.csv.gz
    │   │   │   ├── va_person.csv.gz
    │   │   │   ├── va_population_network.csv.gz
    │   │   │   └── va_residence_locations.csv.gz
    │   │   └── partitions.json
    │   └── train/
    │       ├── client01/
    │       │   ├── predictions_format.csv
    │       │   ├── va_activity_location_assignment.csv.gz
    │       │   ├── va_activity_locations.csv.gz
    │       │   ├── va_disease_outcome_training.csv.gz
    │       │   ├── va_household.csv.gz
    │       │   ├── va_person.csv.gz
    │       │   ├── va_population_network.csv.gz
    │       │   └── va_residence_locations.csv.gz
    │       ├── client02/
    │       │   ├── predictions_format.csv
    │       │   ├── va_activity_location_assignment.csv.gz
    │       │   ├── va_activity_locations.csv.gz
    │       │   ├── va_disease_outcome_training.csv.gz
    │       │   ├── va_household.csv.gz
    │       │   ├── va_person.csv.gz
    │       │   ├── va_population_network.csv.gz
    │       │   └── va_residence_locations.csv.gz
    │       ├── client03/
    │       │   ├── predictions_format.csv
    │       │   ├── va_activity_location_assignment.csv.gz
    │       │   ├── va_activity_locations.csv.gz
    │       │   ├── va_disease_outcome_training.csv.gz
    │       │   ├── va_household.csv.gz
    │       │   ├── va_person.csv.gz
    │       │   ├── va_population_network.csv.gz
    │       │   └── va_residence_locations.csv.gz
    │       └── partitions.json
    └── scenarios.txt

Submissions made through the provided infrastructure will always use the latest official image, so you won’t be able to run those with any dependencies that are still in open pull requests. Thanks for your patience—we’ll try to get your pull request reviewed soon.

Hi,
I have created my local image, it has my local dependancies and it begins to run my code, however, after reading in the swift and bank data (centralized version) it fails with a cryptic ‘/tmp/tmpt3ppoeij: line 3: 99 Killed python main_centralized_train.py’ - basically any attempt to manipulate the dataframes after the data is loaded causes a failure. The only thing I can come up with is that it’s a RAM related issue. The code runs fine when its outside the image. Does the image significantly reduce the available RAM, does this error message provide any insight from your end? I’m running a Mac, M2, 24GB - no gpu.

Hi @jimking100,

It looks like the docker run command was written to provide 8 GB of memory:

You can change that to a different number as appropriate for your local test.

EDIT: Fixed repo link.

1 Like

Thanks for this file tree. Is there a reason the file data/fincrime/centralized/test/data.json doesn’t contain a line for the file predictions_format.csv, since that is a required entry?

1 Like

Hi @markblunk,

If you take a look at how the evaluation runner is implemented, the predictions format file is separately specified as an input argument when calling provided client factories.

I was asking about the centralized solution, not the federated one.

I got to step 6 in GitHub - drivendataorg/pets-prize-challenge-runtime: Evaluation runtime for Phase 2 of the PETs Prize Challenge and was attempting to test my centralized solution by running SUBMISSION_TRACK=fincrime SUBMISSION_TYPE=centralized make test-submission. This command fails if the file data/fincrime/centralized/test/predictions_format.csv is missing (as you mention above in Developing Own Submission - #4 by jayqi) . I didn’t realize this file is required from the instructions in GitHub - drivendataorg/pets-prize-challenge-runtime: Evaluation runtime for Phase 2 of the PETs Prize Challenge because those instructions refer you to the various data.json files, which do not mention the file predictions_format.csv. Hence my original question:
" Is there a reason the file data/fincrime/centralized/test/data.json doesn’t contain a line for the file predictions_format.csv, since that is a required entry?"

Thanks!

Hi,
The link does not seem to work?

Hi @markblunk,

Apologies for linking to the wrong thing. As you can see, the centralized evaluation code is implemented in exactly the same way:

Thank you for pointing out that the README is wrong in omitting the predictions_format.csv files. We will get that fixed.

@jimking100 Fixed in the original post. Sorry about that!

Hi,
I’ve changed the shm-size to 24g in two places in the makefile, deleted the local image using Docker Desktop, rebuilt the image and re-ran the code - same or similar error. When I rebuild the image it doesn’t seem to build from scratch but from cache, but my logs show 24g for the shm-size instead of 8g. Are there settings I need for my local Docker setup - I just installed Docker with the default installation on a Mac?

Aha! I answered my own question - Docker defaults to 2gb at installation on a Mac. I upped it to 24gb and it works now. So, you need to change the makefile and check the memory settings in Docker.

Thanks @jayqi

1 Like