Hi, It does not appear that we can use multiprocessing in our final submissions, perhaps you could shed some light on this. For example, in the Pandemic Centralized Code Submission it is stated:
"
Your submission should be a zip archive named with the extension .zip
(e.g., submission.zip
). The root level of the archive must contain a solution_centralized.py
module that contains the following named functions:
-
fit
: A function that fits your model on the training data and writes your model to disk -
predict
: A function that loads your model and performs inference on the test data
When you make a submission, this will kick off a containerized evaluation job. This job will run a Python main_centralized_train.py
script which will import your fit
function and call it with the appropriate training data access and filesystem access. Then, the job will run a Python main_centralized_test.py
script which will import your predict
function with the appropriate test data access and filesystem access.
"
By importing our solution_centralized.py
script I don’t believe we can initiate multiprocessing as the parent process needs to called from a command line (initiated from if name == “main”: ). In your baseline solution you call a shell (e.g. logistic_regression.sh) which executes several scripts from the command line and also makes use of storage to store processed training and test data. It appears we will not have access to executing scripts directly from a command line or access to storage (other than for the model)? Am I interpreting this correctly? The use of multiprocessing and storage can greatly increase the speed of both fit and predict functions.