Cannot run local eval for fl solutions, fincrime

I was using the following command.

SUBMISSION_TRACK=fincrime SUBMISSION_TYPE=federated GITHUB_ACTIONS_NO_TTY=true make test-submission

The tested submission was the example source code. But after configure_fit completed, it was stuck at " fit_round 1: strategy sampled 5 clients (out of 5)". And then it threw ray errors after dozens of minutes.

How can I fix this? I am using a windows 10 machine and running make commands in git bash.

Hi @panheng,

Taking the error messages at face value, it seems like Ray is unable to find the requested client resources to spawn the client processes.

Earlier in the logs, at the start of federated training, you should see some logs that look something like this:

INFO flower 2023-01-23 02:56:31,321 | app.py:140 | Starting Flower simulation, config: ServerConfig(num_rounds=2, round_timeout=None)
2023-01-23 02:56:33,139 INFO worker.py:1518 -- Started a local Ray instance.
INFO flower 2023-01-23 02:56:34,302 | app.py:174 | Flower VCE: Ray initialized with resources: {'memory': 3547435008.0, 'node:127.0.0.1': 1.0, 'CPU': 4.0, 'object_store_memory': 1773717504.0}

What does it say there for how many CPU and GPU resources that Ray is starting with on your machine?

For reference, here is the relevant source code that sets the resource requirements for clients. This is set to (available - 1) CPUs for your system, and 1 GPU if you are running the GPU version of the image. It seems like you’ve allocated 16 cores to the Docker container, and you have a GPU on your system?

If you don’t need a GPU for your code, and if it’s the case that Ray is having trouble finding your GPU, then you can try using the CPU version of the image by setting the variable CPU_OR_GPU=cpu when running all Makefile commands (pull, build, test-submission, etc.).

Unfortunately, I’m not all that familiar with Windows, and I’m not sure if anything here involves any specific Windows gotchas.

1 Like