Persistent model training states under Flower


I have a question regarding maintaining the states of local model.

Suppose we have a naive learning protocol — SWIFT trains a neural net using SGD and ask the bank for the flag information every batch. In the current harness given under Flower, the client representing SWIFT will no longer exist when the bank client is on. To keep the training procedure going, SWIFT has to save the model state to disk every batch and reload it after listening to the bank. This is computationally prohibitive.

Is there a way to make the client more persistent in memory instead of having to save it to the disk everytime? Please let me know if I’m misunderstanding the harness and the Flower framework. Thank you very much!

Hi @yizhewan,

I believe your understanding is correct. Unfortunately, this is a limitation for how evaluation is instrumented.

If disk read/write overhead is a bottleneck, you may want to consider alternative approaches, such as large batch sizes, transmitting multiple batches together, or transmitting all of the bank’s data and batching in the SWIFT client.

Please note also that the federated evaluation has a time limit of 9 hours.