Track B: execution failed for federated without messages

Hi @jayqi!

I’m observing a weird failure for my federated smoke test (at timestep 2023-01-23 20:23:47 UTC).

My code has successfully ran till the testing stage of the 3rd scenario, and then it abruptly failed at without any error messages. I’m not too sure if it’s a memory issue, after seeing that:

  • it has successfully finished the first 2 scenarios for all clients
  • the most memory intensive data preprocessing stage finished succesfully
  • the same model training/inference code can finish for the central evaluation runtime

Could you please give some pointers as to why the submission failed? Would it be due to time limit or memory limit? Thanks!

Also, if it is indeed a time/memory issue, would you please consider increasing the limit for the runtime? From the public leaderboard it seems that not a lot of teams have successfully made a submission to the federated runtime.

Thanks!

Hi @kzliu,

For your federated smoke test submitted at 2023-01-23 20:23:47 UTC, it was likely killed because you hit the time limit of 9 hours. This is consistent with the final log message being at 2023-01-24 05:22:28 UTC.

The smoke test uses approximately 30% as much data as the normal submission, and has the time limit set to half of the time. (A normal evaluation submission for Track B has a time limit of 18 hours.)

Given that you ran out of time on the smoke test, you should consider making optimizations or adjusting hyperparameters to speed things up.

1 Like