Our recent submission has been pending for more than 8 hours when where was only one 1 submission ahead of us in the queue at the time of submission. Is this normal? Thanks.
There was a long delay between when the previous (now running) submission was submitted and when its execution was started on the cluster (likely due to Azure), so the runtime isn’t over 8 hours. Yours is next!
Thanks Emily!
We also want to bring to your attention a matter regarding the execution speed of our recent submission, and I would appreciate your assistance in clarifying this issue.
Specifically, we have observed that our latest submission (dated 2023-05-02 16:53:36 UTC) experienced a significantly slower inference time compared to our previous submission (dated 2023-04-27 06:07:21 UTC). In particular, the inference time required was almost double that of our previous submission. This discrepancy resulted in our inability to complete the prediction of all test cases within the allotted time frame. Btw, the two submissions are completely same for the test purpose.
We are wondering if there has been any change to the running environment or hardware settings of the clusters. For example, have multiple containers been running simultaneously recently on the cluster and sharing the hard disk/GPU/CPU during the latest submissions? We have been working diligently to optimize the inference speed by fully utilizing the entire GPU/CPU in the past few weeks, and we hope that the running environment can be kept consistent throughout the challenge. Thank you.
My submission has been queued for nearly a day and is still pending. I would like to make two suggestions:
- Set a cooldown period like 8 hours for timeout submissions. People often make minor adjustments and resubmit their timed-out submissions, which could increase the loading of your queue and other people’s submission will delay more time.
- Add more workers to help process submissions faster.
The program runs for a maximum of 8 hours, and is forced to close when the time limit is exceeded.Our program has been tested before to pass the time consumption, but in the recent even submitted the same two copies of the code, the original prediction of 450 samples only 4.3 hours, now it takes 7-8 hours, how can the same two copies of the code cause such a big time difference?
Since the host indicated that our submission is running on Azure, the hardware has been virtualized. CPUs and RAM are generally virtualized and their performance varies under different loads. GPUs may be dedicated by passthrough or virtualized (K80 is a datacenter card that supports virtualization). The difference in inference depends on your bottleneck. The worst situation may be when your code has a heavy disk loading and the VM is based on an HDD disk. This is just my speculation; only the platform can identify the real issue.
Our program does rely on a lot of reads, but does not need to rely on writes, which would be very bad if there was competition for reads similar to mine at the same time
Also, worried about how long the submission time takes.
Submissions are not sharing hard disk, GPU, or CPU. Each is run on a separate VM.
We just increased the number of nodes to 4 (from 1) so submissions will spend much less time waiting in the queue. Thanks for the great engagement with this competition!