When I submit code, I find
< ... WARNING: logs capped at 1,000 lines; dropping 77 more >
Therefore, I cannot find what errors occur after 1,000 lines. Could you change the limitation from 1,000 to 10,000? Currently, I cannot figure out the bug. Thanks,
@wenhaowang I’ve updated the limit, let me know if you hit the limit again and I can increase further!
Great, many thanks to this
It seems I have successfully run the code, but the muAP of the subset is about 0.06 (please refer to my latest submission). Is that normal? I am not sure whether this submission is valid for phase 2?
I would expect them to be comparable if inference is running identically. One thing to check locally is whether, in your local container, the subset descriptors you generate (can be on any random subset) are identical to those you generate on the full query set. I will investigate whether there’s an issue on our end contributing to a score difference.
Thanks. In my local container, I find the descriptors of any subset are same (regardless of the sequence of video frame because in descriptor track time frames are generated randomly.) I think the difference in score may because whether you use the same ground truths for both full and subset. In this way, the muAP will be low for subset because there are only about 1/10 videos.
I believe we should be subsetting the ground truth to only the portion of the ground truth that contains videos relevant to the query subset, but it’s possible there’s an error in the processing there, that’s a good callout.
This was, in fact, the problem! The subset file on our platform was out-of-date and has now been corrected. On your next submission, I believe the uAP should be comparable.
Thank you again @wenhaowang for your help in identifying and debugging these issues! I’m very grateful to you for promptly calling them out and communicating frequently with us.
Thanks a lot about your effort
1 Like
Thanks, the scores are comparable now.
1 Like