Hi,
May I know if the MessageId is a unique identifier of the transactions in the final evaluation sets? Is UETR unique too? Additionally, will the train and test transactions have overlapping UETR and/or MessageId? Thank you!
Hi,
May I know if the MessageId is a unique identifier of the transactions in the final evaluation sets? Is UETR unique too? Additionally, will the train and test transactions have overlapping UETR and/or MessageId? Thank you!
Hi @yizhewan,
Yes, both MessageId
and UETR
are unique identifiers, as described here.
There should not be overlap in values between the train and test splits.
Thank you. I’m asking because my code will concatenate train and test datasets during test time with UETR as the index column. It passes the smoke test as well as my own test using published train/test sets. However, it gets
ValueError: cannot reindex on an axis with duplicate labels
when I’m doing a groupby operation.
Since I’m not able to reproduce the error on smoke test and my local test data, I doubt if there’s duplicated UETR or MessageId in the final train/test datasets. Thank you for your response.
Hi @yizhewan,
Thank you for having brought this to our attention. It was actually the case that there was an error with the evaluation data, and that was reflected by some UETR values present in both train and test splits. This has been fixed, so your submissions going forward should not have this issue.
Teams that have made successful submissions for Track A before this fix have had their submissions cleared and are being asked to resubmit. The previously erroneous data should have no impact on final results.
Apologies for the mistake, and thanks again for reporting this.