Bank name in transaction is not found

I got an exception from the official runtime:


Generally, in our fl solution, each client will send to swift client the list of banks it has. And then the swift client can build a dict mapping bank name to client id. But I somehow got the above exception saying that a bank name existing in the transaction data is not found in any of the lists received from bank clients.
Maybe there’s something wrong with my implementation. I just wonder if anything I miss, or I might misunderstand how the data partition algorithm works.

Hi @panheng,

I believe this is a bug with your implementation. I have checked and the bank in your error is indeed present in the partitioned datasets.

I am not so sure about it. I add one line of code to log whether the bank GVNMTRIS is in the dataset of a bank client just inside the train_client_factory function, as shown in the following figure. The log of a smoke test, fincrime track, shows that, in scenario 1 with 2 bank clients, neither of them contain the bank named GVNMTRIS

I think the data selection of the smoke test may cause the problem. The log file from a normal test says that banks are not missing, but the log file from a smoke test says they are.

Hi @panheng,

I looked at this more closely, and yes, that is correct. Due to the downsampling for creating the smoke test data, a small number of banks present in the transactions are not present in the bank datasets. (This also applies to the centralized smoke test data for the Financial Crime track.)

This is not the case in general for the full datasets—either the evaluation dataset or the development dataset that you have download access to.

Apologies for this difference between the smoke test data and the evaluation data. You may want to have some logic to handle this defensively.

Also, please exercise care in logging out information about the data, especially with regards to the evaluation dataset. In general, printing out information about held out data is disallowed. Given the complex nature of the task for this challenge and the value in helping debug issues, we will not be completely strict about this for small amounts of diagnostics for debugging purposes. Please do not print out a significant amount of raw data, or print out substantial analyses about the data.