Questions about dataset in Data Track A

Hi,

What is the exact input to the collaboratively trained model? The provided test data “swift_transaction_test_dataset.zip” consists of all elements of the SWIFT dataset. Does this mean that all elements in the SWIFT dataset can be used as model inputs? Can I use elements from the Account data as (private) input at the same time? (e.g. the flag of OrderingAccount or/and BeneficiaryAccount.)

Are there some elements in the SWIFT dataset that the banks related to the transaction can access/hold? Of course, unrelated banks should not learn these. For example, can I assume that ​the “Sender” bank (resp. “Receiver” bank​) of a transaction knows the MessageId, ​"Receiver" bank (resp. “Sender” bank)​, Timestamp, Label, or more? If yes, I would like to have a precise scope of the transaction elements that related banks can hold.

In addition, according to the description of " Scope of sensitive data", can I consider that the sensitive information in the SWIFT dataset does not include Sender, Receiver, SettlementCurrency, SettlementAmount, InstructedCurrency, InstructedAmount, Label?

Thanks!

Hi @jikeyu,

At inference time, both the SWIFT transaction dataset (rows belonging to the test set) and the bank account dataset will be available in a similar way as training time. More details on this will be forthcoming in “What to Expect in Phase 2” documentation that we will publish in the next few weeks.

For the scope of sensitive data in this challenge, please consider SWIFT’s transaction data table as belonging to SWIFT and sensitive, and the bank data table as belonging to the respective bank partition and sensitive. The situation in real life may be more complex, but this is simplified for the challenge.

1 Like

These questions are regarding Track A:

  1. Can we assume that the final trained model as well as the results of inferences made by this model will only been known to SWIFT? This matters because under this assumption it would be fine for the final model to leak some information about the SWIFT data, because it would only be seen internally by SWIFT. Of course it shouldn’t leak any information to SWIFT about the banks’ data.

  2. Is it correct that the only additional information to be gained from the banks’ data (Dataset 2) is the “Flags” field? All the other fields in Dataset 2, such as name and address, seem to be already included in the SWIFT data (Dataset 1).

1 Like

Hi @mdecock,

Apologies for the long delays to answer your questions, and thank you for your patience.

Regarding (1): the challenge organizers are working on formulating a precise answer, and we will follow up here later with a response.

Regarding (2): You are correct. The SWIFT data and the bank data both are sources of notionally the same information for names and addresses. The flag field is the only field that is only available from the banks.

1 Like

Hi @mdecock,

The challenge rules don’t strictly define what constitutes the “model” or who gets access to it. Instead, the parties engage in a “training” phase and an “inference” phase, and local state may be saved for each party across these phases. So, it is certainly valid for each party (including SWIFT) to train a local model that is not shared with the other parties, as long as your solution can leverage the local model appropriately during inference. The inferences made during the inference are not considered sensitive, and can be revealed to all parties.

1 Like