PETs Challenge - Data Availability

There is no public data for this yet but the Track page says that you will be provided datasets representing synthetic population data that includes:

  • a social contact network, capturing when and where any two people come into contact, and the duration of the contact
  • demographic attributes of individuals
  • observations of individuals’ health state (i.e., whether they are infected or not).

Any idea when will that be?

Hi, I wonder how to download Track A’s data. Thanks.

1 Like

Hi @sony.rajan and @dwtqq, welcome to the PETs Prize Challenge!

I’ve merged your two topics together and renamed it, so that I can address both of your questions in one place.

The challenge organizers are still finalizing both the Track A and Track B datasets. The details in the two data overview pages are also being updated as the datasets become finalized. We expect that everything will be ready soon (within the next two weeks), at which time we will release the datasets for both tracks for download. We will be sure to have clear announcements when this happens.

Thank you for your patience!

1 Like

Hi,
I am not able to find the Track A: Financial Crime data also,
Can you please confirm that it was not posted yet?
Best,
Samer Hanoudi

Hi @shanoudi,

The data has not yet been posted for either track. We are expecting that they will be ready sometime this week, and will follow-up with announcements and with a response in this thread.

Hi @jayqi,
Thank you for your response,
Do you know if the due date will be extended because of this delay?
Best,
Samer

Hi everyone,

The development data for “Track B: Pandemic Forecasting” is now available on the data download page for the challenge. This data is intended for teams to use locally for solution development in Phase 1 and Phase 2.

We have also updated the data overview page with additional details about the data and the modeling task. Please review that page for an explanation of the dataset and the risk forecasting task.


The data and an update to the overview page for “Track A: Financial Crime” is not yet available but will be coming soon. Thank you for your patience.


@shanoudi: There are no changes to the competition timeline planned.

Hi everyone,

The development data for “Track A: Financial Crime” is now available on the data download page for the challenge. This data is intended for teams to use locally for solution development in Phase 1 and Phase 2.

We have also updated the data overview page with additional details about the data and the modeling task. Please review that page for an explanation of the dataset and the anomaly detection task.

Additionally, code for two example baseline models (Random Forest and XGBoost) developed by SWIFT is also available on the data download page. These two models only use SWIFT transaction data, and are meant to be examples to help you get started on model development.

Hi @jayqi the data download links result in

Page not found.
Surprised to be here? If you followed a competition-specific link, please make sure that you are signed in and signed up for that competition."

Hi @ewdavis, which link/URL are you trying to access that is not working for you?

Thanks @jayqi, we were trying the main link from the track pages labeled

data download

Hi @ewdavis, as the error message that you got suggests, you must be logged in and registered for the competition in order to access the data. It looks like your account is not registered for the challenge. Every account must individually register for the challenge and agree to the challenge rules.

Hi @jayqi , will we be providing baseline model for Track B as well? I also wonder if we need to specify the scope of each federation unit by ourselves when developing our solution?

Thanks!

The baseline models for Track B are available. Please see the latest challenge announcement for more details.

Hello @jayqi when I go to the download page I see this error message:

Can you please help advise how to enable my account to view the data download please?

Thank you,
Adam

Hi, I am facing the issue, I cannot see/access to the data. Regards,

Hi @lzambra2,

The data is only available to PETs Prize Challenge participants. It does not look like you are registered for Phase 1 or Phase 2 of the challenge. The registration for those phases is closed.

If you are interested in the challenge, you may be interested in signing up for Phase 3 (Red Teams). See here for more information: Competition: U.S. PETs Prize Challenge: Phase 3 (Red Teams)

Hi @jayqi,

I am a Ph.D. researcher working on enhancing transaction monitoring and was wondering if it would be possible for me to access the dataset. Hope to hear from you soon.

Thank you.

The datasets for this challenge are only available for the purposes of participating in the challenge.

Currently, only Phase 1 and Phase 2 blue team participants have been given access to the data.

Fully registered Phase 3 red team participants will be given access after registration closes.

The general public will not be given access to the challenge datasets while the challenge is active. There are no plans to make the dataset public after the close of the competition, but it is possible that the data partners at SWIFT and/or UVA-BII may separately choose to make their datasets public following the end of the challenge in March 2023.

Hi @jayqi,

Is there any indication of how the Virginia dev population data was split to generate the example AUPRC scores presented in the WH-challenge-baselines ReadMe? GitHub - NSSAC/WH-challenge-baselines: Example baseline implementations for WH PETS challenge

There is a note saying that 50% of the Virginia dev population data was used to generate results, is this a random split or partition?

Thanks,
Sarah