Question about rules: Publishing a synthetic dataset based on contest voices

ZFTurbo · April 25, 2026, 7:48am

Hello. During the contest, we created a large children’s speech dataset using voice cloning techniques and external text datasets. Since our data is based on the voices from the contest dataset, and the rules forbid sharing the original contest data, we wanted to clarify a point. Because our generated dataset isn’t strictly the original contest data, are we allowed to publish it?

hannahmoro · April 27, 2026, 9:06pm

The competition data was approved strictly for use in the challenge for the duration of the challenge period. For uses beyond the challenge, you will need to follow up with the original data providers. We are putting together a list of providers that we will share on the About page of each track.

ZFTurbo · April 27, 2026, 9:32pm

Thank you for the clarification. Does this restriction also apply to sharing our trained models, given that their weights inherently encode the original contest data?

hannahmoro · May 4, 2026, 6:21pm

Please see the competition rules for terms relevant to sharing solutions, including the following:

Participants are permitted to publicly share source or executable code developed in connection with or based upon the Data, or otherwise relevant to the Competition, provided that such sharing does not violate the intellectual property rights of any third party. By so sharing, the sharing Participant is thereby deemed to have licensed the shared code under the MIT License (an open source software license commonly described at The MIT License - Open Source Initiative).

Topic		Replies	Views
Can we use data from other track? Children’s Speech Recognition Challenge	7	193	March 18, 2026
Rules clarifications Where's Whale-do?	4	339	July 3, 2022
Uploading model weights in HF Children’s Speech Recognition Challenge	1	104	March 3, 2026
Training data: S3 buckets Children’s Speech Recognition Challenge	2	137	March 1, 2026
External data usage Senior Data Science: Safe Aging with SPHERE	1	1200	July 11, 2016

Question about rules: Publishing a synthetic dataset based on contest voices

Related topics