Hi @_NQ @wenhaowang - Thanks for asking about the YFCC100M data.
As the source of the challenge data, the YFCC100M dataset is not external data and training using data from this source is prohibited (with the exception of the training data provided through the challenge). The 1 million training images have been safely drawn from this source to ensure no overlap, so these are a great resource to use for training.
This has also been clarified in the Problem Description.