Back to DrivenData | Blog

Official pre-trained models / external data thread

Pre-trained models and external data are allowed in this competition as long as they can be released under an Open Source License. We want to build on the best of what’s available. If you do use a pre-trained model or external data, please make sure to share in this thread for the rest of the community.

Thanks and good luck!

2 Likes

To start us off, here are some great external datasets allowable for use upon sharing here and with the proper attributions:

Generally speaking, allowable external data means that they’re publicly available and disclosed here for all participants to benefit from and licensed in a way that enables their use in models released under those open source software licenses mentioned above.

If you are wondering if a specific dataset or pre-trained model is allowed for use or not, please let the challenge organizers know in this thread and we’ll get back to you with an answer. Thank you!

Dave

Do we have to share if we’re using a model trained on ImageNet (for eg most of the default models in fast.ai et al)?

Well, you just did :slight_smile:

For good measure, here’s a starter list of ImageNet pre-trained models:

2 Likes

Many thanks @daveluo_gfdrr :slight_smile:

Hello , i have recently participated in xview2 challenge, it is okay if i use their data also ?
this is the link for the official xview2 challenge dataset website : https://xview2.org/dataset

and thank you!

Hi @Hasan_N,

Welcome to the challenge and thank you for checking about using the xView2 dataset.

Unfortunately, that dataset can’t be used here. xView2 data is licensed as Creative Commons Attribution-Noncommercial-Sharealike 4.0 International (CC BY-NC-SA 4.0) and the NonCommercial-Sharealike part potentially limits the open sourcing of solutions developed in this challenge.

Dave

1 Like

Thank you @daveluo_gfdrr

Open source deep learning code and pretrained models:
https://modelzoo.co/

2 Likes

Any ideas if we can use this https://github.com/dronedeploy/dd-ml-segmentation-benchmark dataset? Can’t really find the license.

1 Like

https://project.inria.fr/aerialimagelabeling/ and this one

1 Like

Hi @azkalot1, thanks for sharing these resources and checking about their usage!

The Inria Aerial Image Labeling dataset should be okay to use as all the imagery and labels are from public domain sources per their website:

The dataset was constructed by combining public domain imagery and public domain official building footprints.

and the paper states their intention for the dataset to be open access:

Let us first highlight the fact that we can only focus on regions where both the images and the reference data are available. In addition, we require the data to be open access in order to freely share our derived dataset with the community.

Re: the DroneDeploy segmentation benchmark dataset, there doesn’t seem to be any info anywhere about its license or permitted usage outside of benchmarking on Weights & Bias. I’m inquiring about it now and will update here if I hear back. Until updated otherwise here, the DroneDeploy dataset should not be used in this challenge.

1 Like

Im not sure about data from here:

https://www.crowdai.org/challenges/mapping-challenge
Mapping Challenge
Building Missing Maps with Machine Learning
License: This dataset is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International - I presume this would be OK to use?

Also:

https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection/data
https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection/rules

Where the Dstl comp data comes from here:

http://worldview3.digitalglobe.com/

But I cant find anywhere a specific note that data is open-source

1 Like
1 Like

Hi @Agedev, thanks for checking about these datasets!

RE: the CrowdAI Mapping Challenge dataset, because its license contains the NonCommercial clause, it’s not permissive enough and can’t be used in this challenge (same issue as with the xView2 dataset inquired about earlier in thread).

In any case, I know that that CrowdAI dataset is a derived subset of the SpaceNet 2 Buildings dataset. The original SpaceNet datasets are OK for use in this challenge as they all have a Creative Commons Attribution-ShareAlike 4.0 International License.

Re: the DSTL Kaggle competition dataset, I can’t find any data licensing info either. If it’s WorldView imagery from DigitalGlobe/Maxar, those are usually not licensed for open usage. In general, if there’s no explicit and permissive open source license or public statement by the producers of how a dataset is intended for use by others, it can’t be used in our challenge. So for the DSTL dataset case, it’s not OK to be used in our challenge.