Official pre-trained models / external data thread

glipstein · December 18, 2019, 3:55pm

Pre-trained models and external data are allowed in this competition as long as they can be released under an Open Source License. We want to build on the best of what’s available. If you do use a pre-trained model or external data, please make sure to share in this thread for the rest of the community.

Thanks and good luck!

daveluo_gfdrr · December 20, 2019, 7:02pm

To start us off, here are some great external datasets allowable for use upon sharing here and with the proper attributions:

SpaceNet datasets, licensed as CC BY-SA 4.0:
- also their SpaceNet-pretrained models, available via solaris library by CosmiQ Works
Any OpenStreetMap data, licensed as ODbL 1.0. Can access via:
- HOTOSM’s export tool
- overpass turbo

Generally speaking, allowable external data means that they’re publicly available and disclosed here for all participants to benefit from and licensed in a way that enables their use in models released under those open source software licenses mentioned above.

If you are wondering if a specific dataset or pre-trained model is allowed for use or not, please let the challenge organizers know in this thread and we’ll get back to you with an answer. Thank you!

Dave

johnowhitaker · December 21, 2019, 9:06am

Do we have to share if we’re using a model trained on ImageNet (for eg most of the default models in fast.ai et al)?

daveluo_gfdrr · December 21, 2019, 3:51pm

Well, you just did

For good measure, here’s a starter list of ImageNet pre-trained models:

Fast.ai’s vision models zoo (thanks @johnowhitaker) : https://docs.fast.ai/vision.models.html
PyTorch/torchvision pretrained models, https://pytorch.org/docs/stable/torchvision/models.html
More PyTorch pretrained models from:
- Cadene: https://github.com/Cadene/pretrained-models.pytorch
- Ross Wightman: https://github.com/rwightman/pytorch-image-models
Tensorflow Keras pretrained models: https://www.tensorflow.org/api_docs/python/tf/keras/applications

johnowhitaker · December 21, 2019, 3:55pm

Many thanks @daveluo_gfdrr

Hasan_N · January 18, 2020, 5:05pm

Hello , i have recently participated in xview2 challenge, it is okay if i use their data also ?
this is the link for the official xview2 challenge dataset website : https://xview2.org/dataset

and thank you!

daveluo_gfdrr · January 19, 2020, 10:24pm

Hi @Hasan_N,

Welcome to the challenge and thank you for checking about using the xView2 dataset.

Unfortunately, that dataset can’t be used here. xView2 data is licensed as Creative Commons Attribution-Noncommercial-Sharealike 4.0 International (CC BY-NC-SA 4.0) and the NonCommercial-Sharealike part potentially limits the open sourcing of solutions developed in this challenge.

Dave

Hasan_N · January 20, 2020, 5:45am

Thank you @daveluo_gfdrr

akashintsev · February 14, 2020, 6:01am

Open source deep learning code and pretrained models:
https://modelzoo.co/

azkalot1 · February 22, 2020, 11:00pm

Any ideas if we can use this https://github.com/dronedeploy/dd-ml-segmentation-benchmark dataset? Can’t really find the license.

azkalot1 · February 23, 2020, 3:57am

https://project.inria.fr/aerialimagelabeling/ and this one

daveluo_gfdrr · February 24, 2020, 5:07pm

Hi @azkalot1, thanks for sharing these resources and checking about their usage!

The Inria Aerial Image Labeling dataset should be okay to use as all the imagery and labels are from public domain sources per their website:

The dataset was constructed by combining public domain imagery and public domain official building footprints.

and the paper states their intention for the dataset to be open access:

Let us first highlight the fact that we can only focus on regions where both the images and the reference data are available. In addition, we require the data to be open access in order to freely share our derived dataset with the community.

Re: the DroneDeploy segmentation benchmark dataset, there doesn’t seem to be any info anywhere about its license or permitted usage outside of benchmarking on Weights & Bias. I’m inquiring about it now and will update here if I hear back. Until updated otherwise here, the DroneDeploy dataset should not be used in this challenge.

akashintsev · February 26, 2020, 7:15am

Agedev · March 1, 2020, 7:29am

Im not sure about data from here:

https://www.crowdai.org/challenges/mapping-challenge
Mapping Challenge
Building Missing Maps with Machine Learning
License: This dataset is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International - I presume this would be OK to use?

Also:

https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection/data
https://www.kaggle.com/c/dstl-satellite-imagery-feature-detection/rules

Where the Dstl comp data comes from here:

http://worldview3.digitalglobe.com/

But I cant find anywhere a specific note that data is open-source

qubvel · March 1, 2020, 9:36pm

daveluo_gfdrr · March 2, 2020, 6:25pm

Hi @Agedev, thanks for checking about these datasets!

RE: the CrowdAI Mapping Challenge dataset, because its license contains the NonCommercial clause, it’s not permissive enough and can’t be used in this challenge (same issue as with the xView2 dataset inquired about earlier in thread).

In any case, I know that that CrowdAI dataset is a derived subset of the SpaceNet 2 Buildings dataset. The original SpaceNet datasets are OK for use in this challenge as they all have a Creative Commons Attribution-ShareAlike 4.0 International License.

Re: the DSTL Kaggle competition dataset, I can’t find any data licensing info either. If it’s WorldView imagery from DigitalGlobe/Maxar, those are usually not licensed for open usage. In general, if there’s no explicit and permissive open source license or public statement by the producers of how a dataset is intended for use by others, it can’t be used in our challenge. So for the DSTL dataset case, it’s not OK to be used in our challenge.

Topic		Replies	Views
Official external data / pre-trained models thread Mapping Disaster Risk from Aerial Imagery	3	1261	December 22, 2019
External data and the use of an ImageNet pre-trained model Mapping Disaster Risk from Aerial Imagery	4	1280	October 15, 2019
Are pre-trained models allowed? Pri-matrix Factorization	5	1699	December 20, 2017
Official pre-trained models/external data thread Clog Loss: Advance Alzheimer’s Research	3	1289	August 3, 2020
Pre-trained models and external data Cold Start Energy Forecasting	5	1198	September 13, 2018

Official pre-trained models / external data thread

Related topics