Data download issues

How to get SAS files? Provided links in data_download_instructions don’t work.

<Error>
<Code>AuthenticationFailed</Code>
<Message>
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:28b7b0a6-d01e-0009-574a-e7731c000000 Time:2021-12-02T07:01:43.5624113Z
</Message>
<AuthenticationErrorDetail>
Signature did not match. String to sign used was rl 2022-08-01T12:00Z /blob/cloudcoverdatawesteurope/$root 2018-11-09 c
</AuthenticationErrorDetail>
</Error>

Faced the same problem, restarted the downloader and everything worked

I think I see – the URLs in the download instructions are the SAS tokens; they are not meant to be downloaded.

For example wget https://cloudcoverdatawesteurope.blob.core.windows.net/public?se=2022-08-01T12%3A00Z&sp=rl&sv=2018-11-09&sr=c&sig=ABC will fail with the authentication error you are seeing.

Instead, you need to pass that URL string directly to the download_data.py script, like

python download_data.py --sas-url https://cloudcoverdatawesteurope.blob.core.windows.net/public?se=2022-08-01T12%3A00Z&sp=rl&sv=2018-11-09&sr=c&sig=ABC

Or you can write the token to a file and use it that way:

python download_data.py --sas-url sas.txt

where sas.txt is a plain text file containing https://cloudcoverdatawesteurope.blob.core.windows.net/public?se=2022-08-01T12%3A00Z&sp=rl&sv=2018-11-09&sr=c&sig=ABC.

(note that the above example uses a fake token; see the download page for the actual SAS token).

Let me know if that clears things up. Apologies for the confusion, and I’ll update the instructions to be more clear. Happy cloud detecting!

Thanks. Now it’s clear. DL is working fine.

Probably it will be bettter just to put sas_europe.txt, sas_us.txt, sas_asia.txt available for download.

1 Like

I like that idea―updated!

I am not sure if I did something wrong but I got the same files 3 times when I downloaded the 3 given url’s e.g.:

python download_data.py --sas-url sas_westeurope.txt --local-directory data/westeurope
python download_data.py --sas-url sas_centralus.txt --local-directory data/centralus
python download_data.py --sas-url sas_eastasia.txt --local-directory data/eastasia

each folder gets the same 58740 files with exactly the same size and md5sum, even though the url’s are different:

https://cloudcoverdatacentralus.blob.core.windows.net/public?se=2022-08-01T12%3A00Z&sp=rl&sv=2018-11-09&sr=c&sig=Bhyvh/jgnWKtcBbZ62nOJKalUByIzDikBenFxLJs7FU%3D
https://cloudcoverdataeastasia.blob.core.windows.net/public?se=2022-08-01T12%3A00Z&sp=rl&sv=2018-11-09&sr=c&sig=nL3TY7pT/tSppIfZ13UeCXvrNE/wT9o0rTXlyJi8aic%3D
https://cloudcoverdatawesteurope.blob.core.windows.net/public?se=2022-08-01T12%3A00Z&sp=rl&sv=2018-11-09&sr=c&sig=DrqaBLSI9t1nnx1sekyPaMgsqMiO9%2BBzjU/JwDhfQ64%3D

any idea?
thank you

Henrique

From “data_download_instructions.txt”: Each region includes identical data, so choose the region closest to the machine you are downloading the data to"

You need only download from 1 of the 3 regions

1 Like

I think they are supposed to be identical, three different sas files are just different azure blobs. Not sure why, I’m on us-east, but centralus not working for me, and westeurope is very slow…

1 Like

thank you @fischcheng and @rdeggau !
I should have read it properly, my bad…
It did take me a couple of hours to download it, and it failed a couple of times with lost connection, but at least re-trying after was faster.

When I search for the additional data following the Additional data section of Problem description I cannot get any data with the datetime from train_metadata.csv. The section suggests to use both timestamp and coordinates for getting more data, but even searching for timestamp alone does not produce any results. For example, timestamp for cjge chip_id is 2019-11-12T11:02:20Z but I cannot really see any captures made by Sentinel-2 at this exact time point… Here is how I search:

from pystac_client import Client
import rasterio

catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
search = catalog.search(
    collections=["sentinel-2-l2a"], datetime='2019-11-12T11:00:00/2019-11-12T11:04:00'
)
items = list(search.get_items())

# This prints nothing
for item in items:
    print(f"{item.id}: {item.datetime}")

To get any results from the above search, I need to expand the search datetime to '2019-11-12T11:00:00/2019-11-12T11:28:00'.

So, I’m wondering what I’m missing and what can be the reason for the empty search.

@imakarov see this thread about pulling additional bands from the Planetary Computer. We’ll be posting a tutorial soon with a lot more detail!

@imakarov our tutorial on pulling in data from the Planetary Computer is now published here! Hope that helps.

1 Like

Have anyone uploaded the dataset (training data) in drive / one drive / kaggle? The official instruction isn’t convenient on my side, tried several times, either it stuck or get disconnected in the middle of the download.

See below. Does anyone have a solution for this?

If running on the planetary computer, you don’t need to download the data; they are at /driven-data/cloud-cover