Data download issues

ZFTurbo · December 2, 2021, 7:02am

How to get SAS files? Provided links in data_download_instructions don’t work.

<Error>
<Code>AuthenticationFailed</Code>
<Message>
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:28b7b0a6-d01e-0009-574a-e7731c000000 Time:2021-12-02T07:01:43.5624113Z
</Message>
<AuthenticationErrorDetail>
Signature did not match. String to sign used was rl 2022-08-01T12:00Z /blob/cloudcoverdatawesteurope/$root 2018-11-09 c
</AuthenticationErrorDetail>
</Error>

sweetlhare · December 2, 2021, 11:44am

Faced the same problem, restarted the downloader and everything worked

rbgb · December 2, 2021, 2:42pm

I think I see – the URLs in the download instructions are the SAS tokens; they are not meant to be downloaded.

For example wget https://cloudcoverdatawesteurope.blob.core.windows.net/public?se=2022-08-01T12%3A00Z&sp=rl&sv=2018-11-09&sr=c&sig=ABC will fail with the authentication error you are seeing.

Instead, you need to pass that URL string directly to the download_data.py script, like

python download_data.py --sas-url https://cloudcoverdatawesteurope.blob.core.windows.net/public?se=2022-08-01T12%3A00Z&sp=rl&sv=2018-11-09&sr=c&sig=ABC

Or you can write the token to a file and use it that way:

python download_data.py --sas-url sas.txt

where sas.txt is a plain text file containing https://cloudcoverdatawesteurope.blob.core.windows.net/public?se=2022-08-01T12%3A00Z&sp=rl&sv=2018-11-09&sr=c&sig=ABC.

(note that the above example uses a fake token; see the download page for the actual SAS token).

Let me know if that clears things up. Apologies for the confusion, and I’ll update the instructions to be more clear. Happy cloud detecting!

ZFTurbo · December 2, 2021, 3:28pm

Thanks. Now it’s clear. DL is working fine.

Probably it will be bettter just to put sas_europe.txt, sas_us.txt, sas_asia.txt available for download.

rbgb · December 2, 2021, 4:06pm

I like that idea―updated!

henrique · December 4, 2021, 5:31pm

I am not sure if I did something wrong but I got the same files 3 times when I downloaded the 3 given url’s e.g.:

python download_data.py --sas-url sas_westeurope.txt --local-directory data/westeurope
python download_data.py --sas-url sas_centralus.txt --local-directory data/centralus
python download_data.py --sas-url sas_eastasia.txt --local-directory data/eastasia

each folder gets the same 58740 files with exactly the same size and md5sum, even though the url’s are different:

https://cloudcoverdatacentralus.blob.core.windows.net/public?se=2022-08-01T12%3A00Z&sp=rl&sv=2018-11-09&sr=c&sig=Bhyvh/jgnWKtcBbZ62nOJKalUByIzDikBenFxLJs7FU%3D
https://cloudcoverdataeastasia.blob.core.windows.net/public?se=2022-08-01T12%3A00Z&sp=rl&sv=2018-11-09&sr=c&sig=nL3TY7pT/tSppIfZ13UeCXvrNE/wT9o0rTXlyJi8aic%3D
https://cloudcoverdatawesteurope.blob.core.windows.net/public?se=2022-08-01T12%3A00Z&sp=rl&sv=2018-11-09&sr=c&sig=DrqaBLSI9t1nnx1sekyPaMgsqMiO9%2BBzjU/JwDhfQ64%3D

any idea?
thank you

rdeggau · December 4, 2021, 10:18pm

Henrique

From “data_download_instructions.txt”: Each region includes identical data, so choose the region closest to the machine you are downloading the data to"

You need only download from 1 of the 3 regions

fischcheng · December 5, 2021, 1:54am

I think they are supposed to be identical, three different sas files are just different azure blobs. Not sure why, I’m on us-east, but centralus not working for me, and westeurope is very slow…

henrique · December 5, 2021, 11:54am

thank you @fischcheng and @rdeggau !
I should have read it properly, my bad…
It did take me a couple of hours to download it, and it failed a couple of times with lost connection, but at least re-trying after was faster.

imakarov · December 14, 2021, 6:15pm

When I search for the additional data following the Additional data section of Problem description I cannot get any data with the datetime from train_metadata.csv. The section suggests to use both timestamp and coordinates for getting more data, but even searching for timestamp alone does not produce any results. For example, timestamp for cjge chip_id is 2019-11-12T11:02:20Z but I cannot really see any captures made by Sentinel-2 at this exact time point… Here is how I search:

from pystac_client import Client
import rasterio

catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
search = catalog.search(
    collections=["sentinel-2-l2a"], datetime='2019-11-12T11:00:00/2019-11-12T11:04:00'
)
items = list(search.get_items())

# This prints nothing
for item in items:
    print(f"{item.id}: {item.datetime}")

To get any results from the above search, I need to expand the search datetime to '2019-11-12T11:00:00/2019-11-12T11:28:00'.

So, I’m wondering what I’m missing and what can be the reason for the empty search.

kwetstone · December 14, 2021, 8:28pm

@imakarov see this thread about pulling additional bands from the Planetary Computer. We’ll be posting a tutorial soon with a lot more detail!

kwetstone · December 16, 2021, 6:47pm

@imakarov our tutorial on pulling in data from the Planetary Computer is now published here! Hope that helps.

M.Innat · January 16, 2022, 1:59pm

Have anyone uploaded the dataset (training data) in drive / one drive / kaggle? The official instruction isn’t convenient on my side, tried several times, either it stuck or get disconnected in the middle of the download.

DL99 · January 24, 2022, 3:06pm

See below. Does anyone have a solution for this?

jessethegreat · January 24, 2022, 6:00pm

If running on the planetary computer, you don’t need to download the data; they are at /driven-data/cloud-cover

Topic		Replies	Views
Dowloading Data NASA Airathon	13	811	February 22, 2022
Data download from server/cli Genetic Engineering Attribution	9	1370	September 15, 2020
Problems downloading data Hakuna Ma-data	2	669	November 21, 2019
Data download error : SSL validation failed for Video Similarity Challenge	1	331	February 1, 2023
AWS CLI access forbidden Overhead Geopose Challenge	6	697	June 28, 2021

Data download issues

Related topics