How to get SAS files? Provided links in data_download_instructions don’t work.
<Error>
<Code>AuthenticationFailed</Code>
<Message>
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:28b7b0a6-d01e-0009-574a-e7731c000000 Time:2021-12-02T07:01:43.5624113Z
</Message>
<AuthenticationErrorDetail>
Signature did not match. String to sign used was rl 2022-08-01T12:00Z /blob/cloudcoverdatawesteurope/$root 2018-11-09 c
</AuthenticationErrorDetail>
</Error>
I think I see – the URLs in the download instructions are the SAS tokens; they are not meant to be downloaded.
For example wget https://cloudcoverdatawesteurope.blob.core.windows.net/public?se=2022-08-01T12%3A00Z&sp=rl&sv=2018-11-09&sr=c&sig=ABC will fail with the authentication error you are seeing.
Instead, you need to pass that URL string directly to the download_data.py script, like
Or you can write the token to a file and use it that way:
python download_data.py --sas-url sas.txt
where sas.txt is a plain text file containing https://cloudcoverdatawesteurope.blob.core.windows.net/public?se=2022-08-01T12%3A00Z&sp=rl&sv=2018-11-09&sr=c&sig=ABC.
(note that the above example uses a fake token; see the download page for the actual SAS token).
Let me know if that clears things up. Apologies for the confusion, and I’ll update the instructions to be more clear. Happy cloud detecting!
From “data_download_instructions.txt”: Each region includes identical data, so choose the region closest to the machine you are downloading the data to"
I think they are supposed to be identical, three different sas files are just different azure blobs. Not sure why, I’m on us-east, but centralus not working for me, and westeurope is very slow…
thank you @fischcheng and @rdeggau !
I should have read it properly, my bad…
It did take me a couple of hours to download it, and it failed a couple of times with lost connection, but at least re-trying after was faster.
When I search for the additional data following the Additional data section of Problem description I cannot get any data with the datetime from train_metadata.csv. The section suggests to use both timestamp and coordinates for getting more data, but even searching for timestamp alone does not produce any results. For example, timestamp for cjge chip_id is 2019-11-12T11:02:20Z but I cannot really see any captures made by Sentinel-2 at this exact time point… Here is how I search:
from pystac_client import Client
import rasterio
catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
search = catalog.search(
collections=["sentinel-2-l2a"], datetime='2019-11-12T11:00:00/2019-11-12T11:04:00'
)
items = list(search.get_items())
# This prints nothing
for item in items:
print(f"{item.id}: {item.datetime}")
To get any results from the above search, I need to expand the search datetime to '2019-11-12T11:00:00/2019-11-12T11:28:00'.
So, I’m wondering what I’m missing and what can be the reason for the empty search.
Have anyone uploaded the dataset (training data) in drive / one drive / kaggle? The official instruction isn’t convenient on my side, tried several times, either it stuck or get disconnected in the middle of the download.