Dowloading Data

smcoder5 · January 27, 2022, 10:10am

I am trying to download using boto3 and cloudpathlib in python, but I am getting error “Unable to locate Credentials”, I don’t know why I am getting this error but someone can tell me what is the name of bucket of amazon s3, from where I can Download???

cszc · January 27, 2022, 4:06pm

As mentioned in another post, the name of the buckets and the individual filepaths can be found in the satellite_metadata.csv and data download file found on the Data Download page.

Cloudpathlib should work, but to be explicit you can set the no_sign_request boolean kwarg when initializing the s3 client. Hope that helps!

smcoder5 · January 28, 2022, 7:48am

Hello CSZC, I am using Googe Colab as of now, for initial experiment:

I am using cloudpathlib as follows:

cp = CloudPath(“s3://drivendata-competition-airathon-public-as/pm25/train/maiac/2018/20180201T191000_maiac_la_0.hdf --no-sign-request”)
cp.download_to(“1.hdf”)

but still I am getting credentials error.
I have tried both i.e. by not using “–no sign-request”.

This is the first link of satellite metadata.csv
Thank You

cszc · January 28, 2022, 6:06pm

If it’s not working for you out of the box, I would recommend instantiating an s3 client directly:

from cloudpathlib import S3Path, S3Client

s3_cli = S3Client(no_sign_request=True)

From there you can create an s3 path to a specific file, as in your example, or to the whole directory. In either case, you want to pass the s3 client you instantiated. Here’s an example to the maiac directory:

maiac_path = S3Path("s3://drivendata-competition-airathon-public-as/pm25/train/maiac", client=s3_cli)
maiac_files = list(maiac_path.rglob("*.hdf"))
# get first file
fp = maiac_files[0]
# download file to temp directory
fp.fspath
# download to local
fp.download_to("1.hdf")

eyastaifour · February 1, 2022, 6:50am

personally, I had some issues accessing the files until I ran the aws cli (without arguments) and it asked me to setup some AWS things - I entered random digits, and moved forward

Eisenberg · February 2, 2022, 7:07pm

Hello, I’m reaching over 90 GB only for the train dataset… But in the instructions they mention something like 7 GB. Is that normal?

cszc · February 2, 2022, 7:17pm

To which track/product are you referring?

Eisenberg · February 2, 2022, 7:37pm

I used this command line to download the directory on my local machine:

aws s3 cp s3://drivendata-competition-airathon-public-eu/no2/train/ train/ --no-sign-request --recursive

is that normal?

kar95ar · February 3, 2022, 4:33am

I think you’ve downloaded the entire directory - which includes both maiac and misr satellite data.

Eisenberg · February 3, 2022, 12:06pm

Gotcha I figured out, its just that’s my first competition so I don’t really know by where should I start, how many files to dl ect

snande23 · February 3, 2022, 7:49pm

Is there a way to copy the competition data in aws s3 bucket to an Azure Blob Storage? Please help me with this because this is my first competition.

JM1000 · February 18, 2022, 10:42am

Hi, is anyone else struggling to connect to the s3/bucket endpoint? I have the following error:
fatal error: Could not connect to the endpoint URL: “https://drivendata-competition-airathon-public-eu-central-1.s3.London.amazonaws.com/no2/train/tropomi/2019/20190101T213357_tropomi_la_0.nc”

I’ve successfully pinged s3.amazonaws.com… and have tried various region permutations. I’ve also tried pulling data with a scripy per cszc above response but i get the same error.

Just wondered if anyone else had this problem, how did you solve it.

Help… pulling my hair out

Thanks

Carl_Malings · February 18, 2022, 7:02pm

Hello,

I’ve had success with the following snippet of Python code:

# Import the needed libraries:
from cloudpathlib import S3Path, S3Client
# URL of the file I want to download, from the satellite metadata csv file:
s_url = 's3://drivendata-competition-airathon-public-eu/no2/train/tropomi/2019/20190101T213357_tropomi_la_0.nc'
# Where I want the downloaded file to go on my computer:
s_file_location = 'C:\\Users\\carlm\\Downloads'
# Access the file:
Remote_file = S3Path(s_url,client=S3Client(no_sign_request=True))
# Download it:
Remote_file.download_to(s_file_location)

Hope this helps.

cszc · February 22, 2022, 8:51pm

Hi @JM1000 - is this working for you yet? Not sure where the url https://drivendata-competition-airathon-public-eu-central-1.s3.London.amazonaws.com/no2/train/tropomi/2019/20190101T213357_tropomi_la_0.nc is coming from - could you please post more of your code?

Topic		Replies	Views
How to have access to the data The BioMassters	6	1168	November 17, 2022
Larger Dataset no longer on Data Download Clog Loss: Advance Alzheimer’s Research	10	1119	February 16, 2021
Data Images Download from Python code VisioMel Challenge	3	324	April 23, 2023
AWS CLI access forbidden Overhead Geopose Challenge	6	693	June 28, 2021
Data download issues On Cloud N	14	1089	January 24, 2022

Dowloading Data

Related topics