Dowloading Data

I am trying to download using boto3 and cloudpathlib in python, but I am getting error “Unable to locate Credentials”, I don’t know why I am getting this error but someone can tell me what is the name of bucket of amazon s3, from where I can Download???

As mentioned in another post, the name of the buckets and the individual filepaths can be found in the satellite_metadata.csv and data download file found on the Data Download page.

Cloudpathlib should work, but to be explicit you can set the no_sign_request boolean kwarg when initializing the s3 client. Hope that helps!

1 Like

Hello CSZC, I am using Googe Colab as of now, for initial experiment:

I am using cloudpathlib as follows:

cp = CloudPath(“s3://drivendata-competition-airathon-public-as/pm25/train/maiac/2018/20180201T191000_maiac_la_0.hdf --no-sign-request”)
cp.download_to(“1.hdf”)

but still I am getting credentials error.
I have tried both i.e. by not using “–no sign-request”.

This is the first link of satellite metadata.csv
Thank You

If it’s not working for you out of the box, I would recommend instantiating an s3 client directly:

from cloudpathlib import S3Path, S3Client

s3_cli = S3Client(no_sign_request=True)

From there you can create an s3 path to a specific file, as in your example, or to the whole directory. In either case, you want to pass the s3 client you instantiated. Here’s an example to the maiac directory:

maiac_path = S3Path("s3://drivendata-competition-airathon-public-as/pm25/train/maiac", client=s3_cli)
maiac_files = list(maiac_path.rglob("*.hdf"))
# get first file
fp = maiac_files[0]
# download file to temp directory
fp.fspath
# download to local
fp.download_to("1.hdf")

personally, I had some issues accessing the files until I ran the aws cli (without arguments) and it asked me to setup some AWS things - I entered random digits, and moved forward

Hello, I’m reaching over 90 GB only for the train dataset… But in the instructions they mention something like 7 GB. Is that normal?

To which track/product are you referring?

I used this command line to download the directory on my local machine:

aws s3 cp s3://drivendata-competition-airathon-public-eu/no2/train/ train/ --no-sign-request --recursive

is that normal?

I think you’ve downloaded the entire directory - which includes both maiac and misr satellite data.

1 Like

Gotcha I figured out, its just that’s my first competition so I don’t really know by where should I start, how many files to dl ect :slight_smile:

1 Like

Is there a way to copy the competition data in aws s3 bucket to an Azure Blob Storage? Please help me with this because this is my first competition.

Hi, is anyone else struggling to connect to the s3/bucket endpoint? I have the following error:
fatal error: Could not connect to the endpoint URL: “https://drivendata-competition-airathon-public-eu-central-1.s3.London.amazonaws.com/no2/train/tropomi/2019/20190101T213357_tropomi_la_0.nc

I’ve successfully pinged s3.amazonaws.com… and have tried various region permutations. I’ve also tried pulling data with a scripy per cszc above response but i get the same error.

Just wondered if anyone else had this problem, how did you solve it.

Help… pulling my hair out :joy:

Thanks

Hello,

I’ve had success with the following snippet of Python code:

# Import the needed libraries:
from cloudpathlib import S3Path, S3Client
# URL of the file I want to download, from the satellite metadata csv file:
s_url = 's3://drivendata-competition-airathon-public-eu/no2/train/tropomi/2019/20190101T213357_tropomi_la_0.nc'
# Where I want the downloaded file to go on my computer:
s_file_location = 'C:\\Users\\carlm\\Downloads'
# Access the file:
Remote_file = S3Path(s_url,client=S3Client(no_sign_request=True))
# Download it:
Remote_file.download_to(s_file_location)

Hope this helps.

1 Like

Hi @JM1000 - is this working for you yet? Not sure where the url https://drivendata-competition-airathon-public-eu-central-1.s3.London.amazonaws.com/no2/train/tropomi/2019/20190101T213357_tropomi_la_0.nc is coming from - could you please post more of your code?