I am trying to download using boto3 and cloudpathlib in python, but I am getting error “Unable to locate Credentials”, I don’t know why I am getting this error but someone can tell me what is the name of bucket of amazon s3, from where I can Download???
As mentioned in another post, the name of the buckets and the individual filepaths can be found in the
satellite_metadata.csv and data download file found on the Data Download page.
Cloudpathlib should work, but to be explicit you can set the
no_sign_request boolean kwarg when initializing the s3 client. Hope that helps!
Hello CSZC, I am using Googe Colab as of now, for initial experiment:
I am using cloudpathlib as follows:
cp = CloudPath(“s3://drivendata-competition-airathon-public-as/pm25/train/maiac/2018/20180201T191000_maiac_la_0.hdf --no-sign-request”)
but still I am getting credentials error.
I have tried both i.e. by not using “–no sign-request”.
This is the first link of satellite metadata.csv
If it’s not working for you out of the box, I would recommend instantiating an s3 client directly:
from cloudpathlib import S3Path, S3Client s3_cli = S3Client(no_sign_request=True)
From there you can create an s3 path to a specific file, as in your example, or to the whole directory. In either case, you want to pass the s3 client you instantiated. Here’s an example to the
maiac_path = S3Path("s3://drivendata-competition-airathon-public-as/pm25/train/maiac", client=s3_cli) maiac_files = list(maiac_path.rglob("*.hdf")) # get first file fp = maiac_files # download file to temp directory fp.fspath # download to local fp.download_to("1.hdf")
personally, I had some issues accessing the files until I ran the aws cli (without arguments) and it asked me to setup some AWS things - I entered random digits, and moved forward
Hello, I’m reaching over 90 GB only for the train dataset… But in the instructions they mention something like 7 GB. Is that normal?
To which track/product are you referring?
I used this command line to download the directory on my local machine:
aws s3 cp s3://drivendata-competition-airathon-public-eu/no2/train/ train/ --no-sign-request --recursive
is that normal?
I think you’ve downloaded the entire directory - which includes both maiac and misr satellite data.
Gotcha I figured out, its just that’s my first competition so I don’t really know by where should I start, how many files to dl ect
Is there a way to copy the competition data in aws s3 bucket to an Azure Blob Storage? Please help me with this because this is my first competition.
Hi, is anyone else struggling to connect to the s3/bucket endpoint? I have the following error:
fatal error: Could not connect to the endpoint URL: “https://drivendata-competition-airathon-public-eu-central-1.s3.London.amazonaws.com/no2/train/tropomi/2019/20190101T213357_tropomi_la_0.nc”
I’ve successfully pinged s3.amazonaws.com… and have tried various region permutations. I’ve also tried pulling data with a scripy per cszc above response but i get the same error.
Just wondered if anyone else had this problem, how did you solve it.
Help… pulling my hair out
I’ve had success with the following snippet of Python code:
# Import the needed libraries: from cloudpathlib import S3Path, S3Client # URL of the file I want to download, from the satellite metadata csv file: s_url = 's3://drivendata-competition-airathon-public-eu/no2/train/tropomi/2019/20190101T213357_tropomi_la_0.nc' # Where I want the downloaded file to go on my computer: s_file_location = 'C:\\Users\\carlm\\Downloads' # Access the file: Remote_file = S3Path(s_url,client=S3Client(no_sign_request=True)) # Download it: Remote_file.download_to(s_file_location)
Hope this helps.
Hi @JM1000 - is this working for you yet? Not sure where the url
https://drivendata-competition-airathon-public-eu-central-1.s3.London.amazonaws.com/no2/train/tropomi/2019/20190101T213357_tropomi_la_0.nc is coming from - could you please post more of your code?