How to have access to the data

nick.burns · November 15, 2022, 7:02am

Ahhh, I understand! That worried me too, that it might error out during the download.

Could you perhaps write a script (like the python one below) to loop over the files and download them individually? I just checked and this works nicely. If you ran a few scripts in parallel, it wouldn’t be too slow. Sure, it’ll take a while But that’s usually one of the pain points working with good-sized data.

import os
import pandas as pd

metadata = pd.read_csv("features_metadata.csv")
for i, row in metadata.iterrows():
    
    if not os.path.exists(row.filename):
        cmd = f"aws s3 cp s3://drivendata-competition-biomassters-public-us/train_features/{row.filename} ./ --no-sign-request"
        os.system(cmd)
        
    if i > 5:
        break

Topic		Replies	Views
Larger Dataset no longer on Data Download Clog Loss: Advance Alzheimer’s Research	10	1123	February 16, 2021
How to download the data from a direct link The BioMassters	3	438	November 24, 2022
AWS CLI access forbidden Overhead Geopose Challenge	6	693	June 28, 2021
Dowloading Data NASA Airathon	13	777	February 22, 2022
Data download from server/cli Genetic Engineering Attribution	9	1369	September 15, 2020

How to have access to the data

Related topics