How to have access to the data

Ahhh, I understand! That worried me too, that it might error out during the download.

Could you perhaps write a script (like the python one below) to loop over the files and download them individually? I just checked and this works nicely. If you ran a few scripts in parallel, it wouldn’t be too slow. Sure, it’ll take a while :frowning: But that’s usually one of the pain points working with good-sized data.

import os
import pandas as pd

metadata = pd.read_csv("features_metadata.csv")
for i, row in metadata.iterrows():
    
    if not os.path.exists(row.filename):
        cmd = f"aws s3 cp s3://drivendata-competition-biomassters-public-us/train_features/{row.filename} ./ --no-sign-request"
        os.system(cmd)
        
    if i > 5:
        break
3 Likes