Ahhh, I understand! That worried me too, that it might error out during the download.
Could you perhaps write a script (like the python one below) to loop over the files and download them individually? I just checked and this works nicely. If you ran a few scripts in parallel, it wouldn’t be too slow. Sure, it’ll take a while But that’s usually one of the pain points working with good-sized data.
import os
import pandas as pd
metadata = pd.read_csv("features_metadata.csv")
for i, row in metadata.iterrows():
if not os.path.exists(row.filename):
cmd = f"aws s3 cp s3://drivendata-competition-biomassters-public-us/train_features/{row.filename} ./ --no-sign-request"
os.system(cmd)
if i > 5:
break