I have not enough space for data

Hello, I am new to this community and I would like to start participating in competitions of this style. My problem is that most of the competitions require a lot of space to store the data, how do you do it? In this competition, for example, the dataset dataset has these sizes… How do you do to download all this locally?
| # files | size
train_features | 189078 | 215.9GB
test_features | 63348 | 73.0GB
train_agbm | 8689 | 2.1GB
Any comments would be helpful,

Thank you very much


Hi Robert,

I have had similar issues with the dataset size. The way I see it is you can take subsets of the training data and train multiple times, or you can stream the data using a pipeline such as torchdata pipes.

I have been experimenting with data pipes and it seems to work.
Youll want to check out the sf3fileloader as these files are stored in an AWS bucket.


Thank so much for your help! I will try everything you recommend. Thanks a lot!