Hi everyone ,
Hope you are all doing well! I needed a suggestion related to hardware to run the ML algorithms on such a huge dataset. I was trying tree based algorithms on data using 8GB and also 12GB RAM hardware and still getting memory error every time I run the code.
Could anyone please guide on the configuration of hardware required or if there is any cloud platform where we can access high configuration hardware?
I do not think hardware is an issue here.
GPU : gtx-1070
I could reach 0.91 on leaderboard with the following time to result:
(from raw csvs to submission)
- without GPU : 3 to 4 hours
- with GPU : 20 mins
I would recommend 16GB at least though, and use chunking and iterator to avoid all in memory.
" I could reach 0.91 on leaderboard with the following time to result … with GPU : 20 mins"
i would say that is impressive results. good work!
Those are very good results. Do you have any suggestions for cloud based platform that can provide 16GB or higher memory?
Google Colab is really good, and it’s free. Only 12GB of RAM though, but if you pay $10 a month you get access to higher memory (24GB) VM’s too. I’ve been using the free tier (0.8784 on the leaderboard atm) and found 12GB of RAM is enough. I did run into issues running out of memory as you have. I’d focus on trying to work out if there are any hyperparameters you can change to reduce how much memory the algorithm requires. If not then think about using a different algorithm.