mirror of
https://github.com/GokuMohandas/Made-With-ML.git
synced 2026-03-09 07:12:37 -05:00
trainer.fit() Object Memory Error in Local Machine #63
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @satyamnyati on GitHub (Feb 2, 2024).
I run out of memory in trainer.fit(), I have 8gb RAM and i7 8th gen 12 core CPU. I also see error in trainer.fit(). Is there any way to reduce the load on RAM or will I need another RAM to be able to run this.
@totovivi commented on GitHub (Feb 9, 2024):
These changes helped me:
But I am surprised that I had to do these changes, given my computer's characteristics
@satyamnyati commented on GitHub (Feb 13, 2024):
Yes, I changed the batch size and I could get it to run on ubuntu. However in windows I couldn't run it even with these changes. The path manipulations are causing some problems.
@capmichal commented on GitHub (Feb 26, 2024):
Same issue here, on both laptops (powerfull Dell XPS 15, and weaker HP from work). After using num_workers=1 and batch_size=32 I could run just fine, it took about 30 minutes to train model.
If I work on a single machine, does the num_workers affect this OOM problem, or all i have to do is manipulate batch_size variable ? Does changing to 2 workers allow me to use larger batch size? Or does my computer(each core) just cannot handle more that batch_size=32 ?
Would love to get explanation about how does num_workers, resources_per_worker and batch_size relate to my RAM.
In Addition: Using GPU on my Dell solves this issue, and trains model in a minute, but i am interested in using only CPU. While running the whole setup my computer RAM usage is about 70% which leaves about 4/5GB of RAM for training, it is simply not enough, so any configuration (if not using GPU) wont allow my to easily train?
@Koowah commented on GitHub (May 4, 2024):
@capmichal @totovivi I had the same reactions and questions.
After a bit of research and chatting with GPT, here's what I gathered :
So I guess the main issue regarding memory is how much data is passed by iteration (batch_size * num_workers). Using less workers should however be better as each worker has overhead and must load its own copy of the model.
@satyamnyati commented on GitHub (Nov 16, 2024):
The thing that worked for me was reducing all the params like batch size etc. in training.