CA
dependent-tan
How to set ram used by the crawler
Ive scoured the docs and used chatgpt/perplexity. I for the life of me cannot work out how to set the ram available to the crawler. I want to give it 20gb i have a 32gb system
11 Replies
Someone will reply to you shortly. In the meantime, this might help:
quickest-silver•7mo ago
Hi, please refer this link
https://docs.apify.com/platform/limits
Limits | Platform | Apify Documentation
Learn the Apify platform's resource capability and limitations such as max memory, disk size and number of Actors and tasks per user.
dependent-tanOP•7mo ago
Im not using Apify, is this not place for crawlee python questions?
quickest-silver•7mo ago
are you going to set the RAM in your local python code?
In most case, we discuss Apify actor crawlee here
Hey @Grespino
Use https://crawlee.dev/python/api/class/Configuration#memory_mbytes
Configuration | API | Crawlee for Python · Fast, reliable Python we...
Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.
dependent-tanOP•7mo ago
Hi so I’ve found that but clearly I am not a good developer because I haven’t figured out what I actually need to write in the code to use it.
Doesn’t work for me.
Is it not working when used in this way?
dependent-tanOP•7mo ago
Thats the correction I was probably looking for. Ive just tried it on my macbook and it starts crawling w/o errors. When I get home Ill try it on my actual computer and let you know how it goes
@Grespino just advanced to level 1! Thanks for your contributions! 🎉
dependent-tanOP•7mo ago
ok so it works in the script but im not quite sure if its working properly. It doesnt seem to going faster althought that could be because I am only trying to do one domain atm rather than many in parallel?
here s the autoscaling stuff from console:
[crawlee._autoscaling.autoscaled_pool] INFO current_concurrency = 25; desired_concurrency = 200; cpu = 0.0; mem = 0.0; event_loop = 0.295; client_info = 0.0
Mergers expect PR related to parallelism - https://github.com/apify/crawlee-python/pull/780
This may be related to the problem it solves.