CA
Crawlee & Apify7mo ago
dependent-tan

How to set ram used by the crawler

Ive scoured the docs and used chatgpt/perplexity. I for the life of me cannot work out how to set the ram available to the crawler. I want to give it 20gb i have a 32gb system
11 Replies
Hall
Hall7mo ago
Someone will reply to you shortly. In the meantime, this might help:
quickest-silver
quickest-silver7mo ago
Hi, please refer this link https://docs.apify.com/platform/limits
Limits | Platform | Apify Documentation
Learn the Apify platform's resource capability and limitations such as max memory, disk size and number of Actors and tasks per user.
dependent-tan
dependent-tanOP7mo ago
Im not using Apify, is this not place for crawlee python questions?
quickest-silver
quickest-silver7mo ago
are you going to set the RAM in your local python code? In most case, we discuss Apify actor crawlee here
Mantisus
Mantisus7mo ago
Configuration | API | Crawlee for Python · Fast, reliable Python we...
Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.
dependent-tan
dependent-tanOP7mo ago
Hi so I’ve found that but clearly I am not a good developer because I haven’t figured out what I actually need to write in the code to use it.
from crawlee import Configuration
from crawlee import Configuration
Doesn’t work for me.
Mantisus
Mantisus7mo ago
Is it not working when used in this way?
from crawlee.http_crawler import HttpCrawler, HttpCrawlingContext
from crawlee.configuration import Configuration


async def main() -> None:
crawler = HttpCrawler(
configuration=Configuration(memory_mbytes=20480)
)
from crawlee.http_crawler import HttpCrawler, HttpCrawlingContext
from crawlee.configuration import Configuration


async def main() -> None:
crawler = HttpCrawler(
configuration=Configuration(memory_mbytes=20480)
)
dependent-tan
dependent-tanOP7mo ago
from crawlee.configuration import Configuration
from crawlee.configuration import Configuration
Thats the correction I was probably looking for. Ive just tried it on my macbook and it starts crawling w/o errors. When I get home Ill try it on my actual computer and let you know how it goes
MEE6
MEE67mo ago
@Grespino just advanced to level 1! Thanks for your contributions! 🎉
dependent-tan
dependent-tanOP7mo ago
ok so it works in the script but im not quite sure if its working properly. It doesnt seem to going faster althought that could be because I am only trying to do one domain atm rather than many in parallel? here s the autoscaling stuff from console: [crawlee._autoscaling.autoscaled_pool] INFO current_concurrency = 25; desired_concurrency = 200; cpu = 0.0; mem = 0.0; event_loop = 0.295; client_info = 0.0
Mantisus
Mantisus7mo ago
Mergers expect PR related to parallelism - https://github.com/apify/crawlee-python/pull/780 This may be related to the problem it solves.

Did you find this page helpful?