dependent-tan

How to set ram used by the crawler

Ive scoured the docs and used chatgpt/perplexity. I for the life of me cannot work out how to set the ram available to the crawler. I want to give it 20gb i have a 32gb system

11 Replies

Hall•7mo ago

Someone will reply to you shortly. In the meantime, this might help:

quickest-silver•7mo ago

Hi, please refer this link https://docs.apify.com/platform/limits

Limits | Platform | Apify Documentation

Learn the Apify platform's resource capability and limitations such as max memory, disk size and number of Actors and tasks per user.

dependent-tanOP•7mo ago

Im not using Apify, is this not place for crawlee python questions?

quickest-silver•7mo ago

are you going to set the RAM in your local python code? In most case, we discuss Apify actor crawlee here

Mantisus•7mo ago

Hey @Grespino Use https://crawlee.dev/python/api/class/Configuration#memory_mbytes

Configuration | API | Crawlee for Python · Fast, reliable Python we...

Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.

dependent-tanOP•7mo ago

Hi so I’ve found that but clearly I am not a good developer because I haven’t figured out what I actually need to write in the code to use it.

from crawlee import Configuration

from crawlee import Configuration

Doesn’t work for me.

Mantisus•7mo ago

Is it not working when used in this way?

from crawlee.http_crawler import HttpCrawler, HttpCrawlingContext
from crawlee.configuration import Configuration


async def main() -> None:
    crawler = HttpCrawler(
        configuration=Configuration(memory_mbytes=20480)
    )

from crawlee.http_crawler import HttpCrawler, HttpCrawlingContext
from crawlee.configuration import Configuration


async def main() -> None:
    crawler = HttpCrawler(
        configuration=Configuration(memory_mbytes=20480)
    )

dependent-tanOP•7mo ago

from crawlee.configuration import Configuration

from crawlee.configuration import Configuration

Thats the correction I was probably looking for. Ive just tried it on my macbook and it starts crawling w/o errors. When I get home Ill try it on my actual computer and let you know how it goes

MEE6•7mo ago

@Grespino just advanced to level 1! Thanks for your contributions! 🎉

dependent-tanOP•7mo ago

ok so it works in the script but im not quite sure if its working properly. It doesnt seem to going faster althought that could be because I am only trying to do one domain atm rather than many in parallel? here s the autoscaling stuff from console: [crawlee._autoscaling.autoscaled_pool] INFO current_concurrency = 25; desired_concurrency = 200; cpu = 0.0; mem = 0.0; event_loop = 0.295; client_info = 0.0

Mantisus•7mo ago

Mergers expect PR related to parallelism - https://github.com/apify/crawlee-python/pull/780 This may be related to the problem it solves.

Gaming

Programming

How to set ram used by the crawler

Did you find this page helpful?