Max Depth option
Hello! Just wondering whether it is possible to set max depth for the crawl?
Previous posts (2023) seems to make use of 'userData' to track the depth.
Thank you.
4 Replies
Someone will reply to you shortly. In the meantime, this might help:
inland-turquoise•4mo ago
Hi, this can be set. You can add a maxdepth parameter in the INPUT_SCHEMA.json file, then retrieve it from the code and implement the specified depth functionality.
fascinating-indigo•4mo ago
@mjh Hey, when using python, you can use
max_crawl_depth
(https://crawlee.dev/python/api/class/BasicCrawlerOptions#max_crawl_depth) .
Otherwise, you can track the current depth using request.userData
. When adding new links in the request handler, just increment the depth by one and use that for each newly enqueued request (https://crawlee.dev/api/core/class/Request)Request | API | Crawlee · Build reliable crawlers. Fast.
Crawlee helps you build and maintain your crawlers. It's open source, but built by developers who scrape millions of pages every day for a living.
BasicCrawlerOptions | API | Crawlee for Python · Fast, reliable Pyt...
Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.
fascinating-indigoOP•4mo ago
Got this. thank you