Handling of 4xx and 5xx in default handler (Python)
I built a crawler for crawling the websites and now trying to add functionality to also handle error pages/links like 4xx and 5xx. I was not able to find any documentation regarding that.
So, the question is if it is supported and if yes in what direction to look at?
6 Replies
Someone will reply to you shortly. In the meantime, this might help:
-# This post was marked as solved by rast42. View answer.
genetic-orange•2mo ago
Hey @rast42
Standard crawlee has its own behavior for status error handling
5xx - cause a repeat
403, 429, 401 - cause session rotation if used
4xx - marked as erroneous without repetition
If you want to handle any statuses yourself you can use
ignore_http_error_status_codes
.adverse-sapphire•2mo ago
is it needed to include all the codes in this setting or can we set it to ignore all codes?
genetic-orange•2mo ago
You need to include all. Something like.
adverse-sapphire•2mo ago
crazy. is there no better solution to override the error handling?
genetic-orange•2mo ago
Could you give examples of the kind of behavior you want to achieve?
Perhaps
error_handler
is better for your case
https://crawlee.dev/python/api/class/BasicCrawler#error_handler