Handling of 4xx and 5xx in default handler (Python)

I built a crawler for crawling the websites and now trying to add functionality to also handle error pages/links like 4xx and 5xx. I was not able to find any documentation regarding that. So, the question is if it is supported and if yes in what direction to look at?
6 Replies
Hall
Hall2mo ago
Someone will reply to you shortly. In the meantime, this might help: -# This post was marked as solved by rast42. View answer.
genetic-orange
genetic-orange2mo ago
Hey @rast42 Standard crawlee has its own behavior for status error handling 5xx - cause a repeat 403, 429, 401 - cause session rotation if used 4xx - marked as erroneous without repetition If you want to handle any statuses yourself you can use ignore_http_error_status_codes.
adverse-sapphire
adverse-sapphire2mo ago
is it needed to include all the codes in this setting or can we set it to ignore all codes?
genetic-orange
genetic-orange2mo ago
You need to include all. Something like.
list(range(400,600))
list(range(400,600))
adverse-sapphire
adverse-sapphire2mo ago
crazy. is there no better solution to override the error handling?
genetic-orange
genetic-orange2mo ago
Could you give examples of the kind of behavior you want to achieve? Perhaps error_handler is better for your case https://crawlee.dev/python/api/class/BasicCrawler#error_handler

Did you find this page helpful?