CA
stormy-gold
How to retry when hit with 429
When using crawlee-js its working fine, but when using python 429 is not getting retried. Is there anything I am missing
I am using BeautifulSoupCrawler. Please help.
12 Replies
Someone will reply to you shortly. In the meantime, we’ve found some posts that could help answer your question.
afraid-scarlet•7mo ago
Is it still an issue ? Can you please provide short code reproduction, so we can check it ?
stormy-goldOP•7mo ago
hi sorry for the delayed reply, yes when we get any errors in 429, 403 and anything in 400 range its not retrying
Hi, could you please show a code sample, I'm wondering how you configure the crowler (
max_request_retries
and max_session_rotations
) and do you handle the cases of getting an error somehow additionally?
It is possible that when you get a 429 response, a re-request is executed, but it happens too fast and all re-requests get 429 error status too?
The 403 response status signals that you have received an access lock. I don't think a re-request should be performed in this case, more like a session change.
400 usually signals that the request is invalid, I don't think such requests should be repeated.
In general it seems to me that if you are encountering 429, you should adjust ConcurrencySettings
to reduce the aggressiveness of scraping.
Also, what http client are you using?afraid-scarlet•7mo ago
@Shine
Yes, please share some code to reproduce the issue, including the configuration of your scraper.
Also, provide logs or proof showing that your requests are not being retried in case of a 429 response.
Without this information, it’s difficult to assist, as your case seems quite unusual. By default, such requests should be retried automatically.
stormy-goldOP•7mo ago
Hi the below is the code
stormy-goldOP•7mo ago
Log
stormy-goldOP•7mo ago
if there is 403 error when we try again then its accessible so I want to do retry for this status code
Yes, it looks like you can't call repeats for status code in the 400-499 range
https://github.com/apify/crawlee-python/blob/master/src/crawlee/basic_crawler/_basic_crawler.py#L653
I don't think it's supposed to work that way
GitHub
crawlee-python/src/crawlee/basic_crawler/_basic_crawler.py at maste...
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Wo...
stormy-goldOP•7mo ago
for now what I did is used and in request handler it will have error in elements so retry works from there
I created an Issue on this problem - https://github.com/apify/crawlee-python/issues/756
I'll post here when it's resolved.
Once v.0.5.0 is released, you will be able to invoke retries for 403 or 429 with
additional_http_error_status_codes
.
See PR https://github.com/apify/crawlee-python/pull/812.stormy-goldOP•6mo ago
thank you for the update