CheerioCrawler Timeout after 320 Seconds Error/Exception
In some of our CheerioCrawler actors, we continue to get some random timeout errors after 320 seconds that cause them to crash. This is an example of the error:
This is an example of another error message that occurs:
I have tried wrapping the code in the failedRequestHandler in a try/catch block, but that doesn't provide any additional information.
I manually resurrected two recent jobs that failed, and their ids are and . Any insight would be greatly appeciated, as this is impacting production. Thank you!
I manually resurrected two recent jobs that failed, and their ids are and . Any insight would be greatly appeciated, as this is impacting production. Thank you!
7 Replies
other-emerald•2y ago
Firstly - I see you are sending the request twice. Cheerio crawler is already using got-scraping underneath and you have access to the response body directly requestHandler. Then you send request again wiht
response = await gotScraping({ url: request.url, proxyUrl: proxyUrl});
- this line is not needed. Secondly - I would change node version in dockerfile to 18.
But generally speaking - I would start by changing two things I pointed above. Will also recommend to remove unused dependencies from package.jsondependent-tanOP•2y ago
Thank you so much for your assistance. I made the changes, and they appear to have resolved the issue. I will try making the changes to our other actors, too. Thanks again!
other-emerald•2y ago
Happy to help 👍
dependent-tanOP•2y ago
Hi, @Andrey Bykov! I thought I had resolved this issue by making the changes you recommended. However, another of our actors is experiencing similar behavior. This is the the run ID: . Any other suggestions of what I could try?
other-emerald•2y ago
in the beginning of the run I only see proxy errors (which means probably bad proxy or some network issues). In the end - it's 30 seconds timeout, which in theory could also be network issues (in case server response takes too much time)
dependent-tanOP•2y ago
Thank you for this suggestion. I have tried specifying residential proxies, and that seems to be working.
other-emerald•2y ago
Perfect, happy to help!