Canada411 site failing after 4 hours
I am using a CheerioCrawler actor to process input files of 500,000 records against this dynamically populated url: https://www.canada411.ca/search/?stype=re&what=
The actor has been mysteriously failing after 4 to 4.5 hours, and we have not observed such behavior before. I have included below the log toward the end of the failed run (#KcMSz5QQp8qIQnbYF).
Any insight on this error message would be greatly appreciated. Thank you!
Reverse Phone Number Lookup - Canadian People Search | Canada 411 P...
Reverse phone lookup for finding someone quickly. Enter a 7-digit number in our reverse phone number lookup for general listings or a 10-digit one for a specific listing.
14 Replies
exotic-emerald•3y ago
The error seems to be due to a TimeOut.
Maybe you have been banned by the site.
Are you using proxies?
metropolitan-bronzeOP•3y ago
I am using proxies with these statements:
continuing-cyan•3y ago
without settings proxy config will only consider datacenter proxies available for your account, so may be targeted site added blocking for datacenter IPs or there is limit per IP and you reached it for all of your available IPs
metropolitan-bronzeOP•3y ago
Thank you for that insight.
correct-apricot•3y ago
The reason your actor crashed is that you have an error in
failedRequestHandler
https://crawlee.dev/api/cheerio-crawler/interface/CheerioCrawlerOptions#failedRequestHandler
The code there must not crash so either remove the failable code or wrap it in try.catchmetropolitan-bronzeOP•3y ago
Thank you, Lukas!
@danhelfman just advanced to level 6! Thanks for your contributions! 🎉
metropolitan-bronzeOP•3y ago
Hi! I know this is an old thread, but we are continuing to experience this issue. As @Lukas Krivka suggested, I wrapped the code within the failedRequestHandler in a try/catch block, as in the following, but the catch block wasn't reached. Any suggestions would be greatly appreciated.
Hello @danhelfman
The code looks good. Are we still talking about the error:
?
metropolitan-bronzeOP•3y ago
Yes, that is the error we are still experiencing.
@danhelfman may you provide me id of the run to the PM?
metropolitan-bronzeOP•3y ago
6cQvh6Q8MfvM37VmA
I am not even able to get to that page. Not even with CA residential proxies,
There is discussion https://updownradar.com/status/canada411.ca#comments from last two days, that the website is not working... :\
Canada411 down today April, 2023? Canada411.ca not working for me o...
Canada411.ca website down Today April, 2023? Can't log in? Real-time problems and outages - here you'll see what is going on.
correct-apricot•3y ago
Hmm seems the actual error was swallowed by the log limit. If you have fully try/catch than it might have crashed inside Crawlee, I will check the code. But this will be hard to reproduce since your run is going crazy fast