Canada411 site failing after 4 hours

I am using a CheerioCrawler actor to process input files of 500,000 records against this dynamically populated url: https://www.canada411.ca/search/?stype=re&what= The actor has been mysteriously failing after 4 to 4.5 hours, and we have not observed such behavior before. I have included below the log toward the end of the failed run (#KcMSz5QQp8qIQnbYF). Any insight on this error message would be greatly appreciated. Thank you!
2022-11-09T21:39:19.012Z ERROR CheerioCrawler: An exception occurred during handling of failed request. This places the crawler and its underlying storages into an unknown state and crawling will be terminated. This may have happened due to an internal error of Apify's API or due to a misconfigured crawler.
2022-11-09T21:39:19.015Z Error: Handling request failure of https://www.canada411.ca/search/?stype=re&what=4505855104 (undefined) timed out after 320 seconds.
2022-11-09T21:39:19.017Z at Timeout._onTimeout (/usr/src/app/node_modules/@apify/timeout/index.js:62:68)
2022-11-09T21:39:19.018Z at listOnTimeout (node:internal/timers:559:17)
2022-11-09T21:39:19.020Z at processTimers (node:internal/timers:502:7)
2022-11-09T21:39:19.022Z ERROR CheerioCrawler:AutoscaledPool: runTaskFunction failed.
2022-11-09T21:39:19.023Z Error: Handling request failure of https://www.canada411.ca/search/?stype=re&what=4505855104 (undefined) timed out after 320 seconds.
2022-11-09T21:39:19.025Z at Timeout._onTimeout (/usr/src/app/node_modules/@apify/timeout/index.js:62:68)
2022-11-09T21:39:19.027Z at listOnTimeout (node:internal/timers:559:17)
2022-11-09T21:39:19.029Z at processTimers (node:internal/timers:502:7)
...
2022-11-09T21:39:23.868Z ERROR Actor finished with an error (exit code 91)
2022-11-09T21:39:19.012Z ERROR CheerioCrawler: An exception occurred during handling of failed request. This places the crawler and its underlying storages into an unknown state and crawling will be terminated. This may have happened due to an internal error of Apify's API or due to a misconfigured crawler.
2022-11-09T21:39:19.015Z Error: Handling request failure of https://www.canada411.ca/search/?stype=re&what=4505855104 (undefined) timed out after 320 seconds.
2022-11-09T21:39:19.017Z at Timeout._onTimeout (/usr/src/app/node_modules/@apify/timeout/index.js:62:68)
2022-11-09T21:39:19.018Z at listOnTimeout (node:internal/timers:559:17)
2022-11-09T21:39:19.020Z at processTimers (node:internal/timers:502:7)
2022-11-09T21:39:19.022Z ERROR CheerioCrawler:AutoscaledPool: runTaskFunction failed.
2022-11-09T21:39:19.023Z Error: Handling request failure of https://www.canada411.ca/search/?stype=re&what=4505855104 (undefined) timed out after 320 seconds.
2022-11-09T21:39:19.025Z at Timeout._onTimeout (/usr/src/app/node_modules/@apify/timeout/index.js:62:68)
2022-11-09T21:39:19.027Z at listOnTimeout (node:internal/timers:559:17)
2022-11-09T21:39:19.029Z at processTimers (node:internal/timers:502:7)
...
2022-11-09T21:39:23.868Z ERROR Actor finished with an error (exit code 91)
Reverse Phone Number Lookup - Canadian People Search | Canada 411 P...
Reverse phone lookup for finding someone quickly. Enter a 7-digit number in our reverse phone number lookup for general listings or a 10-digit one for a specific listing.
14 Replies
exotic-emerald
exotic-emerald•3y ago
The error seems to be due to a TimeOut. Maybe you have been banned by the site. Are you using proxies?
metropolitan-bronze
metropolitan-bronzeOP•3y ago
I am using proxies with these statements:
const proxyConfiguration = await Actor.createProxyConfiguration();
...
let proxyUrl = proxyInfo.url;
...
response = await gotScraping({ url: request.url, proxyUrl: proxyUrl});
const proxyConfiguration = await Actor.createProxyConfiguration();
...
let proxyUrl = proxyInfo.url;
...
response = await gotScraping({ url: request.url, proxyUrl: proxyUrl});
continuing-cyan
continuing-cyan•3y ago
without settings proxy config will only consider datacenter proxies available for your account, so may be targeted site added blocking for datacenter IPs or there is limit per IP and you reached it for all of your available IPs
metropolitan-bronze
metropolitan-bronzeOP•3y ago
Thank you for that insight.
correct-apricot
correct-apricot•3y ago
The reason your actor crashed is that you have an error in failedRequestHandler https://crawlee.dev/api/cheerio-crawler/interface/CheerioCrawlerOptions#failedRequestHandler The code there must not crash so either remove the failable code or wrap it in try.catch
metropolitan-bronze
metropolitan-bronzeOP•3y ago
Thank you, Lukas!
MEE6
MEE6•3y ago
@danhelfman just advanced to level 6! Thanks for your contributions! 🎉
metropolitan-bronze
metropolitan-bronzeOP•3y ago
Hi! I know this is an old thread, but we are continuing to experience this issue. As @Lukas Krivka suggested, I wrapped the code within the failedRequestHandler in a try/catch block, as in the following, but the catch block wasn't reached. Any suggestions would be greatly appreciated.
failedRequestHandler: async ({ request, error }) => {
try {
//console.log(`000 ERROR: ${request.url}`);
await Actor.pushData({
a_InputUniqueID: request.userData.inputId,
b_InputFirstName: request.userData.inputFirstName,
c_InputMiddleName: request.userData.inputMiddleName,
d_InputLastName: request.userData.inputLastName,
e_InputAddress: request.userData.inputAddress,
f_InputCity: request.userData.inputCity,
g_InputState: request.userData.inputState,
h_InputZip: request.userData.inputZip,
i_InputPhone: request.userData.inputPhone,
j_ReturnedName: '',
k_ReturnedAddress: '',
l_ReturnedAddress2: '',
m_ReturnedPhone: '',
n_ReturnedURL: '',
o_ReturnedHTTPCode: '000'
});
} catch(err) {
console.log('In failedRequestHandler error block!!!!!!!');
console.log(err);
}
}
failedRequestHandler: async ({ request, error }) => {
try {
//console.log(`000 ERROR: ${request.url}`);
await Actor.pushData({
a_InputUniqueID: request.userData.inputId,
b_InputFirstName: request.userData.inputFirstName,
c_InputMiddleName: request.userData.inputMiddleName,
d_InputLastName: request.userData.inputLastName,
e_InputAddress: request.userData.inputAddress,
f_InputCity: request.userData.inputCity,
g_InputState: request.userData.inputState,
h_InputZip: request.userData.inputZip,
i_InputPhone: request.userData.inputPhone,
j_ReturnedName: '',
k_ReturnedAddress: '',
l_ReturnedAddress2: '',
m_ReturnedPhone: '',
n_ReturnedURL: '',
o_ReturnedHTTPCode: '000'
});
} catch(err) {
console.log('In failedRequestHandler error block!!!!!!!');
console.log(err);
}
}
Pepa J
Pepa J•3y ago
Hello @danhelfman The code looks good. Are we still talking about the error:
ERROR CheerioCrawler: An exception occurred during handling of failed request. This places the crawler and its underlying storages into an unknown state and crawling will be terminated. This may have happened due to an internal error of Apify's API or due to a misconfigured crawler.
ERROR CheerioCrawler: An exception occurred during handling of failed request. This places the crawler and its underlying storages into an unknown state and crawling will be terminated. This may have happened due to an internal error of Apify's API or due to a misconfigured crawler.
?
metropolitan-bronze
metropolitan-bronzeOP•3y ago
Yes, that is the error we are still experiencing.
Pepa J
Pepa J•3y ago
@danhelfman may you provide me id of the run to the PM?
metropolitan-bronze
metropolitan-bronzeOP•3y ago
6cQvh6Q8MfvM37VmA
Pepa J
Pepa J•3y ago
I am not even able to get to that page. Not even with CA residential proxies, There is discussion https://updownradar.com/status/canada411.ca#comments from last two days, that the website is not working... :\
Canada411 down today April, 2023? Canada411.ca not working for me o...
Canada411.ca website down Today April, 2023? Can't log in? Real-time problems and outages - here you'll see what is going on.
correct-apricot
correct-apricot•3y ago
Hmm seems the actual error was swallowed by the log limit. If you have fully try/catch than it might have crashed inside Crawlee, I will check the code. But this will be hard to reproduce since your run is going crazy fast

Did you find this page helpful?