failedRequestHandler, error argument, detailed error message lost

I am using PlaywrightCrawler and the failedRequestHandler to handle errors. Something like this:
const crawler = new PlaywrightCrawler({
...
async failedRequestHandler({request, response, page, log}, error) {

...
const crawler = new PlaywrightCrawler({
...
async failedRequestHandler({request, response, page, log}, error) {

...
And sometimes I see errors in the log:
ERROR failedRequestHandler: Request failed and reached maximum retries. page.goto: SSL_ERROR_BAD_CERT_DOMAIN
ERROR failedRequestHandler: Request failed and reached maximum retries. page.goto: SSL_ERROR_BAD_CERT_DOMAIN
But! when I am looking inside the error argument of the failedRequestHandler with the JSON.stringify(error) I see only this: {"name":"Error"} It seems, the detailed error message I see in the log is not accessible in the error argument. So, how to access the detailed error message in code?
6 Replies
adverse-sapphire
adverse-sapphire3y ago
Test with pass ignoreHTTPSErrors to launchContext/launchOptions Eg:
const crawler = new PlaywrightCrawler({
launchContext: {
launchOptions:
headless: false,
ignoreHTTPSErrors: true,
},
})
const crawler = new PlaywrightCrawler({
launchContext: {
launchOptions:
headless: false,
ignoreHTTPSErrors: true,
},
})
molecular-blue
molecular-blueOP3y ago
👍 yes, this is the fix for the SSL_ERROR_BAD_CERT_DOMAIN problem. Great! However - there might be another 100 different errors... and I would like to see the error messages in the error argument mentioned above (I can not always look into log files, I think we have this error for this purpose!)
broad-brown
broad-brown3y ago
may be https://crawlee.dev/api/browser-crawler/interface/BrowserCrawlerOptions#postNavigationHooks ? Sounds like you want to add logic to response, so hooks might be a better way
adverse-sapphire
adverse-sapphire3y ago
May be add response listener before starting the navigation. An option is to use page.on() [1], something like this:
def handle_response(response):
if response.status == 500:
log.errror("Error: " + response.status )
exit(1)

page.on("response", handle_request)
def handle_response(response):
if response.status == 500:
log.errror("Error: " + response.status )
exit(1)

page.on("response", handle_request)
[1] https://playwright.dev/python/docs/api/class-page?_highlight=page.on#pageonresponse
Page | Playwright Python
* extends: [EventEmitter]
molecular-blue
molecular-blueOP3y ago
@LeMoussel , @Alexey Udovydchenko thanks for your responses, but let us keep simple things simple. there are PlaywrightCrawler functions handling situations when something goes wrong: errorHandler() failedRequestHandler() both functions have error argument. These functions are called at right time, when some timeout, SSL-error, or something else happens. Great. But the argument contains... nothing: {"name":"Error"} is not helpfull. On the other hand: the information about error - it exist! I see it in the Crawlee log! It is a bug in Crawlee.
molecular-blue
molecular-blueOP3y ago
well, this is finally solved: https://github.com/apify/crawlee/discussions/1755 short version: it was a bad idea to call JSON.stringify(error) to check the contents of error It is enough to call error.message
GitHub
In PlaywrightCrawler.failedRequestHandler() the error argument cont...
Which package is this bug report for? If unsure which one to select, leave blank None Issue description I need access to the information about error in my code, so I use PlaywrightCrawler.failedReq...

Did you find this page helpful?