Mark session as bad when request times out or proxy responds with 502
I'm using CheerioCrawler and I'd like to mark sessions as bad when the request either times out or there's a proxy error. Those cases trigger an error before reaching requestHandler and the request is added back to the queue without me having the opportunity to mark the session. Is there a hook somewhere that I can use? Or should I override _requestFunctionErrorHandler?
16 Replies
fair-rose•3y ago
I would like to know this as well
optimistic-gold•3y ago
You can mark a session as bad with the
session.markBad()
function within the errorHandler
function (which runs on every request failed, as opposed to failedRequestHandler
, which runs once a request has reached its max retries)
But if you just want a session to be thrown away if it fails once, you can do this instead in the sessionPoolOptions
:
mute-goldOP•3y ago
Amazing thank you @thek1tten I didn't know about errorHandler
One more question: how can I access the error in errorHandler? Is it passed as parameter?
All good I found my answer in the docs!
@thek1tten can I prevent the request from being retried depending on the error from the errorHandler?
optimistic-gold•3y ago

optimistic-gold•3y ago
This should work
mute-goldOP•3y ago
So I've tried that but without success, the request still ends up being retried
Is there any other way to prevent a retry? Maybe throwing a NonRetryableError?
mute-goldOP•3y ago
See on the logs, I print request right after setting request.noRetry to true in errorHandler, then the request is retried right after

optimistic-gold•3y ago
Hmm, that means it’s going off of the old value and reassigning it here does nothing. Let me look into it.
mute-goldOP•3y ago
Thanks!
@fab8203 just advanced to level 4! Thanks for your contributions! 🎉
optimistic-gold•3y ago
This feature doesn’t seem to exist yet. I’m making a PR on Crawlee’s GitHub to fix this
mute-goldOP•3y ago
Thank you @thek1tten let me know if there is a link to the issue that I can follow
optimistic-gold•3y ago
GitHub
feat(basic-crawler): allow request skipping by mstephen19 · Pull Re...
See this Discord post to fully understand the use case: https://discord.com/channels/801163717915574323/1019936393235017769
Didn't want to make big changes to existing code so kept the else sta...
optimistic-gold•3y ago
@fab8203 It was merged with master
mute-goldOP•3y ago
Thank you for the follow up @thek1tten
exotic-emerald•3y ago
Btw: retiring sessions and not retrying request are 2 completely different concepts. Request and Session are separate objects that might be connected temporarily (do a Request using this Session)