The code sometimes returns 'undefined' IDK why.
Hi, I'm trying to add the link to the next page to the requestQueue. I know that it should always be there. But for some reason I'm getting the error. (see ss). I'm not sure why it's doing that. Any suggestions?



5 Replies
genetic-orange•2y ago
Hey @ThalfPant, I would suggest saving the parsed html to the KV store and checking whether the link is actually there. The logic you have sent seems to be correct, so the response is probably inconsistent. If you want more help it would be also helpful to provide more info on your project (scraped website, whether you are using browser or only direct http requests,...).
fair-roseOP•2y ago
It doesn't give the error all the time. Only sometimes. Other times it works as intended. This is an SSR page. So I believe the links should be there. Which it is much of the times. But other times it throws these errors. Idk what's going on.
fair-roseOP•2y ago
@vojtechmaslan Sorry for the ping, It seems that sometimes the next link isn't there in the page. But after a retry or two, it usually manages to find the link. This also happens when it has reached the last page, when after 3 tries it throws and error and ends the crawl. (Which is expected). What's the best way to handle this situation? Or should I just leave it like this? TIA!

fair-roseOP•2y ago
I would've left it alone, except it's having serious impacts on the performance of the crawler. 😦
genetic-orange•2y ago
There are a lot of possible things that could be causing this. I would usually try to save the HTML of the parsed page and try to find some recurring pattern in the structure to determine whether I have a valid response or if this is actually the last page.
If there is somewhere on the page a total number of pages, you could also construct the request object on your own and on the first page directly enqueue requests to all other pages.