Crawlee scrapper invoking the same handler multiple times
Hey all! I've built a Crawlee scrapper, but for some reason it invokes the same handler multiple times, creating a lot of duplicate requests and entries in my dataset. Also:
- I've already tried manually setting uniqueKeys for all my requests.
- I've also tried setting maxConcurrency: 1 for the crawler.
- As you can see from the logs below, the issue is not that I'm adding the same requests multiple times. It's Crawlee who's invoking handlers multiple times with the same request.
Has anyone experienced the same issue? Any clue about what could be happening here?
I've posted the question and all the details (code and logs) on StackOverflow: https://stackoverflow.com/questions/77358550/crawlee-scrapper-visiting-the-same-url-multiple-times
Stack Overflow
Crawlee scrapper invoking the same handler multiple times
I've built a Crawlee scrapper, but for some reason it invokes the same handler multiple times, creating a lot of duplicate requests and entries in my dataset. Also:
I've already tried manually set...
7 Replies
like-gold•2y ago
You can try to log uniqueKey of each request when being processed. That way we can be sure if it is bug in the crawlee or in your code.
narrow-beigeOP•2y ago
I already did. In
main.ts
I have:
Is this what you mean? Or is there a better way to log them?like-gold•2y ago
this is in the add request, no? I meant in the handler function
narrow-beigeOP•2y ago
Ah, sorry. These are the updated logs with the
uniqueKey
s being logged from both addRequest
as well as the handlers.
I've simplified a bit the keys, so now they are just the target URL (but they are still added manually).
You can see it starts with 2 requests with keys https://site.com/page-a/user-0
and https://site.com/page-a/user-1
. Those two are processed first and second, but for some reason the same handler is invoked later with the same key https://site.com/page-a/user-1
(but no additional request for this was added).like-gold•2y ago
ok, so it looks like it is this issue https://github.com/apify/crawlee/issues/2078
try to remove
sameDomainDelaySecs
narrow-beigeOP•2y ago
Ok, thanks. Good to know what it is then, I'll try to just add an
await sleep()
in the handlers and see if it works the same 😛
Ok, JFYI, I had the same issue with version 3.5.2
and even 3.5.0
. Removing sameDomainDelaySecs
and adding a sleep at the end of the handlers work well though, so I'll stick to that.@Dani just advanced to level 1! Thanks for your contributions! 🎉