CA
initial-rose
Duplicated Requests due to Migration of Host
sometimes, I saw this from my actor
Is there a way to prevent the migration? If that's not possible, is there a way to fail the request? instead of creating duplicates?
2 Replies
No way to prevent migration, so to resolve request duplicates logically correct you should save data just before handleFunction finished, this way when request restarted your crawler will parse data again but will save it as unique data item.
continuing-cyan•3y ago
There is
Actor.on('migrating
event that you can respond to.
It is super rare this would produce any duplicates though. Usually, you push data at the very end of the request which means it will be immediately marked as done. If it migrates before the request is fully done, it will be retried. Of course, there is still small chance this happens but I would probably deduplicate these rare cases afterwards.