PlayWrightCrawler new request results are bleeding into old requests. RequestQueue issue?
Hello, first some code:
crawl function
setInterval calls this function
Background
: I'm calling the fetchImagesUrls
from a setInterval
function simulating a 'cron job'. I purposely make setinterval
pick up Job#1 (details are fetched from a DB) then when the Job#1 starts, I make Job#2 be available for processing.
Behavior
: Now Job#1 and Job#2 are running from two different calls, however, the results are getting mixed into each other.
I've tried useState() and my own callback (as shown here) - is there a way to make new calls be isolated to their own results set?
I understand I might be missing something regarding JS fundamentals, but some guidance would be much appreciated. Thanks!6 Replies
@cryptorex just advanced to level 1! Thanks for your contributions! 🎉
fair-roseOP•3y ago
Other stuff I tried
1. injecting a key as the jobId into the cb
array and push relevant job results to that key and return results from the cb
array via the corresponding key, like: { 'jobId': ['url1', 'url2', 'url2'] }genetic-orange•3y ago
You need to create multiple request queues or request lists, one for each crawler. Then the results won't mix
fair-roseOP•3y ago
thanks! that seemed easy, and I think it worked. I can see the
storage -> request_queues
now has the assigned jobId (uuid)
so I added this:
and passed it into my crawl function, then to the crawler init object as requestQueue: rQueue
and I think it worked!
I will do more testing but thanks again for your guidance!genetic-orange•3y ago
You will just need to clean the named queues afterwards.
await rQueue.drop()
fair-roseOP•3y ago
ok thanks Lukas!