Is there a way to initiate crawlee crawl + scraping jobs from a server?
Context:
- I'm currently using playwright on my nextjs api routes and persist some data in my database (postgres)
- since I need IP roration with session management though, I'd love to offload the scraping to crawlee
- I'm also considering apify as the platform to deploy this crawlee scraper to (as that seems the recommended setup?)
Is there an example somewhere on how to trigger my crawlee scraping job on apify from another server?
Edit: I'm cross-posting to #apify-platform as it seems like I'm interested in triggering a serverless actor.
8 Replies
Hello @p6l.richard You may start the run via Apify SDK, or Apify Platform API
check https://help.apify.com/en/articles/3224035-run-actor-task-and-retrieve-data-via-api
https://docs.apify.com/sdk/js/
https://docs.apify.com/sdk/python/
absent-sapphireOP•2y ago
And can I share a browser session between runs?
@p6l.richard just advanced to level 1! Thanks for your contributions! 🎉
absent-sapphireOP•2y ago
Id like to execute the first scraping step and then schedule the next ones to run in the background.
The background jobs need the same session as the first run though
Not sure if apify supports streaming responses or returning early data or something?
chedule the next ones to run in the background.This is possible by using API
The background jobs need the same session as the first run thoughYou need to pass the sesionId from the first run to the others and then always use the same session based on the id
Not sure if apify supports streaming responses or returning early data or something?It does not support it for default, but there is no limitation to implement this by yourself, you may create stream and the data to specific endpoint, or maybe better solution would be using websockets.
absent-sapphireOP•2y ago
Thank you, I’ll try out the session sharing first. 🙏
Okay, let's see if I get this right (haven't found a way to run it successfully so far).
If I provide a
sessionId
as an input to an actor such that I run it via actor.run({ sessionId })
.
Then, within the actor, I set the
persistStateKey
in the SessionPool based on this very input:
Can I then have a separate Actor use the same browser session?
the different actor will use the useState():
Will these two separate actors then share the same browser session?I was thinking more about creating a custom
createSessionFunction
:
Now All the Actors with this configuration should share the same session.absent-sapphireOP•2y ago
Thank you! 🙏