How do we handle authenticated scrapers on apify cli locally
Iam using playwright crawlee + TS template , should i handle login , save session and session injection by myself? or is there any apify tool that can help with that ?
6 Replies
Someone will reply to you shortly. In the meantime, this might help:
-# This post was marked as solved by IrshaiD'. View answer.
What I mean is: I need to scrape an goverment contracts website where the data is only accessible after logging in. It contains multiple contract listings and around 150,000 contract detail pages, so it's a large-scale operation. I want every request to carry the authenticated session.
How would you approach this using Apify? Let me know if you have any suggestions.
My approach is to set up two separate actors with an integrated flow:
One actor login once and handles scraping the list pages.
The other actor scrapes the detail pages and sends the data to an S3 bucket.
This follows a divide-and-conquer approach. However, I want to avoid logging in every time in the details actor—I’d prefer to log in once and maintain the browser session across both actors.
just save cookies and use them in your request(s) after login, also probably they have API so no need for PW
@IrshaiD' Yes, in most cases you need to handle the inital login on your side and then persist the browser cookies (for example to the key-value store), in the pre-navigation hook you can setup loading of the cookies from key-value store and setting them to the page context. You may also need to handle expiration of the cookies.
Yep got it . thanks guys