Getting past google's signin page
I'm writing a script for scraping part of google maps. I've got chromium to open, go to google maps, click on the cookie consent button, but I'm not sure how to get past the google sign in.
What's the secret for this?
Thanks.
15 Replies
like-gold•2y ago
You shouldn't need to sign in just because of cookie consent
xenial-blackOP•2y ago
The following script ends up in the signup page.
const puppeteer = require('puppeteer');
const { newInjectedPage } = require('fingerprint-injector');
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await newInjectedPage(browser, {
fingerprintOptions: {
devices: ['desktop'],
operatingSystems: ['windows'],
},
});
try {
await page.goto('https://www.google.com/maps/search/malls+in+London');
const consentButtonSelector = '[jsname="V67aGc"]'; // Consent button selector
// Wait for the consent button to be visible
await page.waitForSelector(consentButtonSelector, { visible: true, timeout: 10000 });
// Wait for 3 seconds before clicking the button
await page.waitForTimeout(3000);
// Click the consent button
await page.click(consentButtonSelector);
// Wait for any navigation that might occur after clicking the consent button
await page.waitForNavigation({ timeout: 10000 });
// Add your scraping logic after the consent interaction
// ...
} catch (error) {
console.error('An error occurred:', error);
}
// Uncomment the line below if you want to keep the browser open for debugging
// await page.waitForTimeout(20000); // Adjust or remove for production
// await browser.close();
})();
Malls
Malls
like-gold•2y ago
Not sure why, when I click on
'[action^="https://consent.google"] button'
, it just goes to the websitesxenial-blackOP•2y ago
Oh… well on the one hand that’s good news I guess on the other, I’ve got absolutely no idea what to do about it.
@Raed just advanced to level 1! Thanks for your contributions! 🎉
xenial-blackOP•2y ago
I assume you’re using chromium?
Puppeteer.
like-gold•2y ago
Yep, with Crawlee
xenial-blackOP•2y ago
I also tried with crawlee but same problem.
I’m new to this and normally just use google sheets to scrape.
Or some simple python.
And anything needing more than 1k lines I use Octoparse.
xenial-blackOP•2y ago
I've also tried this:
const { PuppeteerCrawler } = require('crawlee');
const crawler = new PuppeteerCrawler({
launchContext: {
launchOptions: {
headless: false,
// Set a common user-agent to avoid detection of automated browsing
args: ['--user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"'],
},
},
requestHandler: async ({ page, request }) => {
const consentButtonSelector = '[jsname="V67aGc"]';
// Before clicking the consent button, set some cookies/localStorage if needed
await page.waitForSelector(consentButtonSelector, { visible: true });
await page.click(consentButtonSelector);
// Check if we have been redirected to the sign-in page
await page.waitForNavigation();
const currentUrl = page.url();
// If redirected to sign-in, handle accordingly, otherwise proceed if (currentUrl.includes('accounts.google.com')) { // Logic to handle sign-in page } else { // We are on the correct page, proceed with scraping const pageTitle = await page.title(); console.log(
// If redirected to sign-in, handle accordingly, otherwise proceed if (currentUrl.includes('accounts.google.com')) { // Logic to handle sign-in page } else { // We are on the correct page, proceed with scraping const pageTitle = await page.title(); console.log(
Title of ${request.url}: ${pageTitle}
);
}
// Additional scraping logic will go here
}
});
(async () => {
await crawler.addRequests(['https://www.google.com/maps/search/malls+in+London']);
await crawler.run();
})();Malls
Malls
xenial-blackOP•2y ago
Do I need to modify the chromium settings in some way do you think?
like-gold•2y ago
I still have no idea why it would redirect to sign up, it never did for me, manually or with Puppeteer
xenial-blackOP•2y ago
If I do it manually, it doesn't do it.
are you signed in to a google account on chromium?
like-gold•2y ago
No, no idea why it would happen, make sure you use proxies
xenial-blackOP•2y ago
OK. Thanks. I'll keep trying
@Raed just advanced to level 2! Thanks for your contributions! 🎉