magic-amber

Moving from Playwright to Crawlee/Playwright for Scraping

Are there actually any ressources on building a scraper with crawlee except the one in the docs? Where do I set all the browser context for example?

const launchPlaywright = async () => {
  const browser = await playwright["chromium"].launch({
    headless: true,
    args: ["--disable-blink-features=AutomationControlled"],
  });

  const context = await browser.newContext({
    viewport: { width: 1280, height: 720 },
    userAgent:
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3",
    geolocation: { longitude: 7.8421, latitude: 47.9978 },
    permissions: ["geolocation"],
    locale: "en-US",
    storageState: "playwright/auth/user.json",
  });
  return await context.newPage();
};

const launchPlaywright = async () => {
  const browser = await playwright["chromium"].launch({
    headless: true,
    args: ["--disable-blink-features=AutomationControlled"],
  });

  const context = await browser.newContext({
    viewport: { width: 1280, height: 720 },
    userAgent:
      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3",
    geolocation: { longitude: 7.8421, latitude: 47.9978 },
    permissions: ["geolocation"],
    locale: "en-US",
    storageState: "playwright/auth/user.json",
  });
  return await context.newPage();
};

3 Replies

Hall•5mo ago

Someone will reply to you shortly. In the meantime, this might help:

azzouzana•5mo ago

In the launch context, here's an example https://docs.apify.com/sdk/js/docs/examples/playwright-crawler

Playwright crawler | SDK for JavaScript | Apify Documentation

This example demonstrates how to use PlaywrightCrawler

azzouzana•5mo ago

Or within the pre navigation hook Something like: const crawler = new PlaywrightCrawler({ preNavigationHooks: [ async ({ page, request, browserContext }) => { // Set a specific user agent for the browser context await browserContext.addCookies([ { name: 'session', value: '12345', domain: 'example.com' }, ]); // Emulate a specific device (e.g., mobile) await page.setUserAgent( 'Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.0 Mobile/15E148 Safari/604.1' ); }, ], requestHandler: async ({ page, request }) => { console.log(Visiting ${request.url}); const content = await page.content(); console.log(Content length: ${content.length}); }, }); await crawler.run(['https://example.com']);

Gaming

Programming

Moving from Playwright to Crawlee/Playwright for Scraping

Did you find this page helpful?