Crawlee & Apify•3w ago

Proxy settings appear to be cached

Hi, I'm trying to use residential proxies on a playwright crawler, but it appears that even when I comment out the proxyConfiguration there is still an attempt to use a proxy. Created a fresh project to create a minimal test to debug and it worked fine, until I had a proxy failure, and then it happened again. The error is: WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. Detected a session error, rotating session... goto: net::ERR_TUNNEL_CONNECTION_FAILED so clearly it's trying to use a proxy. I have verified this by looking at the process arguments that include --proxy-bypass-list=<-loopback> --proxy-server=http://127.0.0.1:63572 . Any ideas? It's driving me insane. Code as follows:

 import { PlaywrightCrawler } from 'crawlee'

// const proxyConfiguration = new ProxyConfiguration({
//   proxyUrls: [
//     '...'
//   ],
// })

const crawler: PlaywrightCrawler = new PlaywrightCrawler({
  launchContext: {
    launchOptions: {
      headless: false,
      // channel: 'chrome',
      // viewport: null,
    },
  },
  // proxyConfiguration,
  maxRequestRetries: 0,
  maxRequestsPerCrawl: 5,
  sessionPoolOptions: {
    blockedStatusCodes: [],
  },
  async requestHandler({ request, page, log }) {
    log.info(`Processing ${request.url}...`)
    await page.waitForTimeout(100000)
  },
  failedRequestHandler({ request, log }) {
    log.info(`Request ${request.url} failed too many times.`)
  },
  // browserPoolOptions: {
  //   useFingerprints: false,
  // },
})

await crawler.addRequests([
  'https://abrahamjuliot.github.io/creepjs/'
])

await crawler.run()

console.log('Crawler finished.')

 import { PlaywrightCrawler } from 'crawlee'

// const proxyConfiguration = new ProxyConfiguration({
//   proxyUrls: [
//     '...'
//   ],
// })

const crawler: PlaywrightCrawler = new PlaywrightCrawler({
  launchContext: {
    launchOptions: {
      headless: false,
      // channel: 'chrome',
      // viewport: null,
    },
  },
  // proxyConfiguration,
  maxRequestRetries: 0,
  maxRequestsPerCrawl: 5,
  sessionPoolOptions: {
    blockedStatusCodes: [],
  },
  async requestHandler({ request, page, log }) {
    log.info(`Processing ${request.url}...`)
    await page.waitForTimeout(100000)
  },
  failedRequestHandler({ request, log }) {
    log.info(`Request ${request.url} failed too many times.`)
  },
  // browserPoolOptions: {
  //   useFingerprints: false,
  // },
})

await crawler.addRequests([
  'https://abrahamjuliot.github.io/creepjs/'
])

await crawler.run()

console.log('Crawler finished.')

4 Replies

Hall•3w ago

Someone will reply to you shortly. In the meantime, this might help: -# This post was marked as solved by Matous. View answer.

fascinating-indigoOP•3w ago

After some frenetic debugging trying everything I can think of (removing node modules, user data dir, browsers and reinstalling everything), it appears that the issue was with bun. Not sure what in particular was causing it but it must have been somehow running cached code.

optimistic-gold•3w ago

from what I remember bun is still throwing errors when it's combined with Crawlee. Some internal packages complaining. Is theere any particular reason you want to use bun ?

fascinating-indigoOP•3w ago

Just that it's fast and works well generally. Issues seem to have resolved but if they continue I'll probably jump to pnpm

Gaming

Programming

Proxy settings appear to be cached

Did you find this page helpful?