Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Join

Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻devs-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

xenophobic-harlequin

11/1/2022

Fingerprint and workers

Hello! I used fingerprint generator, but creepjs shows me the inconsistencies between the fingerprint and the workers. Is it possible to patch fingerprint so that the workers are patched as well?

stormy-gold

11/1/2022

How to set 'locale' and 'timezoneId' on browsers or pages?

If in Playwright, I can create Page this way and set locale and TimezoneId ```typescript browser.newPage({ locale: 'zh-TW',...

conscious-sapphire

11/1/2022

crawlee eating memory like hell

It is eating 3 GB after running just for 2 days

blank-aquamarine

10/31/2022

Can't purge named datasets

When I crate a named dataset like const dataset = await Dataset.open("test"); and let the script run the data gets appended after each run. I tried to call purgeDefaultStorages() but this has no effect. What am I doing wrong?

genetic-orange

10/31/2022

SyntaxError Unexpected end of JSON input!

Hi! Does anyone know what do I get this error and How can I solve it? I noticed it happens when I have many files loaded in the directory. Is there a way to get past this? Hereis the full log message: ``` SyntaxError: Unexpected end of JSON input at JSON.parse (<anonymous>) at findOrCacheDatasetByPossibleId (C:\Users\misag\OneDrive\Documents\Joiakim\Neontech\my-crawler\node_modules@crawlee\memory-storage\cache-helpers.js:48:39) at async DatasetClient.get (C:\Users\misag\OneDrive\Documents\Joiakim\Neontech\my-crawler\node_modules@crawlee\memory-storage\resource-clients\dataset.js:79:23)...

yelping-magenta

10/30/2022

How to create a new cheerio instance $?

I need to instatiate a new cheerio object, i'm doing a search in a set o elements and need to select just one element for further processing, my actual code is: ``` function getOrigin($: typeof cheerioModule) { let origin = "" const specElements = $('#product_specs table tr').toArray()...

absent-sapphire

10/30/2022

netERR_TUNNEL_CONNECTION_FAILED

I am trying to use proxy with crawlee playwright-crawler to connect to page at non standard port (444) and I am getting this proxy error PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.goto: net::ERR_TUNNEL_CONNECTION_FAILED, any suggestions? Without proxy it works fine on local. On platform I get timeout which could be because of banned aws ip range....

equal-aqua

10/28/2022

How to pass UserData when executing crawler

When I do await crawler.run(['https://crawlee.dev'], { userData: { depth: 0 } }); I got this error: Uncaught ArgumentError ArgumentError: Did not expect property userData to exist, got [object Object] in object options How can I set userData in option?...

rival-black

10/27/2022

Crawler does not work anymore due to error

Hi all, I was updating some packages and after I wanted to test is my crawler still worked. The console logged the error in the screenshot. I tried going back to old versions but the error was still there, I have no idea where to solve this. Does anyone have an idea?...

harsh-harlequin

10/26/2022

Trying to combine a content checker with a login on Apify (new to Apify and webscraping)

The content checker actor is what I need to get an alert when the content on a web page changes. The page is behind a login and I have learned how to export cookies. But I can't seem to marry the two. What happens is: (1) I keep getting a picture of the login page! ...

afraid-scarlet

10/25/2022

How to solve navigation timed out after 60 seconds

INFO PlaywrightCrawler: Error analysis: {"totalErrors":53,"uniqueErrors":2,"mostCommonErrors":["46x: Navigation timed out after 60 seconds. (C:\Scrapers\ZolStock\my-crawler\node_modules\@crawlee\core\crawlers\crawler_utils.js:13:11)","7x: Navigation timed out after 60 seconds. (<anonymous>)"]}

magic-amber

10/24/2022

Proxy Rotaion Apify-Python

Hi, I am writing an actor in python, the problem is how can I make a user to use apify proxy rotation via input, I am unable to find that in docs. I will highly appreciate any help.

flat-fuchsia

10/24/2022

Keep browser context alive in puppeteer crawler?

Per default a new context is crated for each new Request. This means that all data (localsStorage, SessionStorage...) is wiped out. Is there a way to keep the context for multiple requests?...

helpful-purple

10/24/2022

use JSDOMcrawler to crawl multiple consecutive links?

I want to crawl 1 page => get a link from it => crawl that link => get some other link from it => crawl the third link => get a html table from there. i need to do it like this because the 2nd and 3rd links change a lot. how can i chain link crawling like this?...

generous-apricot

10/22/2022

How to stop Puppeteer crawler without causing error?

I have forks in my script and if certain conditions are met, I would like to stop the script. How should I do that? page.close creates issues, especially if I run concurrently.

generous-apricot

10/22/2022

Sessions and proxies?

I am having a hard time understanding sessions and proxies. I have the following crawler setup: ``` const crawler = new PuppeteerCrawler({ requestList,...

fascinating-indigo

10/21/2022

Increasing a memory limit

Hello, I'm trying to increase memory limit on my computer with 4GB ram total from default 1GB to 2GB. I tried to set "CRAWLEE_MEMORY_MBYTES" to 2048 by crawlee.json, global settings and via custom configuaration too, but still it's only 1 gb. Any idea where can be problem? Thanks

flat-fuchsia

10/20/2022

Stop crawler at specific request

Is it possible to stop the crawler at a specific request and leave the window open to inspect it via devtools? When using headless:false it seems like the window is closed after the requestQueue has been processed. It also would be nice to have the devtools: true option in the puppeteer config......

flat-fuchsia

10/20/2022

Using SessionStorage in PuppeteerCrawler

How can we use SessionStorage in Puppeteer Crawler? I didn't find anything related to Session Storage in the documentation so I tried to guess some reasonable config values. ...

flat-fuchsia

10/20/2022

Specific timeout for single request in PuppeteerCrawler

I 'm aware that it's possible to set a NavigationTimeout for the complete crawling process but I need to wait for a specific page without slowing down the whole crawling process. Is there a way to do so? Right now I'm just using a setTimout function but I wonder if there is a better way to achieve this...

Previous Next

Gaming

Programming

Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Crawlee & Apify

This is the official developer community of Apify and Crawlee.