Crawlee & Apify

CA

Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻devs-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

How to push to same dataset from 2 different urls?

I have a site I’m scraping but I’m facing a problem. There is information about the same thing but on 2 different pages of the website. But I want to store those information on the same JSON dataset....

Output Schema not mapping properly

I have created an actor that takes as input an array of urls and processes them one by one after each process I save an object using Actor.pushData(my_output_object) and my_output_object = { "url": result } but this schema is not being mapped properly to output when I am testing it on console...

Duplicate requests

how can I allow duplicates by default in the request queue ?

Get ads reactions

I use https://apify.com/apify/facebook-ads-scraper this scrapper to get facebooks ads. Now i want to get the facebook post url related to each ads...

Proxy URLs

I am trying to scrape a website and save some data from it. I am using gotScraping to fetch the url but when I try to use proxyConfiguration in gotScraping options object I am getting this error The "Proxy external access" feature is not enabled for your account. Please upgrade your plan or contact [email protected] I have tried some other combinations also but I got 403 forbidden error In my console tab I can see 5 data center IPs under my account and I am copying that url only but still I got error Has anyone have any experience how to scrape data with proxies ?...

Crawl Taobao, 1688

Hi everyone! My name is Giang and I am a fresher developer. Now I am build a web order from taobao, 1688, tmall,... But I have a big problem when crawl taobao. I just crawl 10 or 15 product before block of antiscraping. I think if i use proxy the problem is must login see product item and I have try to use cookie but i think if login in many ip it easy to block my account. If anyone has experience scraping Tmall/Taobao and could offer some advice or help that would be hugely helpful. Thanks!
No description

Integrating cheerio in react or vue

Is it possible to run cheerio in a react or vue Website? Background: i need people to tell me how many pages their website has. Therefore i wanted to provide a little app that counts pages of given domain. To prevent abuse i wanted to run it client-side in vue3 and not provide a node server myself: ``` import HelloWorld from './components/HelloWorld.vue'...

facebook event scraping - high-res image URLs

Can I get high-res image URLs from Facebook events using your web app, or do I need the SDK and scrape each event page manually?

Youtube web scraping

Hi, can someone please help me with getting a list of all of the channels under this hashtag? https://www.youtube.com/hashtag/some2...

How can I include extra urls that has other domain?

I see that I can for example exclude some paths that the same domain has. I want to include others domains that are in the page In this case I want to crawlee https://pizzeriapopularrn.com/ that in the page https://pizzeriapopularrn.com/carta/ have for example https://www.cucina.link/ords/pedidos/r/pedidos/categorias-digital?t=pizzeria-popular-catamarca . As you can see, this last url has other domain, and crawlee doesn't scrap from this one....

Inspect the request headers

Is there any way to inspect/log the request headers after goint to a page, just like what you can see in the Developer Console? I tried to look at ctx.response?.request().allHeaders(), but it is always empty. ```ts...
No description

Adding puppeteer dependency in crawlee

Hi, I'm using Crawlee's Puppeteer crawler I've imported crawlee via package.json like ```json...

Crawlee Vercel Serverless Session Managment

Hi! Sorry to bother anyone, I was wondering if I could ask a few questions regarding session managment with Crawlee. I'm working on a personal site that lets me log in to certain websites and handle automatic uploads and resolve orders etc for a company management tool. The service we use doesn't expose an API for this so I need to do it via Crawlee and I was wondering how can I keep a session running using serverless. I wrote a bunch of methods that are split up now into their own routes but to...

Prevent Clawler from adding failed request to default RequestQueue

Is there a way to prevent the crawler from adding a failed request to the default RequestQueue? ```javascript const crawler = new PuppeteerCrawler({ proxyConfiguration,...

Free Usage doesn't reset

Does anyone know why the free usage doesn't reset every month?

How do you handle crawl requests sent simultaneously by different users?

Currently, I can't crawl 2 different websites from different API calls each. The second API call is just ignored and the API continues to crawl the website of the 1st request. I'm using Express for my API....

CheerioCrawler only scrapes one url

Hi, I'm trying to scrape https://www.noom.com but it crawls only that url but not all urls available that are in this page, I added retryOnBlocked: true and played with all EnqueueStrategy but it still the same, it happens on this website only. Is there a way to make it work? thank you...

Why does Chromium close it self after each request even though the queued links aren't completed?

I use PlaywrightCrawler with headless chromium in crawlee, and I have many links in the enqueLinks but when I check my activity Monitor (task manager) it shows me a Chromium and when the requestHandler has done the task for the current link the chromium closes and restarts again to continue with the next link. This costs so much time and makes the whole scraping really slow. How do I fix it?...

How to have an enqueueLinks selector for an <a> tag that is inside a div with a specific class?

The <a> tag it self has no class and just a href attribute. I need to receive the value inside href

Instagram profile save

Hi, can someone please help me to find/build a scraper, that can download videos/photos/stories/... just all of those things from any instagram profile?
No description