Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Join

Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻devs-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

foreign-sapphire

10/7/2023

How to push to same dataset from 2 different urls?

I have a site I’m scraping but I’m facing a problem. There is information about the same thing but on 2 different pages of the website. But I want to store those information on the same JSON dataset....

genetic-orange

10/6/2023

Output Schema not mapping properly

I have created an actor that takes as input an array of urls and processes them one by one after each process I save an object using Actor.pushData(my_output_object) and my_output_object = { "url": result } but this schema is not being mapped properly to output when I am testing it on console...

NeoNomade

10/6/2023

Duplicate requests

how can I allow duplicates by default in the request queue ?

other-emerald

10/5/2023

Get ads reactions

I use https://apify.com/apify/facebook-ads-scraper this scrapper to get facebooks ads. Now i want to get the facebook post url related to each ads...

genetic-orange

10/5/2023

Proxy URLs

I am trying to scrape a website and save some data from it. I am using gotScraping to fetch the url but when I try to use proxyConfiguration in gotScraping options object I am getting this error The "Proxy external access" feature is not enabled for your account. Please upgrade your plan or contact [email protected] I have tried some other combinations also but I got 403 forbidden error In my console tab I can see 5 data center IPs under my account and I am copying that url only but still I got error Has anyone have any experience how to scrape data with proxies ?...

exotic-emerald

10/2/2023

Crawl Taobao, 1688

Hi everyone! My name is Giang and I am a fresher developer. Now I am build a web order from taobao, 1688, tmall,... But I have a big problem when crawl taobao. I just crawl 10 or 15 product before block of antiscraping. I think if i use proxy the problem is must login see product item and I have try to use cookie but i think if login in many ip it easy to block my account. If anyone has experience scraping Tmall/Taobao and could offer some advice or help that would be hugely helpful. Thanks!

extended-salmon

9/30/2023

Integrating cheerio in react or vue

Is it possible to run cheerio in a react or vue Website? Background: i need people to tell me how many pages their website has. Therefore i wanted to provide a little app that counts pages of given domain. To prevent abuse i wanted to run it client-side in vue3 and not provide a node server myself: ``` import HelloWorld from './components/HelloWorld.vue'...

deep-jade

9/29/2023

facebook event scraping - high-res image URLs

Can I get high-res image URLs from Facebook events using your web app, or do I need the SDK and scrape each event page manually?

other-emerald

9/28/2023

Youtube web scraping

Hi, can someone please help me with getting a list of all of the channels under this hashtag? https://www.youtube.com/hashtag/some2...

silky-coral

9/27/2023

How can I include extra urls that has other domain?

I see that I can for example exclude some paths that the same domain has. I want to include others domains that are in the page In this case I want to crawlee https://pizzeriapopularrn.com/ that in the page https://pizzeriapopularrn.com/carta/ have for example https://www.cucina.link/ords/pedidos/r/pedidos/categorias-digital?t=pizzeria-popular-catamarca . As you can see, this last url has other domain, and crawlee doesn't scrap from this one....

harsh-harlequin

9/27/2023

Inspect the request headers

Is there any way to inspect/log the request headers after goint to a page, just like what you can see in the Developer Console? I tried to look at ctx.response?.request().allHeaders(), but it is always empty. ```ts...

vicious-gold

9/27/2023

Adding puppeteer dependency in crawlee

Hi, I'm using Crawlee's Puppeteer crawler I've imported crawlee via package.json like ```json...

like-gold

9/26/2023

Crawlee Vercel Serverless Session Managment

Hi! Sorry to bother anyone, I was wondering if I could ask a few questions regarding session managment with Crawlee. I'm working on a personal site that lets me log in to certain websites and handle automatic uploads and resolve orders etc for a company management tool. The service we use doesn't expose an API for this so I need to do it via Crawlee and I was wondering how can I keep a session running using serverless. I wrote a bunch of methods that are split up now into their own routes but to...

fascinating-indigo

9/26/2023

Prevent Clawler from adding failed request to default RequestQueue

Is there a way to prevent the crawler from adding a failed request to the default RequestQueue? ```javascript const crawler = new PuppeteerCrawler({ proxyConfiguration,...

correct-apricot

9/26/2023

Free Usage doesn't reset

Does anyone know why the free usage doesn't reset every month?

optimistic-gold

9/26/2023

How do you handle crawl requests sent simultaneously by different users?

Currently, I can't crawl 2 different websites from different API calls each. The second API call is just ignored and the API continues to crawl the website of the 1st request. I'm using Express for my API....

optimistic-gold

9/25/2023

CheerioCrawler only scrapes one url

Hi, I'm trying to scrape https://www.noom.com but it crawls only that url but not all urls available that are in this page, I added retryOnBlocked: true and played with all EnqueueStrategy but it still the same, it happens on this website only. Is there a way to make it work? thank you...

foreign-sapphire

9/24/2023

Why does Chromium close it self after each request even though the queued links aren't completed?

I use PlaywrightCrawler with headless chromium in crawlee, and I have many links in the enqueLinks but when I check my activity Monitor (task manager) it shows me a Chromium and when the requestHandler has done the task for the current link the chromium closes and restarts again to continue with the next link. This costs so much time and makes the whole scraping really slow. How do I fix it?...

foreign-sapphire

9/24/2023

How to have an enqueueLinks selector for an <a> tag that is inside a div with a specific class?

The <a> tag it self has no class and just a href attribute. I need to receive the value inside href

other-emerald

9/23/2023

Instagram profile save

Hi, can someone please help me to find/build a scraper, that can download videos/photos/stories/... just all of those things from any instagram profile?

Previous Next

Gaming

Programming

Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Crawlee & Apify

This is the official developer community of Apify and Crawlee.