Crawlee & Apify

CA

Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻devs-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

grumpy-cyan
grumpy-cyan8/19/2023

Google Maps Polygon

I am trying to create a custom polygon in Maps and am getting an error. "MultiPolygon must be a 4 nested array with longitude and latitude numbers." Here is my array. What am I missing?...
quickest-silver
quickest-silver8/19/2023

pricing plans question

Heyy so do you think I should buy the $49 plan to scrape the 78,786 companies URLs from this site: https://clutch.co/agencies/digital-marketing ?

Client Headers

Is it possible to get user client HTTP headers (user-agent etc.) anytime a user accessing the actor ?
unwilling-turquoise
unwilling-turquoise8/18/2023

Is it possible to stream scraping results as they become available?

I want to scrape a lot of data and don't want to wait for the actor to finish running before being able to use the data
foreign-sapphire
foreign-sapphire8/16/2023

Failed to launch the browser process

Hey! I'm using the latest PuppeteerCrawler in a brand new Apify project and when pushing and running my actor I'm getting this: Failed to launch the browser process! spawn /root/.cache/puppeteer/chrome/linux-115.0.5790.170/chrome-linux64/chrome ENOENT Any ideas what's going wrong? It works when running locally on my machine (which has puppeteer installed) and the package.json includes puppeteer as a dependency...
fair-rose
fair-rose8/16/2023

Logging maximum

I have an apify actor that performs some quick actions (multiple per second) for a brief window of time, but when I look in the logs, I do not see them reflecting all of the activity the actor is supposedly doing. Is there a limit on the maximum amount of logging apify displays?
inland-turquoise
inland-turquoise8/16/2023

url with commas

Hi, in Web Scraper I try to pass list of urls via Start urls -> text file unfortunately urls contains commas (for example https://www.xyz.com/en/page,id,2) and when I run the task it reads only url before first comma -> https://www.xyz.com/en/page How to pass/encode urls to have it with commas? Thanks!
environmental-rose
environmental-rose8/16/2023

Python apify client not working inside docker container

I have built an app using streamlit that uses the python client to scrap some data. The app works correctly outside of docker, but I need to deply it using a container. When I try to import apify_client inside the container, i get the following error: ```
import apify_client Traceback (most recent call last):...
like-gold
like-gold8/15/2023

COINGECKO AND KRAKEN DELAY

I'm making a trading bot that takes data from coingecko and does transactions on kraken but I can't tell if there is a delay between coingeckos and krakens prices. Can someone help me, please. Where do these sites get the data from because if there is a global site there shouldn't be a delay.
eager-peach
eager-peach8/15/2023

Scrape Instagram Captions and timestamps ONLY

Hi, I'm new to Apify and am enjoying it greatly so far! I'm currently using the Instagram Scraper actor, but I'd like to limit what it scrapes even more than the actor seems to offer: I want to scrape JUST the post captions and timestamps from the last 30 days (to speed up scraping mostly). Is this possible with that actor, or would a completely new one need to be coded? Thanks!
optimistic-gold
optimistic-gold8/14/2023

Pay per result vs. monthly vs. "free" confusion

My example use case: I want to track 1000 TikTok profiles over time. So, let's say, once, a week, i want to scrape 1000 profiles page and their each most recent video. I set up Apify, and it seems to work well. But i have three different options and I'm not certain how the cost works out for each. For my example, I took 100 random profiles and gave it to each of these actors. Each returns 89 results, asking for one video per profile. (the other 11 I assume were broken / private / bad links, that's fine) (1) TikTok Scraper Actor: https://console.apify.com/actors/GdWCkxBtKWOsKjdch/information/latest/readme - for $49/mo. Scraping 100 example profiles with 89 results cost me $0.439, ~ 0.50 cent per result, above the $49/month(?) as well. So crawling 4000 profiles a month would cost me $49 + $20 = $69 a month. (2) TikTok Profile Scraper Actor: https://console.apify.com/actors/0FXVyOXXEmdGcV88a/information/latest/readme - for $5 / 1000 videos (0.50 cent per video). This does end up charging me 0.50c/result. This would cost me $20 a month. ...
absent-sapphire
absent-sapphire8/13/2023

How to input specific URLs?

I would like to add an input to my actor as such: - "website1.com" - "website2.com" ...
extended-salmon
extended-salmon8/12/2023

Free Plan API limits

I am testing on the free plan and and am sending a API request via promise.all to hit multiple API endpoints in Apify. It works fine up to 4 requests but errors with 5 requests or more.
Is there a limit to how many api calls I can send at one time with the free plan? Is the issue max concurrent runs? Even the free plan has 25 concurrent runs allowed...
reduced-jade
reduced-jade8/11/2023

Error with API token

Hi I'm trying to save my data store to my api cloud account online so I'm using the forcecloud option. I have the APIFY_TOKEN environment variable set as one of my api keys, but I'm getting the following error message. Any help would be appreciated: CODE: const store = await Apify.openKeyValueStore('dataStore', { forceCloud: true}); ...
metropolitan-bronze
metropolitan-bronze8/10/2023

Rotate proxy when request is timing out

I'm utilizing a third-party proxy provider within Playwright. Whenever the page loading initiates and starts timing out, I'm aiming to remove and cycle the proxy that the browser is using. This should apply to all browser requests currently being handled or queued for processing. Unfortunately, I'm struggling to locate suitable documentation or a clear example for this. Could someone please point me towards an example or assist me in resolving this issue? Error message: Reclaiming failed request back to the list or queue. page.goto: net::ERR_TUNNEL_CONNECTION_FAILED at https://example.com...
harsh-harlequin
harsh-harlequin8/9/2023

Queueing Concurrent runs

Hey everyone, I have a limit of 32 concurrent runs. I need 200-300 runs to be completed. I don't need to scrape that often, so it's usually a one time thing. Is there a way to queue runs, so that once memory is freed, it will continue the next task in line?
eastern-cyan
eastern-cyan8/9/2023

Facebook Group Scraper / Apify Pricing

Could someone help me understand how much its costs in total scraping? Even ballpark It cost me 40 cent usage just now for 200 results but I'm on a free trial and I see in your (Apify's) pricing there's a variety of different charges with different free quotas that kick in at different times? Or would multiplying usage on my first scrape on trial mode a fair indicator? We would be scraping 3 groups daily - 60 results per group so that's a total of 5400 results per month on average. ...
national-gold
national-gold8/9/2023

PC Part Picker Scraping

Hi. I am very new to this and struggling to figure this out sorry. I would like to scrape this list (all 8 pages) for basic data, but then click through to each product page and scrape a sub-table of all the vendors and prices. Is this possible? https://pcpartpicker.com/products/video-card/#sort=price&page=1&X=15000,300000 Thank you very much...
broad-brown
broad-brown8/8/2023

Google Shopping scraper

Hi, I am interested in getting a Google Shopping scraper that extracts information on large volumes of products. I found the "Google Search Results Scraper" (https://console.apify.com/actors/nFJndFXA5zjCTuudP/console) that gets most of the job done except for the fact that it doesn't seem to extract image urls (in my tests the image column generates a constant value of "data:image/gif;base64,R0lGODlhAQABAIAAAP///////yH5BAEKAAEALAAAAAABAAEAAAICTAEAOw=="). How can I best address this problem? I...