Crawlee & Apify

CA

Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻devs-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

extended-salmon
extended-salmon10/20/2023

Cannot Extract Data

Hi guys I'm getting this error: can you guys please check. ERROR Cannot extract data from https://www.tripadvisor.com/Hotel_Review-g198564-d557450-Reviews-Das_Konig_Ludwig_Inspiration_SPA-Schwangau_Upper_Bavaria_Bavaria.html. 2023-10-20T19:19:08.293Z Traceback (most recent call last):...
rare-sapphire
rare-sapphire10/20/2023

How to parse out results in Langchain ApifyDatasetLoader

I'm using the Google Search Results Scraper which provides a single JSON with Paid, Organic and a few other keys. I'd like to parse out the Organic title and urls into a Langchain agent, but it's clear how to iterate over them. Any suggestions? loader = apify.call_actor( actor_id="apify/google-search-scraper", ...
conscious-sapphire
conscious-sapphire10/20/2023

nodemon causing my server to restart, on every crawl, causing it to error out

Wondering if anyone knows why nodemon would cause it to restart? i added a nodemon.json file to ignore the crawler, as well as ignore the storage files, but for some reason it keeps restarting on first run. when i remove nodemon, the code runs great. just wondering what it is watching for that is causing a restart...
foreign-sapphire
foreign-sapphire10/20/2023

create new

I am using the following code to use apify Website Content Crawler: ` from apify_client import ApifyClient Initialize the ApifyClient with your API token...
inland-turquoise
inland-turquoise10/18/2023

[Incorrect Types] Apify JS Client

I am using the API JS client version 2.8.0 to programmatically create a schedule with 1 task action. The code looks like the following ```ts const schedule = await client.schedules().create({ isEnabled: true,...
No description
broad-brown
broad-brown10/18/2023

Apify Proxies Help

hello, i created an actor using the cli, logged in using an api token, then i wrote ```await Actor.init(); const proxyConfiguration = await Actor.createProxyConfiguration({ groups: ['RESIDENTIAL'],...
ratty-blush
ratty-blush10/17/2023

How does apify tiktok scraper decide which videos to scrape?

Hello, I am very new to this and have zero background in tech, coding, whatsoever. Sorry if this is a very basic/dumb question... I just would like to know how the tiktok hashtag scraper works in terms of how it decides which videos to scrape? Like, if I input a hashtag and a number of 100 results, does it scrape data from the 100 most popular videos? Or how does that work? If it's not based on popularity, is there a way to change the setting so it scrapes based on popularity? Thanks in advance!...
blank-aquamarine
blank-aquamarine10/16/2023

Actor run all results

Does anyone know why my run stops at 135, whilst there are 1000+ results? I set the run timer to
No description
constant-blue
constant-blue10/14/2023

Is there any way to set cookies before pupeteer run?

Hi all! I'm just trying out crawlee, Is there any way to setup crawlee launch with cookies? I know there are page.setCookie() but its only avalibale in handler ~ Is there any way I can setup it in here? so I dont have to go through a default handler just to set a cookie? ```js...
ambitious-aqua
ambitious-aqua10/12/2023

Help with pricing on Ai Product Matcher

Has anyone found a solution to lowering the cost of the Ai Product Matcher? at $10 per 1000 results it is way too cost prohibitive Currently, the pricing plan is completely cost prohibitive because I have three sites I want to compare and if I scrape 10 product from 3 sites that is 10 x 10 x10 or 1000 results which is $10 cost. ...
fair-rose
fair-rose10/11/2023

Instagram scraper - set date range.

Hi, is it possible to set a date range for scraping instagram posts between these dates? 'Older than' is mentioned in the documentation, but I'm not sure how to use this.
No description
criminal-purple
criminal-purple10/10/2023

Bulk SEO Scraper

Hi guys, I am looking for a scraper that can bulk scrape 100s of homepages for SEO datapoints. Basically like this, but with ability to import 100s of URLS: Hi guys, I am looking for a scraper that can bulk scrape 100s of homepages for SEO datapoints. Does one exist already?
optimistic-gold
optimistic-gold10/9/2023

Function isn't running when posted in api fy

Hello everyone, I have the following code working perfectly when I am not using Apify however once I use Apify it doesn't run the second function. I am using the apify template for Scrapy . Thanks for the help...
foreign-sapphire
foreign-sapphire10/8/2023

Reconstructing clickable URLs from FB comment scraper output

Hi, I am new to Apify, and so far it's what I've been looking for. But after looking for some sort of data dictionary that defines what the output fields are, I have not been able to get a clear idea of how the value in (for example) the replyToCommentId field can be converted into a clickable URL that would open someone's browser to that comment. I have not been able to find the data dic or code examples that show how one might reconstitute the URL from the FB comment scraper output fields. Wou...
afraid-scarlet
afraid-scarlet10/8/2023

Limiting Number of Concurrent Agents?

Hi -- I'm just starting to work with the platform. I've been very impressed so far but I'm running into a challenge constraining things so I don't get a lot of what are effectively "out of resources" errors. What I am doing is trying to use Apify as the crawler end of an airtable knowlege tracking tool we use internally. Users submit links into an Airtable, we pull the links into a cannon including content with Zapier as the orchestration tool. Apify is much better at the pulling in part, but the challenge I'm running into is we get a lot of errors because this is a bursty process and we request too many page parsings at the same time. Zapier doesn't seem to have a way to do this elegantly but I'm hoping Apify does. Ideally there is a setting somewhere that says "only run up to 4 concurrent agents at once for this task" or "queue agents by available RAM on the account." Is there any way to approach this, especially from a no-code solution angle?...
vicious-gold
vicious-gold10/8/2023

Function not executing for a reason.

```py async def main(): async with Actor:...
conscious-sapphire
conscious-sapphire10/6/2023

Mix Cheerio and Playwright same crawler

Hi, i need to crawl a website that have only a certain type of page that need JS to be scraped. So for speed and resource reasons i'm using cheerio to scrape all the possible data, and enqueue every links, including links to the pages requiring JS. After the cheerio scrape ends, i launch a playwright scrape but how can i get playwright to get the requestqueue from the first crawl, and scrape data from specific label....
conscious-sapphire
conscious-sapphire10/6/2023

Is it possible to queue Actors execution?

Hi, when running my actors I easily reach the Actor RAM of 8GB in free plan, I’m planning to switch to the starter plan but I will also hit the limit of 32gb, is there a way to queue the actors execution using Apify when the RAM is reached? Thank you
adverse-sapphire
adverse-sapphire10/6/2023

judgereini

I'm using the Tweet Flash - Twitter Scraper actor https://apify.com/shanes/tweet-flash It delivers exactly what I need. However, it only provides 500 results per run. But I need much more. Does anyone know it this is only a limitation, because I'm using the free version? Or did I do something wrong in the configuration?...