Crawlee & Apify

CA

Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻devs-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

like-gold
like-gold7/20/2023

Google play reviews scraper

Can anyone look into the code below, I don't know what's wrong but i can't scrape reviews from play.google.com using below code.
xenial-black
xenial-black7/20/2023

Running nested crawlers

How can I run a crawler inside a running crawler? I have a cheerio crawler running and want to run a new crawler for example a JSDOMCrawler per each page the Cheerio Crawler visits. I know that I can run them in parallel but what I want is to run them nested.
passive-yellow
passive-yellow7/19/2023

Langchain Apify Dataset Loader Issue

Whenever I try to create a vectorstore index using Langchain and the APIfy dataset loader, I get the following error: 'ApifyDatasetLoader' object has no attribute 'page_content' Which is odd because the code was working just last week (following the exact documentation on Langchain) and now it refuses to work. The data does not have the attribute 'page_content' despite specifying that in the mapping function....
No description
yappiest-sapphire
yappiest-sapphire7/18/2023

Strange error when building actors

Hi everyone, I started getting some strange error when building actors - it's now happening on two different actors even after I have undone the changed that I initially made (which caused the actor to fail). Has anyone encountered this before, and could offer any advice?
No description
afraid-scarlet
afraid-scarlet7/18/2023

Facebook group returned error

I have ran Facebook group scraper (https://console.apify.com/view/runs/PdvKUyNLzEtYA7YAD) which I received success status but there are no data. I looked into the log and found there is an error. (The error log can be seen in the run I have attached)
rare-sapphire
rare-sapphire7/17/2023

Using Zillow ZIP Code Scraper & Zillow Detail Scraper

Using Zipcode Scraper I got 10K results & when I used the Dataset ID & put it through Detail Scraper, I only got 6K results. Please help!
fascinating-indigo
fascinating-indigo7/17/2023

Trying to figure out pricing

Hello everyone! So I just found out about Apify and I have a question about pricing. Let's say I want to run a job that goes through 10K instagram accounts by running the actor: apify/instagram-scraper...
flat-fuchsia
flat-fuchsia7/16/2023

Is there a Python script example on how to use the Initial cookies option with Apify’s API

So with the Twitter Scraper, there is a note close to the top of the page saying: “On 30 June 2023, Twitter put all content behind a login, so scraping publicly available Twitter content is no longer possible. To extract data behind the login, you can use the Initial cookies option. Beware that this is an experimental/research feature, and you use it on your responsibility.” I was wondering if anyone has an example script on how to use the Initial cookies option with Apify’s API in Python? ...
sunny-green
sunny-green7/14/2023

Automating Exports from Apify to Airtable with make.com

Hi! With make.com I created a scenario, where each time I update or add a row in Airtable, it triggers the run of an Apify task (Google Search Result scraper). It overrides the query with the input from my table. I would like to automate receiving the results (aka the column "description") from the Apify output, ideally all summed up in one field, in Airtable. Make.com offers me some data to export, not the description though. I asked the make.com support, whos feedback was: ​"You can use the HT...
aware-green
aware-green7/13/2023

Starting a new actor

I started a new actor through console. It keep saying it network authentication required and cannot access 127.0.0.1. what am i missing? thanks
continuing-cyan
continuing-cyan7/13/2023

Best practices/examples of hardening an actor that handles tens of thousands of records?

I was told to post this here instead of #chat by DanielDo: I'm looking for any helpful links/articles/source code for writing actors that split a collection of objects from a dataset into paged collections for batching? I want to support actor input for capping the total dataset records that are allowed to be processed, the size of each page/batch, etc. The objects retrieved will have a url in one of their keys that the actor will then go fetch and save to the local fs, so I'd like to make sure the actor can stop and resume where it left off without redundant fetches or fs operations. ...
absent-sapphire
absent-sapphire7/12/2023

Generalizing a Simple Cheerio Scraper for Different Pages

I have a working Cheerio Task, Task, which uses a Glob to find links, and CSS Classes to find the content to be extracted on the child page. I have almost 40 more similar pages., but each will have a different Glob for links and CSS Class for the terget information . What is the best way to generalye my working scraper for the other 40, maybe soon enough, 100 pages? My thought is that I need to collect the same info mentioned from all the pages...how terribly boring. Is there any AI for that?, either on the platform or elsewhere? Given that I get that done manually, I am assuming there is something smarter than creating 100 clones of the Task.... I have done some JS automation on Uilicious calculating the inputs using a Google Sheet to populate an array.......
deep-jade
deep-jade7/12/2023

Seeking Help to Access Safari Reader View Mode HTML Code

I'm currently working on a project that requires accessing the HTML code generated by Safari's Reader View mode. This mode simplifies the webpage content, making it cleaner and easier to parse. I understand that the reader view mode content appears after clicking on the Reader Mode button. I'm curious to know if there are any tools or methods within the Apify ecosystem that could assist me in obtaining the HTML code from Safari's Reader View mode. Any insights or suggestions on how to accomplish this would be greatly appreciated!...
like-gold
like-gold7/12/2023

Need help with shadow-root

I was making an actor for my business, which extracts reviews from different platforms, but i was having issues with a website- it has every data inside shadow-root so no results are coming because of that, i couldn't find a solution over internet so i came here. any help would be appreciated!...
conscious-sapphire
conscious-sapphire7/10/2023

Invalid 'connection' header: close, getaddrinfo EAI_AGAIN and getaddrinfo ENOTFOUND

What are these errors caused by? in the case of invalid 'connection' header, if I go to the website manually, it works fine.
national-gold
national-gold7/10/2023

Google map scrapper

Hello we are interested in using the Google Maps scraper on Apify.  I was able to pull in information for local businesses however, I noticed that the descriptions and also images were not scraped from Google Maps. Before we sign up for a plan, it is imperative that we are able to scrape this information. Do you have any suggestions? Thank you....
inland-turquoise
inland-turquoise7/10/2023

scrapping for name, title and company info

Is there a way to scrap multiple sites like to search for individuals with specific titles, their names and company info ? Please note I am not a technical person and if any free lancer is interested to connect with me on this happy to do so as well. Thank you in advance
continuing-cyan
continuing-cyan7/8/2023

Accessing actor dataset locally in a monorepo

I haven't been able to find any information on how accessing datasets via client works for local development–does this only work on the platform? I have a monorepo with two actors and I'd like to access a named dataset from one actor inside the other. If accessing the datasets of other actors is not possible via openDataSet locally, what alternatives are there?
continuing-cyan
continuing-cyan7/7/2023

Example monorepo repository + CLI for deployments

The example monorepo (seen here: https://github.com/apify/actor-monorepo-example) doesn't cover how apify push is intended to be used – the only place you're able to use it is from the root of the repository, but doing it there shows the following in console: ```bash apify push Info: Created actor with name undefined on Apify....