Crawlee & Apify

CA

Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻devs-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

robust-apricot
robust-apricot7/8/2023

Accessing actor dataset locally in a monorepo

I haven't been able to find any information on how accessing datasets via client works for local development–does this only work on the platform? I have a monorepo with two actors and I'd like to access a named dataset from one actor inside the other. If accessing the datasets of other actors is not possible via openDataSet locally, what alternatives are there?
robust-apricot
robust-apricot7/7/2023

Example monorepo repository + CLI for deployments

The example monorepo (seen here: https://github.com/apify/actor-monorepo-example) doesn't cover how apify push is intended to be used – the only place you're able to use it is from the root of the repository, but doing it there shows the following in console: ```bash apify push Info: Created actor with name undefined on Apify....
robust-apricot
robust-apricot7/7/2023

Using env vars with secret values via `apify secrets:add`, `actor.json`, and default start script

The documentation (https://docs.apify.com/cli/docs/vars) doesn't touch on how you access any environment variables set in actor.json in an actor's source–I'm using the monorepo example repository.
dependent-tan
dependent-tan7/6/2023

Scraping skips big texts in the url, have tried to change input unsuccesfully.

So I am trying to scrape the text in this url: https://www.svila.it/en/our-story/ This is my input: def run_text_scraper(self, url):...
constant-blue
constant-blue7/3/2023

get facebook event cover

hi. Is there any way to get facebook event cover pictures instead of the thumbnail with the facebook event scrapper? The thumbnail images are super low quality
foreign-sapphire
foreign-sapphire7/2/2023

start urls input

I can get the input from Apify in my Crawlee Playwright code and console.log() the start urls, but I am not sure how to access them because it says the start urls are of type any instead of an array of strings. Can you provide some example code for this to be extracted so I can use them as start urls in my code?
rising-crimson
rising-crimson7/2/2023

How can I scrape a whole facebook group with 50,000 posts and not get the same ones every time?

Basically that! If I do a run of the maximum per run, I have no way to avoid getting the same posts back every time. I'd like to drain a whole group that I run of every post. Even limiting by year doesn't work, as 2021 has over 20,000 posts.
xenial-black
xenial-black7/2/2023

Google Search Results Scraper always showing 10 per page

No matter what I try to set for the "Results per Google page" option, Google is only returning 10 items per page. How do we increase this?
sunny-green
sunny-green6/30/2023

How can I automatically get the dataset id from instagram post scraper into Merge, Dedup & Transform

As above, so im using the instagram post scraper once a week, and then i used to use code to get only the fields i needed. i only need 3 fields, then i push this in make (integromat) to do some automated tasks. But now the instagram post scraper has stopped allowing the code to remove the 100 values i dont need. So ive been advised to use Merge, Dedup & Transform Datasets to get the data into the needed format, but i cant work out how to get this all to run automatically? So the flow would be, run instgram post scraper. somehow get the dataset id from the sucessful run into Merge, Dedup & Transform Datasets use that to remove all dataset item except 3...
conscious-sapphire
conscious-sapphire6/30/2023

which twitter scraper is currently working?

the og one is not working, the tweet flash is taking forever to reach a small number of tweets (so maybe it also is not working). Is perhaps Twitter Profile Scraper working for any of you guys? Or any others?
exotic-emerald
exotic-emerald6/30/2023

query Instagram based on location area

How can I query for posts published on Instagram in a geographic area (for example: by a point and a radius up to 300 meters.... not based on the specific link)?

User Info

I am using api call to get user info:
user_info = await Actor.apify_client.user('me').get()
user_info = await Actor.apify_client.user('me').get()
...
wise-white
wise-white6/30/2023

Twitter Actors not working anymore

Hey guys, It seems that Twitter has prevented every information without logging in. Will you be able to fix this? Or should we think of other options? Are you considering an actor with login information? Please let us know about the future plans....
xenial-black
xenial-black6/27/2023

Requested Monetization

Hola, last week I requested monetization for an Actor I would like to publish in the store, but I haven't heard anything yet. How long does this usually take? Just super excited about this 🙂 Thank you!
quickest-silver
quickest-silver6/27/2023

Questions about FacebookAds actor results

Hi everyone,  I am currently evaluating the Apify Facebook Ad Actor for a client project, however I have many questions about the data returned by the scraper and I can't find any reliable doc/infos. Here are my main questions:...
fair-rose
fair-rose6/27/2023

Time specification in facebook post scraper

Hi, I would like to customize the facebook posts scraper to scan the same facebook pages every day at a given time and only scrape the last 24hours (this option works well with the twitter scraper, but is not offered in the facebook posts scraper). Does anyone know how to achieve that?
multiple-amethyst
multiple-amethyst6/27/2023

Can't get the Dun & Bradstreet Scraper working

I'm trying to use the Dun & Bradstreet Scraper, and I get this warning: Reclaiming failed request back to the list or queue. Request blocked - received 403 status code. and this error: Request failed and reached maximum retries. Error: Request blocked - received 403 status code. ...
sunny-green
sunny-green6/26/2023

Why is `ts-node-esm` not installed?

I wonder how I need to build the app so that it actually runs in apify. I'm a bit at a loss to understand how to include this dependency in the docker-image given that it should be a dev dependency. ```bash richardpoelderl@Richards-MBP apify % pnpm add ts-node-esm -D  WARN  deprecated [email protected]: This package has been deprecated and is no longer maintained. Please use @rollup/plugin-terser...
No description
wise-white
wise-white6/25/2023

Twitter List Scraper Bug

Hey there! I've noticed that the 'is_root_thread' function was working fine in the past, but now it seems like the logic for 'is_thread' and 'is_root_thread' is broken. Regardless of whether it's a thread or the first tweet of a thread, it's showing 'false' for all records. Could you please take a look and see if you can fix it? I've already created an issue for this. Thanks!
optimistic-gold
optimistic-gold6/23/2023

Can Aptify be used to crawl reviewer details from platforms like g2?

Well, the title says it all. Is it possible to use Aptify be to crawl reviewer details from platforms like g2?