Crawlee & Apify

CA

Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻devs-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

foreign-sapphire
foreign-sapphire6/7/2023

BrowserPoolOptions

Hello i have browserPoolOptions in PlaywrightCrawler like this: browserPoolOptions: { maxOpenPagesPerBrowser: 0, useFingerprints: true, preLaunchHooks: [async (pageId, launchContext) => {...
initial-rose
initial-rose6/7/2023

Webhook integration - more detailed headers?

I've set up webhook integration for an actor to fire upon success. It works fine, however, I'm concerned about api security. Is there any way I can send more headers via the webhook (or get more reliable, static information)? I'd like to have a condition set up on my server to only accept the webhook if for example; a token matches, or the origin/refer is an apify domain. The couple of keys and Id sent in the headers as they are currently, don't seem to be static or retrievable. I know I can create variables in the actual payload itself, but I'm wanting to stop the webhook at the header level, before I parse the payload...
No description
eastern-cyan
eastern-cyan6/6/2023

instagram - actor Instagram Profile Scraper

Hello! Good afternoon! I am from Brazil. I'm a developer and new user on ApiFy. I'm using the platform to get Instagram Public Info....
like-gold
like-gold6/5/2023

Maximum number of posts that can be retrieved at once (free subscription)

I am trying to download reddit posts but when I put 1000 as the limit for the number of posts I want to retrieve, it only gets 500. Is this the maximum I can get with a free subscription?
rare-sapphire
rare-sapphire6/5/2023

How to access key-value-stores via api?

I want to access a file I created in key-value-stores via api but Store ID keeps changing for every run. How can I access that file?
rare-sapphire
rare-sapphire6/5/2023

How to modify the dataset

I want to modify the dataset, remove all old data and save a modified version of the data as json. Doing await Dataset.pushData(myModifiedData) will add the data but doesnt remove old data
wise-white
wise-white6/3/2023

Search comment but also export posts

Hi, I am trying this new scrapping tool and notice something strange, I am trying to export the 10 most recent Reddit comment with the word "polymaker" in it. It is supposed to only output comments but it also output posts, do you know what I am doing wrong? Attached the actor and the excel results...
absent-sapphire
absent-sapphire6/2/2023

image download/upload empty zip issues

```shell 2023-06-02T21:07:23.777Z INFO Downloading image https://image.api.playstation.com/vulcan/ap/rnd/202105/... 2023-06-02T21:07:23.783Z INFO Downloading image https://image.api.playstation.com/gs2-sec/appkgo/prod/C... 2023-06-02T21:07:24.367Z INFO Downloading image https://image.api.playstation.com/vulcan/ap/rnd/202102/... 2023-06-02T21:07:24.947Z INFO Downloading image https://image.api.playstation.com/gs2-sec/appkgo/prod/C......

Analytics Sorting

Spot a problem on Analytics tab. Sorting seem not working, its not sorted as expected
No description
ratty-blush
ratty-blush6/2/2023

Could Airbnb be blocking Apify Airbnb scraper?

We have many timed out requests in Airbnb Scraper (with timeout execution time of 50 seconds and memory limit of 8192). Could Airbnb be blocking the IP address from scraping? Should we consider getting a different IP address?
ratty-blush
ratty-blush6/2/2023

Inconsistent timeout with the same request

Hello, we have been using Airbnb Scraper for a while and we are having many problems with the duration of the requests. We have set a timeout of 70 seconds, much more than what usually is needed for successful requests. However many requests end up showing time out. The most worrisome part is that many times THE SAME QUERY END UP BEING SUCCESSFUL (in much less than 70 seconds) OR SHOWING TIMEOUT in different requests.
conscious-sapphire
conscious-sapphire6/1/2023

Truncated tweet content returned by Twitter URL Scraper?

Hi there, I'm trying Apify for the first time today. I've just run a test to see how it handles historical twitter scrapes, and it seems to work well except for one issue - the tweet content it returns is truncated at the ±140 character mark. Just wondering if there is any way to run this and get it to return the full text of tweets longer than 140 characters? See the attached screenshot for a comparison between the data returned for a tweet by the Apify Twitter URL Scraper, and the same tweet displayed on twitter.com...
No description
rival-black
rival-black5/31/2023

Getting error from Axios

Hi, I am following the sample code from langchain site https://js.langchain.com/docs/modules/indexes/document_loaders/examples/web_loaders/apify_dataset and I am getting into ``` [Nest] 18014 - 05/31/2023, 9:03:45 PM ERROR [ExceptionsHandler] Request failed with status code 400 Error: Request failed with status code 400 at createError (/Users/adamsobotka/Develop/apifipt/node_modules/.pnpm/[email protected]/node_modules/axios/lib/core/createError.js:16:15) at settle (/Users/adamsobotka/Develop/apifipt/node_modules/.pnpm/[email protected]/node_modules/axios/lib/core/settle.js:17:12)...

Suggestion (feature request)

Apify console is a great tool for data visualizaion. just another idea: 1. Alternative view data like postcard (columns instead rows) 2. Ability to view image, when clicked. 3. Also for video, ability to play video (video player), will be great...
No description
quickest-silver
quickest-silver5/31/2023

Facebook Comment Scraper

Hey guys 🙂 I'm trying to run the Facebook comment scraper but keep getting 'There was an uncaught exception during the run of the Actor and it was not handled.' The run ID is: nZasjD7E4t6DwpPHm...
extended-yellow
extended-yellow5/31/2023

Sorting Facebook Group Posts By Post Time

Hi guys I've been using the Facebook Group Scraper (https://apify.com/apify/facebook-groups-scraper) and I was wondering if I can sort the posts by the post time because I've tried all of the sort options and they did not sort by the posts time.
absent-sapphire
absent-sapphire5/31/2023

Facebook Groups Scraper

Hi there, I wonder if we can specify a date-time period for getting the dataset from the Facebook group post
metropolitan-bronze
metropolitan-bronze5/30/2023

Instagram hashtag scraping doesn't return all the posts

I've been trying to use the Instagram hashtag scraper but after several scrapes I'm finding it only returns half the results and also returns multiple duplicates. I've upgraded my subscription and added IPs and Memory but still am not getting even half of the actual posts.
conscious-sapphire
conscious-sapphire5/30/2023

Question about userData as option in enqueueLinks

Scraping a few pages of a forum. Some infos are in the threadlist page and other are into the thread. In the end these information has to be loaded in a pg db. Example of thread list, information scraped href to feed in the enqueue links, views and replies ```md | Title | Views | Replies |...
absent-sapphire
absent-sapphire5/30/2023

One-off charge or lower price for ability to use proxies locally?

$49/m is too steep for me at this point in time–I'm just exploring the platform and am hoping that it's possible to pay a handful of schrute bucks so I can run my actor locally on my own machine.