Crawlee & Apify

CA

Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻devs-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

Selecting a build version

I have an actor that I would like to select a different build version to be used when ran manually. On the actors page source tab > input tab > run options section; the build is set to the latest an cannot be changed there with a note of "Version of Actor is fixed to latest build of currently selected version. To change version, please use the selector on top of this page." When I try to select from the top of the page my only option is latest and I cannot type the version I would like to use nor are the list of builds that can be found on the build tab of the actor. I also have the same issue when I try to do this in the builds tab, the dropdown only provided latest and I cannot type a specific build into it. I thought maybe I could just remove(deleting) the latest build returning to the version I would like to use, then rebuild once I have completed the manual run. However I cannot delete this build "Error: Cannot delete build (Deleting an Actor's default build is not allowed.)" However when the actor set up to run on a scheduled I can provide the specific build version. ...

How to generate token with read-only permission to share task result in dataset?

I refered to https://blog.apify.com/how-to-turn-any-website-into-an-rss-feed-a8f9f216e1b0/ and wrote my own rss task, but the only way to get the execution result of task in Storage.dataset is accessing:
https://api.apify.com/v2/actor-tasks/changchiyou~wildrift-news-zh-tw/runs/last/dataset/items?token=[apify_api_MYAPI]&format=rss&clean=true
https://api.apify.com/v2/actor-tasks/changchiyou~wildrift-news-zh-tw/runs/last/dataset/items?token=[apify_api_MYAPI]&format=rss&clean=true
...

GCP Authentication

I want to authenticate a custom actor to write to GCP Cloud Storage. While coding this up is easy, does APIFY provide an option on the console to store the keys used for authentication? I wonder if there is a way to provide authentication to GCP via GitHub CI/CD. Can someone help me on this?...

Timeout When Pulling Docker Image

We've been noticing an uptick in the number of timeouts while pulling an Actor's Docker image. Can someone recommend any solutions on how to prevent this? Alternatively, is it possible to restart the actor under this scenario? The "Restart on error" Actor setting does not seem to apply, since the Actor never gets started in the first place. Any help is much appreciated! Details: ``` Actor Name: Custom Actor...
No description

Apify Ip Addresses To White List

I'm developing a bot that needs to call an api service behind a VPN (under my control). The bot will be hosted on apify but to do that, I need to put in my VPN white list, Apify ip addresses. How can I do? I tried to ask on live chat inside my company apify account, but I didn't receive a feedback until now. Anyone can help me please?...

Unsure how to use captcha solver

so I'm trying to use the tiktok captcha solver in the browser (firefox). The page source doesn't have any wid info, so I inspect the page. I go to the storage section and if i look in the cookies part, there's nothing called 'auth_token'. If i look in the local storage part, there's multiple tea_cache_tokens. Right now, they are of the format web id, unique user id, and timestamp except for one token. that token is of the format unique user id, web id, timestamp. That token that's a different format is what I assume is to be the correct one but I haven't tested it out yet. Also, how do I find the install id and the verFP also, once that's all figured out, how do I use the captcha solver? i'm using selenium to navigate firefox and how do i ensure that it waits for the captcha to appear before solving it and how to check if it's done...

Suggestion: JavaScript Fetch Example

On API Console theres is example using CURL, but theres no example using the most used language in the world which is pure JS Fetch. Adding Fetch example may help users a lot. eg: ```javascript...
No description

PlaywrightCrawler actor not finishing requestQueue

I have a playwright Actor that will has 10 URLs added to its queue before i kick it off with .run(). But the actor doesn't finish all 10 URLs. It will process between 4 and 7, then the Log for the run will just show statistics message repeated every second. Note that this happens in my local runs of this Actor as well. The total number of URLs scraped (out of 10) varies from run to run, minimum 1 URL and max 7 (of 10 total). This is the message it shows on repeat, on my local and on Apify platform:...

Actor Run page showing 0 results with named dataset

I'm developing an actor where the user can choose to use a named dataset and request queue instead of the default ones by setting input fields. When I run it with a named dataset and request queue, it works properly and the data and requests show up in the 'Storage' tab, but the Run detail page shows 0 results and references an unnamed dataset and request queue, both of which are empty. Do I need to do something in my code to indicate that the named dataset and request queues should be used (oth...

Bug reporting.

There is this bug in the payout section.
No description

Memory is full, however no tasks running...

Hi, I have an issue with Memory, it shows 8Gb/8Gb, however there are no task in progress. How to fix this?

Platform down - ETA or updates?

Hello, according to the status page, most of the platform is currently down. Is there any ETA, or somewhere we can check for updates?

Using own cloud storage

I'm planning to setup some scrapers using APIFY. However I want to store the data in my own cloud, for example: Google Cloud Storage. I could not find any articles on this. Would appreciate some help on this. Was also wondering if its possible for me to run the APIFY scrapers in my own compute...!...

Q: Is it a bad idea to use Residential Proxy instead of Google Serp Proxy?

Yes I understand the functional differences between the two. However, I've founded that the Google requests are coming through just fine on the Residential Proxy as well. This is where a bit of curiosity arises, so I'm asking. Is it a bad idea to search Google with Residential Proxy instead of Google SERP Proxy?

File size upload to KeyValueStore

I created actor which downloads and uploads videos to KeyValueStore, some of videos are pretty big and getting this error: 2024-05-18T15:54:15.572Z Failed to upload video: RangeError [ERR_FS_FILE_TOO_LARGE]: File size (2183982088) is greater than 2 GiB so anyway to get around this and what are best practices on apify for downloading and uploading of big files...

searching for an actor for non third party e-commerce platform review scraper

Hi Team is there an actor that will help me scrape e-commerce store reviews Note: I want something that is not related to third party platforms like Shopify or Wordpress Basically an actor to do that for websites that are built from scratch without any their party tools...

Trouble with booksscraper in Apify console

Hello, everyone! I am currently following the tutorial: https://www.youtube.com/watch?v=4nxStxC1BJM&list=PLObrtcm1Kw6PEnu5BpeEFb8XEoQXMw0g7&index=11 And at this part, dealing with the deployment of ‘booksscraper’: 05:33 Demo: building an Actor using templates and tools I observe the following problem: It works perfectly in CLI, but bugs as an Actor in the Apify console… 😦...

Resurrect Timed Out Actor via Javascript API SDK?

I have an integration configured to send a webhook event to my server when actor runs finish or time out or error out. In the case of a time out, how do I use the Javascript API SDK to resurrect the run?...

Actor marked under maintenance

I developed an actor that I published yesterday to the Store: https://apify.com/xyzzy/open-router Today I received an email saying the actor is marked under maintenance since the system test using the default inputs failed. This actor requires an external API key to OpenRouter as a secret input. I'm not sure what the platform-approved way of doing this is, and didn't find an answer in the documentation. I emailed support too, but let me know the appropriate channel for getting help on this. Thanks!...

GMail integration to send report about execution

Hi there, I'm trying Apify gmail integration to send a report about execution of a crawler. I enabled that using Dataset file as Text option; so I can add report in a Dataset and I receive data in a json attachment via email. It seems that works fine but I think in this way every dataset will send via email. My question is: Is there a way to send only one kind of dataset?...
No description