Crawlee & Apify

CA

Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻devs-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

Question for any team members: if you

Question for any team members: if you scrape using the residential proxy when required for FB/IG is it possible to get a handle on the costs because I know it looks expensive at 10/gb! @netmilk @vladdy this is a slight concern ....

I need help, I have the Scale plan and

I need help, I have the Scale plan and out of nowhere "You do not have permission to run this public Actor."

Any members of the team: I'd like to

Any members of the team: I'd like to know what happens if we use a lot of resources . How long do we have to pay the bill or do we need to always have enough pre-paid credit on hand? @vladdy @JameEnder let me know this will help us plan better thanks.

you need to explicitly pass the

you need to explicitly pass the requestQueueId when starting the actor.

in the python sdk is there a way to get

in the python sdk is there a way to get the number of links scraped so far?

Hi! I'm building something using the

Hi! I'm building something using the apify sdk for crawling. I'm currently trying to figure out how I can tell the actor which URLs to skip during recrawls. Is the excludeUrlGlobs the right input setting for this? Is there a limit on exclusions? The plan is to regularily crawl news sites but i would like to only process something if there is new (not previously visited urls) data found....

I have the problem with Actor Instagram

I have the problem with Actor Instagram Scraper

Hello guys,

Hello guys, I have pupeteerCrawler in the requestHandler I'm trying to click to the pagination next button and I cannot determine if the content is changed or not. How can I do it? waitfornetworkidle does not seem to work here. any ideas?...
No description

Support regarding website content crawlrr

Hello, I'm using apify/website-content-crawler and want to render javascript before the crawling process. According the documentation, the related property for this settings is crawlerType, and it says if I choose Headless Browser, I can render javascript....
No description

Hi there! Does anybody know how to

Hi there! Does anybody know how to increase HTTPCrawler Requests timeouts? According to the docs there is requestQueue.timeoutSecs property but even when it is set to e.g. 60 secs all my HTTPCrawler requests are failed after 30 sec timeout 😦

I have made a scrapper. and I have

I have made a scrapper. and I have written code in python. and I am using Flask for the server. I have several routes. let say. 1. route1...

Excuse me, i have a problem.If i want to

Excuse me, i have a problem.If i want to use Twitter followers scraper, need i pay the actor fee 25$/month as well as apify platform starter plan 49$/month totally 74$/month?

Hi I am using Smart Article Extractor

Hi I am using Smart Article Extractor actor for extracting info in form of json from an article URL, now upon running it on postman, the actor runs flawlessly on apify console but fails to provide any response on postman with 201, how can i get response on it, please help

The builtwith technology scraper

The builtwith technology scraper

Hi, I am currently trying to split up

Hi, I am currently trying to split up the routes of my playwrightrouter into seperate files. But how do i do this? File1: export const router = createPlaywrightRouter(); File2: router.addHandler does not do the job for me. I guess the reason is that File2 never gets excuted.

Hello everyone,

Hello everyone, I would assume ther is a Custom GPT that can be asked all kinds of questions about Apify and its Agents, trained from the documentation and more resources, is there anything like it ? looking forward to learn more...

Hi, I am building a webscraper and I

Hi, I am building a webscraper and I want to use a kind of persistent cache to determine which links I have scraped recently in other run-throughs and which not. Is the best solution for this to use the KeyValueStore .getValue and KeyValueStore .setValue during the requestHandlers ?

Hey does anyone know if i can

Hey does anyone know if i can programatically change the proxy ? like on certain conditions met i want to change the next random proxy.

hi, thank you for the tool

hi, thank you for the tool PlaywrightCrawler. I want to ask How to handle 429 status code caused by requesting too often? Is there a sleep-for-a-few-seconds method to handle this? Thanks for the attention.

ERROR Actor failed with an exception

ERROR Actor failed with an exception
Traceback (most recent call last):
File "/usr/src/app/src/main.py", line 150, in main
actor_input = await Actor.get_input()
^^^^^^^^^^^^^^^^^^^^^^^ ...