Crawlee & Apify

CA

Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻devs-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

optimistic-gold
optimistic-gold9/9/2023

None of requests was handled

I run my scraper on a schedule every day. Today it started, but did not process any of requests. When I ran it manually with the same input data, it processed all requests. Does anyone know why this might be? Link for run information: https://console.apify.com/view/runs/R3fxP9FwsbIHO1MFp
extended-salmon
extended-salmon9/8/2023

Actor getting SIGTERM and restarting from scratch without any user action

I have a Python requests actor that is somehow restarting itself after many hours , with the same input, and appending the results to the existing output. That's very annoying! If there was a crash it should just abort and i should be able to restart it on my own if i needed to. But restarting with the same input over again just wastes time and money for me. Does anyone know why this is happening? Here's what the log has -- there's no other info about any errors, the crawler was running just fine right up until 11:08:34, was about 21% done, and then I get: ``` 023-09-08T11:08:34.534Z ACTOR: Sending Docker container SIGTERM signal....
unwilling-turquoise
unwilling-turquoise9/8/2023

ERR_CONNECTION_REFUSED on a headful scraper

Sample Code: ``` async with async_playwright() as playwright: browser = await playwright.chromium.launch(headless=False) context = await browser.new_context(...
yappiest-sapphire
yappiest-sapphire9/4/2023

Google Maps Scraper Source Code on GitHub

Hello, everyone! Does the link to the GitHub repository containing the source code for the Google Maps Scraper still exist? The link on the Apify platform leads to a 404 error. Thank you!...
extended-salmon
extended-salmon9/4/2023

Maintain order of column in Dataset from scrapy crawler.

I am creating scraper's using scrapy, python. I have to conform to an output format. The order of columns in dataset automatically turn's into alphabetical. It is very tedious task to manually adjust the orders again. Is there a way to maintain the orders of columns as that of output....
extended-salmon
extended-salmon9/4/2023

Making Facebook Group Scraper Run Faster

Hi guys, I was wondering can we make the scraper run faster? I have tried increasing the memory but there are no noticeable increase in speed. The runs I have tried (Same Facebook group and resultsLimit) 1. memory 1GB: https://console.apify.com/view/runs/nmA94iQrukhzDv33Y 2. memory 2GB: https://console.apify.com/view/runs/RYVckuHJVhtTvDhzG...

how to run headful on the platform?

Failed to launch the browser process! undefined
2023-09-01T21:35:32.962Z [141:164:0901/213532.948704:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
2023-09-01T21:35:32.964Z [141:141:0901/213532.952831:ERROR:ozone_platform_x11.cc(240)] Missing X server or $DISPLAY
2023-09-01T21:35:32.966Z [141:141:0901/213532.952846:ERROR:env.cc(255)] The platform failed to initialize. Exiting.
Failed to launch the browser process! undefined
2023-09-01T21:35:32.962Z [141:164:0901/213532.948704:ERROR:bus.cc(399)] Failed to connect to the bus: Failed to connect to socket /run/dbus/system_bus_socket: No such file or directory
2023-09-01T21:35:32.964Z [141:141:0901/213532.952831:ERROR:ozone_platform_x11.cc(240)] Missing X server or $DISPLAY
2023-09-01T21:35:32.966Z [141:141:0901/213532.952846:ERROR:env.cc(255)] The platform failed to initialize. Exiting.
...
correct-apricot
correct-apricot9/1/2023

How to run Website Content Crawler asynchronously

Hi all, I'm new to Apify and was wondering how to get the Website Content Crawler to crawl pages asynchronously. I used the python ApifyClient and tried passing 50 url's to the "startUrls" input field. I would have expected the actor to run these asynchronously on its own, but the run took 25 min so I'm assuming it didn't. Another option I guess is to use the ApifyClientAsync and run 50 individual async runs and then collect their results....
eager-peach
eager-peach8/31/2023

Easy Twitter Search Scraper stopping after $1 usage

Hi, I had a free trial of Apify. I was using Easy Twitter Search Scraper. Every time that I ran it, it stopped after about 1 hour/$1 usage or about 4000 tweets. I selected a period of about 2 years of twitter data and turned off timeout. I thought that I would have up to $5 with the free trial? Do I have some other settings incorrect?
No description
extended-salmon
extended-salmon8/29/2023

Scrapy scraper immediately stopped working !!

Hi, I am writing scraper's in Python, Scrapy and Apify. I have multiple actors that were working great till last night but suddenly today when I create a new build and start to run they give error "AttributeError: 'AsyncioSelectorReactor' object has no attribute '_handleSignals'". This is same for all the scrapers if I just run the previous build they run fine. How can I fix this?...
extended-salmon
extended-salmon8/28/2023

Python crawlers running in parallel

Hi, I have a custom Python + requests Actor that works great. It's pretty simple, it works against a list of starting URLs and pulls out a piece of information per URL. My question is: If (for example) one run of 1,000 input URLs takes an hour to complete, i would like to parallel-ize it 4 ways so that I can run 4,000 URLs in an hour. What's the best way to do this? I could kick off 4 copies of the run with segmented data, but this seems like something Apify could support natively. ...
complex-teal
complex-teal8/28/2023

Error creating new ad-hoc webhook via actor run API call

I am trying to create an Ad-Hoc Webhook when calling the API to start an actor run and I keep getting an error saying "Webhooks parameter is not a valid JSON: Unexpected token z in JSON at position 0". I'm pretty confident the json is correct, so not sure what to do here. The code that is being used to submit the Http request (I am using Rails 5.2 with Ruby 2.7.7): ...
continuing-cyan
continuing-cyan8/28/2023

proxy configuration

Im currently writing an crawler an noticed on another crawler that you could select a Proxy via the Inputs (screenshot) I have the configure proxy inside my code but cant seem to set anything on apify Am i missing something here?...
No description
evident-indigo
evident-indigo8/27/2023

ApifyWrapper import using Langchain

I installed all the necessary dependencies in my code but when I try to import ApifyWrapper using: from langchain.utilities import ApifyWrapper I get an Import error that says ...
ratty-blush
ratty-blush8/26/2023

Can Google Maps Scraper pull emails from website pages or contact us sections?

I receive phone number and website URL's which are helpful but I want to send my product demo via email. Could emails be a field added to these scrapes? Would there be another apify scraper available that would get emails off of website pages?
vicious-gold
vicious-gold8/25/2023

How to get latest dataset from your scheduler?

Hi trying to use the API, to get the latest created dataset ? I thought using: https://api.apify.com/v2/actor-runs? API router would help and filtering for status SUCCEEDED, but there's events in here with a datasetId that have no linked dataset associated with it.
quickest-silver
quickest-silver8/24/2023

NO RESULTS... Impossible!

I just tried to search for Home Remodelers in North Carolina using a rented actor, and it responded NO RESULTS, that is impossible. There must be many hundreds or thousands there. What am I doing wrong? How can I get help? Will the Actor creator provide help?
extended-salmon
extended-salmon8/24/2023

Facebook Group Data Result Schema Documentation

Hi guys, I was wondering do you have the data result schema documentation because we found that different types of Facebook Group posts have different data schema for example posts with a 3d image, polls etc.
fair-rose
fair-rose8/23/2023

Website keyword scraper

I have been runing the keyword extractor several times and it worked fine. https://console.apify.com/actors/fgCg268Rg9Yrvrx24/console However, I've started experiencing an issue with it. I have 14 keywords as input but the out only shows 2 of the keywords. Also, I don't know if it's a big issue but I get a lot of errors all the time. ( ERROR BasicCrawler: handleRequestFunction failed, reclaiming failed request back to the list or queue {"url":"https://c4belts.com","retryCount":1,"id":"nXV0yuB9kdoSv6s"})...
No description