Crawlee & Apify

CA

Crawlee & Apify

This is the official developer community of Apify and Crawlee.

Join

crawlee-js

apify-platform

crawlee-python

💻hire-freelancers

🚀actor-promotion

💫feature-request

💻devs-and-apify

🗣general-chat

🎁giveaways

programming-memes

🌐apify-announcements

🕷crawlee-announcements

👥community

Not getting right content when crawling Levis website

Using Apify Website Content Crawler I crawled https://www.levi.com/US/en_US/clothing/men/jeans/straight/501-original-fit-mens-jeans/p/005010115 , instead of getting product details content I am getting just the following content which does not make any sense. Can someone please help me what has gone wrong ? Attached input params json file Returned Content: Installments by 4 interest-free payments due every 2 weeks when you select Afterpay at checkout Select Afterpay as your payment method at checkout Available on orders $35 - $1,000. All you need to apply is your debit or credit card. Complete your checkout No long forms and you'll receive an instant approval decision. Pay over 4 equal payments Enjoy your purchase right away! Pay every two weeks with zero interest and no fees when you pay on time. Afterpay not available on orders with Gift Cards. You must be over 18, a resident of the U.S. and meet additional eligibility criteria to qualify. Estimated payment amounts shown on product pages exclude taxes and shipping charges, which are added at checkout. Late Fees apply. Click here Terms & Conditions. ©2020 Afterpay...

Website Content Crawler With Site Login

The default Website Content Crawler has been great for my work, but I'm wondering if there's a version that can log into websites? Or is there a setting on the Input tab that I'm missing?

Output schema not accepted

Hi there, I'm trying to use an output schema. I have a line in my actor.json file "output": "./output_schema.json", and then an output_schema.json file in the .actor folder (attached). When I try building my actor I get the following error: { "instancePath": "", "schemaPath": "#/required",...

ForceCloud

I find the docs around the forceCloud behavior confusing. Is this feature still available? None of these statements from the SDK docs seem to hold true currently, with the SDK:
Note that you can force usage of the cloud storage also by passing the forceCloud option to Dataset.open function, even if the APIFY_LOCAL_STORAGE_DIR variable is set.
Dataset stores its data either on local disk or in the Apify cloud, depending on whether the APIFY_LOCAL_STORAGE_DIR or APIFY_TOKEN environment variables are set....

Apollo Scrap

I want to scrap unlimited data from Apollo.io. Anyone here to give instruction. Please

Download CSV using playwright in an Apify Actor

So I have written some javascript that navigates to and downloads a csv file using playwright with chromium. I am using .saveAs and defining the filepath on my local machine, but not sure how to convert this to work on Apify. I have tried various things. Everything works except the download. It is not explicitly clear to me that Apify can even save a .csv file. I see mention that it is possible to save files to the key-value-store, but i remain unsure....

Scrape tiktok bio AND location of user

I need to input motorcycling as a hashtag, then scrape all users from that hashtag, but i NEED their bio AND location Right now i have to use 2 different scrapers to get the hashtag profiles THEN the location of the user Please help!...

Apify - Tiktok Scraper

Hello, I am new to Apify and trying to figure out what is posssible. I am trying to get the Apify's Tiktok Scraper to pull from a google sheet with a list of tiktok posts. However, I do not see anyway in the Input settings for this to be possible. Does anyone know if Apify has the ability to do this? In the documentation here it seems to say it does but I cannot find much more details on this. ...

Why "Actor finished successfully" when it actually ERRORed?

Why does Apify think this completed successfully when the log shows that it had an error? ```log <snip>...

Can I pass custom data to Website Content Crawler?

I really need to pass a user id to Website Content Crawler and have that included in webhooks, but I don't see any way to do so. Is it not possible? If not possible to include that data in the webhook, can I at least somehow see the entire request body from my API request, using the run id? I tried get run and the response doesn't include any of the input data/request body from my api call....

Unable to Close Issues on Apify - Button Disappeared

Hello everyone, I've encountered a peculiar issue on the Apify platform for the past few days. For some reason, I am no longer able to close my issues - the button to do so seems to have disappeared. I've tried various troubleshooting steps, such as checking different browsers and clearing cache, but the problem persists. Has anyone else faced a similar issue, or is it something specific to my account (using organization account where i'm admin) ? Any guidance on how to resolve this would be greatly appreciated. If it's a wider technical problem, I hope it can be brought to the attention of the technical team for a prompt resolution....
No description

compass/crawler-google-places optimization

I like the scraper and would like to use it more frequently and effectively and I plan on upgrading to a paid plan soon. I understand I can use the orchestrator to have multiple runs that execute at the same time to best utilize my resources but I want to make sure that I am optimizing the individual runs. I have a wide range of categories that I would like to query for but I am wondering if I should do multiple runs per location and query for different categories on each run or if I should just slim the list down so that I can do a single run per location? Also, at what point will there be no new results? For instance, I have gone through all the categories listed that I can put as an input and have picked out things like bar, brewery, brewpub, etc but I noticed that most of the entries in the log are either that there is no data for the search term or all the data is duplicate. Are there certain categories that are better than others or will encompass others? Another question is if I abort a run because I had too many categories in the input, and then I star another run in the same location but with a smaller set of categories, will data be duplicated, or will these places be passed over because I already have them stored?...

compass/crawler-google-places API clients documentation?

How do i get a documentation / explanation on the Actor input for this actor's API clients?
No description

Facebook , I want to scrape all posts from one user in multiple groups, witch tool can i use

I want to scrape all posts from one user in multiple groups, witch tool can i use

Unlogical New Pricing 3.5 $ per 1000 results for Twitter Scraper

Hello, I just got an email about the new pricing for Twitter API - $3.5 per post. Honestly, this pricing seems off. Also, for the Twitter URL scraper, the rate you're considering is $2.5 per 1000 results. Let's break it down and compare it with what Twitter API and Apify cost....

Can I run my actor in apify cloud in headfull (not headless) mode?

I am having trouble with captcha detection while in headless mode. I am not finding any information about running the cloud actor in not headless mode.

Pass a variable from pageFunction() via Webhooks JSON

Im trying to pass a returned variable from pageFunctions via webhooks, but I cannot get the value. My code: async function pageFunction(context) { await context.skipLinks(); const $ = context.jQuery;...

BUG: Notification

I am receiving this notification, not sure where it come from.
No description

Expand clickable elements setting - Website Content Crawler

Hi there, I'm trying to scrape this website - https://www.msci.com/research-and-insights/, there's a load more button which I wish to click so that crawler extracts all the content. I tried this setting in different ways but failing. The css selector for that element would be #research-items-load-more a . I tried setting values like ["#research-items-load-more a"=\"true\"] or just ['#research-items-load-more a']. It fails to run eventually. Would appreciate quick help here....