Twitter scraping by both keyword and profile

It is too computationally intense/slow for me to make the api call for one of the filters and do post processing with the second filter. I am wondering if you can make an api call to scrape filtering by both keyword and profile. Is this possible or can I only do one or the other? Thanks!
11 Replies
foreign-sapphire
foreign-sapphireOP3y ago
I see this question is similar to the Facebook scraper post, is it the same case that you are unable to filter both simultaneously in one api call?
Pepa J
Pepa J3y ago
Hello @Deleted User the twitter has advanced search possibilities by itself . May you fill the form for advanced search ( https://twitter.com/search-advanced?lang=en ) and then copy paste it to the Actor's input? If it would not help, what combination of keywords and profiles, are you trying to scrape?
foreign-sapphire
foreign-sapphireOP3y ago
For some reason when I advanced search by both user and keyword on apify, it only searches the keyword. Is that supposed to happen?
Pepa J
Pepa J3y ago
@Deleted User which specific actor do you use? I just tried Twitter Scraper and 90% of the results are from the user I set on Input with the right keywords.
foreign-sapphire
foreign-sapphireOP3y ago
I use the same, I’m asking if it’s possible to set keyword and user and have results return the union of both
vicious-gold
vicious-gold3y ago
Can you give us more specific examples and step by step approach what are you trying to achieve.
foreign-sapphire
foreign-sapphireOP3y ago
Sure, so say I want to scrape all tweets by https://twitter.com/JoeBiden containing the word "president", I am current using this body of code actorinput = { "addTweetViewCount": true, "addUserInfo": false, "browserFallback": false, "debugLog": false, "extendOutputFunction": "async ({ data, item, page, request, customData, Apify }) => {\n return item;\n}", "extendScraperFunction": "async ({ page, request, addSearch, addProfile, , addThread, addEvent, customData, Apify, signal, label }) => {\n \n}", "fromDate": "2021-11-02", "handle": [ "https://twitter.com/JoeBiden" ], "handlePageTimeoutSecs": 5000, "maxIdleTimeoutSecs": 60, "maxRequestRetries": 6, "mode": "own", "profilesDesired": 10, "proxyConfig": { "useApifyProxy": true }, "searchTerms": [ "president" ], "tweetsDesired": 10000, "useAdvancedSearch": true, "useCheerio": true } headers = { 'Content-Type': 'application/json; charset=utf-8', 'Authorization': f'Bearer {api_token}' } data = json.dumps(actor_input) response = requests.post(api_endpoint, headers=headers, data=data)
MEE6
MEE63y ago
@Deleted User just advanced to level 1! Thanks for your contributions! 🎉
foreign-sapphire
foreign-sapphireOP3y ago
however it looks like the actor is retrieving tweets from any user containing the search term 'president'. I am only interested in tweets from "https://twitter.com/JoeBiden" containing the term 'president'. Thanks!
Pepa J
Pepa J3y ago
@Deleted User yes for this general input I am also receiving a lot unrelevant results. That's why I suggested you to generate expression from advanced search form (on the twitter website) and use it for the searchTerms attribute. The input then looks like this:
{
...
"searchTerms": [
"\"president\" (from:JoeBiden) -filter:links -filter:replies"
],
...
}
{
...
"searchTerms": [
"\"president\" (from:JoeBiden) -filter:links -filter:replies"
],
...
}
Now all the results belongs to the specified twitter account.
foreign-sapphire
foreign-sapphireOP3y ago
ahh okay, i was wrongly under the impression that the api would have done this for me, thank you so much!

Did you find this page helpful?