CA
Crawlee & Apify12mo ago
xenial-black

Instagram scraper // Error no public data

Hi everyone 🙂 I'm currently developing a Python-based solution using your API. After a few successful tests throughout the day, I have the impression that “nothing works anymore”. Indeed, concerning the “instagram-scraper” API, I'm getting the following message telling me that no more data is contained in my “results” object.
No description
21 Replies
xenial-black
xenial-blackOP12mo ago
Is there a daily limit I don't know about? Is this something you can unblock on your side? I'd be extremely grateful.
Marco
Marco12mo ago
I would suggest you to monitor the Instagram scraper on the platform to know what's happening: you should see the runs you triggered from Python 🙂 There is a monthly free usage of 5$ on Apify, you can read more here.
Apify
Apify Console
Manage the Apify platform and your account.
From An unknown user
From An unknown user
xenial-black
xenial-blackOP12mo ago
Thank you very much for your feedback @Marco . However, look, I'm well under $5 at the moment. Especially since when I launch a run from Apify (and not from the code via API), I get everything I need in a much shorter time. With my code, I used to query the last 30 posts in less than a minute, but on my last run, it only took 13 of the 30 in 2 minutes...
No description
Marco
Marco12mo ago
Check if the run triggered from Python had at least 4096MB, which is the default value. If not, you can specify the memory in the call method. Also, check if there is any error in the run's logs.
ActorClient | API | API client for Python | Apify Documentation
Sub-client for manipulating a single actor.
xenial-black
xenial-blackOP12mo ago
Interesting! But how can I see and fix this 4096MB thing?
Marco
Marco12mo ago
Just click on the run in your last screenshot's page and check the amount of memory that the run was using. You can specify a custom value in Python with client.actor(...).call(run_input={...}, memory_mbytes=4096). If you check the previous link to the documentation you can see all the available options.
No description
xenial-black
xenial-blackOP12mo ago
Super I see the value in my run as well as explanations about this in the documentation. What do you think about if I go from 4096 to double, for example, explicitly in my code?
MEE6
MEE612mo ago
@ClemApify just advanced to level 1! Thanks for your contributions! 🎉
xenial-black
xenial-blackOP12mo ago
@Marco
Marco
Marco12mo ago
I don't think it will make a significant difference: it should work correctly with 4GB, since it is the default value. You should check the bad run's log to see if there is any error. If you don't see anything strange, post here the run ID, so I can check it and possibly forward the problem to the internal support.
MEE6
MEE612mo ago
@Marco just advanced to level 1! Thanks for your contributions! 🎉
xenial-black
xenial-blackOP12mo ago
@Marco After rechecking the log, there are no errors. Here's the run ID: dJThRXKo1ILFad9zf. I'll let you get back to me as soon as you or internal support know more!
No description
xenial-black
xenial-blackOP12mo ago
(the duration alone is unreasonable, something is wrong)
Marco
Marco12mo ago
Actually, there are many retries due to calls blocked by Instagram, and some requests exceeded the maximum number of retries, that's why you only have 13 results instead of 30 and it took so long. You can see this in the "logs" tab. Try to remove the proxy settings from the input: this will tell the scraper to use its default settings, which is a good thing in most cases. If you are interested, you can learn more about proxies here and here.
Proxies | Academy | Apify Documentation
Learn all about proxies, how they work, and how they can be leveraged in a scraper to avoid blocking and other anti-scraping tactics.
xenial-black
xenial-blackOP12mo ago
2024-06-11T09:55:38.365Z ERROR CheerioCrawler: Request failed and reached maximum retries. Error: Request got blocked. Will retry with different session
2024-06-11T09:55:38.499Z INFO CheerioCrawler: All requests from the queue have been processed, the crawler will shut down.
2024-06-11T09:55:38.628Z INFO CheerioCrawler: Final request statistics: {"requestsFinished":13,"requestsFailed":1,"retryHistogram":[11,2,null,null,null,null,null,null,null,null,1],"requestAvgFailedDurationMillis":9733,"requestAvgFinishedDurationMillis":5527,"requestsFinishedPerMinute":8,"requestsFailedPerMinute":0,"requestTotalDurationMillis":81581,"requestsTotal":14,"crawlerRuntimeMillis":100892}
2024-06-11T09:55:38.631Z INFO CheerioCrawler: Error analysis: {"totalErrors":1,"uniqueErrors":1,"mostCommonErrors":["1x: Request got blocked. Will retry with different session (file:///usr/src/app/dist/src/routes-feed.js:15:15)"]}
2024-06-11T09:55:38.634Z INFO CheerioCrawler: Finished! Total 14 requests: 13 succeeded, 1 failed. {"terminal":true}
2024-06-11T09:55:38.653Z INFO [Status message]: Post scraper finished
2024-06-11T09:55:38.365Z ERROR CheerioCrawler: Request failed and reached maximum retries. Error: Request got blocked. Will retry with different session
2024-06-11T09:55:38.499Z INFO CheerioCrawler: All requests from the queue have been processed, the crawler will shut down.
2024-06-11T09:55:38.628Z INFO CheerioCrawler: Final request statistics: {"requestsFinished":13,"requestsFailed":1,"retryHistogram":[11,2,null,null,null,null,null,null,null,null,1],"requestAvgFailedDurationMillis":9733,"requestAvgFinishedDurationMillis":5527,"requestsFinishedPerMinute":8,"requestsFailedPerMinute":0,"requestTotalDurationMillis":81581,"requestsTotal":14,"crawlerRuntimeMillis":100892}
2024-06-11T09:55:38.631Z INFO CheerioCrawler: Error analysis: {"totalErrors":1,"uniqueErrors":1,"mostCommonErrors":["1x: Request got blocked. Will retry with different session (file:///usr/src/app/dist/src/routes-feed.js:15:15)"]}
2024-06-11T09:55:38.634Z INFO CheerioCrawler: Finished! Total 14 requests: 13 succeeded, 1 failed. {"terminal":true}
2024-06-11T09:55:38.653Z INFO [Status message]: Post scraper finished
Ok I finally see what the problem is indeed, thanks for that. But can you direct me to “remove the proxy settings from the input” @Marco ?
{
"addParentData": false,
"directUrls": [
"https://www.instagram.com/andie_ella/"
],
"enhanceUserSearchWithFacebookPage": false,
"isUserTaggedFeedURL": false,
"resultsLimit": 30,
"resultsType": "posts",
"searchLimit": 1,
"searchType": "hashtag",
"proxy": {
"useApifyProxy": true
}
}
{
"addParentData": false,
"directUrls": [
"https://www.instagram.com/andie_ella/"
],
"enhanceUserSearchWithFacebookPage": false,
"isUserTaggedFeedURL": false,
"resultsLimit": 30,
"resultsType": "posts",
"searchLimit": 1,
"searchType": "hashtag",
"proxy": {
"useApifyProxy": true
}
}
Do I need to change anything here?
Marco
Marco12mo ago
Exactly, you are passing this object in Python when you execute client.actor('...').call(run_input={ ..., "proxy" {...} }: just remove "proxy" and its content.
xenial-black
xenial-blackOP12mo ago
That's it, I've removed “proxy” from the content, so as to have:
MEE6
MEE612mo ago
@ClemApify just advanced to level 2! Thanks for your contributions! 🎉
xenial-black
xenial-blackOP12mo ago
`{
"addParentData": false,
"directUrls": [
"https://www.instagram.com/justezoe/"
],
"enhanceUserSearchWithFacebookPage": false,
"isUserTaggedFeedURL": false,
"resultsLimit": 30,
"resultsType": "posts",
"searchLimit": 1,
"searchType": "hashtag"
}
`{
"addParentData": false,
"directUrls": [
"https://www.instagram.com/justezoe/"
],
"enhanceUserSearchWithFacebookPage": false,
"isUserTaggedFeedURL": false,
"resultsLimit": 30,
"resultsType": "posts",
"searchLimit": 1,
"searchType": "hashtag"
}
I've just made a new RUN, whose ID I'm sharing with you “S5lQPQde1X3LnX1sl” so that you can check. Everything works, even if the execution time still seems long.
Marco
Marco12mo ago
Unfortunately, there are many factors which can influence the run duration, such as Instagram server's connection speed, the amount of Instagram users connected right now, the status of the proxies, and so on. 😅
xenial-black
xenial-blackOP12mo ago
No problem at all with that, the important thing is that it all works! And thank you so much for this, you've been a great help. See you soon 🤗

Did you find this page helpful?