xenial-black

Instagram scraper // Error no public data

Hi everyone 🙂 I'm currently developing a Python-based solution using your API. After a few successful tests throughout the day, I have the impression that “nothing works anymore”. Indeed, concerning the “instagram-scraper” API, I'm getting the following message telling me that no more data is contained in my “results” object.

21 Replies

xenial-blackOP•12mo ago

Is there a daily limit I don't know about? Is this something you can unblock on your side? I'd be extremely grateful.

Marco•12mo ago

I would suggest you to monitor the Instagram scraper on the platform to know what's happening: you should see the runs you triggered from Python 🙂 There is a monthly free usage of 5$ on Apify, you can read more here.

Apify

Apify Console

Manage the Apify platform and your account.

xenial-blackOP•12mo ago

Thank you very much for your feedback @Marco . However, look, I'm well under $5 at the moment. Especially since when I launch a run from Apify (and not from the code via API), I get everything I need in a much shorter time. With my code, I used to query the last 30 posts in less than a minute, but on my last run, it only took 13 of the 30 in 2 minutes...

Marco•12mo ago

Check if the run triggered from Python had at least 4096MB, which is the default value. If not, you can specify the memory in the call method. Also, check if there is any error in the run's logs.

ActorClient | API | API client for Python | Apify Documentation

Sub-client for manipulating a single actor.

xenial-blackOP•12mo ago

Interesting! But how can I see and fix this 4096MB thing?

Marco•12mo ago

Just click on the run in your last screenshot's page and check the amount of memory that the run was using. You can specify a custom value in Python with client.actor(...).call(run_input={...}, memory_mbytes=4096). If you check the previous link to the documentation you can see all the available options.

xenial-blackOP•12mo ago

Super I see the value in my run as well as explanations about this in the documentation. What do you think about if I go from 4096 to double, for example, explicitly in my code?

MEE6•12mo ago

@ClemApify just advanced to level 1! Thanks for your contributions! 🎉

xenial-blackOP•12mo ago

@Marco

Marco•12mo ago

I don't think it will make a significant difference: it should work correctly with 4GB, since it is the default value. You should check the bad run's log to see if there is any error. If you don't see anything strange, post here the run ID, so I can check it and possibly forward the problem to the internal support.

MEE6•12mo ago

@Marco just advanced to level 1! Thanks for your contributions! 🎉

xenial-blackOP•12mo ago

@Marco After rechecking the log, there are no errors. Here's the run ID: dJThRXKo1ILFad9zf. I'll let you get back to me as soon as you or internal support know more!

xenial-blackOP•12mo ago

(the duration alone is unreasonable, something is wrong)

Marco•12mo ago

Actually, there are many retries due to calls blocked by Instagram, and some requests exceeded the maximum number of retries, that's why you only have 13 results instead of 30 and it took so long. You can see this in the "logs" tab. Try to remove the proxy settings from the input: this will tell the scraper to use its default settings, which is a good thing in most cases. If you are interested, you can learn more about proxies here and here.

Proxies | Academy | Apify Documentation

Learn all about proxies, how they work, and how they can be leveraged in a scraper to avoid blocking and other anti-scraping tactics.

xenial-blackOP•12mo ago

2024-06-11T09:55:38.365Z ERROR CheerioCrawler: Request failed and reached maximum retries. Error: Request got blocked. Will retry with different session
2024-06-11T09:55:38.499Z INFO  CheerioCrawler: All requests from the queue have been processed, the crawler will shut down.
2024-06-11T09:55:38.628Z INFO  CheerioCrawler: Final request statistics: {"requestsFinished":13,"requestsFailed":1,"retryHistogram":[11,2,null,null,null,null,null,null,null,null,1],"requestAvgFailedDurationMillis":9733,"requestAvgFinishedDurationMillis":5527,"requestsFinishedPerMinute":8,"requestsFailedPerMinute":0,"requestTotalDurationMillis":81581,"requestsTotal":14,"crawlerRuntimeMillis":100892}
2024-06-11T09:55:38.631Z INFO  CheerioCrawler: Error analysis: {"totalErrors":1,"uniqueErrors":1,"mostCommonErrors":["1x: Request got blocked. Will retry with different session (file:///usr/src/app/dist/src/routes-feed.js:15:15)"]}
2024-06-11T09:55:38.634Z INFO  CheerioCrawler: Finished! Total 14 requests: 13 succeeded, 1 failed. {"terminal":true}
2024-06-11T09:55:38.653Z INFO  [Status message]: Post scraper finished

2024-06-11T09:55:38.365Z ERROR CheerioCrawler: Request failed and reached maximum retries. Error: Request got blocked. Will retry with different session
2024-06-11T09:55:38.499Z INFO  CheerioCrawler: All requests from the queue have been processed, the crawler will shut down.
2024-06-11T09:55:38.628Z INFO  CheerioCrawler: Final request statistics: {"requestsFinished":13,"requestsFailed":1,"retryHistogram":[11,2,null,null,null,null,null,null,null,null,1],"requestAvgFailedDurationMillis":9733,"requestAvgFinishedDurationMillis":5527,"requestsFinishedPerMinute":8,"requestsFailedPerMinute":0,"requestTotalDurationMillis":81581,"requestsTotal":14,"crawlerRuntimeMillis":100892}
2024-06-11T09:55:38.631Z INFO  CheerioCrawler: Error analysis: {"totalErrors":1,"uniqueErrors":1,"mostCommonErrors":["1x: Request got blocked. Will retry with different session (file:///usr/src/app/dist/src/routes-feed.js:15:15)"]}
2024-06-11T09:55:38.634Z INFO  CheerioCrawler: Finished! Total 14 requests: 13 succeeded, 1 failed. {"terminal":true}
2024-06-11T09:55:38.653Z INFO  [Status message]: Post scraper finished

Ok I finally see what the problem is indeed, thanks for that. But can you direct me to “remove the proxy settings from the input” @Marco ?

{
  "addParentData": false,
  "directUrls": [
    "https://www.instagram.com/andie_ella/"
  ],
  "enhanceUserSearchWithFacebookPage": false,
  "isUserTaggedFeedURL": false,
  "resultsLimit": 30,
  "resultsType": "posts",
  "searchLimit": 1,
  "searchType": "hashtag",
  "proxy": {
    "useApifyProxy": true
  }
}

{
  "addParentData": false,
  "directUrls": [
    "https://www.instagram.com/andie_ella/"
  ],
  "enhanceUserSearchWithFacebookPage": false,
  "isUserTaggedFeedURL": false,
  "resultsLimit": 30,
  "resultsType": "posts",
  "searchLimit": 1,
  "searchType": "hashtag",
  "proxy": {
    "useApifyProxy": true
  }
}

Do I need to change anything here?

Marco•12mo ago

Exactly, you are passing this object in Python when you execute client.actor('...').call(run_input={ ..., "proxy" {...} }: just remove "proxy" and its content.

xenial-blackOP•12mo ago

That's it, I've removed “proxy” from the content, so as to have:

MEE6•12mo ago

@ClemApify just advanced to level 2! Thanks for your contributions! 🎉

xenial-blackOP•12mo ago

`{
  "addParentData": false,
  "directUrls": [
    "https://www.instagram.com/justezoe/"
  ],
  "enhanceUserSearchWithFacebookPage": false,
  "isUserTaggedFeedURL": false,
  "resultsLimit": 30,
  "resultsType": "posts",
  "searchLimit": 1,
  "searchType": "hashtag"
}

`{
  "addParentData": false,
  "directUrls": [
    "https://www.instagram.com/justezoe/"
  ],
  "enhanceUserSearchWithFacebookPage": false,
  "isUserTaggedFeedURL": false,
  "resultsLimit": 30,
  "resultsType": "posts",
  "searchLimit": 1,
  "searchType": "hashtag"
}

I've just made a new RUN, whose ID I'm sharing with you “S5lQPQde1X3LnX1sl” so that you can check. Everything works, even if the execution time still seems long.

Marco•12mo ago

Unfortunately, there are many factors which can influence the run duration, such as Instagram server's connection speed, the amount of Instagram users connected right now, the status of the proxies, and so on. 😅

xenial-blackOP•12mo ago

No problem at all with that, the important thing is that it all works! And thank you so much for this, you've been a great help. See you soon 🤗

Gaming

Programming

Instagram scraper // Error no public data

Did you find this page helpful?