Issue with Null Data Returned After Multiple Attempts (more than 10 Attempts with Python SDK / cURL)
I am writing to report an issue where I consistently receive null data after multiple attempts using both the Python SDK and cURL.
Despite trying various combinations of pageOptions and crawlerOptions, including/excluding different URLs, implementing retries with exponential backoff, and customizing my code to wait and monitor jobs, I am still encountering this problem.
No matter what I attempt, when the limit exceeds approximately 500, the result is always null (it only works for +- 100 urls).
Here are the job IDs from my recent attempts:
id: 038cb167-bc7f-4814-9cb8-a5b7bd8a5e0d
id: 8cc40afb-0900-4aa8-9c7e-b1454ef6cd01
id: f75fa037-2d67-4807-9e01-3feffa814ab0
id: 6f5d6379-4be1-4abb-bed4-307d85085679
id: 9b6d7af2-617d-4ad3-93b8-12d2bde6fe5f
id: 89a867a9-e9e3-432d-b169-cdc2c164c34a
id: 8a51aaec-697c-48fe-8991-2d2e6ebb4f9b
id: 68e1f6c6-c115-4fdf-8a66-6d945828c888
id: 662304f0-e531-45fa-bc73-80dddc997e0b
id: 5dd96f23-9dcb-4934-aebc-c60f58c199df
id: 827ba9cf-205e-4a8d-9c90-035ee85542ee
id: 440bc2ef-d89a-4a26-8d98-0ae2591980a5
id: c4d2f5d6-4e68-419b-9f08-1f46145a3306
id: 43b7c689-cb65-448a-bb96-d2ab1ad0c513
id: 19bf1ecb-2356-4fb6-9afb-11e7ba32ea55
id: 690b4d54-7ec8-46d6-8410-6fb49116bca0
id: da93c050-fad8-4ecd-beec-0554de82a518
id: 747c6ddf-0945-469f-a6c6-ab11a91906d6
id: 17033ef9-2231-472d-af45-7fe140a5970b
id: 91f60d21-2172-43b1-8d81-4b182e330600
id: 32be9ad8-19cc-47af-9331-f9eebeea6fb1
id: 32be9ad8-19cc-47af-9331-f9eebeea6fb1
id: a232c8c7-8fce-4858-8b0f-7fcdf7e2d6fd
id: 8d19cc73-4e3a-4460-959a-638cf2389d5e
id: 3ce02737-7b6d-4699-8c5e-b250b9d7a06c
id: 947bf4d5-7fe3-429b-985f-320f8b2f8c35
id: 755a6d3c-56ff-4f6b-9f3e-9aacbb855172
id: 74e2cdf8-6a28-4719-8fc8-0745766cee44
id: c7210237-c765-42e6-b848-3442085b3e9e
Could someone please assist me in resolving this issue?
Thank you in advance for your help!
3 Replies
Hey there, this is a known issue thats mostly been addressed. Let me know if its still orccuring!
Hi Caleb, good morning!
I'm trying to fetch +-4000 documents using the crawler, but it's only returning 25 of them, while consuming the 3472. Could it be the same problem?
id's
06ebf7af-24c6-493d-8bce-a181ea9fa86c
644726fb-76d6-463e-8bec-8aed8d1737ca
e9f18417-c5c9-47ca-b95b-0fda5ff8d0c0
a814bc2f-4529-48c1-80c3-f0533bd6922e
7bdc409f-1c48-46ec-b04b-51e75a210c76
1982e6b5-a511-4ebd-a7df-7512739ab301
Latest python sdk!
Do you see a
next
property on the crawl status response? It's probably not being paginated through correctly.