Issue with Null Data Returned After Multiple Attempts (more than 10 Attempts with Python SDK / cURL)

I am writing to report an issue where I consistently receive null data after multiple attempts using both the Python SDK and cURL. Despite trying various combinations of pageOptions and crawlerOptions, including/excluding different URLs, implementing retries with exponential backoff, and customizing my code to wait and monitor jobs, I am still encountering this problem. No matter what I attempt, when the limit exceeds approximately 500, the result is always null (it only works for +- 100 urls). Here are the job IDs from my recent attempts: id: 038cb167-bc7f-4814-9cb8-a5b7bd8a5e0d id: 8cc40afb-0900-4aa8-9c7e-b1454ef6cd01 id: f75fa037-2d67-4807-9e01-3feffa814ab0 id: 6f5d6379-4be1-4abb-bed4-307d85085679 id: 9b6d7af2-617d-4ad3-93b8-12d2bde6fe5f id: 89a867a9-e9e3-432d-b169-cdc2c164c34a id: 8a51aaec-697c-48fe-8991-2d2e6ebb4f9b id: 68e1f6c6-c115-4fdf-8a66-6d945828c888 id: 662304f0-e531-45fa-bc73-80dddc997e0b id: 5dd96f23-9dcb-4934-aebc-c60f58c199df id: 827ba9cf-205e-4a8d-9c90-035ee85542ee id: 440bc2ef-d89a-4a26-8d98-0ae2591980a5 id: c4d2f5d6-4e68-419b-9f08-1f46145a3306 id: 43b7c689-cb65-448a-bb96-d2ab1ad0c513 id: 19bf1ecb-2356-4fb6-9afb-11e7ba32ea55 id: 690b4d54-7ec8-46d6-8410-6fb49116bca0 id: da93c050-fad8-4ecd-beec-0554de82a518 id: 747c6ddf-0945-469f-a6c6-ab11a91906d6 id: 17033ef9-2231-472d-af45-7fe140a5970b id: 91f60d21-2172-43b1-8d81-4b182e330600 id: 32be9ad8-19cc-47af-9331-f9eebeea6fb1 id: 32be9ad8-19cc-47af-9331-f9eebeea6fb1 id: a232c8c7-8fce-4858-8b0f-7fcdf7e2d6fd id: 8d19cc73-4e3a-4460-959a-638cf2389d5e id: 3ce02737-7b6d-4699-8c5e-b250b9d7a06c id: 947bf4d5-7fe3-429b-985f-320f8b2f8c35 id: 755a6d3c-56ff-4f6b-9f3e-9aacbb855172 id: 74e2cdf8-6a28-4719-8fc8-0745766cee44 id: c7210237-c765-42e6-b848-3442085b3e9e Could someone please assist me in resolving this issue? Thank you in advance for your help!
3 Replies
Caleb
Caleb14mo ago
Hey there, this is a known issue thats mostly been addressed. Let me know if its still orccuring!
andreichiro
andreichiroOP13mo ago
Hi Caleb, good morning! I'm trying to fetch +-4000 documents using the crawler, but it's only returning 25 of them, while consuming the 3472. Could it be the same problem? id's 06ebf7af-24c6-493d-8bce-a181ea9fa86c 644726fb-76d6-463e-8bec-8aed8d1737ca e9f18417-c5c9-47ca-b95b-0fda5ff8d0c0 a814bc2f-4529-48c1-80c3-f0533bd6922e 7bdc409f-1c48-46ec-b04b-51e75a210c76 1982e6b5-a511-4ebd-a7df-7512739ab301 Latest python sdk!
mogery
mogery13mo ago
Do you see a next property on the crawl status response? It's probably not being paginated through correctly.

Did you find this page helpful?