Can't retrieve results from a crawl job
client.get_crawl_status(crawl_job_id) is stuck since 25 minutes.
Also tried to download the results from the website UI but also seems to be stuck although job is marked as completed.
JOB_ID = 019acae3-c1ea-712d-a07f-8f0bdd3e127f
3 Replies
Hi @Romain,
I checked the JOB ID details from backend.
The Python SDK by default auto-paginates through ALL results. Your crawl has ~3000+ documents, which means it makes ~30+ sequential HTTP requests to fetch everything. This is why
client.get_crawl_status() appears stuck - it's slowly fetching all pages.
As a temporary fix you may setting auto_paginate=FalseThanks for the answer. It has been running for 100minutes now and still no results. I really need to get all results. I will try the auto_paginate=False on Monday after the weekend. I hope the job results will still be available by then
Also, I have noticed that I should have set ignore_query_parameters=True (by default it sets it to False) because most of the URLs scraped are irrelevant and I lost my 3k credits due to this...
Oh, the job is already failed, could you please share this via app chat, I can re-add the used credits for this failed job.