Crawl job not stopping and also the token usage has not been changed for quite some time
The code:
import requests
import time
import os
import re
from dotenv import load_dotenv
load_dotenv()
FIRE_CRAWL_API_KEY = os.getenv("FIRE_CRAWL_API_KEY")
CONFIGURATION
CRAWL_URL = "https://api.firecrawl.dev/v2/crawl"
JOB_STATUS_URL = "https://api.firecrawl.dev/v2/crawl/status/"
OUTPUT_DIR = "crawleddata"
Crawl job parameters
payload = {
"url": "https://www.fhs.unizg.hr/",
"sitemap": "include",
"crawlEntireDomain": True,
"excludePaths": [
"./en/.",
".*news.",
".news.*"
],
"scrapeOptions": {
"onlyMainContent": True,
"maxAge": 172800000,
"parsers": [
"pdf"
],
"formats": [
"markdown"
]
}
}
headers = {
"Authorization": f"Bearer {FIRE_CRAWL_API_KEY}",
"Content-Type": "application/json"
}
START THE CRAWL JOB
print("Starting crawl job...")
response = requests.post(CRAWL_URL, json=payload, headers=headers)


9 Replies
Hi @Suppa this works for me on the playground with the same configuration, is it just not working for the
python-sdk for you?
@Gaurav Chadha Playground maybe works as it has a limited scrape, but there are over 1000 pages on that domain, and when I run from my own credits it does not stop at all.
I worry that it veered off the main page and went to a different domain.
In activity logs it says crawl is still running and my credits even though it's running still stay at 83% with nothing changing.
@Suppa I checked your job ID, it processed 3 pdf documents and then the job timeout at
21:09:38 UTC. Don't worry you're credits are not consumed and the job is terminated in backend.
I'll share this with the team to handle status for timeouts and failures.Thank you could you please look into it again and confirm that it has several thousand markdown documents?
Pdfs are not of much concern, but I needed to make a markdown of the entire site excluding news in url for our college chatbot.
I paid 100 usd, and that was our entire budget.
I can only do it through SAAS like yours as the site is in jquery and other older technologies with fragmented urls so scrapers i write are of little consequence.
Do you think I could get those files in Activity Log page by the end of the week? I already have the pipeline and everything made I just need the markdown files to complete the project.
I started that project with around 10 % of 100 000 credits and now I'm on 83% of 100 000 on that single job.
EDIT: Now that I think about it, do you say it only processed pdf files?
I'm a bit worried if they posted large pdf files as that was not my intention. I only wanted to scrape the site content. I was instructed to scrape some pdfs regarding guidelines but did not take into concern there might be entire books there.
Thank you.
@Suppa The status should be set to failed or removed from there. Can you please refresh and check again? It should not have cost you credits if it were an unsuccessful request
For PDFs, it just parsed them as rest data, didn't go through. You can refresh and make a new request via the API.
Thank you for your involvement in the matter.
As you can see i cant download files from that job, as I tested with 3 jobs and spent 10000 credits and then ran the 4th one which does not even show as failed.
Could you please refund me the credits for that job that failed so I can try to run the job again?
This time I will go over the parameters with you before running just to be sure that I didn't miss something.
Even though I don't have the crawl status and files it still spent credits.



Can you please also share your email?
Sent. Thank you for your prompt involvement in this matter.
@Suppa we'll reimburse the wasted credits, I've shared it with the team, also could you please also send us an email at help@firecrawl.dev? You can add a link to this conversation.
Also, this is a known bug that will be fixed by this week.