Firecrawl•13mo ago

`next` pagination using js-sdk

Using version 1.2.2, does the js-sdk have a built-in method to follow the next url? I call asyncCrawlUrl() and then checkCrawlStatus() every 5 seconds until the job status is complete. However, I'm not sure how to get the next page of results from the next property. More detailed examples of how to use asyncCrawlUrl, checkCrawlStatus, and the next url would be appreciated!

import dotenv from 'dotenv'
import FirecrawlApp, {type FirecrawlDocument} from '@mendable/firecrawl-js'

dotenv.config()
const apiKey = process.env.FIRECRAWL_API_KEY
const firecrawl = new FirecrawlApp({apiKey})

const crawlUrl = async (url: string) => {
  // start the crawl
  const crawlResponse = await firecrawl.asyncCrawlUrl(url, crawlerConfig)

  if (!crawlResponse.success) {
    throw new Error(`Error starting crawl: ${crawlResponse.error}`)
  }

  // loop until the crawl is complete
  let completedFlag = false
  let statusCheck = null
  const pages: FirecrawlDocument[] = []

  while (!completedFlag) {
    // Get the status
    console.log('checking status')
    statusCheck = await firecrawl.checkCrawlStatus(crawlResponse.id)

    if (!statusCheck.success) {
      throw new Error(`Error checking crawl status: ${statusCheck.error}`)
    }

    if (statusCheck.status === 'failed') {
      throw new Error('Error: crawl failed')
    }

    // Check if crawl is completed
    if (statusCheck.status === 'completed') {

      if (!Array.isArray(statusCheck.data)) {
        throw new Error('Error: crawl resulted in no data')
      }

      statusCheck.data.forEach((page) => pages.push(page))
      completedFlag = true

    } else {
      // polling interval
      await new Promise((resolve) => setTimeout(resolve, 5000))
    }
  }
  return pages
}

import dotenv from 'dotenv'
import FirecrawlApp, {type FirecrawlDocument} from '@mendable/firecrawl-js'

dotenv.config()
const apiKey = process.env.FIRECRAWL_API_KEY
const firecrawl = new FirecrawlApp({apiKey})

const crawlUrl = async (url: string) => {
  // start the crawl
  const crawlResponse = await firecrawl.asyncCrawlUrl(url, crawlerConfig)

  if (!crawlResponse.success) {
    throw new Error(`Error starting crawl: ${crawlResponse.error}`)
  }

  // loop until the crawl is complete
  let completedFlag = false
  let statusCheck = null
  const pages: FirecrawlDocument[] = []

  while (!completedFlag) {
    // Get the status
    console.log('checking status')
    statusCheck = await firecrawl.checkCrawlStatus(crawlResponse.id)

    if (!statusCheck.success) {
      throw new Error(`Error checking crawl status: ${statusCheck.error}`)
    }

    if (statusCheck.status === 'failed') {
      throw new Error('Error: crawl failed')
    }

    // Check if crawl is completed
    if (statusCheck.status === 'completed') {

      if (!Array.isArray(statusCheck.data)) {
        throw new Error('Error: crawl resulted in no data')
      }

      statusCheck.data.forEach((page) => pages.push(page))
      completedFlag = true

    } else {
      // polling interval
      await new Promise((resolve) => setTimeout(resolve, 5000))
    }
  }
  return pages
}

7 Replies

mogery•13mo ago

Hi there @Kaleb, this is a bug, working on this today. Will let you know when it's done. Hi @Kaleb, we added a getAllData parameter to checkCrawlStatus. Please update to 1.3.0 and set the parameter to true. Thank you for your patience!

KalebOP•13mo ago

hey @mogery thank you, we will give it a shot. I appreciate the update! Hey @mogery, we're using the new v1.3 SDK and getAllData flag. If the response exceeds 10mb do will we still need to handle pagination by fetching the next url? I want to make sure we handle that case if possible.

mogery•13mo ago

Nope getAllData will fire off requests until next doesn't exist anymore, i.e. all data has been retrieved

KalebOP•13mo ago

ah, I see! thanks so much, that really simplifies our script.

mogery•13mo ago

Yup! Should've been there from the start, sorry about that 😅

KalebOP•13mo ago

Well I can't complain because this is still 10x easier than puppeteer 😁

Caleb•13mo ago

Love to hear, nice name btw 🙂

Gaming

Programming

`next` pagination using js-sdk

Did you find this page helpful?