How to make crawlee try to refetch?

If the return value of the http api I crawl does not meet expectations, but http status is 200 How can I mark this request as a failure and let crawlee get it again with next proxy?
7 Replies
multiple-amethyst
multiple-amethyst•3y ago
From what I understood, you want to make a request based on the data you receive from the initial request? If yes, then you can use the context object in the requestHandler to make a new request or enqueue a new request like this.
import { HttpCrawler } from '@crawlee/http';

const crawler = new HttpCrawler({
async requestHandler({ crawler, sendRequest, request }) {
// Send request right away and get a response
const { body } = await sendRequest({
url: request.url
})

// RequestOptions with custom uniqueKey to prevent Crawlee from thinking its a duplicate request
const newRequest: RequestOptions = {
url: request.url,
uniqueKey: Date.now().toString()
}

// Enqueue request
await crawler.addRequests([newRequest])
},
});

await crawler.run([
'http://www.example.com/page-1',
'http://www.example.com/page-2',
]);
import { HttpCrawler } from '@crawlee/http';

const crawler = new HttpCrawler({
async requestHandler({ crawler, sendRequest, request }) {
// Send request right away and get a response
const { body } = await sendRequest({
url: request.url
})

// RequestOptions with custom uniqueKey to prevent Crawlee from thinking its a duplicate request
const newRequest: RequestOptions = {
url: request.url,
uniqueKey: Date.now().toString()
}

// Enqueue request
await crawler.addRequests([newRequest])
},
});

await crawler.run([
'http://www.example.com/page-1',
'http://www.example.com/page-2',
]);
typical-coral
typical-coral•3y ago
wouldnt it make crawlee think its a duplicate?
multiple-amethyst
multiple-amethyst•3y ago
Good point, from my understanding it shouldn't be a problem if you're using sendRequest for the new request, but if you're using crawler.addRequests you will have to manually generate a uniqueKey for each RequestOptions to prevent it from being marked as duplicate. I have updated my snippet to show how to do this.
typical-coral
typical-coral•3y ago
thanks
multiple-amethyst
multiple-amethyst•3y ago
throw new Error("REASONOFRETRY")
optimistic-gold
optimistic-gold•3y ago
You can also do session.retire() before the throw to ensure it is discarded. Normally, it only increases error score for it
metropolitan-bronze
metropolitan-bronzeOP•3y ago
I got it, thanks all 💓

Did you find this page helpful?