how to disable duplicates check

import { Dataset, HttpCrawler, log, LogLevel } from 'crawlee';
log.setLevel(LogLevel.DEBUG);
const crawler = new HttpCrawler({
useSessionPool:false,
persistCookiesPerSession:false,
minConcurrency: 1,
maxConcurrency: 5,
maxRequestRetries: 1,
requestHandlerTimeoutSecs: 30,
maxRequestsPerCrawl: 10,
async requestHandler({ request, body }) {
log.debug(`Processing ${request.url}...`);
log.debug(`${body}`);
},
failedRequestHandler({ request }) {
log.debug(`Request ${request.url} failed twice.`);
},
});
await crawler.run([
'https://httpbin.org/ip','https://httpbin.org/ip',
]);
log.debug('Crawler finished.');
import { Dataset, HttpCrawler, log, LogLevel } from 'crawlee';
log.setLevel(LogLevel.DEBUG);
const crawler = new HttpCrawler({
useSessionPool:false,
persistCookiesPerSession:false,
minConcurrency: 1,
maxConcurrency: 5,
maxRequestRetries: 1,
requestHandlerTimeoutSecs: 30,
maxRequestsPerCrawl: 10,
async requestHandler({ request, body }) {
log.debug(`Processing ${request.url}...`);
log.debug(`${body}`);
},
failedRequestHandler({ request }) {
log.debug(`Request ${request.url} failed twice.`);
},
});
await crawler.run([
'https://httpbin.org/ip','https://httpbin.org/ip',
]);
log.debug('Crawler finished.');
This is my current code
5 Replies
old-apricot
old-apricot•3y ago
exotic-emerald
exotic-emeraldOP•3y ago
MEE6
MEE6•3y ago
@max just advanced to level 1! Thanks for your contributions! 🎉
other-emerald
other-emerald•3y ago
On each request, give it a uniqueKey that is unqiue. Or, if the payloads/headers are different for each request but the URL is the same, you can use the useExtendedUniqueKey option set to true. These options both go into RequestOptions where you configure the url, label, headers, etc.
exotic-emerald
exotic-emeraldOP•3y ago
thank you

Did you find this page helpful?