Hi everyone! πŸ‘‹

Hi everyone! πŸ‘‹ I’m working with Crawlee and was wondering: Is there a way to enqueue links so that instead of sending a GET request, it sends a POST request with a body (and any other necessary information)? My goal is to scrape data from a GraphQL API. I’m not sure if Crawlee is the best tool for this task. Does anyone have suggestions on how to automate this process smoothly? Specifically: - Handling multiple requests efficiently - Managing proxies - Any best practices or recommendations for working with GraphQL APIs? Thanks in advance for your help! 🌟
4 Replies
harsh-harlequin
harsh-harlequinβ€’4mo ago
Of course, you can enqueue post requests with payload. Just set useExtendedUniqueKey to true otherwise requestQueue will deduplicate the requests with the same url.
deep-jade
deep-jadeβ€’4mo ago
where do i set this parameter? in the enqueueLinks method?
import { HttpCrawler, Dataset } from "@crawlee/http";

const crawler = new HttpCrawler({
requestList,
async requestHandler({ request, response, body, contentType, enqueueLinks }) {
// Save the data to dataset.
await Dataset.pushData({
url: request.url,
html: body,
});

enqueueLinks({
globs: ["http://www.example.com/**"],
});
},
});

await crawler.run(["http://www.example.com/page-1", "http://www.example.com/page-2"]);
import { HttpCrawler, Dataset } from "@crawlee/http";

const crawler = new HttpCrawler({
requestList,
async requestHandler({ request, response, body, contentType, enqueueLinks }) {
// Save the data to dataset.
await Dataset.pushData({
url: request.url,
html: body,
});

enqueueLinks({
globs: ["http://www.example.com/**"],
});
},
});

await crawler.run(["http://www.example.com/page-1", "http://www.example.com/page-2"]);
harsh-harlequin
harsh-harlequinβ€’4mo ago
I think it is possible in enqueueLinks also but you will have more control with enqueing urls like this
async requestHandler({ request, response, body, contentType, crawler }) {
await crawler.addRequests([{url: 'url to enqueue', method:'POST', payload:'some payload', useExtendedUniqueKey:true, label:'some label'}];
}
async requestHandler({ request, response, body, contentType, crawler }) {
await crawler.addRequests([{url: 'url to enqueue', method:'POST', payload:'some payload', useExtendedUniqueKey:true, label:'some label'}];
}
deep-jade
deep-jadeβ€’4mo ago
thanks ill try that!

Did you find this page helpful?