Crawlee & Apify•2y ago

Initiate a crawler's actor with a POST fetch & avoid browser

I have a use case, where an Actor should start out with a POST request and not the usual GET request. Following up, i'll simply to a series of HTTP request & response parsing myself. There is no need for a browser, just plain fetch. Is the BasicCrawler or the HTTPCrawler the right option? I'm still looking into Crawlee & apify to manage the proxies and sessions.

2 Replies

rival-black•2y ago

Hey @p6l.richard, I think the best option would be to use the HttpCrawler here. The request object allows you to specify the method field and you can send the body in the field payload. Generally speaking, unless you actually need the browser, it is good to avoid it and use the HttpCrawler/CheerioCrawler for performance reasons (CheerioCrawler is the same thing as HttpCrawler, but also parses by default the page using cheerio library). With BasicCrawler you need to do the fetching yourself.

fascinating-indigoOP•2y ago

Got you, thank you for clarifying, Vojtech. 🙏 I’m going with the following mental model then: - BasicCrawler skips the navigation and I fetch/parse everything myself - HttpCrawler handles navigation (fetch) with a custom request class instance (assuming some fetch wrapper) but no response parsing - cheerio handles navigation (fetch) & response parsing. If this is somewhat correct, my pick would actually be the basiccrawler because I wouldn’t have to learn how the http crawlst works and could use fetch directly. But will test out! Thank you.

Gaming

Programming

Initiate a crawler's actor with a POST fetch & avoid browser

Did you find this page helpful?