crawlee-js
apify-platform
crawlee-python
💻hire-freelancers
🚀actor-promotion
💫feature-request
💻devs-and-apify
🗣general-chat
🎁giveaways
programming-memes
🌐apify-announcements
🕷crawlee-announcements
👥community
Custom storage provider for RequestQueue?
Exclude query parameter URLs from crawl jobs
Best practice for rendering javascript, then doing a deep or structuredclone of the window object?
About define route
Extracting text from list elements
await page.locator("div.my_class > ul > li").textContent();
causes an error: strict mode violation: locator('div.my_class > ul > li') resolved to x elements
. The presence of multiple elements is expected since this is a list.
Playwright itself doesn't seem to have an issue with selectors that return multiple elements, and I did find the strictSelectors
parameter in the crawlee docs, but didn't manage to set it to false (if that is even the solution).
In scrapy item.add_css("list", "div.my_class > ul > li::text")
returns a list of the text for each list item, which is what I'm looking for.
Does anyone know how to solve this?...Playwright in Docker image doesn't work
npx crawlee create my-crawler
, built the docker image and deployed it in a server.
When the image runs I get the log in the image bellow.
I didn't change any line of code.
Tried with crawlee in version 3.0.0 and 3.1.2.
...
Disable statistics
storage
dir by default?requestQueue doesn't delete requests after visiting and saving data
Run Puppeteer docker locally (actor-node-puppeteer-chrome)
How do we assign a session to a request without having to use proxy?
How to handle sequential steps (like a login flow or a wizard) in headless browser?
page.goto
as is done in the forms example[1]? Should we set up handlers for each page type (loginHandler
and contentPageHandler
) and just add the pages to the RequestQueue
? Or do we do something else entirely?...how to set payload in cheerio crawler preNavigationHooks
``javascript
preNavigationHooks:[async (crawlingContext, gotOptions) => {
const { request } = crawlingContext;
request.payload =
.......`;...Collecting url from the nested Xml
Get data old link crawler
How to handle a huge Json file?
Canada411 site failing after 4 hours
How do I delay requests with HttpCrawler?
setTimeout
and Promise
like this and awaiting on it
```ts
export function delay(seconds: number): Promise<void> {...