NeoNomade
NeoNomade
CACrawlee & Apify
Created by MrSquaare on 5/10/2025 in #crawlee-js
Timeout in Docker (with Camoufox image)
I'm not using Apify images. I use node:slim images.
8 replies
CACrawlee & Apify
Created by MrSquaare on 5/10/2025 in #crawlee-js
Timeout in Docker (with Camoufox image)
shouldn't be. I've tested this on an M3 macbook without issues.
8 replies
CACrawlee & Apify
Created by NeoNomade on 4/23/2025 in #crawlee-js
preNavigationHooks not followed
But thanks for taking timing to test!
13 replies
CACrawlee & Apify
Created by NeoNomade on 4/23/2025 in #crawlee-js
preNavigationHooks not followed
Strange. For me only worked with my workaround
13 replies
CACrawlee & Apify
Created by genetic-orange on 4/19/2025 in #crawlee-js
Proxy settings appear to be cached
from what I remember bun is still throwing errors when it's combined with Crawlee. Some internal packages complaining. Is theere any particular reason you want to use bun ?
5 replies
CACrawlee & Apify
Created by NeoNomade on 4/23/2025 in #crawlee-js
preNavigationHooks not followed
hi @Olexandra found my workaround. The preNavigationHooks are applied only if the response is 200 ( which imo is a bit of a bug, should be applied to all allowed status codes ) . But in my routes file when I handle the 401 response, I set the page route there, and it is picked up. Thanks for your support !
13 replies
CACrawlee & Apify
Created by NeoNomade on 4/23/2025 in #crawlee-js
preNavigationHooks not followed
I think the issue lays somewhere in my routes file. for my requests I initially get a 401 status code which I allow. and in the routes file I perform a page.goto to the same address (basically a hard refresh). and the page works but is not respecting the abortAssets hook
13 replies
CACrawlee & Apify
Created by NeoNomade on 4/23/2025 in #crawlee-js
preNavigationHooks not followed
I've recreated the project on a vm but still same behaviour.
13 replies
CACrawlee & Apify
Created by NeoNomade on 4/23/2025 in #crawlee-js
preNavigationHooks not followed
hi @Olexandra Removed the console.logs from there to keep code short. I'm running my script with those prenavigation hooks, and in headful mode I saw all the images and styles and analytics requests loading. So I wanted to log what requests are being blocked, then surprise logs inside the routes are not working at all. Now I've switched to an even easier approach:
const abortAssets: PlaywrightHook = async ({ page }) => {
const RESOURCE_EXCLUSIONS = ['image', 'media', 'font', 'stylesheet'];
console.log('Welcome to AbortAssets')
await page.route('**/*', (route) => {
console.log(route.request().url)
if (RESOURCE_EXCLUSIONS.includes(route.request().resourceType())) {
return route.abort();
}
return route.continue();
});
};
const abortAssets: PlaywrightHook = async ({ page }) => {
const RESOURCE_EXCLUSIONS = ['image', 'media', 'font', 'stylesheet'];
console.log('Welcome to AbortAssets')
await page.route('**/*', (route) => {
console.log(route.request().url)
if (RESOURCE_EXCLUSIONS.includes(route.request().resourceType())) {
return route.abort();
}
return route.continue();
});
};
the Welcome to AbortAssets log I see. the next one I don't. I can't figure out where is this bug coming from. I've replaced crawlee's internal logger with Pino. And I was using opentelemetry to get some host analytics. Disabled opentelemetry SDK, and still the routes are not working...
13 replies
CACrawlee & Apify
Created by NeoNomade on 4/23/2025 in #crawlee-js
preNavigationHooks not followed
same behaviour with plain firefox... I don't get it...
13 replies
CACrawlee & Apify
Created by xenial-black on 4/16/2025 in #crawlee-js
Customising logging
@je no need for the apify package:
import { log, LogLevel } from 'crawlee'
log.setOptions({
logger: customLogger,
level: LogLevel.DEBUG,
})
import { log, LogLevel } from 'crawlee'
log.setOptions({
logger: customLogger,
level: LogLevel.DEBUG,
})
5 replies
CACrawlee & Apify
Created by NeoNomade on 3/13/2025 in #crawlee-js
Camoufox failing
@nikus found the issue, probably is something in the camoufox-js port. Didn't had time to look into it. I had my crawler script in src/crawler/crawler.ts moved the crawler to src/crawler.ts and worked properly. that was absolutely all.
5 replies
CACrawlee & Apify
Created by vicious-gold on 1/31/2024 in #💻hire-freelancers
Is anyone scraping indeed with Apify and
that puppeteer-real-browser is just a collection of settings for chrome. nothing magic happens.
14 replies
CACrawlee & Apify
Created by other-emerald on 1/31/2024 in #💻hire-freelancers
Is anyone scraping indeed with Apify and
try to go headful and with xvfb try to use particular waits for scripts to be loaded . The whole trick about captchas is to learn what is triggering them and try to avoid as much as possible. Just throwing residential proxies is not solving the issue.
14 replies
CACrawlee & Apify
Created by rival-black on 1/31/2024 in #💻hire-freelancers
Is anyone scraping indeed with Apify and
it's not a matter of Apify. Bot protection is a lot better. Especially fingerprinting. I'm also using other tools and facing the same issues.
14 replies
CACrawlee & Apify
Created by NeoNomade on 5/10/2023 in #crawlee-js
change proxies while running
Yes it gets messed somehow when concurrency is higher than 1. I’m trying to create a function to do this
8 replies
CACrawlee & Apify
Created by NeoNomade on 5/10/2023 in #crawlee-js
change proxies while running
bump !
8 replies
CACrawlee & Apify
Created by NeoNomade on 10/16/2023 in #crawlee-js
error handling
Thanks
4 replies
CACrawlee & Apify
Created by exotic-emerald on 10/17/2023 in #crawlee-js
Add label to pages via `crawler.addRequests()`?
Create a Request object and put it into your array . In the request object you can use label
3 replies