preNavigationHooks not followed

Camoufox JS integration used. If I log something before the await page.route it works, inside page.route it doesn't.
preNavigationHooks: [
async (gotoOptions) => {
gotoOptions.waitUntil = "load";
},
async ({page}) => {
await page.route("**/*", async (route) => {
const url = route.request().url();
const resourceType = route.request().resourceType();
const trackingScriptRegex =
/googletagmanager|facebook|sentry|ads|tracking|metrics|analytics|optimizely|segment/i;
const extraBlocklistRegex =
/tiktok|facebook|prismic-images|bing|ads|tracking|metrics|analytics|contentsquare|lytics|adtrafficquality|adsrvr|tmol|snapchat|ticketm\.net/i;

const isBlockedResourceType = ["stylesheet", "font", "media"].includes(resourceType);
const isBlockedScript = resourceType === "script" && trackingScriptRegex.test(url);
const isBlockedByExtraPatterns = extraBlocklistRegex.test(url);

const shouldBlock =
!url.includes("recaptcha") &&
(isBlockedResourceType || isBlockedScript || isBlockedByExtraPatterns);

if (shouldBlock) {
await route.abort();
return;
}

await route.continue();
});
},

],
preNavigationHooks: [
async (gotoOptions) => {
gotoOptions.waitUntil = "load";
},
async ({page}) => {
await page.route("**/*", async (route) => {
const url = route.request().url();
const resourceType = route.request().resourceType();
const trackingScriptRegex =
/googletagmanager|facebook|sentry|ads|tracking|metrics|analytics|optimizely|segment/i;
const extraBlocklistRegex =
/tiktok|facebook|prismic-images|bing|ads|tracking|metrics|analytics|contentsquare|lytics|adtrafficquality|adsrvr|tmol|snapchat|ticketm\.net/i;

const isBlockedResourceType = ["stylesheet", "font", "media"].includes(resourceType);
const isBlockedScript = resourceType === "script" && trackingScriptRegex.test(url);
const isBlockedByExtraPatterns = extraBlocklistRegex.test(url);

const shouldBlock =
!url.includes("recaptcha") &&
(isBlockedResourceType || isBlockedScript || isBlockedByExtraPatterns);

if (shouldBlock) {
await route.abort();
return;
}

await route.continue();
});
},

],
9 Replies
Hall
Hall•3w ago
Someone will reply to you shortly. In the meantime, this might help:
unwilling-turquoise
unwilling-turquoiseOP•3w ago
same behaviour with plain firefox... I don't get it...
vicious-gold
vicious-gold•3w ago
Hi! I don't see either log.info() or console.log() in your code, so I don't know what particularly you are trying to log. I tested logging locally with fresh Crawlee + Playwright + Camoufox JS template, and it worked for me. Here is a snippet: preNavigationHooks: [ async ({ page, log }) => { try { log.info('Log before route.); await page.route('https://apify.com/', async (route) => { log.info('Log from inside.'); await route.continue(); }); } catch (error) { log.error('Error log from outside.'); } }, ], Can you, please, be more specific about what you're trying to print?
unwilling-turquoise
unwilling-turquoiseOP•3w ago
hi @Olexandra Removed the console.logs from there to keep code short. I'm running my script with those prenavigation hooks, and in headful mode I saw all the images and styles and analytics requests loading. So I wanted to log what requests are being blocked, then surprise logs inside the routes are not working at all. Now I've switched to an even easier approach:
const abortAssets: PlaywrightHook = async ({ page }) => {
const RESOURCE_EXCLUSIONS = ['image', 'media', 'font', 'stylesheet'];
console.log('Welcome to AbortAssets')
await page.route('**/*', (route) => {
console.log(route.request().url)
if (RESOURCE_EXCLUSIONS.includes(route.request().resourceType())) {
return route.abort();
}
return route.continue();
});
};
const abortAssets: PlaywrightHook = async ({ page }) => {
const RESOURCE_EXCLUSIONS = ['image', 'media', 'font', 'stylesheet'];
console.log('Welcome to AbortAssets')
await page.route('**/*', (route) => {
console.log(route.request().url)
if (RESOURCE_EXCLUSIONS.includes(route.request().resourceType())) {
return route.abort();
}
return route.continue();
});
};
the Welcome to AbortAssets log I see. the next one I don't. I can't figure out where is this bug coming from. I've replaced crawlee's internal logger with Pino. And I was using opentelemetry to get some host analytics. Disabled opentelemetry SDK, and still the routes are not working... I've recreated the project on a vm but still same behaviour. I think the issue lays somewhere in my routes file. for my requests I initially get a 401 status code which I allow. and in the routes file I perform a page.goto to the same address (basically a hard refresh). and the page works but is not respecting the abortAssets hook
vicious-gold
vicious-gold•3w ago
@NeoNomade thanks for clarification. Manual page.goto() is not handled identically to regular request navigation. If you do need to get to this page again with preNavigationHooks being triggered, consider adding request to requestQueue with uniqueKey and additional data, that will prevent you from getting 401 again. Hope it helps.
unwilling-turquoise
unwilling-turquoiseOP•3w ago
hi @Olexandra found my workaround. The preNavigationHooks are applied only if the response is 200 ( which imo is a bit of a bug, should be applied to all allowed status codes ) . But in my routes file when I handle the 401 response, I set the page route there, and it is picked up. Thanks for your support !
stormy-gold
stormy-gold•2w ago
unwilling-turquoise
unwilling-turquoiseOP•2w ago
Strange. For me only worked with my workaround But thanks for taking timing to test!
stormy-gold
stormy-gold•2w ago
Np, you resolved it yourself 🙂 Gl with your projects...

Did you find this page helpful?