Injecting Axe a11y tester

I would like to use Crawlee to crawl a bunch of internal sites and run the Axe accessibility scanner on each page. I figured out how to inject the script they reference in their getting started docs (https://github.com/dequelabs/axe-core#getting-started) using the page.addInitScript.
import { PlaywrightCrawler, Dataset } from 'crawlee';
import axe from 'axe-core';

const crawler = new PlaywrightCrawler({
async requestHandler({ request, page, enqueueLinks, log }) {
await page.addInitScript('./node_modules/axe-core/axe-min.js');

const title = await page.title();
log.info(`Title of ${request.loadedUrl} is '${title}'`);

const results = await axe.run();
console.log(results.violations)

await Dataset.pushData({ title, url: request.loadedUrl });

await enqueueLinks();
},
});

await crawler.run(['https://dequeuniversity.com/demo/mars/']);
import { PlaywrightCrawler, Dataset } from 'crawlee';
import axe from 'axe-core';

const crawler = new PlaywrightCrawler({
async requestHandler({ request, page, enqueueLinks, log }) {
await page.addInitScript('./node_modules/axe-core/axe-min.js');

const title = await page.title();
log.info(`Title of ${request.loadedUrl} is '${title}'`);

const results = await axe.run();
console.log(results.violations)

await Dataset.pushData({ title, url: request.loadedUrl });

await enqueueLinks();
},
});

await crawler.run(['https://dequeuniversity.com/demo/mars/']);
But every page after that throws this error. INFO PlaywrightCrawler: Starting the crawl INFO PlaywrightCrawler: Title of https://dequeuniversity.com/demo/mars/ is 'Mars Commuter: Travel to Mars for Work or Pleasure!' WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. Required "window" or "document" globals not defined and cannot be deduced from the context. Either set the globals before running or pass in a valid Element. {"id":"9syPc5JbUuAjPx1","url":"https://dequeuniversity.com/demo/mars/","retryCount":1} If I comment out the call to axe.run() the error goes away and things 'work'. Any idea what could be causing this?
Mars Commuter: Travel to Mars for Work or Pleasure!
MarsCommuter - Your gateway to the Red Planet
GitHub
GitHub - dequelabs/axe-core: Accessibility engine for automated Web...
Accessibility engine for automated Web UI testing. Contribute to dequelabs/axe-core development by creating an account on GitHub.
8 Replies
fair-rose
fair-roseOP•3y ago
My code was wrong. This should be a better example
import { PlaywrightCrawler, Dataset } from 'crawlee';
import AxeBuilder from '@axe-core/playwright'

const crawler = new PlaywrightCrawler({
async requestHandler({ request, page, enqueueLinks, log }) {
await page.addInitScript('./node_modules/axe-core/axe-min.js');

const title = await page.title();
log.info(`Title of ${request.loadedUrl} is '${title}'`);

const results = await new AxeBuilder({page}).analyze();
console.log(results.violations)

await Dataset.pushData({ title, url: request.loadedUrl });

await enqueueLinks();
},
});

await crawler.run(['https://dequeuniversity.com/demo/mars/']);
import { PlaywrightCrawler, Dataset } from 'crawlee';
import AxeBuilder from '@axe-core/playwright'

const crawler = new PlaywrightCrawler({
async requestHandler({ request, page, enqueueLinks, log }) {
await page.addInitScript('./node_modules/axe-core/axe-min.js');

const title = await page.title();
log.info(`Title of ${request.loadedUrl} is '${title}'`);

const results = await new AxeBuilder({page}).analyze();
console.log(results.violations)

await Dataset.pushData({ title, url: request.loadedUrl });

await enqueueLinks();
},
});

await crawler.run(['https://dequeuniversity.com/demo/mars/']);
Ahh, and I needed to add a try/catch to log the error that was being hidden from stdout. Nevermind, nothing to see here :S
absent-sapphire
absent-sapphire•3y ago
I would not recommend putting the page.addInitScript in the requestHandler because requestHandler runs once the load event has fired. Try adding that into preNavigationHooks instead like this:
import { PlaywrightCrawler, Dataset } from 'crawlee';
import axe from 'axe-core';

const crawler = new PlaywrightCrawler({
preNavigationHooks: [
async ({ page }) => {
await page.addInitScript('./node_modules/axe-core/axe-min.js');
},
],
async requestHandler({ request, page, enqueueLinks, log }) {
const title = await page.title();
log.info(`Title of ${request.loadedUrl} is '${title}'`);

const results = await axe.run();
console.log(results.violations);

await Dataset.pushData({ title, url: request.loadedUrl });

await enqueueLinks();
},
});

await crawler.run(['https://dequeuniversity.com/demo/mars/']);
import { PlaywrightCrawler, Dataset } from 'crawlee';
import axe from 'axe-core';

const crawler = new PlaywrightCrawler({
preNavigationHooks: [
async ({ page }) => {
await page.addInitScript('./node_modules/axe-core/axe-min.js');
},
],
async requestHandler({ request, page, enqueueLinks, log }) {
const title = await page.title();
log.info(`Title of ${request.loadedUrl} is '${title}'`);

const results = await axe.run();
console.log(results.violations);

await Dataset.pushData({ title, url: request.loadedUrl });

await enqueueLinks();
},
});

await crawler.run(['https://dequeuniversity.com/demo/mars/']);
The error you're receiving is because the axe script was never run on the page since it had already been initialized. If the above doesn't work, you can try this hacky solution I used a while back to run an init script. This registers the init script at the browser level to ensure it is ALWAYS run before any other scripts on the page:
import { PlaywrightCrawler, Dataset } from 'crawlee';
import axe from 'axe-core';

const crawler = new PlaywrightCrawler({
browserPoolOptions: {
postLaunchHooks: [
async (_, controller) => {
const promises = [];

for (const browser of controller.browser.contexts()) {
promises.push(browser.addInitScript('./node_modules/axe-core/axe-min.js'));
}

await Promise.all(promises);
},
],
},
async requestHandler({ request, page, enqueueLinks, log }) {
const title = await page.title();
log.info(`Title of ${request.loadedUrl} is '${title}'`);

const results = await axe.run();
console.log(results.violations);

await Dataset.pushData({ title, url: request.loadedUrl });

await enqueueLinks();
},
});

await crawler.run(['https://dequeuniversity.com/demo/mars/']);
import { PlaywrightCrawler, Dataset } from 'crawlee';
import axe from 'axe-core';

const crawler = new PlaywrightCrawler({
browserPoolOptions: {
postLaunchHooks: [
async (_, controller) => {
const promises = [];

for (const browser of controller.browser.contexts()) {
promises.push(browser.addInitScript('./node_modules/axe-core/axe-min.js'));
}

await Promise.all(promises);
},
],
},
async requestHandler({ request, page, enqueueLinks, log }) {
const title = await page.title();
log.info(`Title of ${request.loadedUrl} is '${title}'`);

const results = await axe.run();
console.log(results.violations);

await Dataset.pushData({ title, url: request.loadedUrl });

await enqueueLinks();
},
});

await crawler.run(['https://dequeuniversity.com/demo/mars/']);
Second one should work 100% Hope this helps!
fair-rose
fair-roseOP•3y ago
I tried the second one and after wrapping the call to axe.run in a try/catch I get this error. Error: Required "window" or "document" globals not defined and cannot be deduced from the context. Either set the globals before running or pass in a valid Element. at setupGlobals (/Users/LGoolsby/dev/playground/axe/sample/node_modules/axe-core/axe.js:21231:15) at Object.run4 [as run] (/Users/LGoolsby/dev/playground/axe/sample/node_modules/axe-core/axe.js:21486:7) at PlaywrightCrawler.requestHandler [as userProvidedRequestHandler] (file:///Users/LGoolsby/dev/playground/axe/sample/src/main.ts:43:39) at async wrap (/Users/LGoolsby/dev/playground/axe/sample/node_modules/@apify/src/index.ts:77:27)
MEE6
MEE6•3y ago
@Lane Goolsby just advanced to level 1! Thanks for your contributions! 🎉
fair-rose
fair-roseOP•3y ago
Same error with the first one as well The line in the axe.js that's blowing chunks is doing this:
function setupGlobals(context5) {
var hasWindow = window && 'Node' in window && 'NodeList' in window;
var hasDoc = !!document;
if (hasWindow && hasDoc) {
return;
}
if (!context5 || !context5.ownerDocument) {
throw new Error('Required "window" or "document" globals not defined and cannot be deduced from the context. Either set the globals before running or pass in a valid Element.');
}
if (!hasDoc) {
cache_default.set('globalDocumentSet', true);
document = context5.ownerDocument;
}
if (!hasWindow) {
cache_default.set('globalWindowSet', true);
window = document.defaultView;
}
}
function setupGlobals(context5) {
var hasWindow = window && 'Node' in window && 'NodeList' in window;
var hasDoc = !!document;
if (hasWindow && hasDoc) {
return;
}
if (!context5 || !context5.ownerDocument) {
throw new Error('Required "window" or "document" globals not defined and cannot be deduced from the context. Either set the globals before running or pass in a valid Element.');
}
if (!hasDoc) {
cache_default.set('globalDocumentSet', true);
document = context5.ownerDocument;
}
if (!hasWindow) {
cache_default.set('globalWindowSet', true);
window = document.defaultView;
}
}
fair-rose
fair-roseOP•3y ago
GitHub
feat: add playwright.utils.injectJQuery by barjin · Pull Request ...
closes #1336 JQuery requires the global document object, which is not available when Page.addInitScript-s are added. This solution, therefore, introduces the new injectFile option waitForDOM, which...
fair-rose
fair-roseOP•3y ago
I figured it out! Here's a working solution.
import { PlaywrightCrawler, Dataset } from 'crawlee';
import { createRequire } from "module";
const require = createRequire(import.meta.url);
const AxeBuilder = require('@axe-core/playwright').default;

const crawler = new PlaywrightCrawler({
async requestHandler({ request, page, enqueueLinks, log }) {
const title = await page.title();
log.info(`Title of ${request.loadedUrl} is '${title}'`);

try {
const results = await new AxeBuilder({ page }).analyze();
await Dataset.pushData({ title, url: request.loadedUrl, violations: results.violations });
}
catch (e) {
console.log(e)
}

await enqueueLinks();
},
});
import { PlaywrightCrawler, Dataset } from 'crawlee';
import { createRequire } from "module";
const require = createRequire(import.meta.url);
const AxeBuilder = require('@axe-core/playwright').default;

const crawler = new PlaywrightCrawler({
async requestHandler({ request, page, enqueueLinks, log }) {
const title = await page.title();
log.info(`Title of ${request.loadedUrl} is '${title}'`);

try {
const results = await new AxeBuilder({ page }).analyze();
await Dataset.pushData({ title, url: request.loadedUrl, violations: results.violations });
}
catch (e) {
console.log(e)
}

await enqueueLinks();
},
});
absent-sapphire
absent-sapphire•3y ago
Super nice!! Running the script after the page has loaded seems to also work

Did you find this page helpful?