How to access Actor input inside route handlers?

Hello, I am having trouble figuring out how to do the following, and cannot find anything in the docs: Suppose I want to use variables provided in my input_schema, ex:
const {
startUrls = [
'https://enzosbbq.com/'
],
proxyConfig = {
useApifyProxy: true
},
maxRequestsPerCrawl = 100,
navigationTimeoutSecs = 30,
} = await Actor.getInput<Input>() ?? {} as Input;
const {
startUrls = [
'https://enzosbbq.com/'
],
proxyConfig = {
useApifyProxy: true
},
maxRequestsPerCrawl = 100,
navigationTimeoutSecs = 30,
} = await Actor.getInput<Input>() ?? {} as Input;
Inside of my route handlers, ex
export const router = createPlaywrightRouter()

router.addHandler("test-handler", async (ctx) => {
// TODO how do I access input variables here?
})
export const router = createPlaywrightRouter()

router.addHandler("test-handler", async (ctx) => {
// TODO how do I access input variables here?
})
How can I access input variables inside of my route handlers?
13 Replies
correct-apricot
correct-apricot15mo ago
You can pass input via userData (https://crawlee.dev/api/core/interface/RequestOptions#userData)
const input = await Actor.getInput<HasPlate>() ?? {};

const startUrls = [];

startUrls.push({
url: 'wwww.example.com',
userData: {
input,
},
});
const input = await Actor.getInput<HasPlate>() ?? {};

const startUrls = [];

startUrls.push({
url: 'wwww.example.com',
userData: {
input,
},
});
And then get it in Your handler like this: const { input } = ctx.request.userData;
correct-apricot
correct-apricotOP15mo ago
Will crawler.run accept the dict format for startUrls vs raw strings? const crawler = new PlaywrightCrawler({ proxyConfiguration, maxRequestsPerCrawl, navigationTimeoutSecs, requestHandler: router, }); await crawler.run(startUrls);
correct-apricot
correct-apricot15mo ago
Here are docs with all types: https://crawlee.dev/api/playwright-crawler/class/PlaywrightCrawler#run requests: (string | Request<Dictionary> | RequestOptions<Dictionary>)[]
correct-apricot
correct-apricotOP15mo ago
Thanks for the info. I am still having trouble. When providing a valid RequestOptions object into the start urls array and passing it to crawler.run, request.userData is still an empty dict. Main.ts:
...
import { router } from './routes.js';
const crawler = new PlaywrightCrawler({
proxyConfiguration,
maxRequestsPerCrawl,
navigationTimeoutSecs,
requestHandler: router,
});

let startURLs = [];

startURLs.push({
url: 'wwww.example.com',
userData: {
"test": "test"
},
});

await crawler.run(startUrls);

// Exit successfully
await Actor.exit();
...
import { router } from './routes.js';
const crawler = new PlaywrightCrawler({
proxyConfiguration,
maxRequestsPerCrawl,
navigationTimeoutSecs,
requestHandler: router,
});

let startURLs = [];

startURLs.push({
url: 'wwww.example.com',
userData: {
"test": "test"
},
});

await crawler.run(startUrls);

// Exit successfully
await Actor.exit();
routes.ts
import { Dataset, createPlaywrightRouter } from "crawlee";
export const router = createPlaywrightRouter();

router.addDefaultHandler(async ({ enqueueLinks, page, log, request }) => {

// NOTE userData is empty dict here, even though it is provided in the startUrls object

console.log('request.userData', request.userData)
});
import { Dataset, createPlaywrightRouter } from "crawlee";
export const router = createPlaywrightRouter();

router.addDefaultHandler(async ({ enqueueLinks, page, log, request }) => {

// NOTE userData is empty dict here, even though it is provided in the startUrls object

console.log('request.userData', request.userData)
});
Is there another way to achieve this with useState? I cannot find any examples which show how to define state in main.ts then use the defined state in route.ts Edit: The issue was related to me importing a type defined in my main.ts file in route.ts. I am able to get this working now. Thanks! I am still confused as to how I can define state using useState in main.ts, then access this state in my route.ts this approach would actually be preferred for me since I want to be able to set global configuration that I can reference in state in each of my route handlers. I attempted to define the state in my main.ts file `export const state = useState("socialsConfig:", socialsConfig)' But how can I use this state in my route handlers in route.ts? I tried importing state, but this casued a cyclic dependency.
correct-apricot
correct-apricot15mo ago
You don't need to export it from main. Here is working example:
//main.ts

import { useState } from 'crawlee';

const DEFAULT_STATE: { playwrightDetails: EventData[]; listingDone: boolean } = {
playwrightDetails: [],
listingDone: false,
};

const state = await useState(undefined, DEFAULT_STATE); // iniate state


// routes.ts

import { useState } from 'crawlee';

const { playwrightDetails } = await useState();
//main.ts

import { useState } from 'crawlee';

const DEFAULT_STATE: { playwrightDetails: EventData[]; listingDone: boolean } = {
playwrightDetails: [],
listingDone: false,
};

const state = await useState(undefined, DEFAULT_STATE); // iniate state


// routes.ts

import { useState } from 'crawlee';

const { playwrightDetails } = await useState();
correct-apricot
correct-apricotOP15mo ago
I copied this code verbatem, and playwrightDetails is undefined in my route.ts
MEE6
MEE615mo ago
@sunlover just advanced to level 2! Thanks for your contributions! 🎉
correct-apricot
correct-apricotOP15mo ago
Do you know why this may be? Here is my exact code main.ts
import { Actor } from 'apify';
// For more information, see https://crawlee.dev
import { PlaywrightCrawler, RequestOptions, useState } from 'crawlee';
// this is ESM project, and as such, it requires you to specify extensions in your relative imports
// read more about this here: https://nodejs.org/docs/latest-v18.x/api/esm.html#mandatory-file-extensions
// note that we need to use `.js` even when inside TS files
import { router } from './routes.js';

interface ProxyConfig {
useApifyProxy: boolean;
proxyUrls?: string[]
}

export interface SocialsConfig {
facebook: boolean;
instagram: boolean;
tiktok: boolean;
pinterest: boolean;
}

interface Input {
startUrls: string[];
maxRequestsPerCrawl: number;
navigationTimeoutSecs: number;
proxyConfig: ProxyConfig;
socialsConfig: SocialsConfig;
}

// Initialize the Apify SDK
await Actor.init();

// Structure of input is defined in input_schema.json
const {
startUrls = [
'https://enzosbbq.com/'
],
proxyConfig = {
useApifyProxy: true,
groups: [
"RESIDENTIAL"
]
},
socialsConfig = {
facebook: true,
instagram: true,
tiktok: true,
pinterest: true
},
maxRequestsPerCrawl = 100,
navigationTimeoutSecs = 15,
} = await Actor.getInput<Input>() ?? {} as Input;

// TODO should this be input?
const maxRequestRetries: number = 2;

const proxyConfiguration = await Actor.createProxyConfiguration(
proxyConfig
);

const DEFAULT_STATE: { playwrightDetails: any; listingDone: boolean } = {
playwrightDetails: [],
listingDone: false,
};

const state = await useState(undefined, DEFAULT_STATE);

const crawler = new PlaywrightCrawler({
proxyConfiguration,
maxRequestsPerCrawl,
navigationTimeoutSecs,
requestHandler: router,
maxRequestRetries: maxRequestRetries
});

console.log("Initiating Crawler...")
await crawler.run(startUrls);

// Exit successfully
await Actor.exit();
import { Actor } from 'apify';
// For more information, see https://crawlee.dev
import { PlaywrightCrawler, RequestOptions, useState } from 'crawlee';
// this is ESM project, and as such, it requires you to specify extensions in your relative imports
// read more about this here: https://nodejs.org/docs/latest-v18.x/api/esm.html#mandatory-file-extensions
// note that we need to use `.js` even when inside TS files
import { router } from './routes.js';

interface ProxyConfig {
useApifyProxy: boolean;
proxyUrls?: string[]
}

export interface SocialsConfig {
facebook: boolean;
instagram: boolean;
tiktok: boolean;
pinterest: boolean;
}

interface Input {
startUrls: string[];
maxRequestsPerCrawl: number;
navigationTimeoutSecs: number;
proxyConfig: ProxyConfig;
socialsConfig: SocialsConfig;
}

// Initialize the Apify SDK
await Actor.init();

// Structure of input is defined in input_schema.json
const {
startUrls = [
'https://enzosbbq.com/'
],
proxyConfig = {
useApifyProxy: true,
groups: [
"RESIDENTIAL"
]
},
socialsConfig = {
facebook: true,
instagram: true,
tiktok: true,
pinterest: true
},
maxRequestsPerCrawl = 100,
navigationTimeoutSecs = 15,
} = await Actor.getInput<Input>() ?? {} as Input;

// TODO should this be input?
const maxRequestRetries: number = 2;

const proxyConfiguration = await Actor.createProxyConfiguration(
proxyConfig
);

const DEFAULT_STATE: { playwrightDetails: any; listingDone: boolean } = {
playwrightDetails: [],
listingDone: false,
};

const state = await useState(undefined, DEFAULT_STATE);

const crawler = new PlaywrightCrawler({
proxyConfiguration,
maxRequestsPerCrawl,
navigationTimeoutSecs,
requestHandler: router,
maxRequestRetries: maxRequestRetries
});

console.log("Initiating Crawler...")
await crawler.run(startUrls);

// Exit successfully
await Actor.exit();
routes.ts
import { Dataset, createPlaywrightRouter, useState } from "crawlee";
const { playwrightDetails } = await useState();
export const router = createPlaywrightRouter();

router.addDefaultHandler(async ({ page}) => {
// wait for page to load and all the JS to render
await page.waitForLoadState('networkidle',);

console.log("PlaywrightDetails")
console.log(playwrightDetails)
});
import { Dataset, createPlaywrightRouter, useState } from "crawlee";
const { playwrightDetails } = await useState();
export const router = createPlaywrightRouter();

router.addDefaultHandler(async ({ page}) => {
// wait for page to load and all the JS to render
await page.waitForLoadState('networkidle',);

console.log("PlaywrightDetails")
console.log(playwrightDetails)
});
Nomatter what I do, the state in routes.ts is always undefined
correct-apricot
correct-apricot15mo ago
You should move const { playwrightDetails } = await useState(); to DefaultHandler
router.addDefaultHandler(async ({ page}) => {
// wait for page to load and all the JS to render
await page.waitForLoadState('networkidle',);

const { playwrightDetails } = await useState();

console.log("PlaywrightDetails")
console.log(playwrightDetails)

});
router.addDefaultHandler(async ({ page}) => {
// wait for page to load and all the JS to render
await page.waitForLoadState('networkidle',);

const { playwrightDetails } = await useState();

console.log("PlaywrightDetails")
console.log(playwrightDetails)

});
correct-apricot
correct-apricotOP15mo ago
Thanks I got it working @Oleg V. 🙏
ambitious-aqua
ambitious-aqua2mo ago
why is that hard to find this informations on crawlee docs
MEE6
MEE62mo ago
@Scai just advanced to level 2! Thanks for your contributions! 🎉
ambitious-aqua
ambitious-aqua2mo ago
I've got it to working as the way I wanted.

Did you find this page helpful?