Anything special about .php websites?

when I try to make a request to a website URL that ends with .php it appears that request is skipped? Anything peculiar I need to know about .php sites and how to get to them via Crawlee?
7 Replies
NeoNomade
NeoNomade•2y ago
What tool from Crawlee are you using ?
rare-sapphire
rare-sapphireOP•2y ago
I'm just using a headless PlaywrightCrawlwer instance feeding it a single url string (that ends with .php). If I provide another URL I get expected behavior (a requestHandler gets called), but thephp one gets 'ignored'.
NeoNomade
NeoNomade•2y ago
Any errors ? Any other logs ?
rare-sapphire
rare-sapphireOP•2y ago
No.
INFO PlaywrightCrawler: Crawl finished. Final request statistics: {"requestsFinished":0,"requestsFailed":0,"retryHistogram":[],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":21}
DEBUG PlaywrightCrawler: Terminal status message: Finished! Total 0 requests: 0 succeeded, 0 failed.
INFO PlaywrightCrawler: Crawl finished. Final request statistics: {"requestsFinished":0,"requestsFailed":0,"retryHistogram":[],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":null,"requestsFinishedPerMinute":0,"requestsFailedPerMinute":0,"requestTotalDurationMillis":0,"requestsTotal":0,"crawlerRuntimeMillis":21}
DEBUG PlaywrightCrawler: Terminal status message: Finished! Total 0 requests: 0 succeeded, 0 failed.
the structure ofb the url is as follows: https://domain.com/page.php
ratty-blush
ratty-blush•2y ago
It's OK with this example code & the url as follows: https://www.example.com/example.php
import {
PlaywrightCrawler,
} from 'crawlee';

const crawler = new PlaywrightCrawler({
headless: false,

async requestHandler({ request, response, page, log, enqueueLinks }) {
await page.waitForTimeout(5000);

const title = await page.title();
log.info(`${request.loadedUrl} Titre: '${title}'`);

await enqueueLinks();
},
});

await crawler.run([
'https://www.example.com/example.php',
]);
import {
PlaywrightCrawler,
} from 'crawlee';

const crawler = new PlaywrightCrawler({
headless: false,

async requestHandler({ request, response, page, log, enqueueLinks }) {
await page.waitForTimeout(5000);

const title = await page.title();
log.info(`${request.loadedUrl} Titre: '${title}'`);

await enqueueLinks();
},
});

await crawler.run([
'https://www.example.com/example.php',
]);
INFO PlaywrightCrawler Status message: Crawled 0/1 pages, 0 errors.
INFO PlaywrightCrawler Starting the crawl
INFO PlaywrightCrawler https://www.example.com/example.php Titre: 'Example Domain'
INFO PlaywrightCrawler All requests from the queue have been processed, the crawler will shut down.
INFO PlaywrightCrawler Crawl finished. Final request statistics {"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":30450,"requestsFinishedPerMinute":2,"requestsFailedPerMinute":0,"requestTotalDurationMillis":30450,"requestsTotal":1,"crawlerRuntimeMillis":32699}
INFO PlaywrightCrawler Terminal status message: Finished! Total 1 requests: 1 succeeded, 0 failed.
INFO PlaywrightCrawler Status message: Crawled 0/1 pages, 0 errors.
INFO PlaywrightCrawler Starting the crawl
INFO PlaywrightCrawler https://www.example.com/example.php Titre: 'Example Domain'
INFO PlaywrightCrawler All requests from the queue have been processed, the crawler will shut down.
INFO PlaywrightCrawler Crawl finished. Final request statistics {"requestsFinished":1,"requestsFailed":0,"retryHistogram":[1],"requestAvgFailedDurationMillis":null,"requestAvgFinishedDurationMillis":30450,"requestsFinishedPerMinute":2,"requestsFailedPerMinute":0,"requestTotalDurationMillis":30450,"requestsTotal":1,"crawlerRuntimeMillis":32699}
INFO PlaywrightCrawler Terminal status message: Finished! Total 1 requests: 1 succeeded, 0 failed.
rare-sapphire
rare-sapphireOP•2y ago
aha, thanks for the example. Adding a www resolved my issue.
MEE6
MEE6•2y ago
@academiclabs just advanced to level 1! Thanks for your contributions! 🎉

Did you find this page helpful?