Question about Includes param on crawl

I am trying to crawl specific sections of https://www.lsu.edu by using the includes param. When I do this some of the pages I expect data for are returned but others are not:

e.g. in this code:
crawl_url = 'https://www.lsu.edu/'
params = {
'crawlerOptions': {
'limit': 100,
'maxDepth': 6,
'includes': [
'/cas',
'/testing',
'/cmda/theatre/resources/student/advising/index.php',
'/science/student-services/advising/',
'/registrar/academics/academic-calendars/index.php',
'/financialaid/types_of_scholarships/academic_common_market/index.php',
'/majors/fast-tracks.php',
'/eng/current/advising/index.php',
'/cce/academics/undergraduate/advising.php',
'/agriculture/students/student-services/advisors.php',
'/financialaid/apply_for_scholarships/'
]
},
'pageOptions': {
'onlyMainContent': True,
'parsePDF': True,
}
}

crawl_result = app.crawl_url(crawl_url, params=params)

everything I expect in /cas , /testing and /majors/fast-tracks.php but I don't get anything back for: '/cmda/theatre/resources/student/advising/index.php'

But if I crawl for https://www.lsu.edu/cmda/theatre/resources/student/advising/index.php directly - I get back the page I am expecting.

Question about Includes param on crawl

Similar Threads

Similar Threads

Similar Threads