Exctract url from html code
Hello all,
I would like to extract url in html code with Apify scrapper.
Here is the html code and the url to extract :
<a class="app-aware-link profile-rail-card__profile-link t-16 t-black t-bold tap-target" href="https://www.linkedin.com/in/benjaminejzenberg?miniProfileUrn=urn%3Ali%3Afs_miniProfile%3AACoAAAj58zYBTN8loEzvrFJhh-16iFZ8gnfPSGU" data-test-app-aware-link="">
<div class="single-line-truncate t-16 t-black t-bold mt2">
Voir le profil complet
</div>
</a>
Here is my input :
async function pageFunction(context) {
const $ = context.jQuery; const pageTitle = $('title').first().text(); const h1 = $('h1').first().text(); const first_h2 = $('h2').first().text(); const random_text_from_the_page = $('p').first().text(); const author_profile_link = $('div.scaffold-layout.scaffold-layout--breakpoint-xl.scaffold-layout--sidebar-main-aside.scaffold-layout--reflow > div > div > div > div > div > div > div.pt3.ph3.pb4.break-words > a:nth-child(5) a[href]').text();
context.log.info(
return { url: context.request.url, pageTitle, h1, first_h2, random_text_from_the_page, author_profile_link }; } Thanks for your help 🙂
const $ = context.jQuery; const pageTitle = $('title').first().text(); const h1 = $('h1').first().text(); const first_h2 = $('h2').first().text(); const random_text_from_the_page = $('p').first().text(); const author_profile_link = $('div.scaffold-layout.scaffold-layout--breakpoint-xl.scaffold-layout--sidebar-main-aside.scaffold-layout--reflow > div > div > div > div > div > div > div.pt3.ph3.pb4.break-words > a:nth-child(5) a[href]').text();
context.log.info(
URL: ${context.request.url}, TITLE: ${pageTitle}
);
await context.enqueueRequest({ url: 'http://www.example.com' });
return { url: context.request.url, pageTitle, h1, first_h2, random_text_from_the_page, author_profile_link }; } Thanks for your help 🙂
1 Reply
provincial-silver•3y ago
.text()
is what you see as visual output in browser, you need attr('href')