Crawlee & Apify•2y ago

not skipping over urls for unfound elements

when i am scraping product data from product urls, if i am trying to either see whether a tag is available and if not to use a different tag or if a tag simply isn't found, i don't want it to give a full error for not finding that certain element i want and not scrape and save the rest of the data how do i avoid this "skipping" over by overriding or changing the natural response of the crawler i even have tried try catch statements and if else statements and nothing works

1 Reply

sensitive-blueOP•2y ago

code:

lifeWithoutPlasticRouter.addHandler('LIFE_WITHOUT_PLASTIC_PRODUCT', async ({ page, request }) => {
    try {

        await page.goto(page.url(), { waitUntil: 'domcontentloaded' })

        console.log('Scraping products');

        const storeName = 'Life Without Plastic';

        const title = await page.$eval('h1.product-title', (el) => el.textContent?.trim() || '');
        
        let image = await page.$eval('a.product-image', (img) => img.getAttribute('href'));

        let description = await page.$$eval('div.product-description-wrapper p', (paragraphs) => {
          return paragraphs.map((p) => p.textContent?.trim()).join(' ');
        });
      
        let salePrice = await page.$eval('span.price-value', (el) => el.textContent?.trim() || '');
        let newTag = await page.$eval('span.price-ns', (el) => el.textContent?.trim() || '');
        let originalPrice = salePrice;

        if(newTag){
          originalPrice = newTag;
        }else{
          return
        }
        originalPrice = originalPrice.replace("$", "")
        originalPrice = originalPrice.replace("USD", "")

        salePrice = salePrice.replace("$", "")
        salePrice = salePrice.replace("USD", "")

        const shippingInfo = 'Free Shipping on orders above $100';
       ...
});

lifeWithoutPlasticRouter.addHandler('LIFE_WITHOUT_PLASTIC_PRODUCT', async ({ page, request }) => {
    try {

        await page.goto(page.url(), { waitUntil: 'domcontentloaded' })

        console.log('Scraping products');

        const storeName = 'Life Without Plastic';

        const title = await page.$eval('h1.product-title', (el) => el.textContent?.trim() || '');
        
        let image = await page.$eval('a.product-image', (img) => img.getAttribute('href'));

        let description = await page.$$eval('div.product-description-wrapper p', (paragraphs) => {
          return paragraphs.map((p) => p.textContent?.trim()).join(' ');
        });
      
        let salePrice = await page.$eval('span.price-value', (el) => el.textContent?.trim() || '');
        let newTag = await page.$eval('span.price-ns', (el) => el.textContent?.trim() || '');
        let originalPrice = salePrice;

        if(newTag){
          originalPrice = newTag;
        }else{
          return
        }
        originalPrice = originalPrice.replace("$", "")
        originalPrice = originalPrice.replace("USD", "")

        salePrice = salePrice.replace("$", "")
        salePrice = salePrice.replace("USD", "")

        const shippingInfo = 'Free Shipping on orders above $100';
       ...
});

especially this here - it doesn't work to avoid an error and even using a try catch statement that tries to use that different tag and catches it andjust logs an error and returns doesnt work either:

        let salePrice = await page.$eval('span.price-value', (el) => el.textContent?.trim() || '');
        let newTag = await page.$eval('span.price-ns', (el) => el.textContent?.trim() || '');
        let originalPrice = salePrice;

        if(newTag){
          originalPrice = newTag;
        }else{
          return
        }

        let salePrice = await page.$eval('span.price-value', (el) => el.textContent?.trim() || '');
        let newTag = await page.$eval('span.price-ns', (el) => el.textContent?.trim() || '');
        let originalPrice = salePrice;

        if(newTag){
          originalPrice = newTag;
        }else{
          return
        }

ive tried all different combinations to catch errors but it doesn't avoid the built in crawlee error

Gaming

Programming

not skipping over urls for unfound elements

Did you find this page helpful?