stormy-gold

Scraping public records

I have a script that scrapes pulbic records and it works fine, however, ive been trying to make it also download the pdf file that is attatched to eashc listing and ive ran into some issues there. When clicking on the row a pop up comes up that should have the pdf however when i open it with selenium it doesnt load the pop and im unable to download it, but when i open it manually it comes up just fine. The website:https://officialrecords.broward.org/AcclaimWeb/search/SearchTypeDocType My code(its a bit ugly but im only in the proccess of writing it) is attached ive also attached a picture of the pop up when i click it and when selenium clicks it

new.py

5 Replies

HonzaS•2y ago

cant you just catch request with the id/url of the file and then download it?

stormy-goldOP•2y ago

i tried getting the url of the file to download it however when i did it downloaded as a corrupt file, i can show you if youd like

extended-salmon•2y ago

hey @Tragiik , is this bounty still open?

HonzaS•2y ago

I have tried with got and it works for me, I can open the file after downloading

const buffer = await gotScraping('https://officialrecords.broward.org/AcclaimWeb/Image/DocumentPdfAllPages/zjP_EtaJkyiDUhvAzG_n1S9-zp_wBV-oTcZ81ttcaml9UGwtn7ON0mM_v31nrXTo').buffer();
fs.writeFileSync('test.pdf',buffer);

const buffer = await gotScraping('https://officialrecords.broward.org/AcclaimWeb/Image/DocumentPdfAllPages/zjP_EtaJkyiDUhvAzG_n1S9-zp_wBV-oTcZ81ttcaml9UGwtn7ON0mM_v31nrXTo').buffer();
fs.writeFileSync('test.pdf',buffer);

I have tried it with the catching the request to get url and then download and it works very well. I am not familiar with selenium but it should have this possiblity too. Or you can use javascript and playwright.

page.on('popup', async data => { data.on('response',async response => {
            const url = await response.url();
            if(url.includes('https://officialrecords.broward.org/AcclaimWeb/Image/DocumentPdfAllPages'))
            {
                const buffer = await gotScraping(url).buffer();
                fs.writeFileSync('test.pdf',buffer);
            }
            
        });
    });

page.on('popup', async data => { data.on('response',async response => {
            const url = await response.url();
            if(url.includes('https://officialrecords.broward.org/AcclaimWeb/Image/DocumentPdfAllPages'))
            {
                const buffer = await gotScraping(url).buffer();
                fs.writeFileSync('test.pdf',buffer);
            }
            
        });
    });

MEE6•2y ago

@iHATE just advanced to level 1! Thanks for your contributions! 🎉

Gaming

Programming

Scraping public records

Did you find this page helpful?