CA
Crawlee & Apify•2y ago
stormy-gold

Scraping public records

I have a script that scrapes pulbic records and it works fine, however, ive been trying to make it also download the pdf file that is attatched to eashc listing and ive ran into some issues there. When clicking on the row a pop up comes up that should have the pdf however when i open it with selenium it doesnt load the pop and im unable to download it, but when i open it manually it comes up just fine. The website:https://officialrecords.broward.org/AcclaimWeb/search/SearchTypeDocType My code(its a bit ugly but im only in the proccess of writing it) is attached ive also attached a picture of the pop up when i click it and when selenium clicks it
5 Replies
HonzaS
HonzaS•2y ago
cant you just catch request with the id/url of the file and then download it?
stormy-gold
stormy-goldOP•2y ago
i tried getting the url of the file to download it however when i did it downloaded as a corrupt file, i can show you if youd like
extended-salmon
extended-salmon•2y ago
hey @Tragiik , is this bounty still open?
HonzaS
HonzaS•2y ago
I have tried with got and it works for me, I can open the file after downloading
const buffer = await gotScraping('https://officialrecords.broward.org/AcclaimWeb/Image/DocumentPdfAllPages/zjP_EtaJkyiDUhvAzG_n1S9-zp_wBV-oTcZ81ttcaml9UGwtn7ON0mM_v31nrXTo').buffer();
fs.writeFileSync('test.pdf',buffer);
const buffer = await gotScraping('https://officialrecords.broward.org/AcclaimWeb/Image/DocumentPdfAllPages/zjP_EtaJkyiDUhvAzG_n1S9-zp_wBV-oTcZ81ttcaml9UGwtn7ON0mM_v31nrXTo').buffer();
fs.writeFileSync('test.pdf',buffer);
I have tried it with the catching the request to get url and then download and it works very well. I am not familiar with selenium but it should have this possiblity too. Or you can use javascript and playwright.
page.on('popup', async data => { data.on('response',async response => {
const url = await response.url();
if(url.includes('https://officialrecords.broward.org/AcclaimWeb/Image/DocumentPdfAllPages'))
{
const buffer = await gotScraping(url).buffer();
fs.writeFileSync('test.pdf',buffer);
}

});
});
page.on('popup', async data => { data.on('response',async response => {
const url = await response.url();
if(url.includes('https://officialrecords.broward.org/AcclaimWeb/Image/DocumentPdfAllPages'))
{
const buffer = await gotScraping(url).buffer();
fs.writeFileSync('test.pdf',buffer);
}

});
});
MEE6
MEE6•2y ago
@iHATE just advanced to level 1! Thanks for your contributions! 🎉

Did you find this page helpful?