application/octect stream in cheerio
I'm trying to scrape a second page, in a working scraper. Though this page gives the response as "application/octect-stream". Is there something I could do to fix this or should I swap to puppeteer/playwright. Looks kinda same since the page is full static
Here the error message:
Thank you so much
15 Replies
Change the headers to match the request in your browser , in special the application type if it’s application/json
Hello @Lamp can you share a link of such a website? Generally
application/octet-stream
is used when the page provide you some data to download.rival-blackOP•2y ago
Well, actually it's a website that I downloaded via wget. So I could test my stuff on local before doing it with the real site.
Now that you told me I noticed that looking at the network packets I get text/html. Dunno why in wget it changes it
Thank you for the help, I'll look for another way to download the site
@Lamp just advanced to level 1! Thanks for your contributions! 🎉
so you are browsing/scraping the website from you filesystem?
rival-blackOP•2y ago
yep http://127.0.0.1/
I usually follow this process of first downloading a portion and then launching on the actual thing
With tools like HTTrack or just wget
it generally works! :3
how do you serve the content?, I don't think this would be wget related, content-type is more webserver related. does the page have some proper extension like .html ?
rival-blackOP•2y ago
nop, the website spits out in text/html

rival-blackOP•2y ago
maybe wget just started digging into some other stuff
I probably have to just refine the options
By website you mean the original website or your local one? The content type is not saved in a file it is provided to HTTP response by the webserver if it is not specified on application level.
rival-blackOP•2y ago
this was the original one
let me check on the 127 one
but yea tools like HTTTrack may be smart enought to save everything with .html filename extension, so in the end it might solve your issue
rival-blackOP•2y ago
no content type on the local one

rival-blackOP•2y ago
ye generally that works perfectly. Tho just httrack could pass over the restriction of this website
rival-blackOP•2y ago
this one is another website, that I could get with htt
