CA
wise-white

Seeking Help to Access Safari Reader View Mode HTML Code

I'm currently working on a project that requires accessing the HTML code generated by Safari's Reader View mode. This mode simplifies the webpage content, making it cleaner and easier to parse. I understand that the reader view mode content appears after clicking on the Reader Mode button. I'm curious to know if there are any tools or methods within the Apify ecosystem that could assist me in obtaining the HTML code from Safari's Reader View mode. Any insights or suggestions on how to accomplish this would be greatly appreciated!
11 Replies
automatic-azure
automatic-azure2y ago
I'd be surprised if Safari offered an API or any sort of programmatic access to that, but I'll let the wizards chime in on that because I could be wrong.
automatic-azure
automatic-azure2y ago
As a potential alternative, mozilla does offer readability: https://github.com/mozilla/readability
GitHub
GitHub - mozilla/readability: A standalone version of the readabili...
A standalone version of the readability lib. Contribute to mozilla/readability development by creating an account on GitHub.
automatic-azure
automatic-azure2y ago
may be able to feed your target document into it and parse from there?
automatic-azure
automatic-azure2y ago
Since you're talking about accessing Safari's reader mode specifically I assume you're doing legit in-chrome / not-headless scraping, but the readme for readability does make note of how you'd achieve parsing in node if that's of any use: https://github.com/mozilla/readability#nodejs-usage
GitHub
GitHub - mozilla/readability: A standalone version of the readabili...
A standalone version of the readability lib. Contribute to mozilla/readability development by creating an account on GitHub.
wise-white
wise-whiteOP2y ago
Hello @shovelandsandbox, Thank you for your quick response and suggestion about Mozilla's readability library. I appreciate your input. Actually, I have already used the readability and newspaper3k libraries for similar tasks. However, I've found that neither is as reliable as Safari's Reader Mode in terms of consistently producing clean, simplified HTML. That's why I am particularly interested in tapping into Safari's Reader View mode's functionality. Unfortunately, I couldn't find any existing Python solution that replicates Safari's Reader View mode. I've even tried using Selenium to activate the Reader Mode in Safari, but to no avail. Do you know of any Apify actors that might be able to accomplish this? Any advice or direction in this matter would be greatly appreciated!
automatic-azure
automatic-azure2y ago
Hmm, I see – and that doesn't surprise me re: safari reader producing cleaner results more reliably. @logical mirror what exactly are you using for scraping in your actor?
wise-white
wise-whiteOP2y ago
@shovelandsandbox News articles
automatic-azure
automatic-azure2y ago
@logical mirror I mean–playwright, puppeteer, etc.
wise-white
wise-whiteOP2y ago
@shovelandsandbox So far, I haven't used any actors, as the manual/logical scraping isn't the main challenge. The key issue is finding a generic way to extract information that's applicable to all articles. The question stands: does Apify have a solution for this?
MEE6
MEE62y ago
@logical mirror just advanced to level 1! Thanks for your contributions! 🎉
automatic-azure
automatic-azure2y ago
@logical mirror there may be something relevant in the marketplace, but I'm assuming you've already checked through everything there

Did you find this page helpful?