CA
wise-white
Seeking Help to Access Safari Reader View Mode HTML Code
I'm currently working on a project that requires accessing the HTML code generated by Safari's Reader View mode. This mode simplifies the webpage content, making it cleaner and easier to parse. I understand that the reader view mode content appears after clicking on the Reader Mode button.
I'm curious to know if there are any tools or methods within the Apify ecosystem that could assist me in obtaining the HTML code from Safari's Reader View mode. Any insights or suggestions on how to accomplish this would be greatly appreciated!
11 Replies
automatic-azure•2y ago
I'd be surprised if Safari offered an API or any sort of programmatic access to that, but I'll let the wizards chime in on that because I could be wrong.
automatic-azure•2y ago
As a potential alternative, mozilla does offer readability: https://github.com/mozilla/readability
GitHub
GitHub - mozilla/readability: A standalone version of the readabili...
A standalone version of the readability lib. Contribute to mozilla/readability development by creating an account on GitHub.
automatic-azure•2y ago
may be able to feed your target document into it and parse from there?
automatic-azure•2y ago
Since you're talking about accessing Safari's reader mode specifically I assume you're doing legit in-chrome / not-headless scraping, but the readme for readability does make note of how you'd achieve parsing in node if that's of any use: https://github.com/mozilla/readability#nodejs-usage
GitHub
GitHub - mozilla/readability: A standalone version of the readabili...
A standalone version of the readability lib. Contribute to mozilla/readability development by creating an account on GitHub.
wise-whiteOP•2y ago
Hello @shovelandsandbox,
Thank you for your quick response and suggestion about Mozilla's readability library. I appreciate your input.
Actually, I have already used the readability and newspaper3k libraries for similar tasks. However, I've found that neither is as reliable as Safari's Reader Mode in terms of consistently producing clean, simplified HTML. That's why I am particularly interested in tapping into Safari's Reader View mode's functionality.
Unfortunately, I couldn't find any existing Python solution that replicates Safari's Reader View mode. I've even tried using Selenium to activate the Reader Mode in Safari, but to no avail.
Do you know of any Apify actors that might be able to accomplish this? Any advice or direction in this matter would be greatly appreciated!
automatic-azure•2y ago
Hmm, I see – and that doesn't surprise me re: safari reader producing cleaner results more reliably.
@logical mirror what exactly are you using for scraping in your actor?
wise-whiteOP•2y ago
@shovelandsandbox News articles
automatic-azure•2y ago
@logical mirror I mean–playwright, puppeteer, etc.
wise-whiteOP•2y ago
@shovelandsandbox So far, I haven't used any actors, as the manual/logical scraping isn't the main challenge. The key issue is finding a generic way to extract information that's applicable to all articles. The question stands: does Apify have a solution for this?
@logical mirror just advanced to level 1! Thanks for your contributions! 🎉
automatic-azure•2y ago
@logical mirror there may be something relevant in the marketplace, but I'm assuming you've already checked through everything there