I am making a news website. It should fetch headlines hourly, and the user should be able to choose very precisely what news is being fetched. However I keep running into problems, forcing me to make compromises.
v1: The first thing I tried was scraping articles directly from the news source. But the websites blocked my requests. v2: Then I tried the free API from NewsAPI. The news is 24h old, but it worked. Now I needed a backend to automate the process. v3: I used Django for backend, and hosted the site on https://render.com. Hosted the DB on https://www.cockroachlabs.com/ for 10gb DB storage, and used Github actions to perform automated fetch request. Great, however the news is still at least 24h old, the articles only include a description and the service is painfully slow. v4: Then I tried RSS feeds, which are up to date, but again don't include the full articles.
Today I got an e-mail from Cockroachlabs. Turns out they are not free after all, and they will delete my DB in 2 weeks.
So I am not allowed to scrape (is illegal even?), and API's and RSS are very limited in content and timeliness. In terms of hosting I don't seem to have any free options that allow Python, database and workflows.
Could somebody point me in the right direction, or tell me this is impossible?