How to save network requests made by the webpage I am scraping?
Hello, the scraping that I'm trying to do is not of actual content on the page, but rather network requests. I don't need anything too fancy - I just basically want to dump everything one would see in the Network tab of their browser's Inspect tool. I tried searching through the docs, but"request" gives back a lot of unrelated stuff since that word is pretty central to how Crawlee works :).
If there is another tool that would be more appropriate for this, please let me know. I still need to be able to deal with JS-heavy pages, and I still need to be able to follow links. It's just that the end product I need is requests, not page elements.
7 Replies
sunny-green•10mo ago
Hi, you can use python requests module
jolly-crimsonOP•10mo ago
hmm I'm not sure I understand. These webpages are still Javascript-heavy, so if I just do
requests.get(url)
, I'm not actually going to get the results. I still need an actual headless browser solution like Crawlee to open up the webpage. I then need to dump all of the requests that happen inside of that headless browser into an output file of some sortsunny-green•10mo ago
So do you think it's possible to combine Crawlee and Python?
jolly-crimsonOP•10mo ago
If it's possible I'd love some guidance because I don't know where to start. I'm not sure which Crawlee-generated object I would pass to which function in
requests
yelping-magenta•2w ago
Hi, have you find any solution? i also want to capture network traffic
like-gold•2w ago
Hey @uandsaeed
For capture network traffic, you can use Playwright with the
record_har_path
parameter.Browser | Playwright Python
A Browser is created via browser_type.launch(). An example of using a [Browser] to create a [Page]:
stormy-gold•2w ago
Hi, I think, following guides might be helpful (even though they are for JS):
- https://docs.apify.com/academy/puppeteer-playwright/reading-intercepting-requests
- https://docs.apify.com/academy/node-js/using-proxy-to-intercept-requests-puppeteer
You can do the same thing, but in Python. There are also more similar guides (unfortunately, most are for JS). Try to read through them and find what you need.
IV - Reading & intercepting requests | Academy | Apify Documentation
You can use DevTools, but did you know that you can do all the same stuff (plus more) programmatically? Read and intercept requests in Puppeteer/Playwright.
Using man-in-the-middle proxy to intercept requests in Puppeteer | ...
This article demonstrates how to set up a reliable interception of HTTP requests in headless Chrome / Puppeteer using a local proxy.