How to send post request (I'm doing reverse engineering)
I'm conducting reverse engineering and have discovered a link that retrieves all the data I need using the POST method. I've copied the request as cURL to analyze the parameters required for making the correct request.
I've modified the parameters to make the request using the POST method. I've successfully tested this using httpx, but now I want to implement it using the Crawlee framework.
How can I change the method used by the HTTP client to retrieve the data, and how can I pass the modified parameters I've prepared?
Additionally, if anyone has experience, I'd appreciate any insights on handling POST requests within this framework.
Thanks
9 Replies
View post on community site
This post has been pushed to the community knowledgebase. Any replies in this thread will be synced to the community site.
Apify Community
helpful-purple•8mo ago
Hey @frankman
Here's an example in the documentation - https://crawlee.dev/python/docs/examples/fill-and-submit-web-form
Fill and submit web form | Crawlee for Python · Fast, reliable craw...
Crawlee helps you build and maintain your Python crawlers. It's open source and modern, with type hints for Python to help you catch bugs early.
conscious-sapphireOP•7mo ago
Hi, the above answser doesn't work for me. I have found this open issue and may be it is related because I'm trying to do a POST request and I'm not getting any data.
https://github.com/apify/crawlee-python/issues/560
I'm doing this:
Here how I adding the request:
Here the response when I want to save the json:
GitHub
Unable to execute POST request with JSON payload · Issue #560 · api...
Example async def main() -> None: crawler = HttpCrawler() # Define the default request handler, which will be called for every request. @crawler.router.default_handler async def request_handler(...
conscious-sapphireOP•7mo ago
The issue also is related with this PR: https://github.com/apify/crawlee-python/pull/542
I'm adding this url to follow this issue. I'm interested in help because I'm using crawlee and apify a lot.
GitHub
fix!: merge payload and data fields of Request by vdusek · Pull Req...
Description
We had data and payload fields on the Request model.
payload was not being provided to the HTTP clients, only the data field.
In this PR, I'm merging them together, keeping ...
@frankman just advanced to level 1! Thanks for your contributions! 🎉
helpful-purple•7mo ago
Hey, @frankman
Yes, I created issue 560 🙂
About your URL. I don't see any payload in it. That is, you pass all the parameters as link parameters, not in the body of the POST request.
Are you sure you are creating it correctly?
Are you doing the same thing using HTTPX?
If you look at how the site sees it using
httpbin.org/post
you'll get this response format
This is completely correct for your example, all parameters are in args
You'll also see an error in your URL 🙂
You forgot the &
before the genreId
parameter
The correct URL should be
conscious-sapphireOP•7mo ago
Sorry, I have deleted the domain name and some parameters. I have a mistake. You analyze in base of that. I will put the original link so you can check it again.
Continues 🧵
The output was:
If I do the same but only with
httpx
:
helpful-purple•7mo ago
Hi. All the code works correctly.
The problem is exactly what you are doing.
1.
context.request.model_dump_json()
- as you can see, it outputs the Request metadata, which does not include the server response
As a result, you are comparing the request metadata from crawlee with the server response in httpx...
2. I don't really understand why you need BeautifulSoupCrawler when working with json. I think it would be more appropriate to use ParselCrawler or HttpCrawler with a convenient library for working with json.
Here is a sample code that will do what you expect it to do
conscious-sapphireOP•7mo ago
You're right Mantisus, now I'm using HttpCrawler() and I'm getting the data I want:
This code does what I want:
Thanks Mantisus