Hey all, with the introduction to the
Hey all, with the introduction to the new
/map
endpoint, we have the need to list the URLs returned and then have the user click to select the the URLs they want crawled/scraped.
Sending one /scrape
endpoint per URL seems inefficient on both sides and will blow through our rate limit.
It seems like includedPaths
does not only scrape the included paths as a part of the /crawl
endpoint. ...https://helpme.freshnrebel.com/hc/
https://helpme.freshnrebel.com/hc/sitemap.xml
Trying to map all pages with english locale in this sitemap, however when using the /map endpoint with
url = https://helpme.freshnrebel.com/hc/en-gb
It only returns 80-ish results...Or is it possible to use the /crawl
Or is it possible to use the /crawl endpoint without actually scraping all URLs? And to return only a list of URLs for which I then can use the /scrape endpoint?
Hello everyone, maybe someone can help.
Hello everyone, maybe someone can help.
I have a client that has a very specific request and I am wondering how to better build something like this.
We have 100s of websites. For each website we have a product #...
found another issue with the map
found another issue with the map endpoint. How many URLs can the map endpoint retrieve? It only gives 51 URLs but the website domain actuall contains thousands url links. Is there a limit to the number of URLs can be retrieved by the /map endpoint? @Adobe.Flash

FEDEAFES archivos - Confederación Salud ...
Hi @Adobe.Flash
Can you check why scrape of those URLs doesn't work. Just gavei two exampels
https://consaludmental.org/tag/fedeafes/...

Is it possible to exclude raw html from
Is it possible to exclude raw html from responses?
It's quite weird, some of the objects in my result do have rawHtml keys, and some only have markdown keys in them.
I'm using the new crawl endpoint in the v1 api, with
scapeOptions.formats
= markdown
...