Why when I tried to scrape a page with pre tag / code block it's always empty in the markdown?

I want to get code block of the documentation site, but always get empty




in the results
No description
6 Replies
Gaurav Chadha
Gaurav Chadha2w ago
@orykevin can you share the url you're testing?
orykevin
orykevinOP2w ago
Agent
API Reference for the Agent class.
orykevin
orykevinOP2w ago
it's empty code block if the code is long, but it captures one line code
No description
No description
Gaurav Chadha
Gaurav Chadha2w ago
okay, I see the issue here, when I test with other websites to get the code block such as firecrawl it works flawlessly because it's well structured with html tags.
No description
Gaurav Chadha
Gaurav Chadha2w ago
@orykevin To make it work with the https://ai-sdk.dev/docs/reference/ai-sdk-core/agent as this site uses unstructured html tags you'll have to use the format option to extract the code example using prompt, refer to this curl request:
curl -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://ai-sdk.dev/docs/reference/ai-sdk-core/agent",
"formats": [{
"type": "json",
"prompt": "Extract all code examples from this page and format them as a single, complete, well-formatted code file. Start with import statements, then show constructor examples, method calls, and usage examples. Use proper JavaScript/TypeScript formatting with correct indentation, semicolons, and line breaks. Present the result as one cohesive code file that could be executed."
}],
"waitFor": 3000
}'
curl -X POST "https://api.firecrawl.dev/v2/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://ai-sdk.dev/docs/reference/ai-sdk-core/agent",
"formats": [{
"type": "json",
"prompt": "Extract all code examples from this page and format them as a single, complete, well-formatted code file. Start with import statements, then show constructor examples, method calls, and usage examples. Use proper JavaScript/TypeScript formatting with correct indentation, semicolons, and line breaks. Present the result as one cohesive code file that could be executed."
}],
"waitFor": 3000
}'
this will scrape the code block.
No description
orykevin
orykevinOP2w ago
Awesome ! Thanks for the solution, I will try it now

Did you find this page helpful?