Hi I am using Smart Article Extractor

Hi I am using Smart Article Extractor actor for extracting info in form of json from an article URL, now upon running it on postman, the actor runs flawlessly on apify console but fails to provide any response on postman with 201, how can i get response on it, please help
14 Replies
clever-tan
clever-tan•13mo ago
I guess you got defaultDatasetId in the response so you can make another request to get data from that dataset
ambitious-aqua
ambitious-aqua•13mo ago
{ "articleUrls": [ { "url": "https://www.livemint.com/science/serum-institute-faces-lawsuit-over-covishield-side-effects-deaths-from-blood-clots-says-father-of-deceased-girl-11714617925438.html" } ], "crawlWholeSubdomain": false, "enqueueFromArticles": false, "extendOutputFunction": "($) => {\n const result = {};\n // Uncomment to add a title to the output\n // result.pageTitle = $('title').text().trim();\n\n return result;\n}", "isUrlArticleDefinition": { "minDashes": 4, "hasDate": true, "linkIncludes": [ "article", "storyid", "?p=", "id=", "/fpss/track", ".html", "/content/" ] }, "mustHaveDate": true, "onlyInsideArticles": true, "onlyNewArticles": false, "onlyNewArticlesPerDomain": false, "onlySubdomainArticles": false, "proxyConfiguration": { "useApifyProxy": true }, "saveHtml": false, "saveHtmlAsLink": false, "saveSnapshots": false, "scanSitemaps": false, "scrollToBottom": false, "useBrowser": false, "useGoogleBotHeaders": false } this is my body, and it should reflect on postman only , am i doing something wrong here? https://api.apify.com/v2/acts/lukaskrivka~article-extractor-smart/run-sync?token=*mytoken*, this is my POST request url @HonzaS
clever-tan
clever-tan•13mo ago
and what is the response?
ambitious-aqua
ambitious-aqua•13mo ago
nothing, blank on postman, but it works and fetches data as per my logs
clever-tan
clever-tan•13mo ago
it should not be blank I think, let me try
ambitious-aqua
ambitious-aqua•13mo ago
yes
MEE6
MEE6•13mo ago
@Shubh just advanced to level 1! Thanks for your contributions! 🎉
ambitious-aqua
ambitious-aqua•13mo ago
please do actor's id = hy5TYiCBwQ9o8uRKG
ambitious-aqua
ambitious-aqua•13mo ago
No description
ambitious-aqua
ambitious-aqua•13mo ago
that job that i ran
clever-tan
clever-tan•13mo ago
can you see the log? I have this in log of the run 2024-05-03T13:12:09.418Z WARN No text found on article page: https://www.livemint.com/science/serum-institute-faces-lawsuit-over-covishield-side-effects-deaths-from-blood-clots-says-father-of-deceased-girl-11714617925438.html 2024-05-03T13:12:09.453Z WARN IS NOT VALID ARTICLE --- Reasons: [Article has no date], [Article has too few words: 1 (should be at least 150)] so it returns data as response but there are no data from that run I think oh, now I see you have results on the console do I have different input as I have no results? https://console.apify.com/view/runs/6p8nlrbV7oe6GtxwC I have changed the input and now it is returning results on the web and from request also
ambitious-aqua
ambitious-aqua•13mo ago
what did you change could you send curl please minus the token ill try running it thanks in advance Honza you are a true saviour hi
clever-tan
clever-tan•13mo ago
No description
clever-tan
clever-tan•13mo ago
there is no curl export, but the url is https://api.apify.com/v2/acts/lukaskrivka~article-extractor-smart/run-sync-get-dataset-items?token=<token> and body is
{
"articleUrls": [
{
"url": "https://www.livemint.com/science/serum-institute-faces-lawsuit-over-covishield-side-effects-deaths-from-blood-clots-says-father-of-deceased-girl-11714617925438.html"
}
],
"crawlWholeSubdomain": false,
"enqueueFromArticles": false,
"extendOutputFunction": "($) => {\n const result = {};\n // Uncomment to add a title to the output\n // result.pageTitle = $('title').text().trim();\n\n return result;\n}",
"isUrlArticleDefinition": {
"minDashes": 4,
"hasDate": true,
"linkIncludes": [
"article",
"storyid",
"?p=",
"id=",
"/fpss/track",
".html",
"/content/"
]
},
"mustHaveDate": false,
"onlyInsideArticles": true,
"onlyNewArticles": false,
"onlyNewArticlesPerDomain": false,
"onlySubdomainArticles": false,
"proxyConfiguration": {
"useApifyProxy": true
},
"saveHtml": false,
"saveHtmlAsLink": false,
"saveSnapshots": false,
"scanSitemaps": false,
"scrollToBottom": false,
"useBrowser": false,
"minWords":1,
"useGoogleBotHeaders": false
}
{
"articleUrls": [
{
"url": "https://www.livemint.com/science/serum-institute-faces-lawsuit-over-covishield-side-effects-deaths-from-blood-clots-says-father-of-deceased-girl-11714617925438.html"
}
],
"crawlWholeSubdomain": false,
"enqueueFromArticles": false,
"extendOutputFunction": "($) => {\n const result = {};\n // Uncomment to add a title to the output\n // result.pageTitle = $('title').text().trim();\n\n return result;\n}",
"isUrlArticleDefinition": {
"minDashes": 4,
"hasDate": true,
"linkIncludes": [
"article",
"storyid",
"?p=",
"id=",
"/fpss/track",
".html",
"/content/"
]
},
"mustHaveDate": false,
"onlyInsideArticles": true,
"onlyNewArticles": false,
"onlyNewArticlesPerDomain": false,
"onlySubdomainArticles": false,
"proxyConfiguration": {
"useApifyProxy": true
},
"saveHtml": false,
"saveHtmlAsLink": false,
"saveSnapshots": false,
"scanSitemaps": false,
"scrollToBottom": false,
"useBrowser": false,
"minWords":1,
"useGoogleBotHeaders": false
}
and only header: Content-Type: application/json I have set "minWords":1 and "mustHaveDate": false

Did you find this page helpful?