Build Asynchronous pipeline
Hi There, I'm building a pipeline with Apify, but I want to run many jobs in parallel then later I will check their status and export completed jobs.
is there anything I can use to export the data of any job using job id or something ?
14 Replies
metropolitan-bronze•2y ago
Hey there! There's
runs
endpoint - https://docs.apify.com/api/v2/#/reference/actor-runs/run-collection/get-user-runs-list - it return list of runs for a given actor. The list actually contains the status of each run - so you could either iterate through, grab run Ids and then fetch the default storages or whatsoeverforeign-sapphireOP•2y ago
Thank you, Andrey,
Yes I used this API but I thought maybe there's a direct API to check status of specific job.
So this is how I build my pipeline:
1. start job and save the job information into a database/ file.
2. later I will check all jobs in my actor, and if my job is completed I will export the data
This is the best way to do that?
Thank you again
metropolitan-bronze•2y ago
Then it's the following endpoint I believe: https://docs.apify.com/api/v2/#/reference/actor-runs/run-object-and-its-storages -> https://docs.apify.com/api/v2/#/reference/actor-runs/run-object-and-its-storages/get-run
foreign-sapphireOP•2y ago
Thank you for that, it's really helpful:
I found that the api will be something like that:
https://api.apify.com//v2/actor-runs/{runId}{?token}
But how could I get runId, when I run the task I got these information:
dict_keys(['id', 'actId', 'userId', 'startedAt', 'finishedAt', 'status', 'meta', 'stats', 'options', 'createdByOrganizationMemberUserId', 'buildId', 'defaultKeyValueStoreId', 'defaultDatasetId', 'defaultRequestQueueId', 'buildNumber', 'containerUrl', 'usage', 'usageTotalUsd', 'usageUsd'])
I tested every id, but always the api return {'error': {'type': 'page-not-found',
'message': 'We have bad news: there is no API endpoint at this URL. Did you specify it correctly?'}}
So where could I get the runId for my job to check it later
and please, can we save videos directly to AWS S3, as I need to save them later there so I want to save apify storage cost for that? or at least API for deleting apify dataset after exporting it to my S3 storage
@Mahmoud GHonem just advanced to level 1! Thanks for your contributions! 🎉
metropolitan-bronze•2y ago
It should be just
id
from the list you sent above. How do you start the run? Or I mainly need to know response to which call gives you this set of props? but generally the error message tells that something's off with the endpoint/url - so maybe also double check that you're using it correctly.
As for saving the videos - you can do it - but you would need to implement the upload by yourself. Or you could delete store: https://docs.apify.com/api/v2/#/reference/key-value-stores/store-object/delete-storeforeign-sapphireOP•2y ago
Thank you Andrey, Appreciate your response it will help us start using Apify,
I used this api to trigger the job
url = "https://api.apify.com/v2/acts/clockworks~tiktok-scraper/runs?token=***"
It returned the above list
* Then I tried to test using many apis to return tasks status. for example this one:
f"https://api.apify.com/v2/acts/GdWCkxBtKWOsKjdch/runs?token={token}"
it should return all the runs inside actor, but it just returned some of them and my job wasn't one of the returned tasks
* I tested also this API:
url = f"https://api.apify.com//v2/actor-runs/{runId}?{token}"
I gave it runId = id when I run the task, as you described, but this is the response
{'error': {'type': 'page-not-found',
'message': 'We have bad news: there is no API endpoint at this URL. Did you specify it correctly?'}}
please help me in this step as I can't continue working and start using apify because I want API to check the status of the previous job using their id
Yes I started the run and the task is working, I checked that from console. I even take the task_id from the url at console and used it with same api as runId but it returned the same response
metropolitan-bronze•2y ago
So do you start an actor directly or a task? The endpoints above are for actor runs, but there's a set of API endpoints for tasks: https://docs.apify.com/api/v2#/reference/actor-tasks/get-list-of-task-runs
foreign-sapphireOP•2y ago
what's the difference between starting actor and task?
I'm using this api to run:
url = f"https://api.apify.com/v2/acts/{actor}/runs?token={token}"
So I think I run actor. right ?
foreign-sapphireOP•2y ago
This is the api I used to run the job

foreign-sapphireOP•2y ago
So I create runs at actors
Then I want to check status of that runs by passing run Id or something
These are the information I get when I run it
dict_keys(['id', 'actId', 'userId', 'startedAt', 'finishedAt', 'status', 'meta', 'stats', 'options', 'createdByOrganizationMemberUserId', 'buildId', 'defaultKeyValueStoreId', 'defaultDatasetId', 'defaultRequestQueueId', 'buildNumber', 'containerUrl', 'usage', 'usageTotalUsd', 'usageUsd'])
metropolitan-bronze•2y ago
Should be
https://api.apify.com/v2/actor-runs/{runId}?token={token}
- just check and it works. From your list id
is the runId
here you miss the actual param it seems (token={token}
)
url = f"https://api.apify.com//v2/actor-runs/%7BrunId%7D?{token}"also double slash
foreign-sapphireOP•2y ago
Thank you Andrey, it seems I used two different tokens, one for creating task and another one to check its status, so the second one a see the task
But I solved it and the resources you sent were very helpful, I really appreciate your help
metropolitan-bronze•2y ago
Glad this is resolved 👍