I use this to check the currently used
I use this to check the currently used dataset:
await (await Actor.openDataset()).getInfo()
. However, often times it is not up to date (I feel like some eventually consistent problem is playing). Is it possible of fetching the data when it is consistent? Currently I'm just waiting 3 seconds before fetching, but that feels "hacky".6 Replies
That's a known behavior.
They mention that it takes up to 5 seconds interval to update the data.
You might want to get the dataset items and check their length to have realtime count

Or you can track it manually using some variables which you'd ++ if adding to the dataset is OK
Thanks for the info @azzouzana! Link to docs: https://docs.apify.com/api/v2/dataset-get
I've switched to
(await (await Actor.openDataset()).getData()).count
which works. Is this a better approach? Does it have downsides? If there are no downsides, why would you ever use getInfo().itemCount
?
Link docs: https://docs.apify.com/sdk/js/reference/interface/DatasetContent
The documentation mentions the existence of both count
and total
. But I'm not sure what the actual difference is?
- count: Count of dataset entries returned in this set.
- total: Total count of entries in the dataset.Get dataset | Apify Documentation
ClientsReturns dataset object for given dataset ID.
@Louis Deconinck count is the number of entries in that part of the data you fetched. The downside is that there's some pagination involved & it's relatively slower and could become expensive if done frequently, I remember somewhere that the max you can get in a
getData()
is 10000 & now it seems it has been updated to 250000 https://crawlee.dev/js/api/core/interface/DatasetDataOptions#limit so please experiment with it if needed) vs total which should return the total number of entries in the dataset. I believe getInfo().itemCount is to be used when you don't have to worry about caching/realtime issues as it's some pre-calculated metadata so it's cheap.
For example, if you have a dataset with 857 items
The response of
const data = await dataset.getData({ limit: 100, offset: 0 });
is
{
items: [...], // the actual data entries
count: 100, // how many items are in
items array
total: 857 // total number of items in the dataset
}
I remember for the best possible performance is to do something like (await (await Actor.openDataset()).getData({limit: 1})).total
so you don't fetch all the dataset entriesThanks @azzouzana,
(await (await Actor.openDataset()).getData({limit: 1})).total
worked perfectly!