I use this to check the currently used

I use this to check the currently used dataset: await (await Actor.openDataset()).getInfo(). However, often times it is not up to date (I feel like some eventually consistent problem is playing). Is it possible of fetching the data when it is consistent? Currently I'm just waiting 3 seconds before fetching, but that feels "hacky".
6 Replies
azzouzana
azzouzana3w ago
That's a known behavior. They mention that it takes up to 5 seconds interval to update the data. You might want to get the dataset items and check their length to have realtime count
azzouzana
azzouzana3w ago
No description
azzouzana
azzouzana3w ago
Or you can track it manually using some variables which you'd ++ if adding to the dataset is OK
Louis Deconinck
Louis DeconinckOP3w ago
Thanks for the info @azzouzana! Link to docs: https://docs.apify.com/api/v2/dataset-get I've switched to (await (await Actor.openDataset()).getData()).count which works. Is this a better approach? Does it have downsides? If there are no downsides, why would you ever use getInfo().itemCount? Link docs: https://docs.apify.com/sdk/js/reference/interface/DatasetContent The documentation mentions the existence of both count and total. But I'm not sure what the actual difference is? - count: Count of dataset entries returned in this set. - total: Total count of entries in the dataset.
azzouzana
azzouzana3w ago
@Louis Deconinck count is the number of entries in that part of the data you fetched. The downside is that there's some pagination involved & it's relatively slower and could become expensive if done frequently, I remember somewhere that the max you can get in a getData() is 10000 & now it seems it has been updated to 250000 https://crawlee.dev/js/api/core/interface/DatasetDataOptions#limit so please experiment with it if needed) vs total which should return the total number of entries in the dataset. I believe getInfo().itemCount is to be used when you don't have to worry about caching/realtime issues as it's some pre-calculated metadata so it's cheap. For example, if you have a dataset with 857 items The response of const data = await dataset.getData({ limit: 100, offset: 0 }); is { items: [...], // the actual data entries count: 100, // how many items are in items array total: 857 // total number of items in the dataset } I remember for the best possible performance is to do something like (await (await Actor.openDataset()).getData({limit: 1})).total so you don't fetch all the dataset entries
Louis Deconinck
Louis DeconinckOP3w ago
Thanks @azzouzana, (await (await Actor.openDataset()).getData({limit: 1})).total worked perfectly!

Did you find this page helpful?