Tips for merging over 10,000csv files in an R2 bucket.
I have tried multiple methods such as sending a HTTP post to a worker that rapidly attempted to merge thousands of files but was rate limited for concurrently accessing multiple R2 objects. Then i tried sending batches to a queue, but it seemed fragile, prone to data loss (recieved a connection closed) message and it was taking ages.
The reason i wanted to do this is so i can store them in the cache as a single file and it can quickly be served to the user, but concatenating this many files to cache times out, so im wanting to now merge them all into a single file so caching is more simple. but CF lacks cloud functions that allow long running tasks which could handle something like this.
all up 10,000 files is only around 50mb. ideally i can do this in CF to not pay for egress charges that would be incurred by orchestrating this in google cloud or similar services. any ideas???
The reason i wanted to do this is so i can store them in the cache as a single file and it can quickly be served to the user, but concatenating this many files to cache times out, so im wanting to now merge them all into a single file so caching is more simple. but CF lacks cloud functions that allow long running tasks which could handle something like this.
all up 10,000 files is only around 50mb. ideally i can do this in CF to not pay for egress charges that would be incurred by orchestrating this in google cloud or similar services. any ideas???
