Hi!
Hi!
Let’s say I need 3-4 streams (as per the current limits which are 5MB/s) to handle spikes in load
- Is it possible to direct the corresponding pipelines to a single Sink ?
- If the Sink is an R2 bucket, can I ensure, using custom partitioning, that files are written in lexical order ? (I had issues with pipelines legacy where files would be created in R2 out of order)
Use case : events ingestion using Cloudflare Pipelines -> R2 -> Clickhouse Clickpipes S3 integration with continuous ingest (requires lexical ordering of files)
3 Replies
Okay so for first bullet, I managed to direct 2 streams to 1 pipeline using SQL like :
For your first bullet, I think the old version had that issue because of the way it sharded partitions and not having any coordination between them when writing to R2 - in this case we are now using something completely different to write to R2 (based on Arroyo) and this shouldn’t be the case but @Micah | Data Platform or @cole | pipelines can confirm for me
It is possible to union multiple streams, but we can also increase limits for your stream so that's not necessary — DM me your account and stream id and we can discuss
We write files by default with ULID names (https://github.com/ulid/spec), which are are lexicographically sorted. We support custom partitioning by date/time fields (see https://developers.cloudflare.com/pipelines/sinks/available-sinks/r2/#partitioning), and in general there will be a single writer so files will always be written in order.