Hi @kagitac I am trying to POC a switch
Hi @kagitac I am trying to POC a switch from kinesis firehose to CF pipelines, initially using the HTTP endpoint but also eventually moving from lambda as the entrypoint to a worker....
So far, I'm seeing pretty sluggish perf on ingestion, ranging from 400 to 1200ms, both when using the http api & a worker binding. With the worker binding im also seeing some
Internal Operation Failed
(which i've added retry for, but that makes the process even slower)
Is this expected perf or could I be doing something funky?
At full capacity this'll be ~ 10k rpm (in batches of ~200 records per request), so nothing crazy, but we also can't be having 100+ms ingestion times, so I need to understand whether this is viable or nah? I considered also throwing a queue in the mix but didn't really want to add another layer of cost or complexity unless really necessary.24 Replies
Unknown Userβ’2mo ago
Message Not Public
Sign In & Join Server To View
Thx, pipeline id is
79360a2a033f4c74a65afc6b317a1049
- i've been doing some rudimentary testing so far, no real volume (IE looping 1000 times sequentially sending a few hundred events) ... and noticed the variability but the really high floor.
Then I went on to try the worker and sending just a single event is still super slow (actually seems slower than the HTTP method)
as for dreams, I don't have a ton π The goal here is just to aggregate events into nice tidy batches & then we bulk load them into Clickhouse
iac friendly
== :10000: tho πUnknown Userβ’2mo ago
Message Not Public
Sign In & Join Server To View
Hey @kagitac - not realllllly? Weβre utilizing ClickPipes right now which cuts down on cost/complexity.
Obviously one less bit would be great, but itβd need the same performance, exactly once semantics etc.
Any update on the perf of my pipeline from yesterday?
Unknown Userβ’5w ago
Message Not Public
Sign In & Join Server To View
Yes, i defined it as ENAM (and Iβm in east us) so i imagine the worker i was hitting was also in ENAM
Unknown Userβ’5w ago
Message Not Public
Sign In & Join Server To View
Ok, i will evaluate the cost of adding a queue in between, just canβt have the initial request from the clients taking so long
π Hi @kagitac
I have my POC running at 100% duplicating the stream from lambda ... and its mostly working well. I went with a tail worker to do the pipeline calls.
One thing im seeing right now -- the files being written to r2 are not named in a monotomically increasing fashion.
So for example in this screenshot, the files ending in
CV
and 6P
were missed by my clickpipe, because it scans forward, and they appeared 2 minutes after 6W
but are lexically less than it.
I know there's a bunch of path
options in the create/update destination.path
object - but is there any docs you can share on how to pattern it such that it is lexically correct?
My other option is to basically have a worker off the event notifications and move the files to a different prefix when theyre loaded and rename them myself... which id rather not do π
its interesting because the names are indeed increasing, as they are nice ulids - but theyre not available in r2 api calls / visible (maybe considered finalized) until after the previous ones
Unknown Userβ’5w ago
Message Not Public
Sign In & Join Server To View
yeah, its not...
We havent had this issue on firehose though, I'm able to get the name much more granular there (altho in theory the ulid should take care of it)
Maybe its just the difference of when they assign the timestamp / how quickly it finishes.
Unknown Userβ’5w ago
Message Not Public
Sign In & Join Server To View
Yeah, i get it π
I think the other thing is the file size distribution, seems the smaller ones are getting picked up because they finish more quickly etc.
I don't know how distributed my pipeline should be π€ all the requests are coming from a worker and all the workers' requests are coming from a single aws az (lambda)
Would lessening the shard count help you think?
Unknown Userβ’5w ago
Message Not Public
Sign In & Join Server To View
would 1 shard make it definitely monotomic ?
Unknown Userβ’5w ago
Message Not Public
Sign In & Join Server To View
it says 7k requests per second per shard, is that really requests or records?
i havent hooked up any of the gql metric stuff so i have no idea what our true throughput is right now
Unknown Userβ’5w ago
Message Not Public
Sign In & Join Server To View
ok, im turning it down from 10 to 3... hopefully that helps π¬
I think i may just have to abandon clickpipes unfortunately
sadly, it didnt help
Unknown Userβ’4w ago
Message Not Public
Sign In & Join Server To View
120
sorry, was off writing a new worker to consume from q and push directly to CH π
Unknown Userβ’4w ago
Message Not Public
Sign In & Join Server To View
yeah, i could, i just didn't want to have so many small inserts
(the realtimeness isnt so important in this case)
BTW the responsiveness of the re-sharding is super impressive!
Just went back from 3 to 8 shards and within 2 min the file sizes were more & smaller π