Hi @kagitac I am trying to POC a switch

Hi @kagitac I am trying to POC a switch from kinesis firehose to CF pipelines, initially using the HTTP endpoint but also eventually moving from lambda as the entrypoint to a worker.... So far, I'm seeing pretty sluggish perf on ingestion, ranging from 400 to 1200ms, both when using the http api & a worker binding. With the worker binding im also seeing some Internal Operation Failed (which i've added retry for, but that makes the process even slower) Is this expected perf or could I be doing something funky? At full capacity this'll be ~ 10k rpm (in batches of ~200 records per request), so nothing crazy, but we also can't be having 100+ms ingestion times, so I need to understand whether this is viable or nah? I considered also throwing a queue in the mix but didn't really want to add another layer of cost or complexity unless really necessary.
24 Replies
Unknown User
Unknown Userβ€’2mo ago
Message Not Public
Sign In & Join Server To View
jsneedles
jsneedlesOPβ€’2mo ago
Thx, pipeline id is 79360a2a033f4c74a65afc6b317a1049 - i've been doing some rudimentary testing so far, no real volume (IE looping 1000 times sequentially sending a few hundred events) ... and noticed the variability but the really high floor. Then I went on to try the worker and sending just a single event is still super slow (actually seems slower than the HTTP method) as for dreams, I don't have a ton πŸ™‚ The goal here is just to aggregate events into nice tidy batches & then we bulk load them into Clickhouse iac friendly == :10000: tho πŸ™‚
Unknown User
Unknown Userβ€’2mo ago
Message Not Public
Sign In & Join Server To View
jsneedles
jsneedlesOPβ€’5w ago
Hey @kagitac - not realllllly? We’re utilizing ClickPipes right now which cuts down on cost/complexity. Obviously one less bit would be great, but it’d need the same performance, exactly once semantics etc. Any update on the perf of my pipeline from yesterday?
Unknown User
Unknown Userβ€’5w ago
Message Not Public
Sign In & Join Server To View
jsneedles
jsneedlesOPβ€’5w ago
Yes, i defined it as ENAM (and I’m in east us) so i imagine the worker i was hitting was also in ENAM
Unknown User
Unknown Userβ€’5w ago
Message Not Public
Sign In & Join Server To View
jsneedles
jsneedlesOPβ€’5w ago
Ok, i will evaluate the cost of adding a queue in between, just can’t have the initial request from the clients taking so long
jsneedles
jsneedlesOPβ€’5w ago
πŸ‘‹ Hi @kagitac I have my POC running at 100% duplicating the stream from lambda ... and its mostly working well. I went with a tail worker to do the pipeline calls. One thing im seeing right now -- the files being written to r2 are not named in a monotomically increasing fashion. So for example in this screenshot, the files ending in CV and 6P were missed by my clickpipe, because it scans forward, and they appeared 2 minutes after 6W but are lexically less than it. I know there's a bunch of path options in the create/update destination.path object - but is there any docs you can share on how to pattern it such that it is lexically correct?
No description
jsneedles
jsneedlesOPβ€’5w ago
My other option is to basically have a worker off the event notifications and move the files to a different prefix when theyre loaded and rename them myself... which id rather not do πŸ™‚ its interesting because the names are indeed increasing, as they are nice ulids - but theyre not available in r2 api calls / visible (maybe considered finalized) until after the previous ones
Unknown User
Unknown Userβ€’5w ago
Message Not Public
Sign In & Join Server To View
jsneedles
jsneedlesOPβ€’5w ago
yeah, its not... We havent had this issue on firehose though, I'm able to get the name much more granular there (altho in theory the ulid should take care of it) Maybe its just the difference of when they assign the timestamp / how quickly it finishes.
Unknown User
Unknown Userβ€’5w ago
Message Not Public
Sign In & Join Server To View
jsneedles
jsneedlesOPβ€’5w ago
Yeah, i get it πŸ™‚ I think the other thing is the file size distribution, seems the smaller ones are getting picked up because they finish more quickly etc. I don't know how distributed my pipeline should be πŸ€” all the requests are coming from a worker and all the workers' requests are coming from a single aws az (lambda) Would lessening the shard count help you think?
Unknown User
Unknown Userβ€’5w ago
Message Not Public
Sign In & Join Server To View
jsneedles
jsneedlesOPβ€’5w ago
would 1 shard make it definitely monotomic ?
Unknown User
Unknown Userβ€’5w ago
Message Not Public
Sign In & Join Server To View
jsneedles
jsneedlesOPβ€’5w ago
it says 7k requests per second per shard, is that really requests or records? i havent hooked up any of the gql metric stuff so i have no idea what our true throughput is right now
Unknown User
Unknown Userβ€’5w ago
Message Not Public
Sign In & Join Server To View
jsneedles
jsneedlesOPβ€’4w ago
ok, im turning it down from 10 to 3... hopefully that helps 😬 I think i may just have to abandon clickpipes unfortunately sadly, it didnt help
Unknown User
Unknown Userβ€’4w ago
Message Not Public
Sign In & Join Server To View
jsneedles
jsneedlesOPβ€’4w ago
120 sorry, was off writing a new worker to consume from q and push directly to CH πŸ™‚
Unknown User
Unknown Userβ€’4w ago
Message Not Public
Sign In & Join Server To View
jsneedles
jsneedlesOPβ€’4w ago
yeah, i could, i just didn't want to have so many small inserts (the realtimeness isnt so important in this case) BTW the responsiveness of the re-sharding is super impressive! Just went back from 3 to 8 shards and within 2 min the file sizes were more & smaller πŸ™‚

Did you find this page helpful?