Cloudflare Developers•5mo ago

Hi @kagitac I am trying to POC a switch

Hi @kagitac I am trying to POC a switch from kinesis firehose to CF pipelines, initially using the HTTP endpoint but also eventually moving from lambda as the entrypoint to a worker.... So far, I'm seeing pretty sluggish perf on ingestion, ranging from 400 to 1200ms, both when using the http api & a worker binding. With the worker binding im also seeing some Internal Operation Failed (which i've added retry for, but that makes the process even slower) Is this expected perf or could I be doing something funky? At full capacity this'll be ~ 10k rpm (in batches of ~200 records per request), so nothing crazy, but we also can't be having 100+ms ingestion times, so I need to understand whether this is viable or nah? I considered also throwing a queue in the mix but didn't really want to add another layer of cost or complexity unless really necessary.

24 Replies

Unknown User•5mo ago

Message Not Public

jsneedlesOP•5mo ago

Thx, pipeline id is 79360a2a033f4c74a65afc6b317a1049 - i've been doing some rudimentary testing so far, no real volume (IE looping 1000 times sequentially sending a few hundred events) ... and noticed the variability but the really high floor. Then I went on to try the worker and sending just a single event is still super slow (actually seems slower than the HTTP method) as for dreams, I don't have a ton 🙂 The goal here is just to aggregate events into nice tidy batches & then we bulk load them into Clickhouse iac friendly == :10000: tho 🙂

Unknown User•5mo ago

Message Not Public

jsneedlesOP•5mo ago

Hey @kagitac - not realllllly? We’re utilizing ClickPipes right now which cuts down on cost/complexity. Obviously one less bit would be great, but it’d need the same performance, exactly once semantics etc. Any update on the perf of my pipeline from yesterday?

Unknown User•5mo ago

Message Not Public

jsneedlesOP•5mo ago

Yes, i defined it as ENAM (and I’m in east us) so i imagine the worker i was hitting was also in ENAM

Unknown User•5mo ago

Message Not Public

jsneedlesOP•5mo ago

Ok, i will evaluate the cost of adding a queue in between, just can’t have the initial request from the clients taking so long

jsneedlesOP•4mo ago

👋 Hi @kagitac I have my POC running at 100% duplicating the stream from lambda ... and its mostly working well. I went with a tail worker to do the pipeline calls. One thing im seeing right now -- the files being written to r2 are not named in a monotomically increasing fashion. So for example in this screenshot, the files ending in CV and 6P were missed by my clickpipe, because it scans forward, and they appeared 2 minutes after 6W but are lexically less than it. I know there's a bunch of path options in the create/update destination.path object - but is there any docs you can share on how to pattern it such that it is lexically correct?

jsneedlesOP•4mo ago

My other option is to basically have a worker off the event notifications and move the files to a different prefix when theyre loaded and rename them myself... which id rather not do 🙂 its interesting because the names are indeed increasing, as they are nice ulids - but theyre not available in r2 api calls / visible (maybe considered finalized) until after the previous ones

Unknown User•4mo ago

Message Not Public

jsneedlesOP•4mo ago

yeah, its not... We havent had this issue on firehose though, I'm able to get the name much more granular there (altho in theory the ulid should take care of it) Maybe its just the difference of when they assign the timestamp / how quickly it finishes.

Unknown User•4mo ago

Message Not Public

jsneedlesOP•4mo ago

Yeah, i get it 🙂 I think the other thing is the file size distribution, seems the smaller ones are getting picked up because they finish more quickly etc. I don't know how distributed my pipeline should be 🤔 all the requests are coming from a worker and all the workers' requests are coming from a single aws az (lambda) Would lessening the shard count help you think?

Unknown User•4mo ago

Message Not Public

jsneedlesOP•4mo ago

would 1 shard make it definitely monotomic ?

Unknown User•4mo ago

Message Not Public

jsneedlesOP•4mo ago

it says 7k requests per second per shard, is that really requests or records? i havent hooked up any of the gql metric stuff so i have no idea what our true throughput is right now

Unknown User•4mo ago

Message Not Public

jsneedlesOP•4mo ago

ok, im turning it down from 10 to 3... hopefully that helps 😬 I think i may just have to abandon clickpipes unfortunately sadly, it didnt help

Unknown User•4mo ago

Message Not Public

jsneedlesOP•4mo ago

120 sorry, was off writing a new worker to consume from q and push directly to CH 🙂

Unknown User•4mo ago

Message Not Public

jsneedlesOP•4mo ago

yeah, i could, i just didn't want to have so many small inserts (the realtimeness isnt so important in this case) BTW the responsiveness of the re-sharding is super impressive! Just went back from 3 to 8 shards and within 2 min the file sizes were more & smaller 🙂

Gaming

Programming

Hi @kagitac I am trying to POC a switch

Did you find this page helpful?