Serving TB of data from Google Cloud Storage (GCS)

Hello all, We are serving TB of data per day from GCS (ML checkpoints, 2 GB to 10 GB per file). Egress is starting to be a pain! 💸 I'm wondering if Cloudflare could help: - CDN: Could we just proxy a GCS bucket using a domain CNAME, and enjoy both free egress from CF and cheaper egress between GCS and CF due to the Bandwidth Alliance? - R2: Could we use R2 with Sippy for the same purpose? Would it be a better match?
14 Replies
Erisa
Erisa•7mo ago
Sippy doesn't support GCS yet, but it will in the future
Isaac McFadyen
Isaac McFadyen•7mo ago
CDN: probably not without an Enterprise contract, that's a lot of traffic R2: erisa's response ^ If you can somehow get the data into R2 without Sippy though, R2's egress is indeed unmetered
MasterScrat
MasterScrat•7mo ago
what about the GCS -> CF egress cost, is it automatically lower because of Bandwidth Alliance? or do I need to reach out to GCP and/or CF? what's a reasonable daily upper bound for CDN? we could use it only for files that are popular
Isaac McFadyen
Isaac McFadyen•7mo ago
There's not a concrete upper limit, and generally sales do reach out if you are getting up there in egress fees. There have been a few rare cases where they cut customers off before reaching out but generally they do reach out. Not sure re: the Bandwidth Alliance. I've seen other providers where it's a manual process ("open a ticket") but haven't seen anything specific to GCS.
MasterScrat
MasterScrat•7mo ago
But on CF side, i wouldn't need to reach out?
Isaac McFadyen
Isaac McFadyen•7mo ago
No, but CF isn't the one billing you (unless you have an Ent contract in which case you'll know whether you need to reach out or not) GCP would still be the ones billing for egress and I'm uncertain whether you'd need to reach out to them or not
Erisa
Erisa•7mo ago
GCP calls it CDN Interconnect and it just happens automatically. Though they do note on their docs that it only applies to IPv4 and not IPv6. https://cloud.google.com/network-connectivity/docs/cdn-interconnect
MasterScrat
MasterScrat•7mo ago
how can i enforce IPv4 from CF CDN?
Erisa
Erisa•7mo ago
Origin requests will prefer IPv4 if both v4 and v6 addresses are given, even if the client uses IPv6. So it likely won't be a problem
MasterScrat
MasterScrat•7mo ago
Are there recommendations about which GCS regions to use? or does CF have PNIs with all GCP datacenters? Would excessive bandwidth still be an issue if I would use Cache Reserve? My files are too big to be cached by CDN, so they'd end up in Cache Reserve, which is backed by R2 and for which I'd pay for storage and operations.
Consult with your CDN provider to verify that they are an approved provider, and if so, which of their CDN locations are approved for this program.
there is no such thing as CDN locations with CF right?
Isaac McFadyen
Isaac McFadyen•7mo ago
CF has CDN locations but you cannot decide which to use, the nearest is automatically selected based on your location.
MasterScrat
MasterScrat•7mo ago
Would there be a way on CF side to select a different GCS bucket depending on user location?
Isaac McFadyen
Isaac McFadyen•7mo ago
There would with a Worker. But if the datacenter (CF) is halfway across the world from the bucket (GCP) then I don't know if there's Interconnect and might not be bandwidth reduction. You'd need to reach out to CF and/or GCP to find out more specifics, sorry.
MasterScrat
MasterScrat•3mo ago
I see Traffic Steering with Load Balancing may also help with that but yeah Worker could be more flexible! Coming back to this: after checking with our GCP rep, nothing needs to be done to enable "Bandwidth Alliance" pricing, it just works
Want results from more Discord servers?
Add your server
More Posts