Cloudflare Developers•10mo ago

You can see the queries showed up in Hyperdrive Analytics (while the worker was using hyperdrive)

Vvrballer You can see the queries showed up in Hyperdrive Analytics (while the worker was ...

KingMaker•3/15/25, 7:40 PM

Cc @AJR

KKingMaker Cc @AJR

AJR•3/15/25, 7:42 PM

If you swap out Hyperdrive for a direct connection with the changed driver do you still see those errors?

Also, nobody's sent me any IDs yet.

kchro3•3/16/25, 12:55 AM

Hi, I'm not sure if Hyperdrive is the culprit here, but I'm seeing a huge spike in errors, and I don't see any logs in my dashboard so I can't really trace it down. I did see this happen when I was having issues with Hyperdrive, so I wanted to see if I could get a look please!

Kkchro3 Hi, I'm not sure if Hyperdrive is the culprit here, but I'm seeing a huge spike ...

kchro3•3/16/25, 12:58 AM

around the same time, i see that the number of DatabaseConnections I have in RDS gets really weird, which happened the last time I had a bunch of DB disconnects

Kkchro3 around the same time, i see that the number of DatabaseConnections I have in RDS...

kchro3•3/16/25, 1:29 AM

I'm now fairly convinced that it was a Hyperdrive issue. As a hot-fix, I just deployed with a raw postgresjs client (no hyperdrive) and the errors stopped

an•3/16/25, 1:37 AM

^ I'm experiencing similar issue, not yet deep diving to the trace but the API mostly calling to hyperdrive

about ~1 hour ago everything still good

Aan ^ I'm experiencing similar issue, not yet deep diving to the trace but the API m...

kchro3•3/16/25, 1:38 AM

let me know if you want to share notes on what you find

kchro3•3/16/25, 2:09 AM

this was particularly rough because i didn't see any logs in my worker that would indicate a hyperdrive issue. curious what other people were seeing

ivan•3/16/25, 3:08 AM

Same here, replacing hyperdrive with another pooler solved the issue, was really hard to pinpoint

Iivan Same here, replacing hyperdrive with another pooler solved the issue, was really...

kchro3•3/16/25, 3:16 AM

what did you use?

AJR•3/16/25, 3:28 AM

Hey all, can you confirm if you're still seeing impact? I'm seeing issues in some of our Virginia locations.

@kchro3 @an @patternmaster

kchro3•3/16/25, 3:29 AM

I don’t know, I disabled Hyperdrive altogether

Kkchro3 what did you use?

ivan•3/16/25, 3:53 AM

I'm using timescaledb and they have their own pooler

AAJR Hey all, can you confirm if you're still seeing impact? I'm seeing issues in so...

ivan•3/16/25, 3:53 AM

I can test tomorrow, my timescaledb is indeed in virginia

AJR•3/16/25, 4:13 AM

DMing you now to collect some information

AJR•3/16/25, 5:07 AM

This was escalated as an incident, and has now been remediated. We are monitoring status to ensure things remain functioning as expected.

https://www.cloudflarestatus.com/

Cloudflare Status

Welcome to Cloudflare's home for real-time and historical data on system performance.

onetwoDom•3/16/25, 6:41 AM

I am afraid we were still having issues. I created a new Hyperdrive to the same db and it works fine. CORRECTION: the new hyperdrive worked fine at first, but then Client Disconnected errors started building up again

onetwoDom•3/16/25, 6:41 AM

Btw loving the new Invocations tab in Workers Logs

AJR•3/16/25, 6:48 AM

Confirmed we saw impact start climbing again. Still working through it.

AJR•3/16/25, 8:09 AM

Impact has been resolved. We will monitor for a while yet.

AJR•3/16/25, 3:20 PM

Monitoring is showing impact picking back up, working to remediate this.

kchro3•3/16/25, 7:38 PM

when things have settled, can you share more about what has been causing these client disconnected issues over the past couple weeks? i was considering moving to a privately accessible database with a Hyperdrive connection, but i can't justify the risk now because at least with a publicly accessible db, i can connect to it normally.

AAJR Monitoring is showing impact picking back up, working to remediate this.

AJR•3/16/25, 11:02 PM

Things look mostly resolved, there's some backlogged traffic to work through, but if you're still seeing high error rates please let me know.

Kkchro3 when things have settled, can you share more about what has been causing these c...

AJR•3/16/25, 11:06 PM

There's a few different things contributing to this, and picking apart which are the worst offenders requires some observability improvements that should have gone out today. I'll try to set aside some time this week to write up our plans on this topic.

AAJR Things look mostly resolved, there's some backlogged traffic to work through, bu...

shay•3/16/25, 11:27 PM

i'm now getting a few

CONNECTION_CLOSED

CONNECTION_CLOSED

s instead of timeouts

shay•3/16/25, 11:27 PM

not super high but i'm only sampling at 50%

shay•3/16/25, 11:28 PM

Definitely not every invocation though, in fact it seems pretty much fine from the user side

AJR•3/16/25, 11:28 PM

Well that's a good start, at least. Please send me your Hyperdrive ID so I can take a look

shay•3/16/25, 11:31 PM

9568cd870bee47f3801c862de747ca94

ivan•3/17/25, 12:02 AM

I'm still seeing the same, most connections hang and end up in CONNECT_TIMEOUT

Iivan I'm still seeing the same, most connections hang and end up in CONNECT_TIMEOUT

yevgen•3/17/25, 8:16 AM

The issue should be resolved by now. We identified the root cause and rolled out a hotfix.

kchro3•3/17/25, 11:54 AM

ramped traffic back up, so far ok

Original message was deleted

AJR•3/18/25, 1:38 PM

Hyperdrive connects to external databases (e.g Neon, RDS, on-prem Postgres, etc), so it would depend where and how you're hosting the database you're trying to connect to.

AJR•3/18/25, 1:48 PM

That makes sense. Let me explain a bit.

D1 is a Cloudflare-native data storage product. It is a truly distributed product, and runs on our own network with our own tech stack. This means it will (usually) be more performant. However it does have some pretty sharp restrictions to its use, such as size or features, when compared to a more "standard" Postgres database.

Sometimes those restrictions will render D1 inappropriate for some use cases. In those cases, it would be nice to have the option to run your applications on a more standard Postgres offering either on-prem or from other vendors. In that case we still want people to be able to use Cloudflare's Developer Platform to build their application, so we built Hyperdrive. It is a tool for connecting from Workers to those other vendors' products. Because those are not on our network or our servers, connecting to them involves a network hop to wherever they're running. This means that Hyperdrive will often not be able to compete on raw latency alone. However, if you need some features that are not available on D1/R2/KV etc, now you have a way to access them.

AJR•3/18/25, 2:03 PM

It should be less than a couple seconds.

AJR•3/18/25, 2:03 PM

Getting all the way from WEUR to Bangkok might take slightly longer. If you observe longer than a minute, please let me know.

AJR•3/18/25, 2:03 PM

Even a minute would be out of line.

AJR•3/18/25, 2:16 PM

Cache hits for Hyperdrive will be 2-7ms latency. It won't be hard to tell once it gets to working.

AJR•3/18/25, 2:22 PM

Happy to. Good luck!

Original message was deleted

thomasgauvin•3/18/25, 2:26 PM

Let us know how it goes @PatrickJ! Also, depending on how cache heavy your workloads are you may find workers smart placement interesting. If you have more cold reads/less cache hits and a lot of subrequests per worker invocation, it may be best to have that worker placed near your data source

https://developers.cloudflare.com/workers/configuration/smart-placement/

Cloudflare Docs

Smart Placement · Cloudflare Workers docs

Speed up your Worker application by automatically placing your workloads in an optimal location that minimizes latency.

AJR•3/18/25, 2:28 PM

It'll take some traffic to kick on, but what it'll do when run with Hyperdrive is eventually learn to run the Worker itself right next to your DB. That'll make cache misses way faster, but slow down cache hits. Whether that's an improvement will depend on how often you get cache hits, like Thomas said.

AJR•3/18/25, 2:30 PM

That one I'm not sure on, sorry.

kunal•3/18/25, 7:02 PM

Our Cloudflare Worker (backed by Hyperdrive) had a big spike in Errors and Wall Time today starting at around 10:30am PT today. On Hyperdrive, I don't see any spikes in latency, but I did see a couple errors in each of our Hyperdrive instances at ~10:30am PT. Struggling a little with how to debug or fix this - most traffic is fine, but our P999 wall time jumped to 70k ms. All of our backing databases are completely normal, and I'm able to query Hyperdrive normally locally

Original message was deleted

Emilio•3/19/25, 10:08 PM

Yes in the context! Something like this middleware is recommended

// Middleware to inject the database client into the context
export const database = async (c: Context, next: Next) => {
  if (c.env.ENV === "test") {
    const database = new MockDatabaseClient();
    c.set("database", database);
    await next();
  } else {
    const sql = postgres(c.env.HYPERDRIVE.connectionString);
    const database = new KVDatabaseClient(sql);
    c.set("database", database); // set the database client to the context
    await next();
    // clean up the client ensuring we don't kill the worker before that is completed
    c.executionCtx.waitUntil(sql.end());
  }
};

// Middleware to inject the database client into the context
export const database = async (c: Context, next: Next) => {
  if (c.env.ENV === "test") {
    const database = new MockDatabaseClient();
    c.set("database", database);
    await next();
  } else {
    const sql = postgres(c.env.HYPERDRIVE.connectionString);
    const database = new KVDatabaseClient(sql);
    c.set("database", database); // set the database client to the context
    await next();
    // clean up the client ensuring we don't kill the worker before that is completed
    c.executionCtx.waitUntil(sql.end());
  }
};

Note that I'm wrapping the sql connection in another class for mocking purposes

EEmilio Yes in the context! Something like this middleware is recommended ```ts // Middl...

Isaac McFadyen•3/19/25, 11:10 PM

Is there a need to create+close the connection on every request here, given that Hyperdrive does pooling and so it wouldn't exhaust connection limits?

EEmilio Yes in the context! Something like this middleware is recommended ```ts // Middl...

DaniFoldi•3/19/25, 11:18 PM

I'd also add a maximum pool size to the postgres<->hyperdrive part, since >6 queries can deadlock the worker

IIsaac McFadyen Is there a need to create+close the connection on every request here, given that...

AJR•3/20/25, 10:56 AM

Blog coming soon. TLDR: that's a different "kind" of connection than the ones to the origin, takes ~2ms to open, and closing it will avoid holding your Worker open doing any processing, helping keep your CPU time down. I'd recommend doing it the way Emilio showed.

// Middleware to inject the database client into the context export const database = async (c: Context, next: Next) => { if (c.env.ENV === "test") { const database = new MockDatabaseClient(); c.set("database", database); await next(); } else { const sql = postgres(c.env.HYPERDRIVE.connectionString); const database = new KVDatabaseClient(sql); c.set("database", database); // set the database client to the context await next(); // clean up the client ensuring we don't kill the worker before that is completed c.executionCtx.waitUntil(sql.end()); } };

You can see the queries showed up in Hyperdrive Analytics (while the worker was using hyperdrive)

Similar Threads

Similar Threads

Similar Threads