Screenshot 2025-04-14 at 03.56.10

So my Hyperdrive connection with MySQL just started throwing errors, or well it started not responding. https://screen.bouma.link/fGflgtK5X2vH5wrX98nZ From the Cloudflare Dashboard everything is "Active" (Hyperdrive) and "Healthy" (the Tunnel) and cloudflared also is runnign without any log output. But workers connecting throw:
Connection lost: The server closed the connection.
Connection lost: The server closed the connection.
Any clue where to start debugging this? The MySQL server is doing fine and nothing has changed on my end (I was asleep when the incident started). But I expected there to be some visible fault somewhere πŸ˜… Cloudflare side issue?
CleanShot Cloud
Screenshot 2025-04-14 at 03.56.10
Screenshot
52 Replies
AJR
AJRβ€’3w ago
Need your Hyperdrive ID, and I'll take a look in the morning. I assume direct connections with a different mysql client work correctly?
Alex
AlexOPβ€’3w ago
Yes, in fact I removed the worker route to let it fallback to my origin and everything is working again (where my origin is talking to the same database). Luckily the worker is just a "optimization" basically so removing it is fine. Thanks for looking into this, I'll be trying to catch some winks too so be back in a couple of hours πŸ™‚ The ID: e091675e42a94e789ab05718442dce6a
AJR
AJRβ€’3w ago
I also see all your metrics fall to 0 at that time. I see you're going across a tunnel. My first thing to check would be to restart the tunnel with loglevel=debug, to see if you're still successfully authing through there.
Alex
AlexOPβ€’3w ago
So restarting cloudflared did seem to help. Which is pretty unfortunate. The problem did came back pretty quick though within a few minutes. Now running under debug log level but so far no output other then the startup.
cloudflared --no-autoupdate tunnel --loglevel debug --log-directory /var/log/cloudflared run --token <REDACTED
cloudflared --no-autoupdate tunnel --loglevel debug --log-directory /var/log/cloudflared run --token <REDACTED
is running at the moment. Also opened teh connector diagnostics page in Zero Trust dashboard. No errors I can find there. Access analytics show no failed logins for the Hyperdrive application. It shows many successful authentication attempts. Database is still running perfectly and has ~45 connections left. No hyperdrive connections are made to it and workers is still throwing errors. Restarting cloudflared and re-deploying the worker seems to have no effect. Hopefully y'all will be able to tell me where the pain lies and what broke because I am at a loss πŸ˜… The worker has been running without issues since saturday morning and stopped working sunday evening. I have not deployed the worker or made any changes to the server config for the whole of sunday.
AJR
AJRβ€’3w ago
Ok, for now this is going into the category of "beta bug that we're still working to RCA". If it ends up being tunnel weirdness we'll figure that out, but I'm working from the assumption that this is a gap somewhere in how we're handling the wire protocol. With that said (feel free to answer in DMs if you're more comfortable with that): * Can you share as much as possible about your hosting, including specific MySQL version, on-prem vs PaaS, etc. * Can you share as much as possible about your queries/access patterns. Query examples, are you using transactions, etc etc. Thank you!
Alex
AlexOPβ€’3w ago
Interesting, okay cool. let me try and answer as much as possible, luckily it's IMHO a very simple setup which might help πŸ™‚ - I am using mysqld Ver 8.0.41-0ubuntu0.20.04.1 for Linux on x86_64 on a self-managed VPS not with a public cloud provider. It has both v4 and v6 internet connectivity. The tunnel is configured to talk to 127.0.0.1 :3306. I have configured it with a user with only select and show view privileges on a single database. - I run 2 queries in my worker. As far as I can tell the first query already fails (which means it doesn't execute the second. I am using Drizzle with the MySQL2 connector. According to Drizzles logs it executes:
Query: select `id`, `uuid`, `team_id`, `domain`, `with_links`, `is_default`, `target`, `include_path`, `include_query`, `redirect_default_not_found` from `custom_domains` where `custom_domains`.`domain` = ? limit ? -- params: ["example.com", 1]
Query: select `id`, `uuid`, `team_id`, `domain`, `with_links`, `is_default`, `target`, `include_path`, `include_query`, `redirect_default_not_found` from `custom_domains` where `custom_domains`.`domain` = ? limit ? -- params: ["example.com", 1]
I am happy to answer any specific questions and/or run some non-destructive commands or even give you access to the Hyperdrive if needed (since it's read-only it's no problem). But of course if we are arranging that we should move to DM's πŸ˜„
Alex
AlexOPβ€’3w ago
CleanShot Cloud
Screenshot 2025-04-15 at 13.18.08
Screenshot
Alex
AlexOPβ€’3w ago
"Magically" started working again
AJR
AJRβ€’3w ago
Man. I'm gonna have your ID memorized by May. I can tell.
Alex
AlexOPβ€’3w ago
Not sure that's a good thing... 🫣
AJR
AJRβ€’3w ago
We haven't released any changes since yesterday, to be clear.
Alex
AlexOPβ€’3w ago
Oh... I did do an deployment this morning
AJR
AJRβ€’3w ago
Worker or Hyperdrive?
Alex
AlexOPβ€’3w ago
Worker
AJR
AJRβ€’3w ago
Okay. That shouldn't interact with your Hyperdrive config at all, really. Just for context. I'm going to start with another run through of logs for you when I get to my desk this morning. I want to see how that all looks.
Alex
AlexOPβ€’3w ago
I ran yarn upgrade (not seeing mysql2 in there or other related libs from a quick glance) and I also lowered the compat date to 2025-04-02. No actual code changes. In case it matters. Let's see how long it keeps working this time then! Also not touched the MySQL server at all. So not even 3 hours from the looks of it. I did notice when looking at my MySQL server process list that when the first errors started rolling in there were 2 connections, and then 1 and now 0. It took a minute for requests to start consistently failing too. Guessing some of it was also the query cache. But now 100% failure rate again. In cloudflared logs I see:
{"level":"debug","event":1,"connIndex":0,"originService":"tcp://127.0.0.1:3306","ingressRule":0,"destAddr":"tcp://127.0.0.1:3306","time":"2025-04-15T11:39:48Z","message":"upstream->downstream copy: read tcp 127.0.0.1:42476->127.0.0.1:3306: use of closed network connection"}
{"level":"debug","event":1,"connIndex":0,"originService":"tcp://127.0.0.1:3306","ingressRule":0,"destAddr":"tcp://127.0.0.1:3306","time":"2025-04-15T11:39:48Z","message":"upstream->downstream copy: read tcp 127.0.0.1:42476->127.0.0.1:3306: use of closed network connection"}
Check MySQL values and wait_timeout is 8 hours. Not sure if other timeouts could be in play here which is what my first thought went to seeing this behaviour. I would still expect Hyperdrive to handle this and create a new connection but maybe it detects a max_connections situation wrongly here. But I am now assuming based on nothing... I'll let you do the actual root causing here!
AJR
AJRβ€’2w ago
Agreed at least that Hyperdrive is designed to drop bad connections and spin up a new one. That's a good angle to pursue also. Independent of why things fell out of sync somehow, why is it not detecting that and doing the obvious thing. I'll keep you posted Quick followup here. We're adding some additional robustness to the health checks and autorefresh behavior for MySQL connections. That'll go out in our next release, starting either today or tomorrow and done by Friday/Monday.
Alex
AlexOPβ€’2w ago
Hope to see stable service after that 🀘 Thanks for the update!
AJR
AJRβ€’2w ago
@Alex The release is out, we should be in a better spot for dropping/replacing bad connections for MySQL configs. Please let me know how it goes for you.
Alex
AlexOPβ€’2w ago
Very much going in the right direction! https://screen.bouma.link/TmpHFwkgHQv2KM6CDTsK
CleanShot Cloud
Screenshot 2025-04-17 at 22.11.25
Screenshot
Alex
AlexOPβ€’2w ago
Let's see how it holds up over the weekend! Currently seeing ~22 connection to MySQL, which is way more then before (don't think I've seen more then 2 before). So something is definitly better!
Alex
AlexOPβ€’2w ago
So no we are moving the other way 🀣 I have a 51 connection limit on my database, which should be way plenty but Hyperdrive is keeping 30+ connections idle for long times: https://screen.bouma.link/V2zQ00XFj5yCB9j0jYnH
CleanShot Cloud
Screenshot 2025-04-18 at 10.36.03
Screenshot
Alex
AlexOPβ€’2w ago
Some connection have been idle 4+ hours In addition it also had ~14 actively (within last 60s) connections That broke my app πŸ™ˆ And this time not just the worker
AJR
AJRβ€’2w ago
Well that's not supposed to happen. We drop idle connections after 15 minutes. Generally the way this should work is that it will aggressively open connections whenever all available ones are in use, up to 60. Anything that hasn't had traffic in 15 minutes should be disconnected, though I'm assuming you don't have any middleware in your stack that'll hold things open until it gets an explicit close message?
Alex
AlexOPβ€’2w ago
Since I am using Drizzle, I am not a 100% sure what it exactly is doing ofcourse. And I am not explicitely closing the connection to Hyperdrive either. But I also wouldn't expect a single instance of an isolate to live 4+ hours without any requests. I am at least not doing anything with the connection explicitly. I am even connecting in the fetch handler opposed to in the global scope.
AJR
AJRβ€’2w ago
Hyperdrive exists separate from the isolate. Couldn't have warm connections otherwise. But no, it should only live for 15 minutes without traffic I'm planning to bring this to the team, and we'll dig in starting today.
Alex
AlexOPβ€’2w ago
Happy to provide any details if that helps. I can also share the worker code if that helps.
knickish
knickishβ€’2w ago
I think we've found the root cause of this issue, will let you know here once we've confirmed that and released a fix for it. Thanks for your patience
Alex
AlexOPβ€’2w ago
No worries. Happy to β€œhelp” nail this down by breaking it.
AJR
AJRβ€’2w ago
No quotes needed, every problem you find is one less that everyone has to deal with. We very much appreciate it.
mehedi
mehediβ€’2w ago
Hey @Alex , sorry to bother you! I'm using Drizzle with Hono, MySQL, and Hyperdrive. Could you help me confirm if my setup looks right? I'm running into similar issues too. Thanks in advance. This is the code
import { Hono } from "hono";
import { qfykActionschedulerActions } from "./db/schema";
import { drizzle } from "drizzle-orm/mysql2";
import mysql from "mysql2/promise";

async function getDb(hyperdrive: Hyperdrive) {
const connection = await mysql.createConnection({
host: hyperdrive.host,
user: hyperdrive.user,
password: hyperdrive.password,
database: hyperdrive.database,
port: hyperdrive.port,
disableEval: true,
});
return drizzle(connection);
}

type Bindings = {
HYPERDRIVE: Hyperdrive;
};
const app = new Hono<{ Bindings: Bindings }>();

app.get("/", async (c) => {
const db = await getDb(c.env.HYPERDRIVE);
const result = await db.select().from(qfykActionschedulerActions);

return c.json({
result,
});
});

export default app;
import { Hono } from "hono";
import { qfykActionschedulerActions } from "./db/schema";
import { drizzle } from "drizzle-orm/mysql2";
import mysql from "mysql2/promise";

async function getDb(hyperdrive: Hyperdrive) {
const connection = await mysql.createConnection({
host: hyperdrive.host,
user: hyperdrive.user,
password: hyperdrive.password,
database: hyperdrive.database,
port: hyperdrive.port,
disableEval: true,
});
return drizzle(connection);
}

type Bindings = {
HYPERDRIVE: Hyperdrive;
};
const app = new Hono<{ Bindings: Bindings }>();

app.get("/", async (c) => {
const db = await getDb(c.env.HYPERDRIVE);
const result = await db.select().from(qfykActionschedulerActions);

return c.json({
result,
});
});

export default app;
Alex
AlexOPβ€’2w ago
The only difference from my code is that I did:
const connection = drizzle({
connection: {
host: env.HYPERDRIVE.host,
port: env.HYPERDRIVE.port,
user: env.HYPERDRIVE.user,
password: env.HYPERDRIVE.password,
database: env.HYPERDRIVE.database,

// The following line is needed for mysql2 compatibility with Workers
// mysql2 uses eval() to optimize result parsing for rows with > 100 columns
// Configure mysql2 to use static parsing instead of eval() parsing with disableEval
disableEval: true,
},
});
const connection = drizzle({
connection: {
host: env.HYPERDRIVE.host,
port: env.HYPERDRIVE.port,
user: env.HYPERDRIVE.user,
password: env.HYPERDRIVE.password,
database: env.HYPERDRIVE.database,

// The following line is needed for mysql2 compatibility with Workers
// mysql2 uses eval() to optimize result parsing for rows with > 100 columns
// Configure mysql2 to use static parsing instead of eval() parsing with disableEval
disableEval: true,
},
});
Not sure that makes a difference really but it's a difference. The problems are mostly gone by the way, it has been running pretty good for the past few days
mehedi
mehediβ€’2w ago
Dont we have to close the connection on every request like the doc suggests?
Alex
AlexOPβ€’2w ago
As far as I understand. No. Also: https://discord.com/channels/595317990191398933/1363026034240262266/1363311277333549280 Once your worker finished the connection is killed anyway. And remember it's the connection with Hyperdrive not your SQL server. So it shouldn't matter too much.
mehedi
mehediβ€’2w ago
Thank you very much for the help
AJR
AJRβ€’2w ago
The release with the fix for idle connections has gone out. That should be looking better now. Kudos to @knickish for finding and fixing that. Please let us know if anything else pops up for you!
Alex
AlexOPβ€’2w ago
Excellent! I am so far super happy with the stability. Will report if I still find problems. Thank y'all!
Alex
AlexOPβ€’6d ago
Alex
AlexOPβ€’6d ago
This is over the past 7 days 9k errors sounds like a lot but a lot is invalid request methods and things like that. Hyperdrive has been rock solid!
AJR
AJRβ€’6d ago
That is awesome to hear. Thanks for sharing, I'll pass that to the team. :MeowHeartCloudflare: If you're willing to speak to it a bit, would you mind sharing what you're using Hyperdrive for, and what you used prior?
Alex
AlexOPβ€’4d ago
For sure. I run a url shortener (also don’t know why, side project gone wild) @ tny.app. It allows custom domains. I am using fly.io for that infrastructure to have some global-ish coverage (unfortunately not using Cloudflare for SaaS because pretty costly and need enterprise for apex routing) but all requests went back to AMS to talk with my application servers. I’ve now replicated the redirect code serving redirects directly from the worker hence why I needed Hyperdrive to make this change super easy without duplicating my data to KV. And with the added caching it’s now pretty fast all around the world compared to just the fly.io->AMS it now goes fly.io->CF->AMS(Hyperdrive) Damn. We went so well... just took down my application fully again by exhausting the connection limit. https://paste.chief.tools/a1ce4bd1-7d96-43d3-9fca-19d97f9c4b74/markdown It is supposed to not do that right? This time no very old connections. Just many of them. I've now been smart and added a limit on the hyperdrive MySQL user to prevent this from happening. But it seems like for some reason Hyperdrive is maintaining a far larger connection pool then before. And this time all actively used, last time many were idle for hours. Not the case now. No change in traffic to the worker.
AJR
AJRβ€’4d ago
How many in total? It should top out at 60. Maybe 120 if the overall system is stressed and you spill to a second pool
Alex
AlexOPβ€’4d ago
Ah, it went to 48 total. But my server is limited to 51 connections I believe I somehow read the ~20 connections and also read that it woult take any max connection limits into account. But I now see that this page says it can go up-to ~100 connections https://developers.cloudflare.com/hyperdrive/platform/limits/
AJR
AJRβ€’4d ago
Ahhhh, makes sense. Right now there's not a good way to put or change the limits on how many connections Hyperdrive will take. I'd be happy to lower yours with our next release, if you need. Though it sounds like you've already solved the problem in what would amount to the same way.
Alex
AlexOPβ€’4d ago
Yeah I should've done that immediately, this is on me πŸ™‚
Alex
AlexOPβ€’4d ago
AJR
AJRβ€’4d ago
We used to do one pool per region, long story that we've written a blog about if you want the details. With regional pools, of which there could be up to ~8, we limited to 20 per region. Since then (a couple months ago) it's been 60 with some wiggle room to 120
Alex
AlexOPβ€’4d ago
And I assumed this read the max_connections setting from my database, but maube this is more special casey for the serverless databases or something. This makes sense! So am I making Hyperdrive's job harder/impossible now with my self-imposed connection limit?
AJR
AJRβ€’4d ago
Maybe a little, not a lot. When a query comes in, Hyperdrive will (in order of preference): 1. take an unused connection out of the pool if one is available 2. open a new connection, if you have some left before you hit your limit 3. wait for an existing connection to free up, if you're already at your limit, up to a 15-second timeout. With your self-imposed limit, it'll now try for 2 and fail, instead of waiting around for 3. For your scenario where your DB's limit is lower than 60 anyway, the query's going to fail at option 2 either way, and now you won't cause problems for anything else sharing your DB. What would be ideal here would be for you to get to 3 since you'd likely get a successful result that way, eventually. Note: this isn't MySQL-specific, this is how it works for Postgres users too. CC @thomasgauvin -- another one for user-configurable connection limits.
Alex
AlexOPβ€’4d ago
Awesome, I appreciate the details. Really cool tech πŸ™‚ And makes total sense too. Very reasonable. I never had a use for soo many connections since I've never run my database so globally. If you take into account how many CF DC there are, 120 max is downright low πŸ˜›
AJR
AJRβ€’4d ago
tx-mode poolers get a lot of mileage out of each one, it's pretty cool to see how high some people crank their traffic before we need to give them limit overrides
Alex
AlexOPβ€’4d ago
I am a MySQL guy, but I've heard people getting some crazy mileage out of their pgBouncer setups. I eventually for my use case need to switch to KV or D1 maybe but Hyperdrive is performing excellent as an inbetween.

Did you find this page helpful?