Cloudflare Developers•2y ago

Hi, is there a way to export data from D1, sorta like a mysql dump?

JJC Hi, is there a way to export data from D1, sorta like a mysql dump?

Max (@rozenmd)•2/23/24, 3:54 PM

We're working on it - in the meanwhile there's https://github.com/nora-soderlund/cloudflare-d1-backups

GitHub

GitHub - nora-soderlund/cloudflare-d1-backups: A package to dump a ...

A package to dump a copy of your experimental D1 database through a worker. - nora-soderlund/cloudflare-d1-backups

MMax (@rozenmd)We're working on it - in the meanwhile there's https://github.com/nora-soderlund...

JCOP•2/23/24, 3:57 PM

Thanks!

Paul•2/23/24, 4:34 PM

I’m getting some pressure to move to Postgres so some people can use the admin dashboard tools like Looker. Any chance D1 works with any of those things?

Cyb3r-Jak3•2/23/24, 4:37 PM

Not unless they have added support for the HTTP API. D1 doesn’t have a standard connection string that others do

Paul•2/23/24, 4:45 PM

Dang

PPaul I’m getting some pressure to move to Postgres so some people can use the admin d...

volcanicislander•2/23/24, 4:54 PM

I'm in a similar situation. Trying to figure out how to connect with MS Power BI.

Vvolcanicislander I'm in a similar situation. Trying to figure out how to connect with MS Power BI...

Isaac McFadyen•2/23/24, 4:56 PM

The same answer applies here. Power BI doesn't have a D1 connector so you're out of luck there, unfortunately.

Isaac McFadyen•2/23/24, 4:56 PM

You could write a Worker that proxies requests to D1 but I'm not sure how Power BI's support for raw HTTP requests for data is.

IIsaac McFadyen You could write a Worker that proxies requests to D1 but I'm not sure how Power ...

volcanicislander•2/23/24, 4:57 PM

I was thinking about this as a workaround. Power BI can import data through an API endpoint

Isaac McFadyen•2/23/24, 4:58 PM

Then you could try that, yeah.

CCyb3r-Jak3 Not unless they have added support for the HTTP API. D1 doesn’t have a standard ...

Santosh•2/23/24, 5:55 PM

Is this not the reason that D1 cant compete with databases like Turso? I mean you would have to go via workers and a DO, which means theres a certain minimal latency. Is my understanding incorrect?

Santosh•2/23/24, 11:55 PM

Understood. But what I said is also correct as well? :lul:

Adi•2/24/24, 12:10 AM

Question about D1 bindings to Pages. I know that when you bind D1 to a worker it automatically restricts which region the worker spins up to be as close to D1 as possible.
Does it have the same behavior when you bind a pages project? If so, is there any way to get around that?

James•2/24/24, 4:17 AM

That behaviour only happens with smart placement, not by default

James•2/24/24, 4:17 AM

If you don’t want that behaviour, simply don’t enable smart placement

SSantosh Understood. But what I said is also correct as well? :lul:

Max (@rozenmd)•2/24/24, 8:18 AM

in theory if they had the same number of PoPs as us, and we had to go through Workers first and they didn't, yeah you'd gain 1-10ms of latency overhead on us

JJames That behaviour only happens with smart placement, not by default

Adi•2/24/24, 12:30 PM

ahh I see! That's great to hear. Thanks

Advany•2/24/24, 2:37 PM

I have a use case where I store a array of jsons and need to get them quickly for each customer.

I was thinking of storing that in a seperate table for each customer.

Is there a limit on amount of tables (or virtual tables) for d1?

Advany•2/24/24, 2:38 PM

And can will deletes and inserts (i dont do updates) of a table be automatically added to fts5 virtual table?

AAdvany I have a use case where I store a array of jsons and need to get them quickly fo...

K•2/24/24, 6:09 PM

You can do separate db for each customer. Basically you can create unlimited number of d1 DBs. IIRC you can store only 100 tables in one db.

Also, how big are those arrays? There is a limit of 1MB per row so maybe could just store them in row-per-customer.

KK You can do separate db for each customer. Basically you can create unlimited num...

Advany•2/25/24, 6:42 AM

That could be a idea. Let me investigate that!

Original message was deleted

Advany•2/25/24, 6:43 AM

I need to query and full text search stuff inside those jsons. Thats why d1

Advany•2/25/24, 6:44 AM

Btw... d1 is really fast. Amazed by the speed.

MMax (@rozenmd)in theory if they had the same number of PoPs as us, and we had to go through Wo...

Santosh•2/25/24, 7:29 AM

If it is 1 -10 ms latency, thats wonderful. Based on the earlier conversations we have had regarding about DO locations (where do DO live links), DO's are available only in a few DC (For example SGA for an Indian audience). And the minimal latency for DO location from MUmbai would be 200 ms according to the web site. I am making some assumptions here a. That the D1 replicas when they come into pic after Q1 would still live near SGA like locations due to colo reqs 2. Not just initial connections to DO, but subsequent latency for all communications would be the same. In that case, how can the latency be lower (like 10 ms) without smart placement? I would say having these many pops actually is a menace for performance as compared to bettering it. Anothe rpoint to ponder and I am thinking out aloud -> about a use case where I depend on an R2/KV and a DO ( i have such a use case) . I am going every where to collect my objects and the initial request would take multiple latency hits, I cache them using Cache API after the initial, but again theres no guarantee my next customer would hit the same pop.

Robin Lindner•2/25/24, 2:40 PM

I am currently developing a microservice-like project that processes ~2 TB of data from an (S)FTP server once a day (is D1 expensive - for this scale of data?).

This data should then be processed and filtered afterwards and written to a database.
I want to manage the database with an ORM as well as migrations.

Has anyone already had experience with ORMs and migrations in the context of D1 and perhaps already implemented a small microservice project (or Wrangler for that matter)?

bun•2/25/24, 3:38 PM

wrangler apply migrations doesn't work ?

Bbun wrangler apply migrations doesn't work ?

Robin Lindner•2/25/24, 3:46 PM

Oh, there really is such a thing. I thought something like that would be implemented with some proprietary packages.

But the migrations are implemented in SQL here, right?
When I was still working a lot with Java, there was Liquibase or something like that. There you could define such migrations database-agnostically.

I think that would be really cool.
I personally like D1 and Cloudflare. However, I write my applications for customers who sometimes have strong compliance regulations, such as hosting everything themselves, or only certain cloud providers are prescribed.

bun•2/25/24, 3:47 PM

i didnt even read your message i asked it for my own lol

Bbun i didnt even read your message i asked it for my own lol

Robin Lindner•2/25/24, 3:48 PM

I thought it was related

Robin Lindner•2/25/24, 3:48 PM

there was no correlation id

bun•2/25/24, 3:48 PM

for 2 TB of data i dont recommend d1, they support like 2 GB per database, 50 gb in total. i dont think the size will substantially increased later

bun•2/25/24, 3:48 PM

why do you want SQL for this at all?

Bbun why do you want SQL for this at all?

Robin Lindner•2/25/24, 3:50 PM

High-performance (around the globe) would be my concern. That's why I was thinking about D1.

Of course there are other larger cloud providers that offer something like geo-replicated storage - but I'm not sure how well they perform.

In fact, all data within these 2 TB is intended for public use.
A web app that I would have implemented would then retrieve and display this data, for example.

It's basically a huge product database. Images separately in the CDN.

bun•2/25/24, 3:53 PM

i would still recommend a non sql database for this. its much more easily horizontally scalable than SQL. ofc planetscale has achieved horizontal scalability with a mysql databases using vitess, but i dont think its enough for a 2 TB data.

bun•2/25/24, 3:54 PM

what if you just have a single instance and just use hyperdrive to serve content around the world with low latency ?

Bbun i would still recommend a non sql database for this. its much more easily horizo...

Robin Lindner•2/25/24, 3:54 PM

Personally, I am a big fan of relational databases. I always assumed that they could be optimised well with the appropriate data structure.

Do you think NoSQL would make more sense for my use case?
A few years ago when I looked into it, NoSQL was usually still behind relational databases.
But that might have changed in the meantime ...

Bbun what if you just have a single instance and just use hyperdrive to serve content...

Robin Lindner•2/25/24, 3:55 PM

That would be also an good idea

bun•2/25/24, 3:55 PM

it really depends on the shape of your data, but i think hyperdrive solves your problem if you dont have high amount of writes/updates

Bbun it really depends on the shape of your data, but i think hyperdrive solves your ...

Robin Lindner•2/25/24, 3:56 PM

There are big updates 1-2 times a day because the data changes regularly. However, this would not be performance-critical. Only the reads are performance-critical.

zegevlier•2/25/24, 3:57 PM

2tb is a lot of data to have accessible with low latency around the world. May I ask what you're doing?

Original message was deleted

bun•2/25/24, 3:58 PM

wdym by provide entire database ?

Zzegevlier 2tb is a *lot* of data to have accessible with low latency around the world. May...

Robin Lindner•2/25/24, 4:02 PM

Basically, I aggregate product data from different providers.
As the descriptions are often in different languages, they are run through the Deepl API again and are to be written to a database.
And a few more preprocessing steps to check how reputable the provider is...
And the product images are uploaded to a blob storage or CDN to avoid relying on the provider's image service

In the end, this data should be displayed on a website where you can view and compare the active products.

Basically, a user will not be able to retrieve large amounts of data in one go. However, I would like to keep every single data record with very good performance.

A search function is also planned, but not necessarily the most important thing.
I have already tried it with Meilisearch, but I had problems searching for a single ID, which is sometimes useful for my use case.

Original message was deleted

Robin Lindner•2/25/24, 4:04 PM

This is not a requirement of the product.

Just imagine a huge "products" table or bucket.
An end user either clicks on a link that contains an ID or a "slug" and only wants the one product to be displayed.

Fast individual requests, but no mass requests

Original message was deleted

Robin Lindner•2/25/24, 4:10 PM

Could be an idea, yeah. And write the relevant data to a Meilisearch/ElasticSearch in parallel to the insert.

skkkekjf•2/25/24, 5:32 PM

Hello everyone, I have a question, I am sharing my cloudflare service with other friends, as a super administrator it allows me to use cloudflare Images but they ask my friends to pay again, do I have to give them any permission apart from cloud images?