Storage only or storage + table?
I'm trying to decide how I should structure my storage usage in my application.
Let's say users can create "items" and those items can have files uploaded to them.
Should I have a table that is "item_files" which contains a reference + metadata to the uploaded image in storage.
Or should I structure my storage paths in a way that allows for listing the items? Eg /uploads/{itemId}/image.png
Is there a best practice approach? To me it feels safer to have a files table as that means I can control things like metadata (uploader, tags, etc) easier. It also means that you can fetch the associated files for an item, but also display all the files that a user has ever uploaded.
Just not sure if there's pros/cons for each approach so thought I'd ask the experts!
13 Replies
As these are not just less used files like profile picture where extra metadata and details are not that useful, i would choose storage + table since they are used by user, which will give you ability to filter files for any future use case easily
⚠️ Not expert
Good call, appreciate the insight. Any idea if there's a way of creating a record in the table when an upload occurs? Feels like I could do a trigger, but I'd like to add data like "itemId" etc.
I guess I could upload the file then on the client create a new record using the upload response. Was hoping there'd be a neat SQL way of achieving it.
I guess I could upload the file then on the client create a new record using the upload response.this is straight forward and simple. or else can use database functions and triggers where functions can accept the extra details as arguments and run on insert trigger. But I see no advantage over above implementation other than more complication of writing function in plpgsql
Still this approach considered "as best" when the file has metadata ?
Or is better so save the file_path only and save the metadata inside the storage ? ( looks like now is possible to save "extra" metadat )
The metadata option did not exist then.
basically the best approach would be to create a table for the metadata ?
On the example above "items" would be something like "items_files"
or if have something more global could be "attachments"
the data could be something like this:
or how it could be with the metadata ?
Store everything in the metadata and then only the file_path in the table where i need ?
How can i retrieve the metadata in the response, for ex: when i get 1 item or list of items ?
( i'm trying to find out what is the best approach right now 🙂 )
The storage option for custom metadata did not exist to store any extra data when that was written. If you just want extra info that is probably the best way now. If you are going to need to search the metadata, then a table is probably still the best way as you can't control indexing on the storage.objects columns.
I don't know what your list there represents as some of that is built in to the file metadata already. But if you are going to search on those things then the extra table is better.
Also I would not store a URL or id from storage objects, but the file path. You can build the public url with simple string concatenation and if the base URL changes in the future you don't have a bunch of URL's to update. All the REST API calls need a path so the id is not of much use.
Also I would not store a URL or id from storage objects, but the file path. You can build the public url with simple string concatenation and if the base URL changes in the future you don't have a bunch of URL's to update. All the REST API calls need a path so the id is not of much use.
Thx @garyaustin for the explanation!
About the URL is a good point, that i didn't know before.
Updated example:
Let's says i would like to fech all items for the table "items".
If i have a table named" items_files" i could just joind the tables and get all results.. something like this:
This can't be done with "metadata" or is it ?
It can't be done thru the REST API with storage metadata. You can't join data in two schemas. You might be able to use a view to the join and query that.
@garyaustin should the "path" includes the "bucket name" ?
Up to you. If you have a fixed bucket for that type of file then I would not as it is wasted space. Plus when you use REST storage calls bucket and path are separate things so you would have to split it out before make the rest call.
Ok
Basically the bucket “should/can” be hard-coded everywhere and only the path is dynamic?
Thx again 🙏
It really depends on what you are doing. But if you have say an product-images bucket and a products table with a path for an image then you know they are in the product_images bucket for that image type. If you had buckets for different types of products then you may need to save the bucket either in a column or as the the path name and then do a string split to get the bucket and path for your storage REST call.