I am currently in the midst of trying to design possible solutions for a certain workflow I'm having that involves custom and especially partially unknown file schemas.
Workflow Imagine a game or application (in my case a game) that has binary data representing for example item data but you only know partially what it contains and potentially it's size. For example let's say: ItemData struct with size 300 and we know that at 0x0 is the id and at 0x20 is the name in ASCII with the length of 0x16 Now we want to parse them and created a reader for it and writer. Problem is now unfortunately that the name is longer but we added a lot of other fields and some went away etc. thanks to new findings. We now have to rearrange and adjust a lot of stuff just to make this work. Making the following sample workflow very difficult:
* Reverse Engineer Application * Find method that accesses a field of that struct at 0x40 and we know it's a 32 bit integer * Add it to your own application (be it analysing these files or w/e)
The Idea So my idea is currently using a source generator to solve this by generating somewhat flexible reading and writing methods just from provided metadata for example: (Note this is not yet implemented as I'm still trying to figure out if that is a sane and viable approach)
https://paste.mod.gg/wnucpivxjycq/0 This should in theory make it fairly simple to add and remove arbitrarily without having to refactor and rewrite a lot of code that may very well be fragile to simple typos or logic errors etc.
Some of the information I am considering keeping track of as they are more or less of importance for me can be seen in the source generator intermediates: https://paste.mod.gg/wnucpivxjycq/1
A potentially concrete generated solution can be found here: https://paste.mod.gg/wnucpivxjycq/2 But there are various questions in my head in regards to if the way this is can be considered fine.
In general other questions such as are there things I'm missing or vastly incorrectly used (especially Pipes/Pipelines).
Also I have already received an alternative idea on how this can be solved instead of Read/Write methods (which I prefer), that would be by creating public properties that are backed by the read data, but that seems fairly complex to properly solve and I personally didn't feel comfortable with trying to solve it this way.
Questions: * Pipeline usage fine or incorrect or poor etc. * Are there ideas for improvements * Is certain stuff obsolete? * Have there been alternative or existing solutions? (Flatbuffers etc. don't seem to solve this, at least not in a usable manner) * Any feedback generally speaking
p.s.: I feel like I have maybe lost the plot at some point but I'm not sure when and when not as of right now.