C#•7d ago

Parsing/Deserializing Structured File’s Bytes

Hi all. I have a file that I need to break down and turn into data that I can visualize. I know how the file is structured and know how to convert each byte into its corresponding data type (str, int32/16/8, uint32/16/8, etc). Where I’m struggling is figuring out what the best way to actually sort through the data is. So far my script is a jumbled mess of going byte by byte and decoding each one. What my plan is currently is to store the file header and data (there is a compression on everything after the header) in an object as raw bytes, then decompress, then create methods to convert each chunk of data (each stored with their own headers) in attributes within the file’s object. Does it sound like I’m on the right track of things, or is there a better way to handle the parsing of structured files like this?

9 Replies

Jacob CampbellOP•7d ago

And forgive me if I’m using terminology incorrectly, I took one OOP class in college maybe 5 years ago and it was in Python. So I remember the gist of OOP but don’t remember the terminology as well

many things•7d ago

for sure first thing i would do is make models and parsing methods as 1:1 to the raw structure as they can be, no logic, just parsing, and build all the rest above that so it kinda depends how this format is made, if it's linear, recursive, there is inheritance, there is composition, i could think of some ways but can't tell for sure without more details 🤷

SleepWellPupper•6d ago

I'd reccomend looking into the SequenceReader api as a basis for implementing a binary parser.

jcotton42•6d ago

BinaryPrimitives may also be of help.

Jacob CampbellOP•6d ago

Is there a benefit to doing that over BitConverter? Because that’s what I’ve been using, and it’s been working fine Just looked through that, that’s super helpful, thanks! I would assume it’s linear, but not entirely sure what you mean by that 😅 The structure of the file is: File Header Compression (everything below is within the compression) Data Header (contains data name and size of data) Data Data Header (contains data name and size of data) Data Etc Etc So my thought was to read the file as a byte array Separate the header from the compressed portion in an object Write a method for decompressing the compressed portion Write a method for searching for the name of each data type in the file and storing it within the object Then writing methods for the decoding of each header/data type and storing that within the object So calling the class with the byte array would save attributes within the object would look something like (Note: the data names are messages in the file) File.hdr File.decompData File.msgTypes File.msgArray Then there would be methods that check to make sure the messages exist in File.msgTypes and then if they exist, convert them over. The structure of each message is a little different, so I’d have to have different methods for each. File.getHdr: converts the header into readable data (data source, version number, collection date, etc) File.getMsg15: checks to make sure Msg15 exists, converts the header into readable data, then converts the data as specified File.getMsg31: checks to make sure Msg31 exists, converts the header into readable data, then converts the data as specified, etc, etc Does that sound like a sensible way to handle it?

many things•6d ago

more explicit (indianness for example), more byte[] oriented

Jacob CampbellOP•6d ago

Gotcha, I’ve got a function already that handles the endianness, so if that’s the only thing, I’m probably gonna keep it as is. There’s no like performance benefit is there? I would assume it just goes through and flips the bit order if it needs to? I’m more asking about structure to make sure it is fast and elegant to execute/read/edit

wasabi•6d ago

How big is the file.

many things•5d ago

if it's all a sequential or some of the data is flags/logic to parse data in a different way

Then there would be methods that check to make sure the messages exist in File.msgTypes and

so this is a dynamic format? there could be arbitrary data?

Gaming

Programming

Parsing/Deserializing Structured File’s Bytes

Did you find this page helpful?