Agnostic client-side serialization of custom types
Hi,
I've noticed a potential trend in G.V() of serialization issues for types that appear to be custom and therefore not deserializable from G.V()'s perspective.
Where things get tricky is from G.V()'s perspective we need to somehow be able to connect to any database regardless of this type of customer specific situation.
Is there a way to configure a fallback deserializing mechanism client-side (e.g. GraphBinary config or otherwise) that automatically deserializes a result as string (or some other default) in the absence of an appropriate serializer being available?
Solution:Jump to solution
Okay so final update on this, hopefully. I'm coming to the realisation that this is just not straight forward in any sense, even with some sort of fallback mechanism there would still be the issue of determining which/how many bytes should be read in the buffer for the incoming type, which i guess would be nearly impossible without actual knowledge of the structure expected.
16 Replies
I think this was really the intent of GraphSON, where the type information was included in the more verbose output.
Yeah, ive had somewhat mixed results with graphson, im unsure if it's server related but it definitely tends to result in similar deserialisation issues
My thinking is along the lines of implementing a version of say graphbinary within g.v that would either silently fail on deserialisation failure or handle the result in a different way somehow
we purposely didn't do a
toString()
serialization of GraphBinary. i see how it could be useful for this case you describe but i think it's less useful more generally. maybe there could be a configuration for it, but if the server isn't configured with it that's where you will have a problem, right? or are you having a problem with like a JanusGraph RelationIdentifier
coming back from the server and you can't do anything with it (i.e. ends in error)?yeah, i get that. my challenge is really just that a single serialization failure will throw the entire query out from G.V()'s perspective so im thinking it'd be worthwhile having a sort of soft fail mechanism where if deserialization fails a bespoke type gets returned that from G.V()'s perspective just signifies a failure to deserialize a specific value
so in essence would i be able to implement a class inheriting graphbinary and overriding its fatal exception behaviour on failure to deserialize an unknown type
wow, are you like actually asking for a change to TinkerPop right now? 🙂
Now come to think of it that could just a change to the driver's implementation of graphbinary, for instance. I have to admit since gryo got deprecated gdotv already embeds these classes in so my mind turned to just making my own implementation within gdotv but if there was value outside of my weird stuff then sure!
i dont think we want a
toString()
sort of default. i'm never sure what toString()
contains and it might not be appropriate (could even contain a security risk). I suppose if the DataType.CUSTOM
isn't found we could return an "UNKNOWN" that perhaps includes the name of the custom type and its BLOB
.Yup, that's what i had in mind
ill give it a quick shot on my end and let you know how that goes, ive already got these pesky datastax enterprise graph custom types in mind for this type of situation.
im noticing the existence of a withFallbackResolver mechanism on type registry which sounds like it does more or less what im looking for, ill give it a shot (https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/binary/TypeSerializerRegistry.java#L227)
GitHub
tinkerpop/TypeSerializerRegistry.java at master · apache/tinkerpop
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
so just a little progress update here
the fallbackResolver mechanism felt like it should work but im finding that it won't for all cases, specifically when handling classes flagged as "DataType.Custom". I guess the question I'm asking myself now is this:
Should the getSerializerForCustomType method implement a similar fallback mechanism using the optional fallbackResolver as the getSerializerForType method? (https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/io/binary/TypeSerializerRegistry.java#L399 for reference)
GitHub
tinkerpop/TypeSerializerRegistry.java at master · apache/tinkerpop
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
due to the class structure at the moment I can't really extend it and override that specific method, so what im going to try in the meantime is create my own "FailSafeGraphBinarySerializerV1" that just implements a custom implementation of TypeSerializerRegistry, which ill just call "FailSafeTypeSerializerRegistry" and which will just be a shameless copy paste of the code with that extra 2 lines of code
Solution
Okay so final update on this, hopefully. I'm coming to the realisation that this is just not straight forward in any sense, even with some sort of fallback mechanism there would still be the issue of determining which/how many bytes should be read in the buffer for the incoming type, which i guess would be nearly impossible without actual knowledge of the structure expected.
Some interesting implementation of this is datastax's own graphbinary serializer for its custom user defined types (UdtValue as they're called) which I guess are the closest thing to a self assessed definition of the incoming byte data based on the defined data schema
i will try to find some time to look into it this week
definitely no rush, im going to keep investigating because this is quite fun and put findings here