Tinkerpop Server OOM

Hi Tinkerpop team,

I'm trying to make sense of this OOMing that seems to consistently occur in my environment over the course of usually a couple hours.

Attached is a screenshot of the JVM GC behavior metrics showing before & after a GC. It's almost like the underlying live memory continues to grow but I'm not sure why.

Reviewing a heap dump from a different OOM showed that about 8.3GB was consumed by Netty's NioSocketChannels, but drilled deeper seems like it is instances of org.apache.tinkerpop.gremlin.process.traversal.Bytecode

Which got me wondering, is there some kind of "close" clients are supposed to send?

I'm using an unofficial Rust gremlin driver and I'm just wondering if it's missing some house keeping that's causing my JanusGraph instance to accumulate unclosed resources until it dies.

The client is sending up bytecode based traversals using GraphSON V3 and my understanding is what the Tinkerpop Server is supposed to receive these, execute them, and then send a response back (if needed) and then "that's it". Based on the heap dump I'm assuming that is congruent with seeing SingleTaskSession instances on the JG side.

The 43k SingleTaskSessions in the heap dump was unexpected. My client application at the moment should only have at most around 12 connections and the Rust driver library doesn't appear to multiplex the connections to allow multiple requests to go out on the same connection concurrently.

OTOH I noticed these seemed to be under a Netty Channel's "CloseFuture". It seems unlikely but is it possible I'm submiting traversal requests faster than they can be cleaned up? If so, is there a configuration setting to turn that up? I'm aware of the gremlinPool and have that turned up. I tried changing threadPoolWorker but that didn't seem to change things either.
Screenshot_2024-10-01_at_3.55.28_PM.png
image.png
Solution
Sorry for the delayed response. I'll try to take a look at this soon. But for now, I just wanted to point out that SingleTaskSession and the like are part of the UnifiedChannelizer. From what I remember, the UnifiedChannelizer isn't quite production ready, and in fact is being removed in the next major version of TinkerPop. We can certainly still make bug/performance fixes to this part of the code for 3.7.x though.
Was this page helpful?