AT

Docker Janusgraph Custom ID Values

IIsopropyl911/9/2023
I'm trying to setup a janusgraph database with custom verex ID values. I have the following docker-compose configuration:
version: '3.8'

services:
btc_janusgraph:
# build: ./janusgraph
image: janusgraph/janusgraph:latest
container_name: btc_janusgraph
environment:
janusgraph.set-vertex-id: true
ports:
- "${JANUSGRAPH_PORT:-8182}:${JANUSGRAPH_PORT:-8182}"
networks:
- btc-network
volumes:
- btc_janusgraph_data:/var/lib/janusgraph
- "./janusgraph/janusgraph.properties:/etc/opt/janusgraph/janusgraph.properties:ro"
version: '3.8'

services:
btc_janusgraph:
# build: ./janusgraph
image: janusgraph/janusgraph:latest
container_name: btc_janusgraph
environment:
janusgraph.set-vertex-id: true
ports:
- "${JANUSGRAPH_PORT:-8182}:${JANUSGRAPH_PORT:-8182}"
networks:
- btc-network
volumes:
- btc_janusgraph_data:/var/lib/janusgraph
- "./janusgraph/janusgraph.properties:/etc/opt/janusgraph/janusgraph.properties:ro"
Then, after setting up a Python environment with gremlin-python version 3.5.7, I execute the following:
from dotenv import load_dotenv

from graph.base import g

from gremlin_python import statics
from gremlin_python.process.traversal import T
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.process.graph_traversal import GraphTraversalSource
from gremlin_python.structure.graph import Graph
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection


statics.load_statics(globals())

gremlin_version = tuple([int(x) for x in version('gremlinpython').split('.')])
if (gremlin_version <= (3, 4, 0)):
graph = Graph()
g = graph.traversal().withRemote(DriverRemoteConnection(GRAPH_DB_URL, 'g'))
else:
from gremlin_python.process.anonymous_traversal import traversal
g = traversal().withRemote(DriverRemoteConnection(GRAPH_DB_URL, 'g',
username=GRAPH_DB_USER, password=GRAPH_DB_PASSWORD))


# clear database
g.V().drop().iterate()

# add vertices
g.addV('person').property(T.id, 0).next()
from dotenv import load_dotenv

from graph.base import g

from gremlin_python import statics
from gremlin_python.process.traversal import T
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.process.graph_traversal import GraphTraversalSource
from gremlin_python.structure.graph import Graph
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection


statics.load_statics(globals())

gremlin_version = tuple([int(x) for x in version('gremlinpython').split('.')])
if (gremlin_version <= (3, 4, 0)):
graph = Graph()
g = graph.traversal().withRemote(DriverRemoteConnection(GRAPH_DB_URL, 'g'))
else:
from gremlin_python.process.anonymous_traversal import traversal
g = traversal().withRemote(DriverRemoteConnection(GRAPH_DB_URL, 'g',
username=GRAPH_DB_USER, password=GRAPH_DB_PASSWORD))


# clear database
g.V().drop().iterate()

# add vertices
g.addV('person').property(T.id, 0).next()
And I get the following error message:
gremlin_python.driver.protocol.GremlinServerError: 500: Vertex does not support user supplied identifiers
gremlin_python.driver.protocol.GremlinServerError: 500: Vertex does not support user supplied identifiers
Solution:
You could do both ``` graph.set-vertex-id=true graph.allow-custom-vid-types=true...
Jump to solution
IIsopropyl911/10/2023
My ./janusgraph/janusgraph.properties contains the following:
# Copyright 2023 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
...
# more default contents
...
# Directory to store index data locally
#
# Default: (no default value)
# Data Type: String
# Mutability: MASKABLE
index.search.directory = /var/lib/janusgraph/index

# ALLOW SETTING OF CUSTOM IDs
graph.set-vertex-id=true
# Copyright 2023 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
...
# more default contents
...
# Directory to store index data locally
#
# Default: (no default value)
# Data Type: String
# Mutability: MASKABLE
index.search.directory = /var/lib/janusgraph/index

# ALLOW SETTING OF CUSTOM IDs
graph.set-vertex-id=true
I've investigated the logs and found this message:
btc_janusgraph | 23:37:12 WARN org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.getOptionsWithDiscrepancies - Local setting graph.set-vertex-id=true (Type: GLOBAL_OFFLINE) is overridden by globally managed value (false). Use the ManagementSystem interface instead of the local configuration to control this setting.
btc_janusgraph | 23:37:12 WARN org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.getOptionsWithDiscrepancies - Local setting graph.set-vertex-id=true (Type: GLOBAL_OFFLINE) is overridden by globally managed value (false). Use the ManagementSystem interface instead of the local configuration to control this setting.
So now my question becomes, how do I set the global value?
IIsopropyl911/10/2023
Thank you very much for getting back to me. I tried connecting to the graph in gremlin console using the following:
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-server.properties')
Could not instantiate implementation: org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager
Type ':help' or ':h' for help.
Display stack trace? [yN]
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-server.properties')
Could not instantiate implementation: org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager
Type ':help' or ':h' for help.
Display stack trace? [yN]
As you can see, it didn't work. So I think I must have bigger issues haha. I thought berkeley was used by default, but it doesn't seem to be working. I also tried the inmemory config file just in case, and I was able to successfully set the value, as you said:
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-inmemory-server.properties
')
04:41:33 INFO org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.setupTimestampProvider - Set default timestamp provider MICRO
04:41:33 INFO org.janusgraph.graphdb.idmanagement.UniqueInstanceIdRetriever.getOrGenerateUniqueInstanceId - Generated unique-instance-id=c0a8e0025844-4b81751be49f1
04:41:33 INFO org.janusgraph.diskstorage.configuration.ExecutorServiceBuilder.buildFixedExecutorService - Initiated fixed thread pool of size 24
04:41:33 INFO org.janusgraph.diskstorage.Backend.initialize - Configuring total store cache size: 437259072
04:41:33 INFO org.janusgraph.graphdb.database.StandardJanusGraph.<init> - Gremlin script evaluation is disabled
04:41:33 INFO org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller.initializeTimepoint - Loaded unidentified ReadMarker start time 2023-11-10T04:41:33.093469Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@72a0a60d
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin> mgmt = graph.openManagement();
==>org.janusgraph.graphdb.database.management.ManagementSystem@2f4545c6
gremlin> mgmt.get('graph.set-vertex-id')
==>false
gremlin> mgmt.set("graph.set-vertex-id", true);
==>org.janusgraph.diskstorage.configuration.UserModifiableConfiguration@36f40d72
gremlin> mgmt.get('graph.set-vertex-id')
==>true
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-inmemory-server.properties
')
04:41:33 INFO org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.setupTimestampProvider - Set default timestamp provider MICRO
04:41:33 INFO org.janusgraph.graphdb.idmanagement.UniqueInstanceIdRetriever.getOrGenerateUniqueInstanceId - Generated unique-instance-id=c0a8e0025844-4b81751be49f1
04:41:33 INFO org.janusgraph.diskstorage.configuration.ExecutorServiceBuilder.buildFixedExecutorService - Initiated fixed thread pool of size 24
04:41:33 INFO org.janusgraph.diskstorage.Backend.initialize - Configuring total store cache size: 437259072
04:41:33 INFO org.janusgraph.graphdb.database.StandardJanusGraph.<init> - Gremlin script evaluation is disabled
04:41:33 INFO org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller.initializeTimepoint - Loaded unidentified ReadMarker start time 2023-11-10T04:41:33.093469Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@72a0a60d
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin> mgmt = graph.openManagement();
==>org.janusgraph.graphdb.database.management.ManagementSystem@2f4545c6
gremlin> mgmt.get('graph.set-vertex-id')
==>false
gremlin> mgmt.set("graph.set-vertex-id", true);
==>org.janusgraph.diskstorage.configuration.UserModifiableConfiguration@36f40d72
gremlin> mgmt.get('graph.set-vertex-id')
==>true
But my code still produced the same error.
BBo11/10/2023
Can you show the stacktrace for the Berkeley store error? Regarding in-memory graph: could you show the full code/log that shows that your code still produces the same error? Btw I would also want to point out that,
g.addV('person').property(T.id, 0).next()
g.addV('person').property(T.id, 0).next()
is invalid in JanusGraph. Not all numerical values are legal id values in JanusGraph. https://docs.janusgraph.org/advanced-topics/custom-vertex-id/#custom-long-id shows how you can get a legal numerical id. It may not work in gremlin-python, unfortunately. My suggestion is use string type custom id
IIsopropyl911/10/2023
Here is the stack trace for graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-server.properties')
IIsopropyl911/10/2023
And here is the stack trace when I run the same Python script:
BBo11/11/2023
The BerkeleyDB error is specific to BerkeleyDB itself. See if https://stackoverflow.com/questions/8612659/berkeley-db-error-the-je-lck-file-could-not-be-locked works. Try destroying your container and start over. The second one looks more interesting. What if you add the vertex from the gremlin-console? Do you still see the same problem?
IIsopropyl911/11/2023
I tried tearing down/rebuilding the docker container (and deleting volumes), and I got the same error to do with lock files. Then I tried deleting the lock file just to see what would happen, and I got the following. It seems that there is a global variable that is setting the indexing backend to elasticsearch. But I thought I was using lucene? I don't believe I need a fancy indexing backend for my simple project, so I thought lucene would make things more simple. And I am not sure how I could change these global values without being able to create a graph instance in the gremlin console.
IIsopropyl911/11/2023
I would like to mention again that my docker config only contains the following:
version: '3.8'
services:
btc_janusgraph:
# build: ./janusgraph
image: janusgraph/janusgraph:latest
container_name: btc_janusgraph
environment:
janusgraph.set-vertex-id: true
set-vertex-id: true
janusgraph.storage.backend: berkeleyje
storage.backend: berkeleyje
ports:
- "${JANUSGRAPH_PORT:-8182}:${JANUSGRAPH_PORT:-8182}"
- "8484:8184"
networks:
- btc-network
volumes:
- btc_janusgraph_data:/var/lib/janusgraph
- "./janusgraph/janusgraph.properties:/etc/opt/janusgraph/janusgraph.properties:ro"
healthcheck:
test: ["CMD", "bin/gremlin.sh", "-e", "scripts/remote-connect.groovy"]
interval: 10s
timeout: 60s
retries: 4
version: '3.8'
services:
btc_janusgraph:
# build: ./janusgraph
image: janusgraph/janusgraph:latest
container_name: btc_janusgraph
environment:
janusgraph.set-vertex-id: true
set-vertex-id: true
janusgraph.storage.backend: berkeleyje
storage.backend: berkeleyje
ports:
- "${JANUSGRAPH_PORT:-8182}:${JANUSGRAPH_PORT:-8182}"
- "8484:8184"
networks:
- btc-network
volumes:
- btc_janusgraph_data:/var/lib/janusgraph
- "./janusgraph/janusgraph.properties:/etc/opt/janusgraph/janusgraph.properties:ro"
healthcheck:
test: ["CMD", "bin/gremlin.sh", "-e", "scripts/remote-connect.groovy"]
interval: 10s
timeout: 60s
retries: 4
And my custom janusgraph.properties config file volume contains the following:
gremlin.graph=org.janusgraph.core.JanusGraphFactory

storage.backend=berkeleyje
storage.directory=/var/lib/janusgraph/data
index.default.backend=lucene
index.default.directory=/var/lib/janusgraph/index
set-vertex-id=true
gremlin.graph=org.janusgraph.core.JanusGraphFactory

storage.backend=berkeleyje
storage.directory=/var/lib/janusgraph/data
index.default.backend=lucene
index.default.directory=/var/lib/janusgraph/index
set-vertex-id=true
I also get the following messages to do with my "read only" volume at the beginning of my janusgraph logs:
2023-11-10 18:16:07 cp: cannot create regular file '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 chown: changing ownership of '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 chmod: changing permissions of '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 sed: cannot rename /etc/opt/janusgraph/sedUO9BEQ: Device or resource busy
2023-11-10 18:16:07 sed: cannot rename /etc/opt/janusgraph/sedh8uf1G: Device or resource busy
2023-11-10 18:16:18 /etc/opt/janusgraph/janusgraph-server.yaml will be used to start JanusGraph Server in foreground
2023-11-10 18:16:07 cp: cannot create regular file '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 chown: changing ownership of '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 chmod: changing permissions of '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 sed: cannot rename /etc/opt/janusgraph/sedUO9BEQ: Device or resource busy
2023-11-10 18:16:07 sed: cannot rename /etc/opt/janusgraph/sedh8uf1G: Device or resource busy
2023-11-10 18:16:18 /etc/opt/janusgraph/janusgraph-server.yaml will be used to start JanusGraph Server in foreground
Could this be a problem? If I don't make it read-only, the config file gets overridden as soon as the container starts. Notice the "ro" at the end of my volume.
BBo11/11/2023
You should do graph.set-vertex-id instead of set-vertex-id . I see why you used set-vertex-id instead of its full form - the doc was a bit misleading.
It seems that there is a global variable that is setting the indexing backend to elasticsearch. But I thought I was using lucene gremlin> graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-lucene-server.properties') 02:17:47 WARN org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.getOptionsWithDiscrepancies - Local setting index.search.backend=lucene (Type: GLOBAL_OFFLINE) is overridden by globally managed value (elasticsearch). Use the ManagementSystem interface instead of the local configuration to control this setting.
This clearly shows you have stale configuration. Maybe your volume wasn't really completely deleted. I think it would make more sense to start without Docker. You seem to struggle with Docker setup and JanusGraph setup at the same time. Let's get a plain JanusGraph setup correct first.
IIsopropyl911/18/2023
I have decided to just add a custom "id" property, as opposed to setting the actual ID controlled by janusgraph. I assumed providing custom IDs would be the most practical, but It seems that having a custom field works well enough for my purposes. Thank you for helping me with this, but I think there are too many intricate details for me to deal with and understand with this right now. As you suggested, the janusgraph.graph.set-vertex-id: true property in docker-compose.yml worked, but after trying to provide my own ID values it said that they were invalid. I have provided the full stack trace. Strangely, using an id of 0 returns a different error than other IDs, such as 1.
> g.addV('person').property(T.id, 1).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 1
> g.addV('person').property(T.id, 2).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 2
> g.addV('person').property(T.id, 132331).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 132331
> g.addV('person').property(T.id, 1).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 1
> g.addV('person').property(T.id, 2).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 2
> g.addV('person').property(T.id, 132331).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 132331
Reading the documentation, it sounds like we're supposed to get a custom ID by using the ID manager like so:
graph.getIDManager().fromVertexID(long)
graph.getIDManager().fromVertexID(long)
But I cannot figure out how to use this in gremlin-python. I believe it is janusgraph specific.
BBo11/18/2023
Yes this is JanusGraph specific so you cannot do it in gremlin-python Unfortunately we don't have a JanusGraph-specific driver for python
Solution
BBo11/18/2023
You could do both
graph.set-vertex-id=true
graph.allow-custom-vid-types=true
graph.set-vertex-id=true
graph.allow-custom-vid-types=true
Then you could use any arbitrary string ID, e.g.
g.addV("person").property(T.id, "1").next()
g.addV("person").property(T.id, "1").next()
BBo11/18/2023
But remember you should be using org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV3 instead of org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1 (you are probably already using it)

Looking for more? Join the community!

Want results from more Discord servers?
Add your server
Recommended Posts
Reusing connectionsHi, I'm wondering what's the recommended way of using connections to a graph DB. The documentation Can I surpress gremlin console's warnings?How can I surpress these WARNING messages? I've tried gremlin -l but can't seem to get the syntax rSequential IDs in Neptune?@neptune I'm attempting to implement sequential IDs for the vertices in our AWS Neptune graph. So Gremlin console vs REST APII'm trying to get a path and the properties of the vertices and the edges for that path by running aCryptic Neptune Gremlin Error Rate Creeping - What Would You Recommend?This relates more to do with Neptune usage, nevertheless, it is also related to the Gremlin Query erkubehoundIf anyone is familiar with KubeHound DSL. Can someone explain why Query 1 is different from Query 2.Help with visualizing in the graph-notebookI am trying to visualize a graph in the graph-notebook but no matter what I do I cannot get it to beGremlin browser code editorHi, I'm looking for a code editor like monaco https://microsoft.github.io/monaco-editor/ to embed inConnecting to local gremlin server with websocket addressHello everyone. I'm looking for help with a client app written in Java that uses Tinkerpop Gremlin tClarification on Kerberos configuration for Gremlin DriverI'm a little bit unclear on the role of the JAAS configuration file for the Gremlin client in the coGremlin Driver and frequently changing serversIn a containerised environment, hosts are frequently replaced and their IP address can change severaGlobal SearchIs there a way where i can scan all the vertex or edge properties that match a given keyword in gremGraphSON mapperHi, I'm trying to ingest some data into AWS Neptune and due to its size I'm forced to use a bulk d.drop() behavior confussionI have a basic java app and I'm learning hot to send gremlin queries to a JanusGraph from that java Can I name the result of an anonymous traversal without moving the traverser?I can currently do the following: ``` Graph graph = TinkerFactory.createModern(); GraphTraversalSCan GraphBinary be used to save a graph to file?Can GraphBinary be used to save graph in a file. Any example is welcome.How to get cardinality of property?I have a multi property and I want to find out its cardinality. How can I do that? valueMap/elementMinverted regex searchHey, In my vertices I store escaped regexp statements as labels (e.g: 'wh.' which in theory should Debug message spam from tinkerpop server 3.7Right now, when connecting to my local tinkerpop server, I am getting incredible amounts of debug loShould by() Modulator Work For More Types?This works. `gremlin> g.V().out().out().path().by("name") ==>[marko,josh,ripple] ==>[marko,josh,lop]