Docker Janusgraph Custom ID Values

I'm trying to setup a janusgraph database with custom verex ID values. I have the following docker-compose configuration:
version: '3.8'

services:
btc_janusgraph:
# build: ./janusgraph
image: janusgraph/janusgraph:latest
container_name: btc_janusgraph
environment:
janusgraph.set-vertex-id: true
ports:
- "${JANUSGRAPH_PORT:-8182}:${JANUSGRAPH_PORT:-8182}"
networks:
- btc-network
volumes:
- btc_janusgraph_data:/var/lib/janusgraph
- "./janusgraph/janusgraph.properties:/etc/opt/janusgraph/janusgraph.properties:ro"
version: '3.8'

services:
btc_janusgraph:
# build: ./janusgraph
image: janusgraph/janusgraph:latest
container_name: btc_janusgraph
environment:
janusgraph.set-vertex-id: true
ports:
- "${JANUSGRAPH_PORT:-8182}:${JANUSGRAPH_PORT:-8182}"
networks:
- btc-network
volumes:
- btc_janusgraph_data:/var/lib/janusgraph
- "./janusgraph/janusgraph.properties:/etc/opt/janusgraph/janusgraph.properties:ro"
Then, after setting up a Python environment with gremlin-python version 3.5.7, I execute the following:
from dotenv import load_dotenv

from graph.base import g

from gremlin_python import statics
from gremlin_python.process.traversal import T
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.process.graph_traversal import GraphTraversalSource
from gremlin_python.structure.graph import Graph
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection


statics.load_statics(globals())

gremlin_version = tuple([int(x) for x in version('gremlinpython').split('.')])
if (gremlin_version <= (3, 4, 0)):
graph = Graph()
g = graph.traversal().withRemote(DriverRemoteConnection(GRAPH_DB_URL, 'g'))
else:
from gremlin_python.process.anonymous_traversal import traversal
g = traversal().withRemote(DriverRemoteConnection(GRAPH_DB_URL, 'g',
username=GRAPH_DB_USER, password=GRAPH_DB_PASSWORD))


# clear database
g.V().drop().iterate()

# add vertices
g.addV('person').property(T.id, 0).next()
from dotenv import load_dotenv

from graph.base import g

from gremlin_python import statics
from gremlin_python.process.traversal import T
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.process.graph_traversal import GraphTraversalSource
from gremlin_python.structure.graph import Graph
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection


statics.load_statics(globals())

gremlin_version = tuple([int(x) for x in version('gremlinpython').split('.')])
if (gremlin_version <= (3, 4, 0)):
graph = Graph()
g = graph.traversal().withRemote(DriverRemoteConnection(GRAPH_DB_URL, 'g'))
else:
from gremlin_python.process.anonymous_traversal import traversal
g = traversal().withRemote(DriverRemoteConnection(GRAPH_DB_URL, 'g',
username=GRAPH_DB_USER, password=GRAPH_DB_PASSWORD))


# clear database
g.V().drop().iterate()

# add vertices
g.addV('person').property(T.id, 0).next()
And I get the following error message:
gremlin_python.driver.protocol.GremlinServerError: 500: Vertex does not support user supplied identifiers
gremlin_python.driver.protocol.GremlinServerError: 500: Vertex does not support user supplied identifiers
Solution:
You could do both ``` graph.set-vertex-id=true graph.allow-custom-vid-types=true...
Jump to solution
16 Replies
Isopropyl9
Isopropyl97mo ago
My ./janusgraph/janusgraph.properties contains the following:
# Copyright 2023 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
...
# more default contents
...
# Directory to store index data locally
#
# Default: (no default value)
# Data Type: String
# Mutability: MASKABLE
index.search.directory = /var/lib/janusgraph/index

# ALLOW SETTING OF CUSTOM IDs
graph.set-vertex-id=true
# Copyright 2023 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
...
# more default contents
...
# Directory to store index data locally
#
# Default: (no default value)
# Data Type: String
# Mutability: MASKABLE
index.search.directory = /var/lib/janusgraph/index

# ALLOW SETTING OF CUSTOM IDs
graph.set-vertex-id=true
I've investigated the logs and found this message:
btc_janusgraph | 23:37:12 WARN org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.getOptionsWithDiscrepancies - Local setting graph.set-vertex-id=true (Type: GLOBAL_OFFLINE) is overridden by globally managed value (false). Use the ManagementSystem interface instead of the local configuration to control this setting.
btc_janusgraph | 23:37:12 WARN org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.getOptionsWithDiscrepancies - Local setting graph.set-vertex-id=true (Type: GLOBAL_OFFLINE) is overridden by globally managed value (false). Use the ManagementSystem interface instead of the local configuration to control this setting.
So now my question becomes, how do I set the global value?
Isopropyl9
Isopropyl97mo ago
Thank you very much for getting back to me. I tried connecting to the graph in gremlin console using the following:
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-server.properties')
Could not instantiate implementation: org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager
Type ':help' or ':h' for help.
Display stack trace? [yN]
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-server.properties')
Could not instantiate implementation: org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager
Type ':help' or ':h' for help.
Display stack trace? [yN]
As you can see, it didn't work. So I think I must have bigger issues haha. I thought berkeley was used by default, but it doesn't seem to be working. I also tried the inmemory config file just in case, and I was able to successfully set the value, as you said:
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-inmemory-server.properties
')
04:41:33 INFO org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.setupTimestampProvider - Set default timestamp provider MICRO
04:41:33 INFO org.janusgraph.graphdb.idmanagement.UniqueInstanceIdRetriever.getOrGenerateUniqueInstanceId - Generated unique-instance-id=c0a8e0025844-4b81751be49f1
04:41:33 INFO org.janusgraph.diskstorage.configuration.ExecutorServiceBuilder.buildFixedExecutorService - Initiated fixed thread pool of size 24
04:41:33 INFO org.janusgraph.diskstorage.Backend.initialize - Configuring total store cache size: 437259072
04:41:33 INFO org.janusgraph.graphdb.database.StandardJanusGraph.<init> - Gremlin script evaluation is disabled
04:41:33 INFO org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller.initializeTimepoint - Loaded unidentified ReadMarker start time 2023-11-10T04:41:33.093469Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@72a0a60d
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin> mgmt = graph.openManagement();
==>org.janusgraph.graphdb.database.management.ManagementSystem@2f4545c6
gremlin> mgmt.get('graph.set-vertex-id')
==>false
gremlin> mgmt.set("graph.set-vertex-id", true);
==>org.janusgraph.diskstorage.configuration.UserModifiableConfiguration@36f40d72
gremlin> mgmt.get('graph.set-vertex-id')
==>true
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-inmemory-server.properties
')
04:41:33 INFO org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.setupTimestampProvider - Set default timestamp provider MICRO
04:41:33 INFO org.janusgraph.graphdb.idmanagement.UniqueInstanceIdRetriever.getOrGenerateUniqueInstanceId - Generated unique-instance-id=c0a8e0025844-4b81751be49f1
04:41:33 INFO org.janusgraph.diskstorage.configuration.ExecutorServiceBuilder.buildFixedExecutorService - Initiated fixed thread pool of size 24
04:41:33 INFO org.janusgraph.diskstorage.Backend.initialize - Configuring total store cache size: 437259072
04:41:33 INFO org.janusgraph.graphdb.database.StandardJanusGraph.<init> - Gremlin script evaluation is disabled
04:41:33 INFO org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller.initializeTimepoint - Loaded unidentified ReadMarker start time 2023-11-10T04:41:33.093469Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@72a0a60d
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin> mgmt = graph.openManagement();
==>org.janusgraph.graphdb.database.management.ManagementSystem@2f4545c6
gremlin> mgmt.get('graph.set-vertex-id')
==>false
gremlin> mgmt.set("graph.set-vertex-id", true);
==>org.janusgraph.diskstorage.configuration.UserModifiableConfiguration@36f40d72
gremlin> mgmt.get('graph.set-vertex-id')
==>true
But my code still produced the same error.
Bo
Bo7mo ago
Can you show the stacktrace for the Berkeley store error? Regarding in-memory graph: could you show the full code/log that shows that your code still produces the same error? Btw I would also want to point out that,
g.addV('person').property(T.id, 0).next()
g.addV('person').property(T.id, 0).next()
is invalid in JanusGraph. Not all numerical values are legal id values in JanusGraph. https://docs.janusgraph.org/advanced-topics/custom-vertex-id/#custom-long-id shows how you can get a legal numerical id. It may not work in gremlin-python, unfortunately. My suggestion is use string type custom id
Isopropyl9
Isopropyl97mo ago
Here is the stack trace for graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-server.properties')
Isopropyl9
Isopropyl97mo ago
And here is the stack trace when I run the same Python script:
Bo
Bo7mo ago
The BerkeleyDB error is specific to BerkeleyDB itself. See if https://stackoverflow.com/questions/8612659/berkeley-db-error-the-je-lck-file-could-not-be-locked works. Try destroying your container and start over.
Stack Overflow
Berkeley DB error: The je.lck file could not be locked
We are encountering the ff error when running a java program using a berkeley DB. "The environment cannot be locked for single writer access. ENV_LOCKED: The je.lck file could not be locked.
Bo
Bo7mo ago
The second one looks more interesting. What if you add the vertex from the gremlin-console? Do you still see the same problem?
Isopropyl9
Isopropyl97mo ago
I tried tearing down/rebuilding the docker container (and deleting volumes), and I got the same error to do with lock files. Then I tried deleting the lock file just to see what would happen, and I got the following. It seems that there is a global variable that is setting the indexing backend to elasticsearch. But I thought I was using lucene? I don't believe I need a fancy indexing backend for my simple project, so I thought lucene would make things more simple. And I am not sure how I could change these global values without being able to create a graph instance in the gremlin console.
Isopropyl9
Isopropyl97mo ago
I would like to mention again that my docker config only contains the following:
version: '3.8'
services:
btc_janusgraph:
# build: ./janusgraph
image: janusgraph/janusgraph:latest
container_name: btc_janusgraph
environment:
janusgraph.set-vertex-id: true
set-vertex-id: true
janusgraph.storage.backend: berkeleyje
storage.backend: berkeleyje
ports:
- "${JANUSGRAPH_PORT:-8182}:${JANUSGRAPH_PORT:-8182}"
- "8484:8184"
networks:
- btc-network
volumes:
- btc_janusgraph_data:/var/lib/janusgraph
- "./janusgraph/janusgraph.properties:/etc/opt/janusgraph/janusgraph.properties:ro"
healthcheck:
test: ["CMD", "bin/gremlin.sh", "-e", "scripts/remote-connect.groovy"]
interval: 10s
timeout: 60s
retries: 4
version: '3.8'
services:
btc_janusgraph:
# build: ./janusgraph
image: janusgraph/janusgraph:latest
container_name: btc_janusgraph
environment:
janusgraph.set-vertex-id: true
set-vertex-id: true
janusgraph.storage.backend: berkeleyje
storage.backend: berkeleyje
ports:
- "${JANUSGRAPH_PORT:-8182}:${JANUSGRAPH_PORT:-8182}"
- "8484:8184"
networks:
- btc-network
volumes:
- btc_janusgraph_data:/var/lib/janusgraph
- "./janusgraph/janusgraph.properties:/etc/opt/janusgraph/janusgraph.properties:ro"
healthcheck:
test: ["CMD", "bin/gremlin.sh", "-e", "scripts/remote-connect.groovy"]
interval: 10s
timeout: 60s
retries: 4
And my custom janusgraph.properties config file volume contains the following:
gremlin.graph=org.janusgraph.core.JanusGraphFactory

storage.backend=berkeleyje
storage.directory=/var/lib/janusgraph/data
index.default.backend=lucene
index.default.directory=/var/lib/janusgraph/index
set-vertex-id=true
gremlin.graph=org.janusgraph.core.JanusGraphFactory

storage.backend=berkeleyje
storage.directory=/var/lib/janusgraph/data
index.default.backend=lucene
index.default.directory=/var/lib/janusgraph/index
set-vertex-id=true
I also get the following messages to do with my "read only" volume at the beginning of my janusgraph logs:
2023-11-10 18:16:07 cp: cannot create regular file '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 chown: changing ownership of '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 chmod: changing permissions of '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 sed: cannot rename /etc/opt/janusgraph/sedUO9BEQ: Device or resource busy
2023-11-10 18:16:07 sed: cannot rename /etc/opt/janusgraph/sedh8uf1G: Device or resource busy
2023-11-10 18:16:18 /etc/opt/janusgraph/janusgraph-server.yaml will be used to start JanusGraph Server in foreground
2023-11-10 18:16:07 cp: cannot create regular file '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 chown: changing ownership of '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 chmod: changing permissions of '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 sed: cannot rename /etc/opt/janusgraph/sedUO9BEQ: Device or resource busy
2023-11-10 18:16:07 sed: cannot rename /etc/opt/janusgraph/sedh8uf1G: Device or resource busy
2023-11-10 18:16:18 /etc/opt/janusgraph/janusgraph-server.yaml will be used to start JanusGraph Server in foreground
Could this be a problem? If I don't make it read-only, the config file gets overridden as soon as the container starts. Notice the "ro" at the end of my volume.
Bo
Bo7mo ago
You should do graph.set-vertex-id instead of set-vertex-id . I see why you used set-vertex-id instead of its full form - the doc was a bit misleading.
It seems that there is a global variable that is setting the indexing backend to elasticsearch. But I thought I was using lucene gremlin> graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-lucene-server.properties') 02:17:47 WARN org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.getOptionsWithDiscrepancies - Local setting index.search.backend=lucene (Type: GLOBAL_OFFLINE) is overridden by globally managed value (elasticsearch). Use the ManagementSystem interface instead of the local configuration to control this setting.
This clearly shows you have stale configuration. Maybe your volume wasn't really completely deleted. I think it would make more sense to start without Docker. You seem to struggle with Docker setup and JanusGraph setup at the same time. Let's get a plain JanusGraph setup correct first.
Isopropyl9
Isopropyl97mo ago
I have decided to just add a custom "id" property, as opposed to setting the actual ID controlled by janusgraph. I assumed providing custom IDs would be the most practical, but It seems that having a custom field works well enough for my purposes. Thank you for helping me with this, but I think there are too many intricate details for me to deal with and understand with this right now. As you suggested, the janusgraph.graph.set-vertex-id: true property in docker-compose.yml worked, but after trying to provide my own ID values it said that they were invalid. I have provided the full stack trace. Strangely, using an id of 0 returns a different error than other IDs, such as 1.
> g.addV('person').property(T.id, 1).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 1
> g.addV('person').property(T.id, 2).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 2
> g.addV('person').property(T.id, 132331).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 132331
> g.addV('person').property(T.id, 1).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 1
> g.addV('person').property(T.id, 2).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 2
> g.addV('person').property(T.id, 132331).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 132331
Reading the documentation, it sounds like we're supposed to get a custom ID by using the ID manager like so:
graph.getIDManager().fromVertexID(long)
graph.getIDManager().fromVertexID(long)
But I cannot figure out how to use this in gremlin-python. I believe it is janusgraph specific.
Bo
Bo7mo ago
Yes this is JanusGraph specific so you cannot do it in gremlin-python Unfortunately we don't have a JanusGraph-specific driver for python
Solution
Bo
Bo7mo ago
You could do both
graph.set-vertex-id=true
graph.allow-custom-vid-types=true
graph.set-vertex-id=true
graph.allow-custom-vid-types=true
Then you could use any arbitrary string ID, e.g.
g.addV("person").property(T.id, "1").next()
g.addV("person").property(T.id, "1").next()
Bo
Bo7mo ago
But remember you should be using org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV3 instead of org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1 (you are probably already using it)
mle
mle5mo ago
Hacky solution here: id manager to vertex Id is doing this operation: id <<1 3. If you do this operation yourself you don't need to use toVertexid. Side note: I am curious where are you reading janusgraph logs, I don't see them in my gke cluster