AT
Apache TinkerPop
questions
Docker Janusgraph Custom ID Values
I'm trying to setup a janusgraph database with custom verex ID values. I have the following docker-compose configuration:
Then, after setting up a Python environment with gremlin-python version 3.5.7, I execute the following:
And I get the following error message:
version: '3.8'
services:
btc_janusgraph:
# build: ./janusgraph
image: janusgraph/janusgraph:latest
container_name: btc_janusgraph
environment:
janusgraph.set-vertex-id: true
ports:
- "${JANUSGRAPH_PORT:-8182}:${JANUSGRAPH_PORT:-8182}"
networks:
- btc-network
volumes:
- btc_janusgraph_data:/var/lib/janusgraph
- "./janusgraph/janusgraph.properties:/etc/opt/janusgraph/janusgraph.properties:ro"
version: '3.8'
services:
btc_janusgraph:
# build: ./janusgraph
image: janusgraph/janusgraph:latest
container_name: btc_janusgraph
environment:
janusgraph.set-vertex-id: true
ports:
- "${JANUSGRAPH_PORT:-8182}:${JANUSGRAPH_PORT:-8182}"
networks:
- btc-network
volumes:
- btc_janusgraph_data:/var/lib/janusgraph
- "./janusgraph/janusgraph.properties:/etc/opt/janusgraph/janusgraph.properties:ro"
from dotenv import load_dotenv
from graph.base import g
from gremlin_python import statics
from gremlin_python.process.traversal import T
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.process.graph_traversal import GraphTraversalSource
from gremlin_python.structure.graph import Graph
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
statics.load_statics(globals())
gremlin_version = tuple([int(x) for x in version('gremlinpython').split('.')])
if (gremlin_version <= (3, 4, 0)):
graph = Graph()
g = graph.traversal().withRemote(DriverRemoteConnection(GRAPH_DB_URL, 'g'))
else:
from gremlin_python.process.anonymous_traversal import traversal
g = traversal().withRemote(DriverRemoteConnection(GRAPH_DB_URL, 'g',
username=GRAPH_DB_USER, password=GRAPH_DB_PASSWORD))
# clear database
g.V().drop().iterate()
# add vertices
g.addV('person').property(T.id, 0).next()
from dotenv import load_dotenv
from graph.base import g
from gremlin_python import statics
from gremlin_python.process.traversal import T
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.process.graph_traversal import GraphTraversalSource
from gremlin_python.structure.graph import Graph
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
statics.load_statics(globals())
gremlin_version = tuple([int(x) for x in version('gremlinpython').split('.')])
if (gremlin_version <= (3, 4, 0)):
graph = Graph()
g = graph.traversal().withRemote(DriverRemoteConnection(GRAPH_DB_URL, 'g'))
else:
from gremlin_python.process.anonymous_traversal import traversal
g = traversal().withRemote(DriverRemoteConnection(GRAPH_DB_URL, 'g',
username=GRAPH_DB_USER, password=GRAPH_DB_PASSWORD))
# clear database
g.V().drop().iterate()
# add vertices
g.addV('person').property(T.id, 0).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Vertex does not support user supplied identifiers
gremlin_python.driver.protocol.GremlinServerError: 500: Vertex does not support user supplied identifiers
Solution:
You could do both
```
graph.set-vertex-id=true
graph.allow-custom-vid-types=true...
Jump to solutionMy
I've investigated the logs and found this message:
So now my question becomes, how do I set the global value?
./janusgraph/janusgraph.properties
contains the following:
# Copyright 2023 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
...
# more default contents
...
# Directory to store index data locally
#
# Default: (no default value)
# Data Type: String
# Mutability: MASKABLE
index.search.directory = /var/lib/janusgraph/index
# ALLOW SETTING OF CUSTOM IDs
graph.set-vertex-id=true
# Copyright 2023 JanusGraph Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
...
# more default contents
...
# Directory to store index data locally
#
# Default: (no default value)
# Data Type: String
# Mutability: MASKABLE
index.search.directory = /var/lib/janusgraph/index
# ALLOW SETTING OF CUSTOM IDs
graph.set-vertex-id=true
btc_janusgraph | 23:37:12 WARN org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.getOptionsWithDiscrepancies - Local setting graph.set-vertex-id=true (Type: GLOBAL_OFFLINE) is overridden by globally managed value (false). Use the ManagementSystem interface instead of the local configuration to control this setting.
btc_janusgraph | 23:37:12 WARN org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.getOptionsWithDiscrepancies - Local setting graph.set-vertex-id=true (Type: GLOBAL_OFFLINE) is overridden by globally managed value (false). Use the ManagementSystem interface instead of the local configuration to control this setting.
Thank you very much for getting back to me. I tried connecting to the graph in gremlin console using the following:
As you can see, it didn't work. So I think I must have bigger issues haha. I thought berkeley was used by default, but it doesn't seem to be working.
I also tried the inmemory config file just in case, and I was able to successfully set the value, as you said:
But my code still produced the same error.
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-server.properties')
Could not instantiate implementation: org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager
Type ':help' or ':h' for help.
Display stack trace? [yN]
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-server.properties')
Could not instantiate implementation: org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager
Type ':help' or ':h' for help.
Display stack trace? [yN]
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-inmemory-server.properties
')
04:41:33 INFO org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.setupTimestampProvider - Set default timestamp provider MICRO
04:41:33 INFO org.janusgraph.graphdb.idmanagement.UniqueInstanceIdRetriever.getOrGenerateUniqueInstanceId - Generated unique-instance-id=c0a8e0025844-4b81751be49f1
04:41:33 INFO org.janusgraph.diskstorage.configuration.ExecutorServiceBuilder.buildFixedExecutorService - Initiated fixed thread pool of size 24
04:41:33 INFO org.janusgraph.diskstorage.Backend.initialize - Configuring total store cache size: 437259072
04:41:33 INFO org.janusgraph.graphdb.database.StandardJanusGraph.<init> - Gremlin script evaluation is disabled
04:41:33 INFO org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller.initializeTimepoint - Loaded unidentified ReadMarker start time 2023-11-10T04:41:33.093469Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@72a0a60d
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin> mgmt = graph.openManagement();
==>org.janusgraph.graphdb.database.management.ManagementSystem@2f4545c6
gremlin> mgmt.get('graph.set-vertex-id')
==>false
gremlin> mgmt.set("graph.set-vertex-id", true);
==>org.janusgraph.diskstorage.configuration.UserModifiableConfiguration@36f40d72
gremlin> mgmt.get('graph.set-vertex-id')
==>true
gremlin> graph = JanusGraphFactory.open('conf/janusgraph-inmemory-server.properties
')
04:41:33 INFO org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.setupTimestampProvider - Set default timestamp provider MICRO
04:41:33 INFO org.janusgraph.graphdb.idmanagement.UniqueInstanceIdRetriever.getOrGenerateUniqueInstanceId - Generated unique-instance-id=c0a8e0025844-4b81751be49f1
04:41:33 INFO org.janusgraph.diskstorage.configuration.ExecutorServiceBuilder.buildFixedExecutorService - Initiated fixed thread pool of size 24
04:41:33 INFO org.janusgraph.diskstorage.Backend.initialize - Configuring total store cache size: 437259072
04:41:33 INFO org.janusgraph.graphdb.database.StandardJanusGraph.<init> - Gremlin script evaluation is disabled
04:41:33 INFO org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller.initializeTimepoint - Loaded unidentified ReadMarker start time 2023-11-10T04:41:33.093469Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@72a0a60d
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin> mgmt = graph.openManagement();
==>org.janusgraph.graphdb.database.management.ManagementSystem@2f4545c6
gremlin> mgmt.get('graph.set-vertex-id')
==>false
gremlin> mgmt.set("graph.set-vertex-id", true);
==>org.janusgraph.diskstorage.configuration.UserModifiableConfiguration@36f40d72
gremlin> mgmt.get('graph.set-vertex-id')
==>true
Can you show the stacktrace for the Berkeley store error?
Regarding in-memory graph: could you show the full code/log that shows that your code still produces the same error?
Btw I would also want to point out that,
is invalid in JanusGraph. Not all numerical values are legal id values in JanusGraph. https://docs.janusgraph.org/advanced-topics/custom-vertex-id/#custom-long-id shows how you can get a legal numerical id. It may not work in gremlin-python, unfortunately.
My suggestion is use string type custom id
g.addV('person').property(T.id, 0).next()
g.addV('person').property(T.id, 0).next()
Here is the stack trace for
graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-server.properties')
And here is the stack trace when I run the same Python script:
The BerkeleyDB error is specific to BerkeleyDB itself. See if https://stackoverflow.com/questions/8612659/berkeley-db-error-the-je-lck-file-could-not-be-locked works. Try destroying your container and start over.
The second one looks more interesting. What if you add the vertex from the gremlin-console? Do you still see the same problem?
I tried tearing down/rebuilding the docker container (and deleting volumes), and I got the same error to do with lock files. Then I tried deleting the lock file just to see what would happen, and I got the following. It seems that there is a global variable that is setting the indexing backend to elasticsearch. But I thought I was using lucene? I don't believe I need a fancy indexing backend for my simple project, so I thought lucene would make things more simple. And I am not sure how I could change these global values without being able to create a
graph
instance in the gremlin console.I would like to mention again that my docker config only contains the following:
And my custom janusgraph.properties config file volume contains the following:
I also get the following messages to do with my "read only" volume at the beginning of my janusgraph logs:
Could this be a problem? If I don't make it read-only, the config file gets overridden as soon as the container starts. Notice the "ro" at the end of my volume.
version: '3.8'
services:
btc_janusgraph:
# build: ./janusgraph
image: janusgraph/janusgraph:latest
container_name: btc_janusgraph
environment:
janusgraph.set-vertex-id: true
set-vertex-id: true
janusgraph.storage.backend: berkeleyje
storage.backend: berkeleyje
ports:
- "${JANUSGRAPH_PORT:-8182}:${JANUSGRAPH_PORT:-8182}"
- "8484:8184"
networks:
- btc-network
volumes:
- btc_janusgraph_data:/var/lib/janusgraph
- "./janusgraph/janusgraph.properties:/etc/opt/janusgraph/janusgraph.properties:ro"
healthcheck:
test: ["CMD", "bin/gremlin.sh", "-e", "scripts/remote-connect.groovy"]
interval: 10s
timeout: 60s
retries: 4
version: '3.8'
services:
btc_janusgraph:
# build: ./janusgraph
image: janusgraph/janusgraph:latest
container_name: btc_janusgraph
environment:
janusgraph.set-vertex-id: true
set-vertex-id: true
janusgraph.storage.backend: berkeleyje
storage.backend: berkeleyje
ports:
- "${JANUSGRAPH_PORT:-8182}:${JANUSGRAPH_PORT:-8182}"
- "8484:8184"
networks:
- btc-network
volumes:
- btc_janusgraph_data:/var/lib/janusgraph
- "./janusgraph/janusgraph.properties:/etc/opt/janusgraph/janusgraph.properties:ro"
healthcheck:
test: ["CMD", "bin/gremlin.sh", "-e", "scripts/remote-connect.groovy"]
interval: 10s
timeout: 60s
retries: 4
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=berkeleyje
storage.directory=/var/lib/janusgraph/data
index.default.backend=lucene
index.default.directory=/var/lib/janusgraph/index
set-vertex-id=true
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=berkeleyje
storage.directory=/var/lib/janusgraph/data
index.default.backend=lucene
index.default.directory=/var/lib/janusgraph/index
set-vertex-id=true
2023-11-10 18:16:07 cp: cannot create regular file '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 chown: changing ownership of '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 chmod: changing permissions of '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 sed: cannot rename /etc/opt/janusgraph/sedUO9BEQ: Device or resource busy
2023-11-10 18:16:07 sed: cannot rename /etc/opt/janusgraph/sedh8uf1G: Device or resource busy
2023-11-10 18:16:18 /etc/opt/janusgraph/janusgraph-server.yaml will be used to start JanusGraph Server in foreground
2023-11-10 18:16:07 cp: cannot create regular file '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 chown: changing ownership of '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 chmod: changing permissions of '/etc/opt/janusgraph/janusgraph.properties': Read-only file system
2023-11-10 18:16:07 sed: cannot rename /etc/opt/janusgraph/sedUO9BEQ: Device or resource busy
2023-11-10 18:16:07 sed: cannot rename /etc/opt/janusgraph/sedh8uf1G: Device or resource busy
2023-11-10 18:16:18 /etc/opt/janusgraph/janusgraph-server.yaml will be used to start JanusGraph Server in foreground
You should do
graph.set-vertex-id
instead of set-vertex-id
. I see why you used set-vertex-id
instead of its full form - the doc was a bit misleading.
It seems that there is a global variable that is setting the indexing backend to elasticsearch. But I thought I was using lucene gremlin> graph = JanusGraphFactory.open('conf/janusgraph-berkeleyje-lucene-server.properties') 02:17:47 WARN org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.getOptionsWithDiscrepancies - Local setting index.search.backend=lucene (Type: GLOBAL_OFFLINE) is overridden by globally managed value (elasticsearch). Use the ManagementSystem interface instead of the local configuration to control this setting.This clearly shows you have stale configuration. Maybe your volume wasn't really completely deleted. I think it would make more sense to start without Docker. You seem to struggle with Docker setup and JanusGraph setup at the same time. Let's get a plain JanusGraph setup correct first.
I have decided to just add a custom "id" property, as opposed to setting the actual ID controlled by janusgraph. I assumed providing custom IDs would be the most practical, but It seems that having a custom field works well enough for my purposes. Thank you for helping me with this, but I think there are too many intricate details for me to deal with and understand with this right now.
As you suggested, the
Reading the documentation, it sounds like we're supposed to get a custom ID by using the ID manager like so:
But I cannot figure out how to use this in gremlin-python. I believe it is janusgraph specific.
janusgraph.graph.set-vertex-id: true
property in docker-compose.yml worked, but after trying to provide my own ID values it said that they were invalid. I have provided the full stack trace. Strangely, using an id of 0
returns a different error than other IDs, such as 1
.
> g.addV('person').property(T.id, 1).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 1
> g.addV('person').property(T.id, 2).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 2
> g.addV('person').property(T.id, 132331).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 132331
> g.addV('person').property(T.id, 1).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 1
> g.addV('person').property(T.id, 2).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 2
> g.addV('person').property(T.id, 132331).next()
gremlin_python.driver.protocol.GremlinServerError: 500: Not a valid vertex id: 132331
graph.getIDManager().fromVertexID(long)
graph.getIDManager().fromVertexID(long)
Yes this is JanusGraph specific so you cannot do it in gremlin-python
Unfortunately we don't have a JanusGraph-specific driver for python
Solution
You could do both
Then you could use any arbitrary string ID, e.g.
graph.set-vertex-id=true
graph.allow-custom-vid-types=true
graph.set-vertex-id=true
graph.allow-custom-vid-types=true
g.addV("person").property(T.id, "1").next()
g.addV("person").property(T.id, "1").next()
But remember you should be using
org.apache.tinkerpop.gremlin.util.ser.GraphSONMessageSerializerV3
instead of org.apache.tinkerpop.gremlin.util.ser.GraphBinaryMessageSerializerV1
(you are probably already using it)Looking for more? Join the community!
AT
Apache TinkerPop
questions
AT
Apache TinkerPop
questions
Want results from more Discord servers?
Recommended PostsReusing connectionsHi,
I'm wondering what's the recommended way of using connections to a graph DB. The documentation Can I surpress gremlin console's warnings?How can I surpress these WARNING messages? I've tried gremlin -l but can't seem to get the syntax rSequential IDs in Neptune?@neptune
I'm attempting to implement sequential IDs for the vertices in our AWS Neptune graph. So Gremlin console vs REST APII'm trying to get a path and the properties of the vertices and the edges for that path by running aCryptic Neptune Gremlin Error Rate Creeping - What Would You Recommend?This relates more to do with Neptune usage, nevertheless, it is also related to the Gremlin Query erkubehoundIf anyone is familiar with KubeHound DSL. Can someone explain why Query 1 is different from Query 2.Help with visualizing in the graph-notebookI am trying to visualize a graph in the graph-notebook but no matter what I do I cannot get it to beGremlin browser code editorHi, I'm looking for a code editor like monaco https://microsoft.github.io/monaco-editor/ to embed inConnecting to local gremlin server with websocket addressHello everyone. I'm looking for help with a client app written in Java that uses Tinkerpop Gremlin tClarification on Kerberos configuration for Gremlin DriverI'm a little bit unclear on the role of the JAAS configuration file for the Gremlin client in the coGremlin Driver and frequently changing serversIn a containerised environment, hosts are frequently replaced and their IP address can change severaGlobal SearchIs there a way where i can scan all the vertex or edge properties that match a given keyword in gremGraphSON mapperHi,
I'm trying to ingest some data into AWS Neptune and due to its size I'm forced to use a bulk d.drop() behavior confussionI have a basic java app and I'm learning hot to send gremlin queries to a JanusGraph from that java Can I name the result of an anonymous traversal without moving the traverser?I can currently do the following:
```
Graph graph = TinkerFactory.createModern();
GraphTraversalSCan GraphBinary be used to save a graph to file?Can GraphBinary be used to save graph in a file. Any example is welcome.How to get cardinality of property?I have a multi property and I want to find out its cardinality. How can I do that?
valueMap/elementMinverted regex searchHey,
In my vertices I store escaped regexp statements as labels (e.g: 'wh.' which in theory should Debug message spam from tinkerpop server 3.7Right now, when connecting to my local tinkerpop server, I am getting incredible amounts of debug loShould by() Modulator Work For More Types?This works.
`gremlin> g.V().out().out().path().by("name")
==>[marko,josh,ripple]
==>[marko,josh,lop]