Getting Property Out of a Variable in Python Gremlin Query
I've been working on attempting to find a performance way to route from point A to point B.
Right now my schema is as follows
Vertex Labels:
1. Airport:
- Properties: airport: Airport code.
2. Day:
- Properties: Datestamp representing the day.
Edge Labels:
1. has_flights_on:
- Connects Airport to Day.
- Indicates an airport has flights on a specific day.
2. flight:
- Connects Day to Airport (destination).
- Properties: flight_id, origin, destination, start_timestamp, end_timestamp, capacity, tail_number, equipment_type.
Graph Structure:
- Airport vertex for each airport.
- Day vertex for specific days connected to airports with flights.
- has_flights_on edges connect Airport to Day.
- flight edges connect Day to Airport, representing flights.
Example:
- Airport (JFK) -> has_flightson -> Day (2023-08-03) -> flight -> Airport (LAX)
- Indicates a flight from JFK to LAX on August 3, 2023.
This is my query right now
```
g.V().has('Airport', 'airport', origin) # find origin airport
.as('current_airport')
.repeat(
.out('has_flights_on') # Go to the corresponding Day vertex
.has('date', gte(start_datestamp))
.has('date', lte(end_datestamp))
.outE('flight') # Go to the Flight edge .where(.values('origin').is_(.select('currentairport').values('airport'))) # this is the problem
.inV().hasLabel('Airport') # Go to the destination Airport vertex
.as('current_airport') # Update the current airport alias
.simplePath() # Ensure that the path doesn't contain cycles
)
.until(.has('airport', destination).or().loops().is(4)) # Finish when destination is reached or max hops exceeded
.has('airport', destination)
.path()
.by(__.valueMap('airport', 'flight_id', 'start_timestamp', 'end_timestamp'))
.toList()
```
The problem is the where, see below for more context.
10 Replies
Basically I'm not sure I'm grabbing the airport out correctly from the where step,
when I do something like
.where(__.values('origin').is_(origin))
instead, it will print out paths (obviously not the correct paths but it does print some). I'm basically trying to figure out how to grab the string value out of the current_airport variable and compare that to the origin of the flight edge...is()
step doesn't take a Traversal
as an argument. you want something more like:
I could kiss you!
It seems to be printing out duplicate paths though, trying to figure that out now.
Hmm I'm going to spend some more time trying to troubleshoot this. I might be creating duplicate has_flights_on
yup that was it, i was creating duplicate has_flights_on which caused it to print duplicate paths.
damn performance is still not where i would expect it to be...
Does saving the edges / vertices as variables have a big performance impact?
what graph database is this?
tinkerpop right now, i'll eventually move it to neptune
just trying to build out potential models and performance test them
so tinkergraph?
btw, performance can range wildly among graph databases. they all sorta have their own way of optimizing things. something wickedly fast on one could be hideously slow on another. that said, if TinkerGraph is slow, its likely going to be slow everywhere
The binaries are from apache-tinkerpop-gremlin-servier-3.5.2
so, your running the queries against Gremlin Server which is hosting TinkerGraph?
Yes I believe so. Mostly just for ease, and if performance is good enough here then I can build it out in my eventual solution
But yeah if performance is bad here, when i scale up it's only going to get worse.
hard to say what might be the issue with performance for you. if you care to share more information about your data, the query, expectations, etc, feel free to start a fresh question with more details.