Getting Property Out of a Variable in Python Gremlin Query

I've been working on attempting to find a performance way to route from point A to point B. Right now my schema is as follows Vertex Labels: 1. Airport: - Properties: airport: Airport code. 2. Day: - Properties: Datestamp representing the day. Edge Labels: 1. has_flights_on: - Connects Airport to Day. - Indicates an airport has flights on a specific day. 2. flight: - Connects Day to Airport (destination). - Properties: flight_id, origin, destination, start_timestamp, end_timestamp, capacity, tail_number, equipment_type. Graph Structure: - Airport vertex for each airport. - Day vertex for specific days connected to airports with flights. - has_flights_on edges connect Airport to Day. - flight edges connect Day to Airport, representing flights. Example: - Airport (JFK) -> has_flightson -> Day (2023-08-03) -> flight -> Airport (LAX) - Indicates a flight from JFK to LAX on August 3, 2023. This is my query right now ``` g.V().has('Airport', 'airport', origin) # find origin airport .as('current_airport') .repeat( .out('has_flights_on') # Go to the corresponding Day vertex .has('date', gte(start_datestamp)) .has('date', lte(end_datestamp)) .outE('flight') # Go to the Flight edge .where(.values('origin').is_(.select('currentairport').values('airport'))) # this is the problem .inV().hasLabel('Airport') # Go to the destination Airport vertex .as('current_airport') # Update the current airport alias .simplePath() # Ensure that the path doesn't contain cycles ) .until(.has('airport', destination).or().loops().is(4)) # Finish when destination is reached or max hops exceeded .has('airport', destination) .path() .by(__.valueMap('airport', 'flight_id', 'start_timestamp', 'end_timestamp')) .toList() ``` The problem is the where, see below for more context.
10 Replies
terrabl
terrabl16mo ago
Basically I'm not sure I'm grabbing the airport out correctly from the where step, when I do something like .where(__.values('origin').is_(origin)) instead, it will print out paths (obviously not the correct paths but it does print some). I'm basically trying to figure out how to grab the string value out of the current_airport variable and compare that to the origin of the flight edge...
spmallette
spmallette16mo ago
is() step doesn't take a Traversal as an argument. you want something more like:
...
outE('flight').as('f').
where('f', eq('current_airport')).by('origin').by('airport').
...
...
outE('flight').as('f').
where('f', eq('current_airport')).by('origin').by('airport').
...
terrabl
terrabl16mo ago
I could kiss you!
paths2 = (
g.V().has('Airport', 'airport', origin) # find origin airport
.as_('current_airport')
.repeat(
__.out('has_flights_on') # Go to the corresponding Day vertex
.has('date', gte(start_datestamp))
.has('date', lte(end_datestamp))
.outE('flight') # Go to the Flight edge
.as_('f')
.where('f', eq('current_airport')).by('origin').by('airport')
# .where(__.values('origin').is_(origin))
.inV().hasLabel('Airport') # Go to the destination Airport vertex
.as_('current_airport') # Update the current airport alias
.simplePath() # Ensure that the path doesn't contain cycles
)
.until(__.has('airport', destination).or_().loops().is_(3)) # Finish when destination is reached or max hops exceeded
.has('airport', destination)
.path()
.by(__.valueMap('airport', 'flight_id', 'start_timestamp', 'end_timestamp'))
.toList()
)
paths2 = (
g.V().has('Airport', 'airport', origin) # find origin airport
.as_('current_airport')
.repeat(
__.out('has_flights_on') # Go to the corresponding Day vertex
.has('date', gte(start_datestamp))
.has('date', lte(end_datestamp))
.outE('flight') # Go to the Flight edge
.as_('f')
.where('f', eq('current_airport')).by('origin').by('airport')
# .where(__.values('origin').is_(origin))
.inV().hasLabel('Airport') # Go to the destination Airport vertex
.as_('current_airport') # Update the current airport alias
.simplePath() # Ensure that the path doesn't contain cycles
)
.until(__.has('airport', destination).or_().loops().is_(3)) # Finish when destination is reached or max hops exceeded
.has('airport', destination)
.path()
.by(__.valueMap('airport', 'flight_id', 'start_timestamp', 'end_timestamp'))
.toList()
)
It seems to be printing out duplicate paths though, trying to figure that out now.
path[{'airport': ['MIA']}, {}, {'end_timestamp': 1686567600, 'start_timestamp': 1686561000, 'flight_id': '1912MIA20230612'}, {'airport': ['ATL']}]
path[{'airport': ['MIA']}, {}, {'end_timestamp': 1686567600, 'start_timestamp': 1686561000, 'flight_id': '1912MIA20230612'}, {'airport': ['ATL']}]
path[{'airport': ['MIA']}, {}, {'end_timestamp': 1686621960, 'start_timestamp': 1686613560, 'flight_id': '0191MIA20230612'}, {'airport': ['BWI']}, {}, {'end_timestamp': 1686774840, 'start_timestamp': 1686763260, 'flight_id': '3002BWI20230614'}, {'airport': ['ATL']}]
path[{'airport': ['MIA']}, {}, {'end_timestamp': 1686621960, 'start_timestamp': 1686613560, 'flight_id': '0191MIA20230612'}, {'airport': ['BWI']}, {}, {'end_timestamp': 1686774840, 'start_timestamp': 1686763260, 'flight_id': '3002BWI20230614'}, {'airport': ['ATL']}]
path[{'airport': ['MIA']}, {}, {'end_timestamp': 1686567600, 'start_timestamp': 1686561000, 'flight_id': '1912MIA20230612'}, {'airport': ['ATL']}]
path[{'airport': ['MIA']}, {}, {'end_timestamp': 1686567600, 'start_timestamp': 1686561000, 'flight_id': '1912MIA20230612'}, {'airport': ['ATL']}]
path[{'airport': ['MIA']}, {}, {'end_timestamp': 1686621960, 'start_timestamp': 1686613560, 'flight_id': '0191MIA20230612'}, {'airport': ['BWI']}, {}, {'end_timestamp': 1686774840, 'start_timestamp': 1686763260, 'flight_id': '3002BWI20230614'}, {'airport': ['ATL']}]
path[{'airport': ['MIA']}, {}, {'end_timestamp': 1686621960, 'start_timestamp': 1686613560, 'flight_id': '0191MIA20230612'}, {'airport': ['BWI']}, {}, {'end_timestamp': 1686774840, 'start_timestamp': 1686763260, 'flight_id': '3002BWI20230614'}, {'airport': ['ATL']}]
Hmm I'm going to spend some more time trying to troubleshoot this. I might be creating duplicate has_flights_on yup that was it, i was creating duplicate has_flights_on which caused it to print duplicate paths. damn performance is still not where i would expect it to be... Does saving the edges / vertices as variables have a big performance impact?
spmallette
spmallette16mo ago
what graph database is this?
terrabl
terrabl16mo ago
tinkerpop right now, i'll eventually move it to neptune just trying to build out potential models and performance test them
spmallette
spmallette16mo ago
so tinkergraph? btw, performance can range wildly among graph databases. they all sorta have their own way of optimizing things. something wickedly fast on one could be hideously slow on another. that said, if TinkerGraph is slow, its likely going to be slow everywhere
terrabl
terrabl16mo ago
The binaries are from apache-tinkerpop-gremlin-servier-3.5.2
spmallette
spmallette16mo ago
so, your running the queries against Gremlin Server which is hosting TinkerGraph?
terrabl
terrabl16mo ago
Yes I believe so. Mostly just for ease, and if performance is good enough here then I can build it out in my eventual solution But yeah if performance is bad here, when i scale up it's only going to get worse.
spmallette
spmallette16mo ago
hard to say what might be the issue with performance for you. if you care to share more information about your data, the query, expectations, etc, feel free to start a fresh question with more details.
Want results from more Discord servers?
Add your server