Llama-index has a agenerator function in query that you can call such as this: result = query_eng.query(question) for whatever reason in runpod container, it wouldn't execute
for response in result.response_gen:
print(f"response from query: {response}")
yield {"word": response}
It simply skips for response in result.response_gen: entirely. I tested this locally and it runs fine. Is it because of some type of timeout?