(Flux) Serverless inference crashes without logs.
Hi All!
I've built a FLUX inference container on Runpods serverless.
It works (sometimes) but I get a lot of random failures and Runpods does not return me the error logs.
E.g. this is the response:
'''
{
"delayTime": 151019,
"error": "job timed out after 1 retries",
"executionTime": 102002,
"id": "64de56ee-4af2-4c64-ab84-02d4a7e81593-u1",
"retries": 1,
"status": "FAILED",
"workerId": "1qjtmj861f1278"
}
'''
But no error log is reported, either in console or in the response, about what made the jobs re-try the first time.
Also the timeout should be one hour but I get this message after a few minutes.
I have also added a Telegram bot to log, but no exception is captured there as well. Did the machine just crash?
Have you experienced the same?
I've built a FLUX inference container on Runpods serverless.
It works (sometimes) but I get a lot of random failures and Runpods does not return me the error logs.
E.g. this is the response:
'''
{
"delayTime": 151019,
"error": "job timed out after 1 retries",
"executionTime": 102002,
"id": "64de56ee-4af2-4c64-ab84-02d4a7e81593-u1",
"retries": 1,
"status": "FAILED",
"workerId": "1qjtmj861f1278"
}
'''
But no error log is reported, either in console or in the response, about what made the jobs re-try the first time.
Also the timeout should be one hour but I get this message after a few minutes.
I have also added a Telegram bot to log, but no exception is captured there as well. Did the machine just crash?
Have you experienced the same?
