Status endpoint only returns "COMPLETED" but no answer to the question

A

ashley•2/16/24, 6:30 PM

What kind of endpoint are you running. This is an issue with your endpoint not with the status API.

Kkingclimax7569 I'm currently using the v2/model_id/status/run_id endpoint and the results I get...

J

J.•2/16/24, 6:30 PM

https://docs.runpod.io/serverless/endpoints/invoke-jobs

Run and status should be correct

Invoke a Job | RunPod Documentation

Asynchronous Endpoints

J

J.•2/16/24, 6:30 PM

Ur main issue is maybe not returning properly

J

J.•2/16/24, 6:31 PM

If u want reference to functions that I made to make a /run call, and just keep polling their status:
https://github.com/justinwlin/runpod_whisperx_serverless_clientside_code/blob/main/runpod_client_helper.py

GitHub

runpod_whisperx_serverless_clientside_code/runpod_client_helper.py ...

Helper functions for Runpod to automatically poll my WhisperX API. Can be adapted to other use cases - justinwlin/runpod_whisperx_serverless_clientside_code

JJ.https://docs.runpod.io/serverless/endpoints/invoke-jobs Run and status should b...

K

kingclimax7569OP•2/16/24, 6:35 PM

I was using runsync instead of run, is that incorrect? I changed it to run and now I'm receiving IN_QUEUE instead

K

kingclimax7569OP•2/16/24, 6:35 PM

So I'm supposed to keep polling that?

Kkingclimax7569 So I'm supposed to keep polling that?

A

ashley•2/16/24, 6:36 PM

Yes, /run is asynchronous, but changing it will most likely not make any difference

A

ashley•2/16/24, 6:36 PM

if it does, then /runsync is broken

A

ashley•2/16/24, 6:43 PM

Just tested and both work fine for me.

Kkingclimax7569 I was using runsync instead of run, is that incorrect? I changed it to run and n...

J

J.•2/16/24, 6:51 PM

/run is great b/c /runsync I find I get a network timeout :))) but certaintly /runsync is also great if it short enough

J

J.•2/16/24, 6:51 PM

but also /run gives u a 30 min cache on runpod's end to store ur answer vs /runsync I forget how long but its <1 min i think

J

J.•2/16/24, 6:51 PM

so i find the 30 min cache nice

J

J.•2/16/24, 6:51 PM

also u can add a /webhook if u want it to call back to ur webhook when done with the response instead of polling

K

kingclimax7569OP•2/16/24, 6:56 PM

Yea im still not getting the output, just a value that says "COMPLETED"

K

kingclimax7569OP•2/16/24, 6:56 PM

import requests
import sys
import json
import time

bearer_token = "**"
endpoint_id = "**"

prompt = """
List me all of the US presidents?

"""

# Define the URL
url = f"https://api.runpod.ai/v2/{endpoint_id}/run"

# Define the headers
headers = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {bearer_token}'
}


system_message = """You are a helpful, respectful and honest assistant and chatbot."""
prompt_template = f'''[INST] <<SYS>>
{system_message}
<</SYS>>'''

# Add the initial user message
prompt_template += f'\n{prompt} [/INST]'

print("here")
request = {
        'prompt': prompt_template,
        'max_new_tokens': 4000,
        'temperature': 0.7,
        'top_k': 50,
        'top_p': 0.7,
        'repetition_penalty': 1.2,
        'batch_size': 8,
            }

response = requests.post(url, json=dict(input=request), headers = {
"Authorization": f"Bearer {bearer_token}"
    })
print(response.text)
response_json = json.loads(response.text)

job = response_json['id']

while True:
    

  status_url = f"https://api.runpod.ai/v2/{endpoint_id}/status/{response_json['id']}"
  get_status = requests.get(status_url, headers=headers)
  print("here",get_status.text)
  status_id = json.loads(get_status.text)['id']
  status = json.loads(get_status.text)['status']

  if status in ["IN_QUEUE", "IN_PROGRESS"]:
    time.sleep(20)
  
  else:
    if status == "COMPLETED":
      print({
          "status": "COMPLETED",
          "output": json.loads(get_status.text).get("output")
      })
    else:
        print("error")

import requests
import sys
import json
import time

bearer_token = "**"
endpoint_id = "**"

prompt = """
List me all of the US presidents?

"""

# Define the URL
url = f"https://api.runpod.ai/v2/{endpoint_id}/run"

# Define the headers
headers = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {bearer_token}'
}


system_message = """You are a helpful, respectful and honest assistant and chatbot."""
prompt_template = f'''[INST] <<SYS>>
{system_message}
<</SYS>>'''

# Add the initial user message
prompt_template += f'\n{prompt} [/INST]'

print("here")
request = {
        'prompt': prompt_template,
        'max_new_tokens': 4000,
        'temperature': 0.7,
        'top_k': 50,
        'top_p': 0.7,
        'repetition_penalty': 1.2,
        'batch_size': 8,
            }

response = requests.post(url, json=dict(input=request), headers = {
"Authorization": f"Bearer {bearer_token}"
    })
print(response.text)
response_json = json.loads(response.text)

job = response_json['id']

while True:
    

  status_url = f"https://api.runpod.ai/v2/{endpoint_id}/status/{response_json['id']}"
  get_status = requests.get(status_url, headers=headers)
  print("here",get_status.text)
  status_id = json.loads(get_status.text)['id']
  status = json.loads(get_status.text)['status']

  if status in ["IN_QUEUE", "IN_PROGRESS"]:
    time.sleep(20)
  
  else:
    if status == "COMPLETED":
      print({
          "status": "COMPLETED",
          "output": json.loads(get_status.text).get("output")
      })
    else:
        print("error")

import requests
import sys
import json
import time

bearer_token = "**"
endpoint_id = "**"

prompt = """
List me all of the US presidents?

"""

# Define the URL
url = f"https://api.runpod.ai/v2/{endpoint_id}/run"

# Define the headers
headers = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {bearer_token}'
}


system_message = """You are a helpful, respectful and honest assistant and chatbot."""
prompt_template = f'''[INST] <<SYS>>
{system_message}
<</SYS>>'''

# Add the initial user message
prompt_template += f'\n{prompt} [/INST]'

print("here")
request = {
        'prompt': prompt_template,
        'max_new_tokens': 4000,
        'temperature': 0.7,
        'top_k': 50,
        'top_p': 0.7,
        'repetition_penalty': 1.2,
        'batch_size': 8,
            }

response = requests.post(url, json=dict(input=request), headers = {
"Authorization": f"Bearer {bearer_token}"
    })
print(response.text)
response_json = json.loads(response.text)

job = response_json['id']

while True:
    

  status_url = f"https://api.runpod.ai/v2/{endpoint_id}/status/{response_json['id']}"
  get_status = requests.get(status_url, headers=headers)
  print("here",get_status.text)
  status_id = json.loads(get_status.text)['id']
  status = json.loads(get_status.text)['status']

  if status in ["IN_QUEUE", "IN_PROGRESS"]:
    time.sleep(20)
  
  else:
    if status == "COMPLETED":
      print({
          "status": "COMPLETED",
          "output": json.loads(get_status.text).get("output")
      })
    else:
        print("error")

import requests
import sys
import json
import time

bearer_token = "**"
endpoint_id = "**"

prompt = """
List me all of the US presidents?

"""

# Define the URL
url = f"https://api.runpod.ai/v2/{endpoint_id}/run"

# Define the headers
headers = {
    'Content-Type': 'application/json',
    'Authorization': f'Bearer {bearer_token}'
}


system_message = """You are a helpful, respectful and honest assistant and chatbot."""
prompt_template = f'''[INST] <<SYS>>
{system_message}
<</SYS>>'''

# Add the initial user message
prompt_template += f'\n{prompt} [/INST]'

print("here")
request = {
        'prompt': prompt_template,
        'max_new_tokens': 4000,
        'temperature': 0.7,
        'top_k': 50,
        'top_p': 0.7,
        'repetition_penalty': 1.2,
        'batch_size': 8,
            }

response = requests.post(url, json=dict(input=request), headers = {
"Authorization": f"Bearer {bearer_token}"
    })
print(response.text)
response_json = json.loads(response.text)

job = response_json['id']

while True:
    

  status_url = f"https://api.runpod.ai/v2/{endpoint_id}/status/{response_json['id']}"
  get_status = requests.get(status_url, headers=headers)
  print("here",get_status.text)
  status_id = json.loads(get_status.text)['id']
  status = json.loads(get_status.text)['status']

  if status in ["IN_QUEUE", "IN_PROGRESS"]:
    time.sleep(20)
  
  else:
    if status == "COMPLETED":
      print({
          "status": "COMPLETED",
          "output": json.loads(get_status.text).get("output")
      })
    else:
        print("error")

A

ashley•2/16/24, 6:57 PM

How do you get a network timeout with runsync? you are doing something wrong, it eventually goes to

IN_QUEUE

IN_QUEUE

IN_QUEUE

IN_QUEUE or

IN_PROGRESS

IN_PROGRESS

IN_PROGRESS

IN_PROGRESS if the request takes too long, it doesn't time out.

K

kingclimax7569OP•2/16/24, 6:59 PM

response:

{"delayTime":662,"executionTime":9823,"id":"1d227fac-78f9-4e22-bb2e-1ff79718704a-u1","status":"COMPLETED"}

{"delayTime":662,"executionTime":9823,"id":"1d227fac-78f9-4e22-bb2e-1ff79718704a-u1","status":"COMPLETED"}

{"delayTime":662,"executionTime":9823,"id":"1d227fac-78f9-4e22-bb2e-1ff79718704a-u1","status":"COMPLETED"}

{"delayTime":662,"executionTime":9823,"id":"1d227fac-78f9-4e22-bb2e-1ff79718704a-u1","status":"COMPLETED"}

A

ashley•2/16/24, 7:00 PM

Yes, I knew it would not make a difference

A

ashley•2/16/24, 7:00 PM

Your worker is most likely throwing an error, and you are most likely capturing a dict in the

error

error

error

error key which causes this to happen

A

ashley•2/16/24, 7:01 PM

error

error

error

error only accepts an

str

str

str

str and not a

dict

dict

dict

dict, RunPod made a shitty breaking change to the SDK that causes this.

A

ashley•2/16/24, 7:01 PM

So now you have to do something like:

{
   "error": "Some error message",
   "output: someDict
}

{
   "error": "Some error message",
   "output: someDict
}

{
   "error": "Some error message",
   "output: someDict
}

{
   "error": "Some error message",
   "output: someDict
}

A

ashley•2/16/24, 7:02 PM

I had this exact same issue and had to change my error handling to fix it.

K

kingclimax7569OP•2/16/24, 7:17 PM

Sorry where does this change need to be made?

K

kingclimax7569OP•2/16/24, 7:17 PM

thank you for the response

A

ashley•2/16/24, 7:17 PM

in your endpoint handler file

K

kingclimax7569OP•2/16/24, 7:23 PM

Sorry I don't think I've ever modified that file, do I need the runpod python package to use it? I only have an endpoint that I set up

A

ashley•2/16/24, 7:27 PM

Are you using the vllm worker?

K

kingclimax7569OP•2/16/24, 7:30 PM

Im not sure, how can I find that out?

K

kingclimax7569OP•2/16/24, 8:11 PM

def generator_handler():    
    bearer_token = "**"
    endpoint_id = "**"

    prompt = """
    List me all of the US presidents?

    """

    # Define the URL
    url = f"https://api.runpod.ai/v2/{endpoint_id}/run"

    # Define the headers
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {bearer_token}'
    }


    system_message = """You are a helpful, respectful and honest assistant and chatbot."""
    prompt_template = f'''[INST] <<SYS>>
    {system_message}
    <</SYS>>'''

    # Add the initial user message
    prompt_template += f'\n{prompt} [/INST]'

    print("here")
    request = {
            'prompt': prompt_template,
            'max_new_tokens': 4000,
            'temperature': 0.7,
            'top_k': 50,
            'top_p': 0.7,
            'repetition_penalty': 1.2,
            'batch_size': 8,
                }

    response = requests.post(url, json=dict(input=request), headers = {
    "Authorization": f"Bearer {bearer_token}"
        })
    print(response.text)
    response_json = json.loads(response.text)

    job = response_json['id']

    while True:
        

      status_url = f"https://api.runpod.ai/v2/{endpoint_id}/status/{response_json['id']}"
      get_status = requests.get(status_url, headers=headers)
      print("here",get_status.text)
      status_id = json.loads(get_status.text)['id']
      status = json.loads(get_status.text)['status']

      if status in ["IN_QUEUE", "IN_PROGRESS"]:
        time.sleep(20)
      
      else:
        if status == "COMPLETED":
          print("COMPLETED")
          return {
              "error": "error 1",
              "output": json.loads(get_status.text)
            }
        
        else:
            return {
              "error": "error 2",
              "output": json.loads(get_status.text)
            }
if __name__ == '__main__':
  runpod.serverless.start({ 
    "handler": generator_handler, # Required
  })

def generator_handler():    
    bearer_token = "**"
    endpoint_id = "**"

    prompt = """
    List me all of the US presidents?

    """

    # Define the URL
    url = f"https://api.runpod.ai/v2/{endpoint_id}/run"

    # Define the headers
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {bearer_token}'
    }


    system_message = """You are a helpful, respectful and honest assistant and chatbot."""
    prompt_template = f'''[INST] <<SYS>>
    {system_message}
    <</SYS>>'''

    # Add the initial user message
    prompt_template += f'\n{prompt} [/INST]'

    print("here")
    request = {
            'prompt': prompt_template,
            'max_new_tokens': 4000,
            'temperature': 0.7,
            'top_k': 50,
            'top_p': 0.7,
            'repetition_penalty': 1.2,
            'batch_size': 8,
                }

    response = requests.post(url, json=dict(input=request), headers = {
    "Authorization": f"Bearer {bearer_token}"
        })
    print(response.text)
    response_json = json.loads(response.text)

    job = response_json['id']

    while True:
        

      status_url = f"https://api.runpod.ai/v2/{endpoint_id}/status/{response_json['id']}"
      get_status = requests.get(status_url, headers=headers)
      print("here",get_status.text)
      status_id = json.loads(get_status.text)['id']
      status = json.loads(get_status.text)['status']

      if status in ["IN_QUEUE", "IN_PROGRESS"]:
        time.sleep(20)
      
      else:
        if status == "COMPLETED":
          print("COMPLETED")
          return {
              "error": "error 1",
              "output": json.loads(get_status.text)
            }
        
        else:
            return {
              "error": "error 2",
              "output": json.loads(get_status.text)
            }
if __name__ == '__main__':
  runpod.serverless.start({ 
    "handler": generator_handler, # Required
  })

def generator_handler():    
    bearer_token = "**"
    endpoint_id = "**"

    prompt = """
    List me all of the US presidents?

    """

    # Define the URL
    url = f"https://api.runpod.ai/v2/{endpoint_id}/run"

    # Define the headers
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {bearer_token}'
    }


    system_message = """You are a helpful, respectful and honest assistant and chatbot."""
    prompt_template = f'''[INST] <<SYS>>
    {system_message}
    <</SYS>>'''

    # Add the initial user message
    prompt_template += f'\n{prompt} [/INST]'

    print("here")
    request = {
            'prompt': prompt_template,
            'max_new_tokens': 4000,
            'temperature': 0.7,
            'top_k': 50,
            'top_p': 0.7,
            'repetition_penalty': 1.2,
            'batch_size': 8,
                }

    response = requests.post(url, json=dict(input=request), headers = {
    "Authorization": f"Bearer {bearer_token}"
        })
    print(response.text)
    response_json = json.loads(response.text)

    job = response_json['id']

    while True:
        

      status_url = f"https://api.runpod.ai/v2/{endpoint_id}/status/{response_json['id']}"
      get_status = requests.get(status_url, headers=headers)
      print("here",get_status.text)
      status_id = json.loads(get_status.text)['id']
      status = json.loads(get_status.text)['status']

      if status in ["IN_QUEUE", "IN_PROGRESS"]:
        time.sleep(20)
      
      else:
        if status == "COMPLETED":
          print("COMPLETED")
          return {
              "error": "error 1",
              "output": json.loads(get_status.text)
            }
        
        else:
            return {
              "error": "error 2",
              "output": json.loads(get_status.text)
            }
if __name__ == '__main__':
  runpod.serverless.start({ 
    "handler": generator_handler, # Required
  })

def generator_handler():    
    bearer_token = "**"
    endpoint_id = "**"

    prompt = """
    List me all of the US presidents?

    """

    # Define the URL
    url = f"https://api.runpod.ai/v2/{endpoint_id}/run"

    # Define the headers
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {bearer_token}'
    }


    system_message = """You are a helpful, respectful and honest assistant and chatbot."""
    prompt_template = f'''[INST] <<SYS>>
    {system_message}
    <</SYS>>'''

    # Add the initial user message
    prompt_template += f'\n{prompt} [/INST]'

    print("here")
    request = {
            'prompt': prompt_template,
            'max_new_tokens': 4000,
            'temperature': 0.7,
            'top_k': 50,
            'top_p': 0.7,
            'repetition_penalty': 1.2,
            'batch_size': 8,
                }

    response = requests.post(url, json=dict(input=request), headers = {
    "Authorization": f"Bearer {bearer_token}"
        })
    print(response.text)
    response_json = json.loads(response.text)

    job = response_json['id']

    while True:
        

      status_url = f"https://api.runpod.ai/v2/{endpoint_id}/status/{response_json['id']}"
      get_status = requests.get(status_url, headers=headers)
      print("here",get_status.text)
      status_id = json.loads(get_status.text)['id']
      status = json.loads(get_status.text)['status']

      if status in ["IN_QUEUE", "IN_PROGRESS"]:
        time.sleep(20)
      
      else:
        if status == "COMPLETED":
          print("COMPLETED")
          return {
              "error": "error 1",
              "output": json.loads(get_status.text)
            }
        
        else:
            return {
              "error": "error 2",
              "output": json.loads(get_status.text)
            }
if __name__ == '__main__':
  runpod.serverless.start({ 
    "handler": generator_handler, # Required
  })

K

kingclimax7569OP•2/16/24, 8:11 PM

Not sure if that makes sense?

K

kingclimax7569OP•2/16/24, 8:12 PM

K

kingclimax7569OP•2/16/24, 8:12 PM

I get that response repeatedly

B

Boxitunny•2/17/24, 8:20 PM

I can share my code, but as far as I can see looking from what you’ve posted, your output should be in the ’tokens’ part of the json that you get back. Try just printing everything you get back. If it’s completed, it should be there…

B

Boxitunny•2/17/24, 8:29 PM

elif status == "COMPLETED":
tokens = json_response['output'][0]['choices'][0]['tokens']
return tokens

here's the relevant part of mine. if the status is COMPLETED, the output you want is in 'tokens'. hope this helps!

B

Boxitunny•2/17/24, 8:35 PM

...so if I'm reading yours right, you'll want something like

LLM_response = json.loads(get_status.text)['tokens']

LLM_response = json.loads(get_status.text)['tokens']

LLM_response = json.loads(get_status.text)['tokens']

LLM_response = json.loads(get_status.text)['tokens']

I think, lol

B

Boxitunny•2/17/24, 8:48 PM

…unless the problem really is that all you’re getting back is ’completed’ and no tokens at all anywhere. In which case forget all I said

BBoxitunny elif status == "COMPLETED": tokens = json_response['outp...

K

kingclimax7569OP•2/20/24, 1:53 PM

I will try this, thank you. Sorry, I didn't see this earlier

BBoxitunny …unless the problem really is that *all* you’re getting back is ’completed’ and ...

K

kingclimax7569OP•2/20/24, 6:19 PM

Hey the object I'm getting back doesn't have the "tokens" key. Did you use a handler function?

B

Boxitunny•2/20/24, 6:21 PM

I just used the ready made vllm endpoint.

I’m not really the one to ask.

A

Alpay Ariyak•2/21/24, 5:58 AM

Hi @kingclimax7569 , what are you looking to deploy?

AAlpay Ariyak Hi @kingclimax7569 , what are you looking to deploy?

K

kingclimax7569OP•2/21/24, 5:59 PM

Hey I already have a serverless endpoint deployed

K

kingclimax7569OP•2/21/24, 6:02 PM

I'm just trying to use the status endpoint to retrieve the entire result of a query instead of using the stream endpoint to retrieve the results gradually

Kkingclimax7569 Hey I already have a serverless endpoint deployed

A

Alpay Ariyak•2/21/24, 6:32 PM

Is it for a LLM?

AAlpay Ariyak Is it for a LLM?

K

kingclimax7569OP•2/21/24, 6:32 PM

Yes

A

Alpay Ariyak•2/21/24, 6:33 PM

Have you tried our https://github.com/runpod-workers/worker-vllm?
We’re adding full OpenAI compatibility this week

GitHub

GitHub - runpod-workers/worker-vllm: The RunPod worker template for...

The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm

K

kingclimax7569OP•2/21/24, 6:33 PM

I'm sorry how would that help? The problem seems to be with the runpod endpoints

K

kingclimax7569OP•2/21/24, 6:33 PM

Not the LLM

Kkingclimax7569 I'm currently using the v2/model_id/status/run_id endpoint and the results I get...

J

J.•2/21/24, 6:34 PM

Ah i think ik why, do u have return_aggregate set to true?

J

J.•2/21/24, 6:34 PM

if mode_to_run in ["both", "serverless"]:
    runpod.serverless.start({
        "handler": handler,
        "concurrency_modifier": adjust_concurrency,
        "return_aggregate_stream": True,
    })

if mode_to_run in ["both", "serverless"]:
    runpod.serverless.start({
        "handler": handler,
        "concurrency_modifier": adjust_concurrency,
        "return_aggregate_stream": True,
    })

if mode_to_run in ["both", "serverless"]:
    runpod.serverless.start({
        "handler": handler,
        "concurrency_modifier": adjust_concurrency,
        "return_aggregate_stream": True,
    })

if mode_to_run in ["both", "serverless"]:
    runpod.serverless.start({
        "handler": handler,
        "concurrency_modifier": adjust_concurrency,
        "return_aggregate_stream": True,
    })

U prob need return_aggregate_stream = true, so that if u are streaming, the streaming results become avaliable on /run