J. Posts - Answer Overflow

•Created by J. on 6/12/2024 in #⚡｜serverless

SDXL Quick Deploy through Runpod Doesn't work

I sent a request in to test it such as below, and it threw an error. There are other alternatives, so this is not the end of the world for me, but I wanted to give feedback that I don't believe it works.

{
  input: {
    prompt: "A cute cat"
  }
}

{
  input: {
    prompt: "A cute cat"
  }
}

5 replies

RRunPod

•Created by J. on 6/4/2024 in #⛅｜pods-clusters

Too many Open Files Error on CPU Pod - Easy Repro

10 replies

RRunPod

•Created by J. on 5/9/2024 in #⚡｜serverless

How to WebSocket to Serverless Pods

@Merrell / @flash-singh Wondering if I can get code-pointers on how to use the API to expose ports on serverless programatically, so that I can do stuff like web sockets?

131 replies

RRunPod

•Created by J. on 2/20/2024 in #⛅｜pods-clusters

GraphQL Cuda Version

How do I make a GPU pod through graphql with a specified cuda version? https://graphql-spec.runpod.io/#definition-PodFilter I assume is possible since runpod has it implemented but is the docs up to date?

3 replies

RRunPod

•Created by J. on 2/20/2024 in #⚡｜serverless

How does multiple priorities for GPUs assign to me workers?

8 replies

RRunPod

•Created by J. on 2/18/2024 in #⚡｜serverless

Serverless Unable to SSH / Use Jupyter Notebook Anymore

10 replies

RRunPod

•Created by J. on 2/18/2024 in #⚡｜serverless

Editing Serverless Template ENV Variable

When I edit a serverless template env variable, does it update in real time? Just wondering, I sort-of can't tell, but wondering what is happening under the hood. Do I need to refresh the workers myself, or is it when Idle workers > go active, will autograb new env variables?

5 replies

RRunPod

•Created by J. on 2/17/2024 in #⚡｜serverless

Does Runpod Autoupdate Images now for non-matching hashes?

2 replies

RRunPod

•Created by J. on 2/17/2024 in #⚡｜serverless

VllM Memory Error / Runpod Error?

https://pastebin.com/vjSgS4up

Error initializing vLLM engine: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (24144). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.

Error initializing vLLM engine: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (24144). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.

I get this error when I tried to start my vllm mistral serverless, it ended up fixing itself by just increasing the GPU to 24GB GPU Pro; which made me guess the GPU just wasn't good enough (even though it was my CPU indicating a 100% usage). But I guess the problem I have is how do I stop it from erroring out and repeating infinitely if it happens again? Does runpod or VLLM is it possible to catch this somehow? (The pastebin shows it worked eventually, cause that was a log from my second request after I upgraded the GPU, but otherwise it just kept going for a bit till i manually killed it)

7 replies

RRunPod

•Created by J. on 1/26/2024 in #⚡｜serverless

Does async generator allow a worker to take off multiple jobs? Concurrency Modifier?

I was reading the runpod docs, and I saw the below. But does an async generator_handler, mean that if I sent 5 jobs for example that one worker will just keep on picking up new jobs? I also tried to add the:

"concurrency_modifier": 4

"concurrency_modifier": 4

But if I queued up 10 jobs, it would first max out the workers, and just have jobs sitting in queue, rather than each worker picking up to the max number of concurrency modifiers? https://docs.runpod.io/serverless/workers/handlers/handler-async

import runpod
import asyncio

async def async_generator_handler(job):
    for i in range(5):
        # Generate an asynchronous output token
        output = f"Generated async token output {i}"
        yield output

        # Simulate an asynchronous task, such as processing time for a large language model
        await asyncio.sleep(1)


# Configure and start the RunPod serverless function
runpod.serverless.start(
    {
        "handler": async_generator_handler,  # Required: Specify the async handler
        "return_aggregate_stream": True,  # Optional: Aggregate results are accessible via /run endpoint
    }
)

import runpod
import asyncio

async def async_generator_handler(job):
    for i in range(5):
        # Generate an asynchronous output token
        output = f"Generated async token output {i}"
        yield output

        # Simulate an asynchronous task, such as processing time for a large language model
        await asyncio.sleep(1)


# Configure and start the RunPod serverless function
runpod.serverless.start(
    {
        "handler": async_generator_handler,  # Required: Specify the async handler
        "return_aggregate_stream": True,  # Optional: Aggregate results are accessible via /run endpoint
    }
)

89 replies

RRunPod

•Created by J. on 1/14/2024 in #⛅｜pods-clusters

Feature Request / Is it possible RunpodCTL

Just sharing a wish / pending thought as a backlog wish ~ Is it possible to add a CLI command to runpodctl, where it generates SSH keys / stuff, and I can send "the public key" to another pod and stuff, and it automatically adds it to the authorized public keys etc. And then it does a connection and a direct SCP file transfer? Just in my head that would be an amazing tool to have. I'd imagine something like: Pod 1: runpodctl setup networking send ... Pod 2: runpodctl receive networking ... --- Pod1: runpodctl send file PodID2

5 replies

RRunPod

•Created by J. on 1/1/2024 in #⚡｜serverless

How much RAM do we have per Serverless endpoint?

I am curious about how does RAM usage work? Since for fly.io you can allocate RAM / CPU, but here for runpod we only choose the GPU/whatever VRAM the GPU has. But if I am doing something like memory intensive like video / audio processing, will it just crash at some point bc of it?

4 replies

RRunPod

•Created by J. on 12/24/2023 in #⚡｜serverless

Is dynamically setting a minimum worker viable?

Wondering about: https://docs.runpod.io/docs/create-serverless-endpoint#modify-an-existing-serverless-endpoint Let's say that I have 5 throttled worker, and I dynamically set a minimum worker to 1 or 2? Does it kick off throttled workers and honors the minimum workers? Just been thinking about if I should be if I am experiencing heavier loads, be dynamically increasing the minimum workers active to guarantee processing time.

2 replies

RRunPod

•Created by J. on 12/21/2023 in #⚡｜serverless

Is runpod UI accurate when saying all workers are throttled?

5 replies

Gaming

Programming