justin
justin
RRunPod
Created by justin on 6/12/2024 in #⚡|serverless
SDXL Quick Deploy through Runpod Doesn't work
I sent a request in to test it such as below, and it threw an error. There are other alternatives, so this is not the end of the world for me, but I wanted to give feedback that I don't believe it works.
{
input: {
prompt: "A cute cat"
}
}
{
input: {
prompt: "A cute cat"
}
}
5 replies
RRunPod
Created by justin on 6/4/2024 in #⛅|pods
Too many Open Files Error on CPU Pod - Easy Repro
No description
10 replies
RRunPod
Created by justin on 5/9/2024 in #⚡|serverless
How to WebSocket to Serverless Pods
@Merrell / @flash-singh Wondering if I can get code-pointers on how to use the API to expose ports on serverless programatically, so that I can do stuff like web sockets?
40 replies
RRunPod
Created by justin on 2/20/2024 in #⛅|pods
GraphQL Cuda Version
How do I make a GPU pod through graphql with a specified cuda version? https://graphql-spec.runpod.io/#definition-PodFilter I assume is possible since runpod has it implemented but is the docs up to date?
3 replies
RRunPod
Created by justin on 2/20/2024 in #⚡|serverless
How does multiple priorities for GPUs assign to me workers?
No description
8 replies
RRunPod
Created by justin on 2/18/2024 in #⚡|serverless
Serverless Unable to SSH / Use Jupyter Notebook Anymore
No description
10 replies
RRunPod
Created by justin on 2/18/2024 in #⚡|serverless
Editing Serverless Template ENV Variable
When I edit a serverless template env variable, does it update in real time? Just wondering, I sort-of can't tell, but wondering what is happening under the hood. Do I need to refresh the workers myself, or is it when Idle workers > go active, will autograb new env variables?
5 replies
RRunPod
Created by justin on 2/17/2024 in #⚡|serverless
Does Runpod Autoupdate Images now for non-matching hashes?
No description
2 replies
RRunPod
Created by justin on 2/17/2024 in #⚡|serverless
VllM Memory Error / Runpod Error?
https://pastebin.com/vjSgS4up
Error initializing vLLM engine: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (24144). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
Error initializing vLLM engine: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (24144). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.
I get this error when I tried to start my vllm mistral serverless, it ended up fixing itself by just increasing the GPU to 24GB GPU Pro; which made me guess the GPU just wasn't good enough (even though it was my CPU indicating a 100% usage). But I guess the problem I have is how do I stop it from erroring out and repeating infinitely if it happens again? Does runpod or VLLM is it possible to catch this somehow? (The pastebin shows it worked eventually, cause that was a log from my second request after I upgraded the GPU, but otherwise it just kept going for a bit till i manually killed it)
7 replies
RRunPod
Created by justin on 1/26/2024 in #⚡|serverless
Does async generator allow a worker to take off multiple jobs? Concurrency Modifier?
I was reading the runpod docs, and I saw the below. But does an async generator_handler, mean that if I sent 5 jobs for example that one worker will just keep on picking up new jobs? I also tried to add the:
"concurrency_modifier": 4
"concurrency_modifier": 4
But if I queued up 10 jobs, it would first max out the workers, and just have jobs sitting in queue, rather than each worker picking up to the max number of concurrency modifiers? https://docs.runpod.io/serverless/workers/handlers/handler-async
import runpod
import asyncio

async def async_generator_handler(job):
for i in range(5):
# Generate an asynchronous output token
output = f"Generated async token output {i}"
yield output

# Simulate an asynchronous task, such as processing time for a large language model
await asyncio.sleep(1)


# Configure and start the RunPod serverless function
runpod.serverless.start(
{
"handler": async_generator_handler, # Required: Specify the async handler
"return_aggregate_stream": True, # Optional: Aggregate results are accessible via /run endpoint
}
)
import runpod
import asyncio

async def async_generator_handler(job):
for i in range(5):
# Generate an asynchronous output token
output = f"Generated async token output {i}"
yield output

# Simulate an asynchronous task, such as processing time for a large language model
await asyncio.sleep(1)


# Configure and start the RunPod serverless function
runpod.serverless.start(
{
"handler": async_generator_handler, # Required: Specify the async handler
"return_aggregate_stream": True, # Optional: Aggregate results are accessible via /run endpoint
}
)
89 replies
RRunPod
Created by justin on 1/14/2024 in #⛅|pods
Feature Request / Is it possible RunpodCTL
Just sharing a wish / pending thought as a backlog wish ~ Is it possible to add a CLI command to runpodctl, where it generates SSH keys / stuff, and I can send "the public key" to another pod and stuff, and it automatically adds it to the authorized public keys etc. And then it does a connection and a direct SCP file transfer? Just in my head that would be an amazing tool to have. I'd imagine something like: Pod 1: runpodctl setup networking send ... Pod 2: runpodctl receive networking ... --- Pod1: runpodctl send file PodID2
5 replies
RRunPod
Created by justin on 1/1/2024 in #⚡|serverless
How much RAM do we have per Serverless endpoint?
I am curious about how does RAM usage work? Since for fly.io you can allocate RAM / CPU, but here for runpod we only choose the GPU/whatever VRAM the GPU has. But if I am doing something like memory intensive like video / audio processing, will it just crash at some point bc of it?
4 replies
RRunPod
Created by justin on 12/24/2023 in #⚡|serverless
Is dynamically setting a minimum worker viable?
Wondering about: https://docs.runpod.io/docs/create-serverless-endpoint#modify-an-existing-serverless-endpoint Let's say that I have 5 throttled worker, and I dynamically set a minimum worker to 1 or 2? Does it kick off throttled workers and honors the minimum workers? Just been thinking about if I should be if I am experiencing heavier loads, be dynamically increasing the minimum workers active to guarantee processing time.
2 replies
RRunPod
Created by justin on 12/21/2023 in #⚡|serverless
Is runpod UI accurate when saying all workers are throttled?
No description
5 replies